Programming and Data Analysis¶

Getting started with Python

Yao-Jen Kuo yaojenkuo@ntu.edu.tw from DATAINPOINT

Getting Started¶

Tools we use to write/run Python programs¶

  • Terminal(Anaconda Prompt/Terminal of macOS.)
  • Text editor(Visual Studio Code is recommended.)
  • Python interpreter(Miniconda is recommended.)

Learn a bit more about what terminal/bash/command line is¶

  • Chapter 2 The Command Line, Technical Foundations of Informatics
  • Learn Enough Command Line to be Dangerous
  • Unix Command Summary

Using Stack Overflow and Pythontutor.com to help us learn programming¶

  • Stack Overflow
  • Pythontutor.com

How to use Stack Overflow efficiently?¶

  • The first post is question itself.
  • The second post, if checked "Green", is the answer chose by the initiator.
  • The third post, is the answer up-voted most by others.

Functions¶

What is print() in our previous example?¶

print("Hello world!")

print() is one of the so-called built-in functions in Python.

What is a function¶

A function is a named sequence of statements that performs a computation, either mathematical, symbolic, or graphical. When we define a function, we specify the name and the sequence of statements. Later, we can call the function by name.

How do we analyze a function?¶

  • function name.
  • inputs and parameters, if any.
  • sequence of statements in a code block belongs to the function itself.
  • outputs, if any.

Take bubble tea shop for instance¶

Imgur

Source: Google Search

What is a built-in function?¶

A pre-defined function, we can call the function by name without defining it.

How many built-in functions are available?¶

  • print()
  • help()
  • type()
  • ...etc.

Source: https://docs.python.org/3/library/functions.html

Get HELP with help()¶

In [1]:
help(print)
Help on built-in function print in module builtins:

print(*args, sep=' ', end='\n', file=None, flush=False)
    Prints the values to a stream, or to sys.stdout by default.

    sep
      string inserted between values, default a space.
    end
      string appended after the last value, default a newline.
    file
      a file-like object (stream); defaults to the current sys.stdout.
    flush
      whether to forcibly flush the stream.

In [2]:
help(type)
Help on class type in module builtins:

class type(object)
 |  type(object) -> the object's type
 |  type(name, bases, dict, **kwds) -> a new type
 |
 |  Methods defined here:
 |
 |  __call__(self, /, *args, **kwargs)
 |      Call self as a function.
 |
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |
 |  __dir__(self, /)
 |      Specialized __dir__ implementation for types.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __instancecheck__(self, instance, /)
 |      Check if an object is an instance.
 |
 |  __or__(self, value, /)
 |      Return self|value.
 |
 |  __repr__(self, /)
 |      Return repr(self).
 |
 |  __ror__(self, value, /)
 |      Return value|self.
 |
 |  __setattr__(self, name, value, /)
 |      Implement setattr(self, name, value).
 |
 |  __sizeof__(self, /)
 |      Return memory consumption of the type object.
 |
 |  __subclasscheck__(self, subclass, /)
 |      Check if a class is a subclass.
 |
 |  __subclasses__(self, /)
 |      Return a list of immediate subclasses.
 |
 |  mro(self, /)
 |      Return a type's method resolution order.
 |
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |
 |  __prepare__(...)
 |      __prepare__() -> dict
 |      used to create the namespace for the class statement
 |
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |
 |  __new__(*args, **kwargs)
 |      Create and return a new object.  See help(type) for accurate signature.
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __abstractmethods__
 |
 |  __annotations__
 |
 |  __dict__
 |
 |  __text_signature__
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __base__ = <class 'object'>
 |      The base class of the class hierarchy.
 |
 |      When called, it accepts no arguments and returns a new featureless
 |      instance that has no instance attributes and cannot be given any.
 |
 |
 |  __bases__ = (<class 'object'>,)
 |
 |  __basicsize__ = 920
 |
 |  __dictoffset__ = 264
 |
 |  __flags__ = 2156420354
 |
 |  __itemsize__ = 40
 |
 |  __mro__ = (<class 'type'>, <class 'object'>)
 |
 |  __type_params__ = ()
 |
 |  __weakrefoffset__ = 368

We can also help() on help()¶

In [3]:
help(help)
Help on _Helper in module _sitebuiltins object:

class _Helper(builtins.object)
 |  Define the builtin 'help'.
 |
 |  This is a wrapper around pydoc.help that provides a helpful message
 |  when 'help' is typed at the Python interactive prompt.
 |
 |  Calling help() at the Python prompt starts an interactive help session.
 |  Calling help(thing) prints help for the python object 'thing'.
 |
 |  Methods defined here:
 |
 |  __call__(self, *args, **kwds)
 |      Call self as a function.
 |
 |  __repr__(self)
 |      Return repr(self).
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables
 |
 |  __weakref__
 |      list of weak references to the object

Besides built-in functions or library-powered functions, we sometimes need to self-define our own functions¶

  • def the name of our function
  • return the output of our function
def function_name(INPUTS: type, PARAMETERS: type, ...) -> type:
    """
    docstring: print documentation when help() is called
    """
    # sequence of statements
    return OUTPUTS
In [4]:
# Definition
def add(x: int, y: int) -> int:
    """
    Equivalent to x + y
    >>> add(5, 6)
    11
    >>> add(55, 66)
    121
    >>> add(8, 7)
    15
    """
    return x + y

help(add)
Help on function add in module __main__:

add(x: int, y: int) -> int
    Equivalent to x + y
    >>> add(5, 6)
    11
    >>> add(55, 66)
    121
    >>> add(8, 7)
    15

Call the function by name after defining it¶

In [5]:
print(add(5, 6))
11

Programming based on testing is called TDD, Test-Driven Development¶

  • Test-driven development (TDD) is a software development process relying on software requirements being converted to test cases before software is fully developed.
  • Our assignments and exams are the minimal version of TDD.

Arithmetic Operators in Python¶

Symbols that represent computations¶

  • +, -, *, / are quite straight-forward.
  • ** for exponentiation.
  • % for remainder.
  • // for floor-divide.

When an expression contains more than one operator, the order of evaluation depends on the operator precedence¶

  1. Parentheses have the highest precedence.
  2. Exponentiation has the next highest precedence.
  3. Multiplication and division have higher precedence than addition and subtraction.
  4. Operators with the same precedence are evaluated from left to right.

Converting Fahrenheit to Celsius¶

\begin{equation} \text{Celsius}(^{\circ}C) = \left( \text{Fahrenheit}(^{\circ}F) - 32 \right) \times \frac{5}{9} \end{equation}

In [6]:
def convert_fahrenheit_to_celsius(x: int) -> float:
    """
    Converting from fahrenheit scale to celsius scale.
    >>> convert_fahrenheit_to_celsius(32)
    0.0
    >>> convert_fahrenheit_to_celsius(212)
    100.0
    """
    out = (x - 32) * 5/9
    return out

print(convert_fahrenheit_to_celsius(32))
print(convert_fahrenheit_to_celsius(212))
0.0
100.0

How to properly use functions?¶

  • Using arguments to adjust the output of a defined function.
  • Differentiate functions versus methods.
  • Be aware of the update mechanism.

sorted() function takes a bool argument for reverse parameter¶

In [7]:
list_to_be_sorted = [11, 5, 7, 2, 3]
print(sorted(list_to_be_sorted, reverse=True))
print(sorted(list_to_be_sorted))
[11, 7, 5, 3, 2]
[2, 3, 5, 7, 11]

Different syntax¶

function_name(OBJECT, ARGUMENTS) # function
OBJECT.method_name(ARGUMENTS)    # method

list has a method sort() works like sorted() function¶

In [8]:
list_to_be_sorted = [11, 5, 7, 2, 3]
print(sorted(list_to_be_sorted))
list_to_be_sorted.sort()
print(list_to_be_sorted)
[2, 3, 5, 7, 11]
[2, 3, 5, 7, 11]

How is the list_to_be_sorted being updated?¶

In [9]:
# update through return
list_to_be_sorted = [11, 5, 7, 2, 3]
sorted_list = sorted(list_to_be_sorted)
print(sorted_list)
[2, 3, 5, 7, 11]
In [10]:
# update through change of state
list_to_be_sorted = [11, 5, 7, 2, 3]
list_to_be_sorted.sort()
print(list_to_be_sorted)
[2, 3, 5, 7, 11]

Variables¶

We usually don't just print out literal values¶

In [11]:
print("Hello world!")
Hello world!

It is more useful to refer a literal value by an object name¶

In [12]:
hello_world = "Hello world!"
print(hello_world.lower())
print(hello_world.upper())
print(hello_world.swapcase())
print(hello_world.title())
hello world!
HELLO WORLD!
hELLO WORLD!
Hello World!

A variable is a name that refers to a value¶

variable_name = literal_value

Choose names for our variables: don'ts¶

  • Do not use built-in functions.
  • Cannot use keywords.
  • Cannot start with numbers.

Source: https://www.python.org/dev/peps/pep-0008/

Choose names for our variables: dos¶

  • Use a lowercase single letter, word, or words.
  • Separate words with underscores to improve readability(so-called snake case).
  • Be meaningful.

Source: https://www.python.org/dev/peps/pep-0008/

Using # to write comments in our program¶

Comments can appear on a line by itself, or at the end of a line.

In [13]:
# Turn fahrenheit to celsius
def from_fahrenheit_to_celsius(x: int) -> float:
    out = (x - 32) * 5/9
    return out

print(from_fahrenheit_to_celsius(32))  # turn 32 fahrenheit to celsius
print(from_fahrenheit_to_celsius(212)) # turn 212 fahrenheit to celsius
0.0
100.0

Everything from # to the end of the line is ignored during execution¶

Data Types¶

Values belong to different types, we commonly use¶

  • int and float for numeric computing.
  • str for symbolic.
  • bool for conditionals.
  • NoneType for undefined values.

Use type() function to check the type of a certain value/variable¶

In [14]:
print(type(5566))
print(type(42.195))
print(type("Hello world!"))
print(type(False))
print(type(True))
print(type(None))
<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
<class 'bool'>
<class 'NoneType'>

str¶

How to form a str?¶

Use paired ', ", or """ to embrace letters strung together.

In [15]:
str_with_single_quotes = 'Hello world!'
str_with_double_quotes = "Hello world!"
str_with_triple_double_quotes = """Hello world!"""
print(type(str_with_single_quotes))
print(type(str_with_double_quotes))
print(type(str_with_triple_double_quotes))
<class 'str'>
<class 'str'>
<class 'str'>

If we have single/double quotes in str values we might have SyntaxError¶

mcd = 'I'm lovin' it!'

Use \ to escape or paired " or paired """¶

In [16]:
mcd = 'I\'m lovin\' it!'
mcd = "I'm lovin' it!"
mcd = """I'm lovin' it!"""

Great features of strings formed with paired """¶

  • A paragraph.
  • Docstring.

Use paired """ for a paragraph¶

In [17]:
storyline = """
Chronicles the experiences of a formerly successful banker\
 as a prisoner in the gloomy jailhouse of Shawshank after\
 being found guilty of a crime he did not commit. The film\
 portrays the man's unique way of dealing with his new, torturous\
 life; along the way he befriends a number of fellow prisoners,\
 most notably a wise long-term inmate named Red.
"""
In [18]:
sql_query = """
SELECT *
  FROM world
 WHERE country = 'Taiwan';
"""

Use paired """ for docstring¶

In [19]:
def from_fahrenheit_to_celsius(x: int) -> float:
    """
    Turns fahrenheit to celsius.
    """
    return (x - 32) * 5/9

help(from_fahrenheit_to_celsius)
Help on function from_fahrenheit_to_celsius in module __main__:

from_fahrenheit_to_celsius(x: int) -> float
    Turns fahrenheit to celsius.

We've seen arithmetic operators for numeric values¶

How about those for str?

str type takes + and *¶

  • + for concatenation.
  • * for repetition.
In [20]:
mcd = "I'm lovin' it!"
print(mcd)
print(mcd + mcd)
print(mcd * 3)
I'm lovin' it!
I'm lovin' it!I'm lovin' it!
I'm lovin' it!I'm lovin' it!I'm lovin' it!

Format our str¶

  • The .format() way.
  • The f-string way.

The f-string way: uses {} for string print with format¶

In [21]:
def hello_anyone(anyone: str) -> str:
    out = f"Hello {anyone}!"
    return out

print(hello_anyone("Anakin Skywalker"))
print(hello_anyone("Luke Skywalker"))
Hello Anakin Skywalker!
Hello Luke Skywalker!

Commonly used format¶

  • {:.nf} for float format.
  • {:,} for comma format.
In [22]:
def format_pi(pi: float) -> str:
    return f"{pi:.2f}"

print(format_pi(3.1415))
print(format_pi(3.141592))
3.14
3.14
In [23]:
def format_thousands(ntd: int) -> str:
    return f"${ntd:,}"

print(format_thousands(1000))
print(format_thousands(1000000))
print(format_thousands(1000000000))
$1,000
$1,000,000
$1,000,000,000

More formats with f-string¶

https://www.w3schools.com/python/ref_string_format.asp

bool¶

How to form a bool?¶

  • Use keywords True and False directly.
  • Use relational operators.
  • Use logical operators.

Use keywords True and False directly¶

In [24]:
print(True)
print(type(True))
print(False)
print(type(False))
True
<class 'bool'>
False
<class 'bool'>

Use relational operators¶

We have ==, !=, >, <, >=, <=, in, not in as common relational operators to compare values.

In [25]:
print(5566 == 5566.0)
print(5566 != 5566.0)
print('56' in '5566')
True
False
True

Use logical operators¶

  • We have and, or, not as common logical operators to manipulate bool type values.
  • Getting a True only if both sides of and are True.
  • Getting a False only if both sides of or are False.
In [26]:
print(True and True)  # get True only when both sides are True
print(True and False)
print(False and False)
print(True or True)
print(True or False)
print(False or False) # get a False only when both sides are False
# use of not is quite straight-forward
print(not True)
print(not False)
True
False
False
True
True
False
False
True

An example of using logical operators¶

Good marathon weather is often described as dry and cold. Say, the probabilities of dry and cold on race day are both 50%, there is a 25% of chance for good marathon weather.

In [27]:
def is_good_marathon_weather(is_dry: bool, is_cold: bool) -> bool:
    return is_dry and is_cold

print(is_good_marathon_weather(True, True))
print(is_good_marathon_weather(True, False))
print(is_good_marathon_weather(False, True))
print(is_good_marathon_weather(False, False))
True
False
False
False

An example of using logical operators(cont'd)¶

Good marathon weather is often described as dry or cold. Say, the probabilities of dry and cold on race day are both 50%, there is a 75% of chance for good marathon weather.

In [28]:
def is_good_marathon_weather(is_dry: bool, is_cold: bool) -> bool:
    return is_dry or is_cold

print(is_good_marathon_weather(True, True))
print(is_good_marathon_weather(True, False))
print(is_good_marathon_weather(False, True))
print(is_good_marathon_weather(False, False))
True
True
True
False

bool is quite useful in control flow and filtering data.¶

NoneType¶

Python has a special type, the NoneType, with a single value, None¶

  • This is used to represent undefined values.
  • It is not the same as False, or an empty string '' or 0.
In [29]:
a_none_type = None
print(type(a_none_type))
print(a_none_type == False)
print(a_none_type == '')
print(a_none_type == 0)
print(a_none_type == None)
<class 'NoneType'>
False
False
False
True

A function without return statement actually returns a NoneType.¶

In [30]:
def hello_anyone(anyone: str) -> str:
    print(f"Hello {anyone}!")

hello_anyone("Anakin Skywalker")
hello_anyone("Luke Skywalker")
Hello Anakin Skywalker!
Hello Luke Skywalker!
In [31]:
func_out = hello_anyone("Anakin Skywalker")
type(func_out)
Hello Anakin Skywalker!
Out[31]:
NoneType

Besides type() function, data types can also be validated via isinstance() function¶

In [32]:
an_integer = 5566
a_float = 42.195
a_str = "5566"
a_bool = False
a_none_type = None

print(isinstance(an_integer, int))
print(isinstance(a_float, float))
print(isinstance(a_str, str))
print(isinstance(a_bool, bool))
print(isinstance(a_none_type, type(None))) # print(a_none_type == None)
True
True
True
True
True

Data types can be dynamically converted using functions¶

  • int() for converting to int.
  • float() for converting to float.
  • str() for converting to str.
  • bool() for converting to bool.

Upcasting(to a supertype) is always allowed¶

NoneType -> bool -> int -> float -> str.

In [33]:
print(bool(None))
print(int(True))
print(float(1))
print(str(1.0))
False
1
1.0
1.0

While downcasting(to a subtype) needs a second look¶

In [34]:
print(float('1.0'))
print(int('1'))
print(bool('False'))
print(bool('NoneType'))
1.0
1
True
True