Data Structures in Python
Kuo, Yao-Jen yaojenkuo@ntu.edu.tw from DATAINPOINT
In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.
As a software engineer, the main job is to perform operations on data, we can simplify that operation into:
Quite similar to what we've got from the definition of a function.
Data structure decides how and where we put the data to be processed. A good choice of data structure can enhance our efficiency.
listtupledict as in dictionarysetQuite similar to the comparison of built-in functions vs. self-defined/third party functions.
list¶Lists are the basic ordered and mutable data collection type in Python. They can be defined with comma-separated values between brackets.
primes = [2, 3, 5, 7, 11]
print(type(primes)) # use type() to check type
print(len(primes)) # use len() to check how many elements are stored in the list
<class 'list'> 5
.append().pop().remove().insert().sort()We can use TAB and SHIFT - TAB for documentation prompts in a notebook environment.
None.sorted() function and a list's sort() method¶primes.append(13) # appending an element to the end of a list
print(primes)
primes.pop() # popping out the last element of a list
print(primes)
primes.remove(2) # removing the first occurance of an element within a list
print(primes)
primes.insert(0, 2) # inserting certain element at a specific index
print(primes)
primes.sort(reverse=True) # sorting a list, reverse=False => ascending order; reverse=True => descending order
print(primes)
[2, 3, 5, 7, 11, 13] [2, 3, 5, 7, 11] [3, 5, 7, 11] [2, 3, 5, 7, 11] [11, 7, 5, 3, 2]
primes.sort()
print(primes[0]) # the first element
print(primes[1]) # the second element
2 3
print(primes[-1]) # the last element
print(primes[-2]) # the second last element
11 7
# slicing syntax
LIST[start:stop:step]
print(primes[0:3:1]) # slicing the first 3 elements
print(primes[-3:len(primes):1]) # slicing the last 3 elements
print(primes[0:len(primes):2]) # slicing every second element
[2, 3, 5] [5, 7, 11] [2, 5, 11]
So we can do the same slicing with defaults.
print(primes[:3]) # slicing the first 3 elements
print(primes[-3:]) # slicing the last 3 elements
print(primes[::2]) # slicing every second element
print(primes[::-1]) # a particularly useful tip is to specify a negative step
[2, 3, 5] [5, 7, 11] [2, 5, 11] [11, 7, 5, 3, 2]
tuple¶Tuples are in many ways similar to lists, but they are defined with parentheses rather than brackets.
primes = (2, 3, 5, 7, 11)
print(type(primes)) # use type() to check type
print(len(primes)) # use len() to check how many elements are stored in the list
<class 'tuple'> 5
Once they are created, their size and contents cannot be changed.
primes = [2, 3, 5, 7, 11]
primes[-1] = 13
print(primes)
primes = tuple(primes)
[2, 3, 5, 7, 13]
try:
primes[-1] = 11
except TypeError as e:
print(e)
'tuple' object does not support item assignment
TUPLE.<TAB>
def get_locale(country: str, city: str) -> str:
return country, city
print(get_locale("Taiwan", "Taipei"))
print(type(get_locale("Taiwan", "Taipei")))
('Taiwan', 'Taipei')
<class 'tuple'>
my_country, my_city = get_locale("Taiwan", "Taipei")
print(my_country)
print(my_city)
Taiwan Taipei
dict¶Dictionaries are extremely flexible mappings of keys to values, and form the basis of much of Python's internal implementation. They can be created via a comma-separated list of key:value pairs within braces.
boston_celtics = {
'isNBAFranchise': True,
'city': "Boston",
'fullName': "Boston Celtics",
'tricode': "BOS",
'teamId': 1610612738,
'nickname': "Celtics",
'confName': "East",
'divName': "Atlantic"
}
print(type(boston_celtics))
print(len(boston_celtics))
<class 'dict'> 8
print(boston_celtics['city'])
print(boston_celtics['confName'])
print(boston_celtics['divName'])
Boston East Atlantic
boston_celtics['isMyFavorite'] = True
print(boston_celtics)
{'isNBAFranchise': True, 'city': 'Boston', 'fullName': 'Boston Celtics', 'tricode': 'BOS', 'teamId': 1610612738, 'nickname': 'Celtics', 'confName': 'East', 'divName': 'Atlantic', 'isMyFavorite': True}
del to remove a key:value pair from a dictionary¶del boston_celtics['isMyFavorite']
print(boston_celtics)
{'isNBAFranchise': True, 'city': 'Boston', 'fullName': 'Boston Celtics', 'tricode': 'BOS', 'teamId': 1610612738, 'nickname': 'Celtics', 'confName': 'East', 'divName': 'Atlantic'}
.keys().values().items()print(boston_celtics.keys())
print(boston_celtics.values())
print(boston_celtics.items())
dict_keys(['isNBAFranchise', 'city', 'fullName', 'tricode', 'teamId', 'nickname', 'confName', 'divName'])
dict_values([True, 'Boston', 'Boston Celtics', 'BOS', 1610612738, 'Celtics', 'East', 'Atlantic'])
dict_items([('isNBAFranchise', True), ('city', 'Boston'), ('fullName', 'Boston Celtics'), ('tricode', 'BOS'), ('teamId', 1610612738), ('nickname', 'Celtics'), ('confName', 'East'), ('divName', 'Atlantic')])
set¶The fourth basic collection is the set, which contains unordered collections of unique items. They are defined much like lists and tuples, except they use the braces.
primes = {2, 3, 5, 7, 11}
odds = {1, 3, 5, 7, 9}
print(type(primes))
print(len(odds))
<class 'set'> 5
|: Union operator.&: Intersection operator.-: Difference operator.^: Symmetric difference operator.print(primes | odds) # with an operator
print(primes.union(odds)) # equivalently with a method
{1, 2, 3, 5, 7, 9, 11}
{1, 2, 3, 5, 7, 9, 11}
print(primes & odds) # with an operator
print(primes.intersection(odds)) # equivalently with a method
{3, 5, 7}
{3, 5, 7}
print(primes - odds) # with an operator
print(primes.difference(odds)) # equivalently with a method
{2, 11}
{2, 11}
print(sorted((primes - odds) | (odds - primes))) # union two differences
print(primes ^ odds) # with an operator
print(primes.symmetric_difference(odds)) # equivalently with a method
[1, 2, 9, 11]
{1, 2, 9, 11}
{1, 2, 9, 11}
()[]{}()¶tuple.[]¶list.{}¶.format() or f-strings.dict with key: value pairs.set.The official API of NBA is a bunch of compound dictionaries contained other dictionaries/lists as values.
The balldontlie API is also a bunch of compound dictionaries contained other dictionaries/lists as values.