Data Structures in Python
Kuo, Yao-Jen yaojenkuo@ntu.edu.tw from DATAINPOINT
In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.
As a software engineer, the main job is to perform operations on data, we can simplify that operation into:
Quite similar to what we've got from the definition of a function.
Data structure decides how and where we put the data to be processed. A good choice of data structure can enhance our efficiency.
list
tuple
dict
as in dictionaryset
Quite similar to the comparison of built-in functions vs. self-defined/third party functions.
list
¶Lists are the basic ordered and mutable data collection type in Python. They can be defined with comma-separated values between brackets.
primes = [2, 3, 5, 7, 11]
print(type(primes)) # use type() to check type
print(len(primes)) # use len() to check how many elements are stored in the list
<class 'list'> 5
.append()
.pop()
.remove()
.insert()
.sort()
We can use TAB
and SHIFT - TAB
for documentation prompts in a notebook environment.
None
.sorted()
function and a list's sort()
method¶primes.append(13) # appending an element to the end of a list
print(primes)
primes.pop() # popping out the last element of a list
print(primes)
primes.remove(2) # removing the first occurance of an element within a list
print(primes)
primes.insert(0, 2) # inserting certain element at a specific index
print(primes)
primes.sort(reverse=True) # sorting a list, reverse=False => ascending order; reverse=True => descending order
print(primes)
[2, 3, 5, 7, 11, 13] [2, 3, 5, 7, 11] [3, 5, 7, 11] [2, 3, 5, 7, 11] [11, 7, 5, 3, 2]
primes.sort()
print(primes[0]) # the first element
print(primes[1]) # the second element
2 3
print(primes[-1]) # the last element
print(primes[-2]) # the second last element
11 7
# slicing syntax
LIST[start:stop:step]
print(primes[0:3:1]) # slicing the first 3 elements
print(primes[-3:len(primes):1]) # slicing the last 3 elements
print(primes[0:len(primes):2]) # slicing every second element
[2, 3, 5] [5, 7, 11] [2, 5, 11]
So we can do the same slicing with defaults.
print(primes[:3]) # slicing the first 3 elements
print(primes[-3:]) # slicing the last 3 elements
print(primes[::2]) # slicing every second element
print(primes[::-1]) # a particularly useful tip is to specify a negative step
[2, 3, 5] [5, 7, 11] [2, 5, 11] [11, 7, 5, 3, 2]
tuple
¶Tuples are in many ways similar to lists, but they are defined with parentheses rather than brackets.
primes = (2, 3, 5, 7, 11)
print(type(primes)) # use type() to check type
print(len(primes)) # use len() to check how many elements are stored in the list
<class 'tuple'> 5
Once they are created, their size and contents cannot be changed.
primes = [2, 3, 5, 7, 11]
primes[-1] = 13
print(primes)
primes = tuple(primes)
[2, 3, 5, 7, 13]
try:
primes[-1] = 11
except TypeError as e:
print(e)
'tuple' object does not support item assignment
TUPLE.<TAB>
def get_locale(country: str, city: str) -> str:
return country, city
print(get_locale("Taiwan", "Taipei"))
print(type(get_locale("Taiwan", "Taipei")))
('Taiwan', 'Taipei') <class 'tuple'>
my_country, my_city = get_locale("Taiwan", "Taipei")
print(my_country)
print(my_city)
Taiwan Taipei
dict
¶Dictionaries are extremely flexible mappings of keys to values, and form the basis of much of Python's internal implementation. They can be created via a comma-separated list of key:value
pairs within braces.
boston_celtics = {
'isNBAFranchise': True,
'city': "Boston",
'fullName': "Boston Celtics",
'tricode': "BOS",
'teamId': 1610612738,
'nickname': "Celtics",
'confName': "East",
'divName': "Atlantic"
}
print(type(boston_celtics))
print(len(boston_celtics))
<class 'dict'> 8
print(boston_celtics['city'])
print(boston_celtics['confName'])
print(boston_celtics['divName'])
Boston East Atlantic
boston_celtics['isMyFavorite'] = True
print(boston_celtics)
{'isNBAFranchise': True, 'city': 'Boston', 'fullName': 'Boston Celtics', 'tricode': 'BOS', 'teamId': 1610612738, 'nickname': 'Celtics', 'confName': 'East', 'divName': 'Atlantic', 'isMyFavorite': True}
del
to remove a key:value pair from a dictionary¶del boston_celtics['isMyFavorite']
print(boston_celtics)
{'isNBAFranchise': True, 'city': 'Boston', 'fullName': 'Boston Celtics', 'tricode': 'BOS', 'teamId': 1610612738, 'nickname': 'Celtics', 'confName': 'East', 'divName': 'Atlantic'}
.keys()
.values()
.items()
print(boston_celtics.keys())
print(boston_celtics.values())
print(boston_celtics.items())
dict_keys(['isNBAFranchise', 'city', 'fullName', 'tricode', 'teamId', 'nickname', 'confName', 'divName']) dict_values([True, 'Boston', 'Boston Celtics', 'BOS', 1610612738, 'Celtics', 'East', 'Atlantic']) dict_items([('isNBAFranchise', True), ('city', 'Boston'), ('fullName', 'Boston Celtics'), ('tricode', 'BOS'), ('teamId', 1610612738), ('nickname', 'Celtics'), ('confName', 'East'), ('divName', 'Atlantic')])
set
¶The fourth basic collection is the set, which contains unordered collections of unique items. They are defined much like lists and tuples, except they use the braces.
primes = {2, 3, 5, 7, 11}
odds = {1, 3, 5, 7, 9}
print(type(primes))
print(len(odds))
<class 'set'> 5
|
: Union operator.&
: Intersection operator.-
: Difference operator.^
: Symmetric difference operator.print(primes | odds) # with an operator
print(primes.union(odds)) # equivalently with a method
{1, 2, 3, 5, 7, 9, 11} {1, 2, 3, 5, 7, 9, 11}
print(primes & odds) # with an operator
print(primes.intersection(odds)) # equivalently with a method
{3, 5, 7} {3, 5, 7}
print(primes - odds) # with an operator
print(primes.difference(odds)) # equivalently with a method
{2, 11} {2, 11}
print(sorted((primes - odds) | (odds - primes))) # union two differences
print(primes ^ odds) # with an operator
print(primes.symmetric_difference(odds)) # equivalently with a method
[1, 2, 9, 11] {1, 2, 9, 11} {1, 2, 9, 11}
()
[]
{}
()
¶tuple
.[]
¶list
.{}
¶.format()
or f-strings.dict
with key: value
pairs.set
.The official API of NBA is a bunch of compound dictionaries contained other dictionaries/lists as values.
The balldontlie API is also a bunch of compound dictionaries contained other dictionaries/lists as values.