NumPy Basics
NumPy is the foundational library for data processing in Python. Its core object is ndarray, the multidimensional array.
I include this in the main track not because you need to do complex scientific computing from the start, but because when you later connect to Pandas, machine learning, or numerical computing, many concepts will come back to arrays, shapes, and vectorization.
Creating Arrays
import numpy as np
a = np.array([1, 2, 3])
b = np.array([[1, 2], [3, 4]])
Common creation methods:
np.zeros((2, 3))
np.ones((2, 3))
np.arange(0, 10, 2)
np.linspace(0, 1, 5)
Understand three attributes first
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # (2, 3)
print(arr.ndim) # 2
print(arr.dtype) # int64 or int32, depends on platform
shape: Dimensionsndim: Number of dimensionsdtype: Element type
Indexing and Slicing
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr[0, 1]) # 2
print(arr[:, 1]) # [2 5]
print(arr[0:2, 1:])
Like regular Python sequences, NumPy supports slicing, but multidimensional array indexing is more expressive.
Vectorization
One of NumPy's most valuable features is that many operations don't require writing for loops yourself.
arr = np.array([1, 2, 3, 4])
print(arr * 2) # [2 4 6 8]
print(arr + 10) # [11 12 13 14]
print(arr ** 2) # [1 4 9 16]
Arrays can also perform element-wise operations directly with each other:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print(x + y)
print(x * y)