NumPy Tutorial ā Arrays and Operations
In this tutorial, you'll learn about NumPy Tutorial. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You'll Learn
Use NumPy for numerical computing ā create arrays, perform element-wise operations, use broadcasting, and leverage NumPy's vectorization for fast computations.
Why It Matters
NumPy is the foundation of Python Data Science. pandas, Scikit-Learn, TensorFlow, and PyTorch all build on NumPy arrays. Understanding NumPy is essential.
Real-World Use
Image processing (images are NumPy arrays), financial calculations on stock price arrays, or computing statistics on millions of data points efficiently.
What is NumPy?
NumPy provides the ndarray (N-dimensional array) ā faster and more memory-efficient than Python lists for numerical data.
import numpy as np
# Python list
py_list = [1, 2, 3, 4, 5]
doubled = [x * 2 for x in py_list] # Slow for large data
# NumPy array
np_array = np.array([1, 2, 3, 4, 5])
doubled = np_array * 2 # Fast, no loop needed
Creating Arrays
# From a list
arr = np.array([1, 2, 3, 4, 5])
# 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])
# array([[1, 2, 3],
# [4, 5, 6]])
# Zeros and ones
zeros = np.zeros((3, 4)) # 3 rows, 4 cols, all 0
ones = np.ones((2, 3)) # All 1
# Range
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# Linear spacing
linear = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1.0]
# Random
random = np.random.rand(3, 3) # Uniform [0, 1)
normal = np.random.randn(1000) # Normal distribution
integers = np.random.randint(0, 100, 10) # Random integers
# Identity matrix
identity = np.eye(3)
# array([[1., 0., 0.],
# [0., 1., 0.],
# [0., 0., 1.]])
Array Attributes
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr.shape # (2, 3) ā 2 rows, 3 columns
arr.ndim # 2 ā number of dimensions
arr.size # 6 ā total elements
arr.dtype # int64 ā data type
arr.itemsize # 8 ā bytes per element
arr.nbytes # 48 ā total bytes (size Ć itemsize)
Indexing and Slicing
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Basic indexing
arr[0, 1] # 2 ā row 0, col 1
arr[2, 3] # 12
# Slicing
arr[0, :] # First row: [1, 2, 3, 4]
arr[:, 1] # Second column: [2, 6, 10]
arr[0:2, 1:3] # Submatrix: [[2, 3], [6, 7]]
# Boolean indexing
arr[arr > 5] # Values > 5: [6, 7, 8, 9, 10, 11, 12]
# Fancy indexing
arr[[0, 2], :] # Rows 0 and 2
arr[:, [0, 3]] # Columns 0 and 3
Reshaping
arr = np.arange(12)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
# Reshape to 3Ć4
reshaped = arr.reshape(3, 4)
# [[0, 1, 2, 3],
# [4, 5, 6, 7],
# [8, 9, 10, 11]]
# Flatten back
flat = reshaped.flatten() # New copy
flat2 = reshaped.ravel() # View (shared memory)
# Transpose
matrix = np.array([[1, 2], [3, 4]])
matrix.T
# array([[1, 3],
# [2, 4]])
# Add dimension
arr = np.array([1, 2, 3])
arr[np.newaxis, :] # Shape (1, 3)
arr[:, np.newaxis] # Shape (3, 1)
Math Operations
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])
a + b # [11, 22, 33, 44]
a - b # [-9, -18, -27, -36]
a * b # [10, 40, 90, 160]
a / b # [0.1, 0.1, 0.1, 0.1]
a ** 2 # [1, 4, 9, 16]
# Universal functions
np.sqrt(a) # [1, 1.41, 1.73, 2]
np.exp(a) # [e, e², e³, eā“]
np.log(a) # Natural log
np.sin(a) # Sine
# Aggregation
a.sum() # 10
a.mean() # 2.5
a.std() # ~1.12
a.min() # 1
a.max() # 4
a.argmin() # 0 (index of min)
a.argmax() # 3 (index of max)
a.cumsum() # [1, 3, 6, 10]
Broadcasting
NumPy automatically expands arrays of different shapes:
# Add a scalar to every element
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr + 100
# [[101, 102, 103],
# [104, 105, 106]]
# Add a 1D array to every row
row = np.array([10, 20, 30])
arr + row
# [[11, 22, 33],
# [14, 25, 36]]
# Add a column vector to every column
col = np.array([[1], [2]])
arr + col
# [[2, 3, 4],
# [6, 7, 8]]
Performance Comparison
import time
# NumPy vs Python list: sum of squares
n = 1_000_000
# Python list
py_list = list(range(n))
start = time.time()
result = sum(x ** 2 for x in py_list)
print(f"Python list: {time.time() - start:.4f}s")
# ~0.15s
# NumPy
np_arr = np.arange(n)
start = time.time()
result = (np_arr ** 2).sum()
print(f"NumPy: {time.time() - start:.4f}s")
# ~0.005s ā 30Ć faster
Practice
# 1. Create a 5Ć5 array of random integers
# 2. Calculate the mean of each column
# 3. Replace all values > 50 with 0
# 4. Create a 3D array and reshape to 2D
# 5. Use broadcasting to normalize (0-1) each column
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro