Skip to content

NumPy Tutorial — Arrays and Operations

DodaTech 4 min read

In this tutorial, you'll learn about NumPy Tutorial. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Use NumPy for numerical computing — create arrays, perform element-wise operations, use broadcasting, and leverage NumPy's vectorization for fast computations.

Why It Matters

NumPy is the foundation of Python Data Science. pandas, Scikit-Learn, TensorFlow, and PyTorch all build on NumPy arrays. Understanding NumPy is essential.

Real-World Use

Image processing (images are NumPy arrays), financial calculations on stock price arrays, or computing statistics on millions of data points efficiently.

What is NumPy?

NumPy provides the ndarray (N-dimensional array) — faster and more memory-efficient than Python lists for numerical data.

import numpy as np

# Python list
py_list = [1, 2, 3, 4, 5]
doubled = [x * 2 for x in py_list]   # Slow for large data

# NumPy array
np_array = np.array([1, 2, 3, 4, 5])
doubled = np_array * 2               # Fast, no loop needed

Creating Arrays

# From a list
arr = np.array([1, 2, 3, 4, 5])

# 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])
# array([[1, 2, 3],
#        [4, 5, 6]])

# Zeros and ones
zeros = np.zeros((3, 4))     # 3 rows, 4 cols, all 0
ones = np.ones((2, 3))       # All 1

# Range
range_arr = np.arange(0, 10, 2)   # [0, 2, 4, 6, 8]

# Linear spacing
linear = np.linspace(0, 1, 5)     # [0, 0.25, 0.5, 0.75, 1.0]

# Random
random = np.random.rand(3, 3)     # Uniform [0, 1)
normal = np.random.randn(1000)    # Normal distribution
integers = np.random.randint(0, 100, 10)  # Random integers

# Identity matrix
identity = np.eye(3)
# array([[1., 0., 0.],
#        [0., 1., 0.],
#        [0., 0., 1.]])

Array Attributes

arr = np.array([[1, 2, 3], [4, 5, 6]])

arr.shape       # (2, 3) — 2 rows, 3 columns
arr.ndim        # 2 — number of dimensions
arr.size        # 6 — total elements
arr.dtype       # int64 — data type
arr.itemsize    # 8 — bytes per element
arr.nbytes      # 48 — total bytes (size Ɨ itemsize)

Indexing and Slicing

arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

# Basic indexing
arr[0, 1]        # 2 — row 0, col 1
arr[2, 3]        # 12

# Slicing
arr[0, :]        # First row: [1, 2, 3, 4]
arr[:, 1]        # Second column: [2, 6, 10]
arr[0:2, 1:3]    # Submatrix: [[2, 3], [6, 7]]

# Boolean indexing
arr[arr > 5]     # Values > 5: [6, 7, 8, 9, 10, 11, 12]

# Fancy indexing
arr[[0, 2], :]   # Rows 0 and 2
arr[:, [0, 3]]   # Columns 0 and 3

Reshaping

arr = np.arange(12)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

# Reshape to 3Ɨ4
reshaped = arr.reshape(3, 4)
# [[0, 1, 2, 3],
#  [4, 5, 6, 7],
#  [8, 9, 10, 11]]

# Flatten back
flat = reshaped.flatten()     # New copy
flat2 = reshaped.ravel()      # View (shared memory)

# Transpose
matrix = np.array([[1, 2], [3, 4]])
matrix.T
# array([[1, 3],
#        [2, 4]])

# Add dimension
arr = np.array([1, 2, 3])
arr[np.newaxis, :]      # Shape (1, 3)
arr[:, np.newaxis]      # Shape (3, 1)

Math Operations

a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

a + b       # [11, 22, 33, 44]
a - b       # [-9, -18, -27, -36]
a * b       # [10, 40, 90, 160]
a / b       # [0.1, 0.1, 0.1, 0.1]
a ** 2      # [1, 4, 9, 16]

# Universal functions
np.sqrt(a)     # [1, 1.41, 1.73, 2]
np.exp(a)      # [e, e², e³, e⁓]
np.log(a)      # Natural log
np.sin(a)      # Sine

# Aggregation
a.sum()        # 10
a.mean()       # 2.5
a.std()        # ~1.12
a.min()        # 1
a.max()        # 4
a.argmin()     # 0 (index of min)
a.argmax()     # 3 (index of max)
a.cumsum()     # [1, 3, 6, 10]

Broadcasting

NumPy automatically expands arrays of different shapes:

# Add a scalar to every element
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr + 100
# [[101, 102, 103],
#  [104, 105, 106]]

# Add a 1D array to every row
row = np.array([10, 20, 30])
arr + row
# [[11, 22, 33],
#  [14, 25, 36]]

# Add a column vector to every column
col = np.array([[1], [2]])
arr + col
# [[2, 3, 4],
#  [6, 7, 8]]

Performance Comparison

import time

# NumPy vs Python list: sum of squares
n = 1_000_000

# Python list
py_list = list(range(n))
start = time.time()
result = sum(x ** 2 for x in py_list)
print(f"Python list: {time.time() - start:.4f}s")
# ~0.15s

# NumPy
np_arr = np.arange(n)
start = time.time()
result = (np_arr ** 2).sum()
print(f"NumPy: {time.time() - start:.4f}s")
# ~0.005s — 30Ɨ faster

Practice

# 1. Create a 5Ɨ5 array of random integers
# 2. Calculate the mean of each column
# 3. Replace all values > 50 with 0
# 4. Create a 3D array and reshape to 2D
# 5. Use broadcasting to normalize (0-1) each column

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro