Pandas Merge, Join, and Concatenate — Combining DataFrames
In this tutorial, you'll learn about Pandas Merge, Join, and Concatenate. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You'll Learn
Combine multiple DataFrames using merge, join, and concat — SQL-style joins (inner, outer, left, right), concatenating rows and columns, and handling mismatched data.
Why It Matters
Real-world data comes from multiple sources. Merging and joining is how you combine customer tables with orders, or sales data with product information.
Real-World Use
Joining a customer table with an orders table on customer_id, appending monthly sales reports into one DataFrame, or combining product details with inventory levels.
Concatenation — Stacking DataFrames
pd.concat() stacks DataFrames vertically (rows) or horizontally (columns):
Vertical (Appending Rows)
import pandas as pd
df1 = pd.DataFrame({"id": [1, 2], "name": ["Alice", "Bob"]})
df2 = pd.DataFrame({"id": [3, 4], "name": ["Charlie", "Diana"]})
# Stack rows
combined = pd.concat([df1, df2], ignore_index=True)
# id name
# 0 1 Alice
# 1 2 Bob
# 2 3 Charlie
# 3 4 Diana
Horizontal (Adding Columns)
df_left = pd.DataFrame({"id": [1, 2, 3], "name": ["Alice", "Bob", "Charlie"]})
df_right = pd.DataFrame({"age": [25, 30, 35], "salary": [50000, 60000, 70000]})
combined = pd.concat([df_left, df_right], axis=1)
# id name age salary
# 0 1 Alice 25 50000
# 1 2 Bob 30 60000
# 2 3 Charlie 35 70000
Merge — SQL-Style Joins
pd.merge() combines DataFrames based on a key column:
customers = pd.DataFrame({
"customer_id": [1, 2, 3, 4],
"name": ["Alice", "Bob", "Charlie", "Diana"],
"city": ["NYC", "LA", "Chicago", "NYC"],
})
orders = pd.DataFrame({
"order_id": [101, 102, 103, 104],
"customer_id": [1, 2, 2, 5],
"amount": [100, 200, 150, 300],
})
Inner Join
Only rows with matching keys in both DataFrames:
result = pd.merge(customers, orders, on="customer_id", how="inner")
# customer_id name city order_id amount
# 0 1 Alice NYC 101 100
# 1 2 Bob LA 102 200
# 2 2 Bob LA 103 150
Left Join
All rows from left (customers), matching rows from right:
result = pd.merge(customers, orders, on="customer_id", how="left")
# customer_id name city order_id amount
# 0 1 Alice NYC 101.0 100.0
# 1 2 Bob LA 102.0 200.0
# 2 2 Bob LA 103.0 150.0
# 3 3 Charlie Chicago NaN NaN
# 4 4 Diana NYC NaN NaN
Right Join
All rows from right, matching rows from left:
result = pd.merge(customers, orders, on="customer_id", how="right")
# customer_id name city order_id amount
# 0 1.0 Alice NYC 101 100
# 1 2.0 Bob LA 102 200
# 2 2.0 Bob LA 103 150
# 3 5.0 NaN NaN 104 300
Outer Join
All rows from both DataFrames:
result = pd.merge(customers, orders, on="customer_id", how="outer")
# customer_id name city order_id amount
# 0 1.0 Alice NYC 101.0 100.0
# 1 2.0 Bob LA 102.0 200.0
# 2 2.0 Bob LA 103.0 150.0
# 3 3.0 Charlie Chicago NaN NaN
# 4 4.0 Diana NYC NaN NaN
# 5 5.0 NaN NaN 104.0 300.0
Merge on Different Column Names
customers = pd.DataFrame({
"cust_id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
})
orders = pd.DataFrame({
"order_id": [101, 102, 103],
"customer_id": [1, 2, 4],
"amount": [100, 200, 300],
})
result = pd.merge(
customers, orders,
left_on="cust_id",
right_on="customer_id",
how="left",
)
Join — Merge on Index
df1 = pd.DataFrame({"name": ["Alice", "Bob"]}, index=[1, 2])
df2 = pd.DataFrame({"age": [25, 30]}, index=[1, 2])
result = df1.join(df2)
# name age
# 1 Alice 25
# 2 Bob 30
Handling Overlapping Columns
# If both DataFrames have the same column name
df1 = pd.DataFrame({"id": [1, 2], "value": [10, 20]})
df2 = pd.DataFrame({"id": [1, 2], "value": [30, 40]})
# Use suffixes
result = pd.merge(df1, df2, on="id", suffixes=("_left", "_right"))
# id value_left value_right
# 0 1 10 30
# 1 2 20 40
Comparison Table
| Method | What It Does | SQL Equivalent |
|---|---|---|
concat(axis=0) |
Stack rows | UNION ALL |
concat(axis=1) |
Stack columns | — |
merge(how="inner") |
Only matching keys | INNER JOIN |
merge(how="left") |
Keep all left rows | LEFT JOIN |
merge(how="right") |
Keep all right rows | RIGHT JOIN |
merge(how="outer") |
Keep all rows | FULL OUTER JOIN |
join() |
Merge on index | — |
Practice
# 1. Create two DataFrames with a shared key column
# 2. Perform inner, left, right, and outer joins
# 3. Observe which rows are kept and which get NaN
# 4. Use concat to combine monthly data
# 5. Handle overlapping column names with suffixes
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro