Jugal kishore

Jun 14, 2025 • 2 min read

📊 Mastering Pandas: A Practical Guide to Reading, Cleaning, and Aggregating Data

📊 Mastering Pandas: A Practical Guide to Reading, Cleaning, and Aggregating Data

Pandas is a powerful data manipulation library in Python that simplifies working with structured data. In this article, we’ll walk through three crucial topics that every data enthusiast or professional must master:

  1. Reading, Writing, and Selecting Data with Pandas

  2. Data Cleaning and Handling Missing Values in Pandas

  3. Aggregation, Grouping, and Combining Data in Pandas

Let’s dive right in! 🚀


🔍 1. Reading, Writing, and Selecting Data with Pandas: Practical Guide

✅ Reading Data

The most common format for structured data is CSV (comma-separated values). Pandas provides the read_csv() function for that.

import pandas as pd

# Reading a CSV file
df = pd.read_csv("data.csv")
print(df.head())  # First 5 rows

You can also read from Excel, JSON, and SQL databases:

# Excel
df = pd.read_excel("data.xlsx")

# JSON
df = pd.read_json("data.json")

✅ Writing Data

Save your DataFrame to various formats using:

# Save to CSV
df.to_csv("output.csv", index=False)

# Save to Excel
df.to_excel("output.xlsx", index=False)

✅ Selecting Data

Accessing Columns

df['column_name']
df[['col1', 'col2']]

Accessing Rows

df.loc[0]     # By label/index
df.iloc[0]    # By position

Filtering Rows

# All rows where age > 25
df[df['age'] > 25]

🧹 2. Data Cleaning and Handling Missing Values in Pandas

✅ Identifying Missing Data

df.isnull().sum()

✅ Dropping Missing Data

df.dropna(inplace=True)  # Drop rows with any missing values

You can also drop rows/columns selectively:

df.dropna(subset=['column1'], inplace=True)

✅ Filling Missing Data

# Fill with a constant
df.fillna(0, inplace=True)

# Fill with mean of a column
df['salary'].fillna(df['salary'].mean(), inplace=True)

✅ Replacing Data

# Replace specific values
df.replace("N/A", pd.NA, inplace=True)

🧪 Example:

import numpy as np

data = {
    'name': ['Alice', 'Bob', 'Charlie', np.nan],
    'age': [25, np.nan, 30, 22],
    'salary': [50000, 60000, np.nan, 40000]
}

df = pd.DataFrame(data)
df.fillna({'name': 'Unknown', 'age': df['age'].mean(), 'salary': df['salary'].median()}, inplace=True)
print(df)

📊 3. Aggregation, Grouping, and Combining Data in Pandas Explained

✅ Aggregation

# Get summary statistics
df.describe()

# Mean of a column
df['salary'].mean()

✅ Grouping Data

# Group by department and calculate average salary
df.groupby('department')['salary'].mean()

You can also apply multiple aggregations:

df.groupby('department')['salary'].agg(['mean', 'max', 'min'])

✅ Combining DataFrames

Concatenation

pd.concat([df1, df2], axis=0)

Merging

pd.merge(df1, df2, on='employee_id', how='inner')

Joining (on index)

df1.join(df2, how='outer')

🧪 Example:

df_sales = pd.DataFrame({
    'store': ['A', 'B', 'C'],
    'sales': [1000, 1500, 2000]
})

df_region = pd.DataFrame({
    'store': ['A', 'B', 'C'],
    'region': ['North', 'East', 'West']
})

# Merge both DataFrames
df_merged = pd.merge(df_sales, df_region, on='store')
print(df_merged)

# Group by region and get total sales
print(df_merged.groupby('region')['sales'].sum())

✍️ Final Thoughts

Pandas is a must-know for any data analyst or backend developer dealing with structured data. Mastering these concepts — reading/writing data, cleaning it, and aggregating — will significantly boost your productivity and understanding of data pipelines.

Join Jugal on Peerlist!

Join amazing folks like Jugal and thousands of other builders on Peerlist.

peerlist.io/

It’s available... this username is available! 😃

Claim your username before it's too late!

This username is already taken, you’re a little late.😐

1

16

0