
Pandas is a powerful data manipulation library in Python that simplifies working with structured data. In this article, we’ll walk through three crucial topics that every data enthusiast or professional must master:
Reading, Writing, and Selecting Data with Pandas
Data Cleaning and Handling Missing Values in Pandas
Aggregation, Grouping, and Combining Data in Pandas
Let’s dive right in! 🚀
The most common format for structured data is CSV (comma-separated values). Pandas provides the read_csv() function for that.
import pandas as pd
# Reading a CSV file
df = pd.read_csv("data.csv")
print(df.head()) # First 5 rowsYou can also read from Excel, JSON, and SQL databases:
# Excel
df = pd.read_excel("data.xlsx")
# JSON
df = pd.read_json("data.json")Save your DataFrame to various formats using:
# Save to CSV
df.to_csv("output.csv", index=False)
# Save to Excel
df.to_excel("output.xlsx", index=False)Accessing Columns
df['column_name']
df[['col1', 'col2']]Accessing Rows
df.loc[0] # By label/index
df.iloc[0] # By positionFiltering Rows
# All rows where age > 25
df[df['age'] > 25]df.isnull().sum()df.dropna(inplace=True) # Drop rows with any missing valuesYou can also drop rows/columns selectively:
df.dropna(subset=['column1'], inplace=True)# Fill with a constant
df.fillna(0, inplace=True)
# Fill with mean of a column
df['salary'].fillna(df['salary'].mean(), inplace=True)# Replace specific values
df.replace("N/A", pd.NA, inplace=True)import numpy as np
data = {
'name': ['Alice', 'Bob', 'Charlie', np.nan],
'age': [25, np.nan, 30, 22],
'salary': [50000, 60000, np.nan, 40000]
}
df = pd.DataFrame(data)
df.fillna({'name': 'Unknown', 'age': df['age'].mean(), 'salary': df['salary'].median()}, inplace=True)
print(df)# Get summary statistics
df.describe()
# Mean of a column
df['salary'].mean()# Group by department and calculate average salary
df.groupby('department')['salary'].mean()You can also apply multiple aggregations:
df.groupby('department')['salary'].agg(['mean', 'max', 'min'])Concatenation
pd.concat([df1, df2], axis=0)Merging
pd.merge(df1, df2, on='employee_id', how='inner')Joining (on index)
df1.join(df2, how='outer')df_sales = pd.DataFrame({
'store': ['A', 'B', 'C'],
'sales': [1000, 1500, 2000]
})
df_region = pd.DataFrame({
'store': ['A', 'B', 'C'],
'region': ['North', 'East', 'West']
})
# Merge both DataFrames
df_merged = pd.merge(df_sales, df_region, on='store')
print(df_merged)
# Group by region and get total sales
print(df_merged.groupby('region')['sales'].sum())Pandas is a must-know for any data analyst or backend developer dealing with structured data. Mastering these concepts — reading/writing data, cleaning it, and aggregating — will significantly boost your productivity and understanding of data pipelines.
1
16
0