Tutorial

Python Pandas Module Tutorial

Beginner-friendly pandas tutorial covering installation, Series/DataFrame, CSV import, inspection, selection (iloc/loc), grouping, merging, concatenation, and common operations with examples.

Drake Nguyen

Founder · System Architect

• Feb. 24, 2026, 11:55 a.m. • 3 min read

Overview: pandas tutorial for beginners

This pandas tutorial introduces the core features of the Python pandas library for working with tabular data. You'll learn how to install pandas, the primary data structures (Series and DataFrame), importing CSV files, inspecting and manipulating data, and common data-wrangling operations like groupby, merge, and concat. Examples use idiomatic python pandas code so you can follow along.

Installing pandas

To start using pandas, install it with your preferred package manager. For pip-based installs (recommended for many users):

python -m pip install pandas

For users with conda environments:

conda install pandas

If you need to install a specific Python wheel or are using a particular interpreter, the command above explains how to install pandas in python using pip. After installation, import pandas and NumPy in your script:

import pandas as pd
import numpy as np

Key pandas data structures

pandas Series: a one-dimensional labeled array for homogeneous data (similar to a column).
pandas DataFrame: a two-dimensional, tabular structure with labeled axes (rows and columns); supports heterogeneous column types.
Older panels for 3-D data exist but are rarely used; most modern workflows rely on DataFrame and MultiIndex structures.

Creating a DataFrame

You can construct a DataFrame from lists, dictionaries, NumPy arrays, or other DataFrames. The constructor signature is roughly:

pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Example: create a DataFrame from a list of dictionaries:

df = pd.DataFrame([
    {"State": "Andhra Pradesh", "Capital": "Hyderabad", "Literacy": 89, "AvgHighC": 33},
    {"State": "Maharashtra", "Capital": "Mumbai", "Literacy": 77, "AvgHighC": 30},
    {"State": "Karnataka", "Capital": "Bengaluru", "Literacy": 82, "AvgHighC": 29}
])
print(df)

Read CSV into DataFrame

Loading tabular data from a CSV file into a DataFrame is common. Use pandas read_csv to convert a CSV to a DataFrame:

# pandas read csv into dataframe example
data = pd.read_csv('cities.csv')
print(data.head())

Inspecting a DataFrame

When working with large datasets, these helpers are essential:

df.head(n) — first n rows
df.tail(n) — last n rows
df.dtypes — column data types
df.index, df.columns, df.values — index, column labels, and raw values
df.describe() — statistical summary (count, mean, std, min, max, percentiles)

Sorting and selecting

Sort rows by a column:

df.sort_values('Literacy', ascending=False)

Select columns and rows:

# single column (returns Series)
capitals = df['Capital']

# multiple columns (returns DataFrame)
subset = df[['State', 'Capital']]

# row slicing by integer position
first_three = df[0:3]

# select by integer position (iloc) and by label (loc)
row2 = df.iloc[1]        # second row by position
cell = df.iloc[1, 1]     # second row, second column by position
row_by_label = df.loc["row_label"]  # select by index label if present

pandas iloc vs loc explained

iloc uses integer-based indexing (like Python lists), while loc uses label-based indexing (row/column names). Use iloc when you need positional selection and loc when working with meaningful labels.

Filtering rows

Filter rows with boolean conditions or membership tests:

# rows where Literacy > 90
high_lit = df[df['Literacy'] > 90]

# filter multiple values using isin
sel = df[df['State'].isin(['Karnataka', 'Tamil Nadu'])]

Renaming, adding and deleting columns

Rename columns in place:

df.rename(columns={'Literacy': 'Literacy_percentage'}, inplace=True)

Add a column from a Series or scalar:

df['Runrate'] = pd.Series([80, 70, 60], index=df.index)
# or set a scalar for all rows
# df['Source'] = 'survey'

Delete columns:

del df['Runrate']
# or
# df.pop('Runrate')

Data wrangling: merge, groupby, concat

pandas offers powerful utilities for combining and aggregating data:

pd.merge(): database-style joins between two DataFrames on one or more keys.
df.groupby(): split data into groups, apply aggregate functions, and combine the results.
pd.concat(): append DataFrames vertically or horizontally.

# merge two dataframes on a column
merged = pd.merge(df1, df2, on='Employee_id')

# groupby example and aggregation
grouped = df2.groupby('Employee_name').size()   # or .agg({'Salary':'sum'})

# concatenate vertically
combined = pd.concat([df1, df2], ignore_index=True)

Create DataFrame from dict of Series

You can build a DataFrame from a dictionary where each value is a Series. Indexes align automatically and missing entries become NaN:

d = {
    'Matches_played': pd.Series([400, 300, 200], index=['Sachin','Kohli','Raina']),
    'Position': pd.Series([1,2,3,4], index=['Sachin','Kohli','Raina','Dravid'])
}
df_series = pd.DataFrame(d)
print(df_series)

Column operations and inspecting internals

Use attributes to inspect and manipulate internals:

df.columns — column labels
df.dtypes — column types
df.values — ndarray of underlying values

Tip: When experimenting interactively, call df.head(), df.info() and df.describe() often to understand the shape and types of your data before transforming it.

Next steps

This pandas tutorial covers the essentials to get started with data analysis in Python: installing pandas, creating and inspecting DataFrames, indexing and selection (iloc vs loc), filtering, basic statistics, and data-wrangling with merge, groupby, and concat. As you progress, explore time-series methods, groupby aggregation strategies, and performance tips (using categorical dtypes, vectorized operations and avoiding Python loops).

References

Consult the official pandas documentation for comprehensive guides and API details to deepen your knowledge.