Pandas cheatsheet
1. Creating a Pandas Series
import pandas as pd
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s)
2. Creating a Pandas DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda', 'Tom'],
'Age': [28, 24, 35, 32, 28],
'City': ['New York', 'Paris', 'Berlin', 'London', 'New York']}
df = pd.DataFrame(data, index = ["a9", "b5", "d2", "r1", "x6"])
print(df)
3. Reading and Writing to CSV
-
Reading/writing CSV:
df = pd.read_csv('data.csv') df.to_csv('output.csv', index=False)
4. Accessing Rows and Columns (Indexing and Slicing)
-
Accessing rows:
df.iloc[3] # Access by position df.loc['b5'] # Access by index label
-
Multiple indices:
df.loc[["b5", "d2"]] df.iloc[[2, 3]]
-
Slicing rows:
df.iloc[1:4:2] # Slicing from row 1 to 4, with steps of 2
-
Accessing columns:
df['Name'] # Single column -> Series df[['Name', 'Age']] # Multiple columns -> DF
5. statistics
-
Statistical info of columns:
df['Age'].mean() df['Age'].median() df['Age'].sum() df['Age'].min() df['Age'].max() df['Age'].var() df['Age'].std() df['Age'].count() # Count non-null values
6. Filtering Data
-
Single condition: Filter rows based on a single condition.
df[df['Age'] > 30]
-
Filter with multiple conditions: You can combine multiple conditions using
&
(and) or|
(or), and parentheses()
.# Example: Filter rows where Age > 30 and City is 'New York' filtered_df = df[(df['Age'] > 30) & (df['City'] == 'New York')] # Example: Filter rows where Age < 30 or City is 'Paris' filtered_df = df[(df['Age'] < 30) | (df['City'] == 'Paris')]
7. Grouping and Looping Over Groups
-
Grouping by ‘City’ and Looping Over Groups:
grouped = df.groupby('City') for city, group in grouped: print(f"\nCity: {city}") print(group)
8. Common Aggregation Functions
-
Aggregation after
groupby
:grouped['Age'].mean() grouped['Age'].sum() grouped['Age'].count() grouped['Age'].max() grouped['Age'].min() grouped['Age'].median() grouped['Age'].std() grouped['Age'].var()
9. Column Operations: Adding, Multiplying, etc.
-
Arithmetic operations on columns:
# Adding a new column by combining other columns df['Salary'] = [50000, 60000, 75000, 65000, 55000] df['Bonus'] = df['Salary'] * 0.10 # Adding, subtracting, multiplying columns df['Total Compensation'] = df['Salary'] + df['Bonus'] df['Years Until Retirement'] = 65 - df['Age']
10. Sorting Data
-
Sort by a single column:
df.sort_values(by='Age')
-
Sort by multiple columns:
df.sort_values(by=['Age', 'Salary'], ascending=[True, False])