.loc and .iloc.pandas.DataFrame.mean and pandas.DataFrame.stdpandas.DataFrame.plot.concat.groupby/agg🔧 Activities spaced throughout the session
DataFrame, a labelled table-like data structure.numpy and matplotlib, adding labels, missing-data handling, reshaping, grouping, and other functionality that simplify data manipulation and analysis.See the pandas docs for more information.
We have used header=None above because there is no header row in our data.
Set DataFrame.columns to assign human-readable column names.
List Comprehensions
List comprehensions are a concise way to create lists. They replace longer for loops and can make code more readable. A list comprehension has the basic form [expression for item in iterable].
An equivalent for loop for the list comprehension used above would be:
F Strings
Python f-strings are a way to format strings by embedding variables in strings using {}
DataFrame.info() quickly shows wrong dtypes or unexpected missing values.
Use DataFrame.describe() to get summary statistics about data.
Tip
Clear, human-readable names avoid off-by-one mistakes.
Use single brackets for one column, double brackets for multiple columns.
Use .iloc (by position) and .loc (by label).
Use boolean masks to filter rows.
Tip
[].loc[...] = valueaxis=0 → down rows; axis=1 → across columns.df[df["day_0"]>0]["day_1"]=...Compute per-day mean/min/max, then plot.
Tip
mean, std, idxmax, etc.DataFrame.plot() is a thin wrapper around Matplotlib—fast for exploration.We can also use assign to create new columns. This is slightly more verbose, but has the advantage of not modifying df until the end. It is preferred for more declarative code.
Vectorised where possible:
Row-wise if necessary:
Tip
groupby(...).agg(...) is the standard summarization pattern.
Notice the row labels (patient IDs) are preserved and sorted along with the data.
This workflow loads multiple files, labels them, combines, summarises, and exports. Here it’s written slowly with excessive comments so each step is clear.
reset_index ensures each patient has an explicit identifier.source preserves provenance across files.groupby("source")[day_cols].agg(...) lets you compute many stats in one call.to_csv makes results reusable in other tools.isna, dropna, type checks).groupby plus agg is the main pattern for grouped statistics.head(), info(), and summary statistics.This module introduces pandas for tabular data workflows. Learners practice selecting columns, filtering rows, handling missing values, grouping data, and exporting clean summaries for downstream analysis.
The concepts in this module connect directly to practical data handling and exploration in Python.
| Submodule | Python Connection | Why It Matters |
|---|---|---|
| DataFrame Basics | pandas DataFrame | DataFrames are the core structure for real-world tables. |
| Filtering and Selection | Indexing and selecting data | Precise selection keeps analyses targeted and correct. |
| Grouped Summaries | groupby |
Group-wise statistics are essential for comparison tasks. |
Attribution
This lesson is derived from materials developed by the Software Carpentry project.
The original content is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://github.com/swcarpentry/python-novice-inflammation/blob/main/LICENSE.md