đ§ Activities spaced throughout the session
Every programmer encounters errors, both those who are just beginning, and those who have been programming for years. Encountering errors and exceptions can be very frustrating at times, and can make coding feel like a hopeless endeavour. However, understanding what the different types of errors are and when you are likely to encounter them can help a lot. Once you know why you get certain types of errors, they become much easier to fix.
Errors in Python have a very specific form, called a traceback. Letâs examine one:
This particular traceback has two levels. You can determine the number of levels by looking for the number of arrows on the left hand side. In this case:
The first shows code from the cell above, with an arrow pointing to Line 11 (which is favorite_ice_cream()).
The second shows some code in the function favorite_ice_cream, with an arrow pointing to Line 9 (which is print(ice_creams[3])).
The last level is the actual place where the error occurred. The other level(s) show what function the program executed to get to the next level down. So, in this case, the program first performed a function call to the function favorite_ice_cream. Inside this function, the program encountered an error on Line 6, when it tried to run the code print(ice_creams[3]).
Long Tracebacks
Sometimes, you might see a traceback that is very long -- sometimes they might even be 20 levels deep! This can make it seem like something horrible happened, but the length of the error message does not reflect severity, rather, it indicates that your program called many functions before it encountered the error. Most of the time, the actual place where the error occurred is at the bottom-most level, so you can skip down the traceback to the bottom.
So what error did the program actually encounter? In the last line of the traceback, Python helpfully tells us the category or type of error (in this case, it is an IndexError) and a more detailed error message (in this case, it says âlist index out of rangeâ).
If you encounter an error and donât know what it means, it is still important to read the traceback closely. That way, if you fix the error, but encounter a new one, you can tell that the error changed. Additionally, sometimes knowing where the error occurred is enough to fix it, even if you donât entirely understand the message.
If you do encounter an error you donât recognize, try looking at the official documentation on errors. However, note that you may not always be able to find the error there, as it is possible to create custom errors. In that case, hopefully the custom error message is informative enough to help you figure out what went wrong.
Read the Python code and the resulting traceback below, and answer the following questions:
print_messageIndexErrorlist index out of range You can then infer that 7 is not the right index to use with messages.Better errors on newer Pythons
Newer versions of Python have improved error printouts. If you are debugging errors, it is often helpful to use the latest Python version, even if you support older versions of Python.
When you forget a colon at the end of a line, accidentally add one space too many when indenting under an if statement, or forget a parenthesis, you will encounter a syntax error. This means that Python couldnât figure out how to read your program. This is similar to forgetting punctuation in English: for example, this text is difficult to read there is no punctuation there is also no capitalization why is this hard because you have to figure out where each sentence ends you also have to figure out where each sentence begins to some extent it might be ambiguous if there should be a sentence break or not.
People can typically figure out what is meant by text with no punctuation, but people are much smarter than computers. If Python doesnât know how to read the program, it will give up and inform you with an error. For example:
Here, Python tells us that there is a SyntaxError on line 1, and even puts a little arrow in the place where there is an issue. In this case the problem is that the function definition is missing a colon at the end.
Actually, the function above has two issues with syntax. If we fix the problem with the colon, we see that there is also an IndentationError, which means that the lines in the function definition do not all have the same indentation:
Both SyntaxError and IndentationError indicate a problem with the syntax of your program, but an IndentationError is more specific: it always means that there is a problem with how your code is indented.
Tabs and Spaces
Some indentation errors are harder to spot than others. In particular, mixing spaces and tabs can be difficult to spot because they are both whitespace. In the example below, the first two lines in the body of the function some_function are indented with tabs, while the third line â with spaces. If youâre working in a Jupyter notebook, be sure to copy and paste this example rather than trying to type it in manually because Jupyter automatically replaces tabs with spaces.
Visually it is impossible to spot the error. Fortunately, Python does not allow you to mix tabs and spaces.
Another very common type of error is called a NameError, and occurs when you try to use a variable that does not exist. For example:
Variable name errors come with some of the most informative error messages, which are usually of the form âname âthe_variable_nameâ is not definedâ.
Why does this error message occur? Thatâs a harder question to answer, because it depends on what your code is supposed to do. However, there are a few very common reasons why you might have an undefined variable. The first is that you meant to use a string, but forgot to put quotes around it:
The second reason is that you might be trying to use a variable that does not yet exist. In the following example, count should have been defined (e.g., with count = 0) before the for loop:
Finally, the third possibility is that you made a typo when you were writing your code. Letâs say we fixed the error above by adding the line Count = 0 before the for loop. Frustratingly, this actually does not fix the error. Remember that variables are case-sensitive, so the variable count is different from Count. We still get the same error, because we still have not defined count:
Next up are errors having to do with containers (like lists and strings) and the items within them. If you try to access an item in a list or a string that does not exist, then you will get an error. This makes sense: if you asked someone what day they would like to get coffee, and they answered âcaturdayâ, you might be a bit annoyed. Python gets similarly annoyed if you try to ask it for an item that doesnât exist:
Here, Python is telling us that there is an IndexError in our code, meaning we tried to access a list index that did not exist.
The last type of error weâll cover today are those associated with reading and writing files: FileNotFoundError. If you try to read a file that does not exist, you will receive a FileNotFoundError telling you so. If you attempt to write to a file that was opened read-only, Python 3 returns an UnsupportedOperationError. More generally, problems with input and output manifest as OSErrors, which may show up as a more specific subclass; you can see the list in the Python docs. They all have a unique UNIX errno, which is you can see in the error message.
One reason for receiving this error is that you specified an incorrect path to the file. For example, if I am currently in a folder called myproject, and I have a file in myproject/writing/myfile.txt, but I try to open myfile.txt, this will fail. The correct path would be writing/myfile.txt. It is also possible that the file name or its path contains a typo.
A related issue can occur if you use the âreadâ flag instead of the âwriteâ flag. Python will not give you an error if you try to open a file for writing when the file does not exist. However, if you meant to open a file for reading, but accidentally opened it for writing, and then try to read from it, you will get an UnsupportedOperation error telling you that the file was not opened for reading:
These are the most common errors with files, though many others exist. If you get an error that youâve never seen before, searching the Internet for that error type often reveals common reasons why you might get that error.
SyntaxError or an IndentationError?SyntaxError for missing (): at end of first line, IndentationError for mismatch between second and third lines. A fixed version is:
NameError do you think this is? In other words, is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?3 NameErrors for number being misspelled, for message not defined, and for a not being in quotes.
Fixed version:
IndexError; the last entry is seasons[3], so seasons[4] doesnât make sense. A fixed version is:
Our previous lessons have introduced the basic tools of programming: variables and lists, file I/O, loops, conditionals, and functions. What they havenât done is show us how to tell whether a program is getting the right answer, and how to tell if itâs still getting the right answer as we make changes to it.
To achieve that, we need to:
The good news is, doing these things will speed up our programming, not slow it down. As in real carpentry â the kind done with lumber â the time saved by measuring carefully before cutting a piece of wood is much greater than the time that measuring takes.
The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. This is called defensive programming, and the most common way to do it is to add assertions to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertionâs condition. If itâs true, Python does nothing, but if itâs false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isnât positive:
đĄď¸ Assertions catch mistakes early and make code safer.
Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80â90% are working correctly. Broadly speaking, assertions fall into three categories:
A precondition is something that must be true at the start of a function in order for it to work correctly.
A postcondition is something that the function guarantees is true when it finishes.
An invariant is something that is always true at a particular point inside a piece of code.
For example, suppose we are representing rectangles using a tuple of four coordinates (x0, y0, x1, y1), representing the lower left and upper right corners of the rectangle. In order to do some calculations, we need to normalize the rectangle so that the lower left corner is at the origin and the longest side is 1.0 units long. This function does that, but checks that its input is correctly formatted and that its result makes sense:
The preconditions on lines 6, 8, and 9 catch invalid inputs:
The post-conditions on lines 20 and 21 help us catch bugs by telling us when our calculations might have been incorrect. For example, if we normalize a rectangle that is taller than it is wide everything seems OK:
but if we normalize one thatâs wider than it is tall, the assertion is triggered:
Re-reading our function, we realize that line 14 should divide dy by dx rather than dx by dy. In a Jupyter notebook, you can display line numbers by typing Ctrl+M followed by L. If we had left out the assertion at the end of the function, we would have created and returned something that had the right shape as a valid answer, but wasnât. Detecting and debugging that would almost certainly have taken more time in the long run than writing the assertion.
But assertions arenât just about catching errors: they also help people understand programs. Each assertion gives the person reading the program a chance to check (consciously or otherwise) that their understanding matches what the code is doing.
Most good programmers follow two rules when adding assertions to their code. The first is, fail early, fail often. The greater the distance between when and where an error occurs and when itâs noticed, the harder the error will be to debug, so good code catches mistakes as early as possible.
The second rule is, turn bugs into assertions or tests. Whenever you fix a bug, write an assertion that catches the mistake should you make it again. If you made a mistake in a piece of code, the odds are good that you have made other mistakes nearby, or will make the same mistake (or a related one) the next time you change it. Writing assertions to check that you havenât regressed (i.e., havenât re-introduced an old problem) can save a lot of time in the long run, and helps to warn people who are reading the code (including your future self) that this bit is tricky.
An assertion checks that something is true at a particular point in the program. The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when itâs given a particular input. For example, suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include:
Most novice programmers would solve this problem like this:
range_overlap.This clearly works â after all, thousands of scientists are doing it right now â but thereâs a better way:
range_overlap function that should pass those tests.range_overlap produces any wrong answers, fix it and re-run the test functions.Writing the tests before writing the function they exercise is called test-driven development (TDD). Its advocates believe it produces better code faster because:
We start by defining an empty function range_overlap:
Here are three test statements for range_overlap:
The error is actually reassuring: we havenât implemented any logic into range_overlap yet, so if the tests passed, it would indicate that weâve written an entirely ineffective test.
And as a bonus of writing these tests, weâve implicitly defined what our input and output look like: we expect a list of pairs as input, and produce a single pair as output.
Something important is missing, though. We donât have any tests for the case where the ranges donât overlap at all:
What should range_overlap do in this case: fail with an error message, produce a special value like (0.0, 0.0) to signal that thereâs no overlap, or something else? Any actual implementation of the function will do one of these things; writing the tests first helps us figure out which is best before weâre emotionally invested in whatever we happened to write before we realized there was an issue.
And what about this case?
Do two segments that touch at their endpoints overlap or not? Mathematicians usually say âyesâ, but engineers usually say ânoâ. The best answer is âwhatever is most useful in the rest of our programâ, but again, any actual implementation of range_overlap is going to do something, and whatever it is ought to be consistent with what it does when thereâs no overlap at all.
Since weâre planning to use the range this function returns as the X axis in a time series chart, we decide that:
None when thereâs no overlap.None is built into Python, and means ânothing hereâ. (Other languages often call the equivalent value null or nil). With that decision made, we can finish writing our last two tests:
Again, we get an error because we havenât written our function, but weâre now ready to do so:
Take a moment to think about why we calculate the left endpoint of the overlap as the maximum of the input left endpoints, and the overlap right endpoint as the minimum of the input right endpoints. Weâd now like to re-run our tests, but theyâre scattered across three different cells. To make running them easier, letâs put them all in a function:
We can now test range_overlap with a single function call:
The first test that was supposed to produce None fails, so we know something is wrong with our function. We donât know whether the other tests passed or failed because Python halted the program as soon as it spotted the first error. Still, some information is better than none, and if we trace the behavior of the function with that input, we realize that weâre initializing max_left and min_right to 0.0 and 1.0 respectively, regardless of the input values. This violates another important rule of programming: always initialize from data.
Suppose you are writing a function called average that calculates the average of the numbers in a NumPy array. What pre-conditions and post-conditions would you write for it? Compare your answer to your neighborâs: can you think of a function that will pass your tests but not his/hers or vice versa?
Given a sequence of a number of cars, the function get_total_cars returns the total number of cars.
Explain in words what the assertions in this function check, and for each one, give an example of input that will make that assertion fail.
values is not empty. An empty sequence such as [] will make it fail.[1, 2, 'c', 3] will make it fail.[-10, 2, 3] will make it fail.Once testing has uncovered problems, the next step is to fix them. Many novices do this by making more-or-less random changes to their code until it seems to produce the right answer, but thatâs very inefficient (and the result is usually only correct for the one case theyâre testing). The more experienced a programmer is, the more systematically they debug, and most follow some variation on the rules explained below.
The first step in debugging something is to know what itâs supposed to do. âMy program doesnât workâ isnât good enough: in order to diagnose and fix problems, we need to be able to tell correct output from incorrect. If we can write a test case for the failing case â i.e., if we can assert that with these inputs, the function should produce that result â then weâre ready to start debugging. If we canât, then we need to figure out how weâre going to know when weâve fixed things.
But writing test cases for scientific software is frequently harder than writing test cases for commercial applications, because if we knew what the output of the scientific code was supposed to be, we wouldnât be running the software: weâd be writing up our results and moving on to the next program. In practice, scientists tend to do the following:
Test with simplified data. Before doing statistics on a real data set, we should try calculating statistics for a single record, for two identical records, for two records whose values are one step apart, or for some other case where we can calculate the right answer by hand.
Test a simplified case. If our program is supposed to simulate magnetic eddies in rapidly-rotating blobs of supercooled helium, our first test should be a blob of helium that isnât rotating, and isnât being subjected to any external electromagnetic fields. Similarly, if weâre looking at the effects of climate change on speciation, our first test should hold temperature, precipitation, and other factors constant.
Compare to an oracle. A test oracle is something whose results are trusted, such as experimental data, an older program, or a human expert. We use test oracles to determine if our new program produces the correct results. If we have a test oracle, we should store its output for particular cases so that we can compare it with our new results as often as we like without re-running that program.
Check conservation laws. Mass, energy, and other quantities are conserved in physical systems, so they should be in programs as well. Similarly, if we are analyzing patient data, the number of records should either stay the same or decrease as we move from one analysis to the next (since we might throw away outliers or records with missing values). If ânewâ patients start appearing out of nowhere as we move through our pipeline, itâs probably a sign that something is wrong.
Visualize. Data analysts frequently use simple visualizations to check both the science theyâre doing and the correctness of their code (just as we did in the opening lesson of this tutorial). This should only be used for debugging as a last resort, though, since itâs very hard to compare two visualizations automatically.
We can only debug something when it fails, so the second step is always to find a test case that makes it fail every time. The âevery timeâ part is important because few things are more frustrating than debugging an intermittent problem: if we have to call a function a dozen times to get a single failure, the odds are good that weâll scroll past the failure when it actually occurs.
As part of this, itâs always important to check that our code is âplugged inâ, i.e., that weâre actually exercising the problem that we think we are. Every programmer has spent hours chasing a bug, only to realize that they were actually calling their code on the wrong data set or with the wrong configuration parameters, or are using the wrong version of the software entirely. Mistakes like these are particularly likely to happen when weâre tired, frustrated, and up against a deadline, which is one of the reasons late-night (or overnight) coding sessions are almost never worthwhile.
If it takes 20 minutes for the bug to surface, we can only do three experiments an hour. This means that weâll get less data in more time and that weâre more likely to be distracted by other things as we wait for our program to fail, which means the time we are spending on the problem is less focused. Itâs therefore critical to make it fail fast.
As well as making the program fail fast in time, we want to make it fail fast in space, i.e., we want to localize the failure to the smallest possible region of code:
The smaller the gap between cause and effect, the easier the connection is to find. Many programmers therefore use a divide and conquer strategy to find bugs, i.e., if the output of a function is wrong, they check whether things are OK in the middle, then concentrate on either the first or second half, and so on.
N things can interact in N! different ways, so every line of code that isnât run as part of a test means more than one thing we donât need to worry about.
Replacing random chunks of code is unlikely to do much good. (After all, if you got it wrong the first time, youâll probably get it wrong the second and third as well.) Good programmers therefore change one thing at a time, for a reason. They are either trying to gather more information (âis the bug still there if we change the order of the loops?â) or test a fix (âcan we make the bug go away by sorting our data before processing it?â).
Every time we make a change, however small, we should re-run our tests immediately, because the more things we change at once, the harder it is to know whatâs responsible for what (those N! interactions again). And we should re-run all of our tests: more than half of fixes made to code introduce (or re-introduce) bugs, so re-running all of our tests tells us whether we have regressed.
Good scientists keep track of what theyâve done so that they can reproduce their work, and so that they donât waste time repeating the same experiments or running ones whose results wonât be interesting. Similarly, debugging works best when we keep track of what weâve done and how well it worked. If we find ourselves asking, âDid left followed by right with an odd number of lines cause the crash? Or was it right followed by left? Or was I using an even number of lines?â then itâs time to step away from the computer, take a deep breath, and start working more systematically.
Records are particularly useful when the time comes to ask for help. People are more likely to listen to us when we can explain clearly what we did, and weâre better able to give them the information they need to be useful.
Version Control Revisited
Version control is often used to reset software to a known state during debugging, and to explore recent changes to code that might be responsible for bugs. In particular, most version control systems (e.g. git, Mercurial) have:
blame command that shows who last changed each line of a file;bisect command that helps with finding the commit that introduced an issue.And speaking of help: if we canât find a bug in 10 minutes, we should be humble and ask for help. Explaining the problem to someone else is often useful, since hearing what weâre thinking helps us spot inconsistencies and hidden assumptions. If you donât have someone nearby to share your problem description with, get a rubber duck!
Asking for help also helps alleviate confirmation bias. If we have just spent an hour writing a complicated program, we want it to work, so weâre likely to keep telling ourselves why it should, rather than searching for the reason it doesnât. People who arenât emotionally invested in the code can be more objective, which is why theyâre often able to spot the simple mistakes we have overlooked.
Part of being humble is learning from our mistakes. Programmers tend to get the same things wrong over and over: either they donât understand the language and libraries theyâre working with, or their model of how things work is wrong. In either case, taking note of why the error occurred and checking for it next time quickly turns into not making the mistake at all.
And that is what makes us most productive in the long run. As the saying goes, A week of hard work can sometimes save you an hour of thought. If we train ourselves to avoid making some kinds of mistakes, to break our code into modular, testable chunks, and to turn every assumption (or mistake) into an assertion, it will actually take us less time to produce working programs, not more.
Take a function that you have written today, and introduce a tricky bug. Your function should still run, but will give the wrong output. Switch seats with your neighbor and attempt to debug the bug that they introduced into their function. Which of the principles discussed above did you find helpful?
You are assisting a researcher with Python code that computes the Body Mass Index (BMI) of patients. The researcher is concerned because all patients seemingly have unusual and identical BMIs, despite having different physiques. BMI is calculated as weight in kilograms divided by the square of height in metres.
Use the debugging principles in this exercise and locate problems with the code. What suggestions would you give the researcher for ensuring any later changes they make work correctly? What bugs do you spot?
calculate_bmi function, like print('weight:', weight, 'height:', height), to make clear that what the BMI is based on.print("Patient's BMI is: %f" % bmi) to print("Patient's BMI (weight: %f, height: %f) is: %f" % (weight, height, bmi)), in order to be able to distinguish bugs in the function from bugs in the loop.The loop is not being utilised correctly. height and weight are always set as the first patientâs data during each iteration of the loop.
The height/weight variables are reversed in the function call to calculate_bmi(...), the correct BMIs are 21.604938, 22.160665 and 51.903114.
SyntaxError. If the issue has to do with how the code is indented, then it will be called an IndentationError.NameError will occur when trying to use a variable that does not exist. Possible causes are that a variable definition is missing, a variable reference differs from its definition in spelling or capitalization, or the code contains a string that is missing quotes around it.IndexError.FileNotFoundError. Trying to read a file that is open for writing, or writing to a file that is open for reading, will give you an IOError.This module builds debugging and defensive programming habits for reliable code. Learners interpret tracebacks, isolate failures, validate assumptions, and use systematic workflows to reduce recurring errors.
The concepts in this module connect directly to practical data handling and exploration in Python.
| Submodule | Python Connection | Why It Matters |
|---|---|---|
| Exceptions and Tracebacks | Errors and Exceptions | Reading tracebacks quickly shortens debug cycles. |
| Assertions and Contracts | assert statement |
Assertions catch invalid states early. |
| Interactive Debugging | pdb debugger |
Step-through debugging reveals actual runtime behavior. |
Attribution
This lesson is derived from materials developed by the Software Carpentry project.
The original content is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://github.com/swcarpentry/python-novice-inflammation/blob/main/LICENSE.md