Command-Line Programs
- Use the values of command-line arguments in a program.
- Handle flags and files separately in a command-line program.
- Read data from standard input in a program so that it can be used in a pipeline.
- How can I write Python programs that will work like Unix command-line tools?
The Jupyter Notebook and other interactive tools are great for prototyping code and exploring data, but sooner or later we will want to use our program in a pipeline or run it in a shell script to process thousands of data files. In order to do that in an efficient way, we need to make our programs work like other Unix command-line tools. For example, we may want a program that reads a dataset and prints the average inflammation per patient.
This program does exactly what we want - it prints the average inflammation per patient for a given file.
$ python ../code/readings_04.py --mean inflammation-01.csv5.45
5.425
6.1
...
6.4
7.05
5.9
We might also want to look at the minimum of the first four lines
$ head -4 inflammation-01.csv | python ../code/readings_06.py --minor the maximum inflammations in several files one after another:
$ python ../code/readings_04.py --max inflammation-*.csvOur scripts should do the following:
- If no filename is given on the command line, read data from standard input.
- If one or more filenames are given, read data from them and report statistics for each file separately.
- Use the
--min,--mean, or--maxflag to determine what statistic to print.
To make this work, we need to know how to handle command-line arguments in a program, and understand how to handle standard input. Weāll tackle these questions in turn below.
Command-Line Arguments
We are going to create a file with our python code in, then use the bash shell to run the code. Using the text editor of your choice, save the following in a text file called sys_version.py:
The first line imports a library called sys, which is short for āsystemā. It defines values such as sys.version, which describes which version of Python we are running. We can run this script from the command line like this:
$ python sys_version.pyversion is 3.4.3+ (default, Jul 28 2015, 13:17:50)
[GCC 4.9.3]
Create another file called argv_list.py and save the following text to it.
The strange name argv stands for āargument valuesā. Whenever Python runs a program, it takes all of the values given on the command line and puts them in the list sys.argv so that the program can determine what they were. If we run this program with no arguments:
$ python argv_list.pysys.argv is ['argv_list.py']
the only thing in the list is the full path to our script, which is always sys.argv[0]. If we run it with a few arguments, however:
$ python argv_list.py first second thirdsys.argv is ['argv_list.py', 'first', 'second', 'third']
then Python adds each of those arguments to that magic list.
With this in hand, letās build a version of readings.py that always prints the per-patient mean of a single data file. The first step is to write a function that outlines our implementation, and a placeholder for the function that does the actual work. By convention this function is usually called main, though we can call it whatever we want:
$ cat ../code/readings_01.pyThis function gets the name of the script from sys.argv[0], because thatās where itās always put, and the name of the file to process from sys.argv[1]. Hereās a simple test:
$ python ../code/readings_01.py inflammation-01.csvThere is no output because we have defined a function, but havenāt actually called it. Letās add a call to main:
$ cat ../code/readings_02.pyand run that:
$ python ../code/readings_02.py inflammation-01.csv5.45
5.425
6.1
5.9
5.55
6.225
5.975
6.65
6.625
6.525
6.775
5.8
6.225
5.75
5.225
6.3
6.55
5.7
5.85
6.55
5.775
5.825
6.175
6.1
5.8
6.425
6.05
6.025
6.175
6.55
6.175
6.35
6.725
6.125
7.075
5.725
5.925
6.15
6.075
5.75
5.975
5.725
6.3
5.9
6.75
5.925
7.225
6.15
5.95
6.275
5.7
6.1
6.825
5.975
6.725
5.7
6.25
6.4
7.05
5.9
Handling Multiple Files
The next step is to teach our program how to handle multiple files. Since 60 lines of output per file is a lot to page through, weāll start by using three smaller files, each of which has three days of data for two patients:
$ ls small-*.csvsmall-01.csv small-02.csv small-03.csv
$ cat small-01.csv0,0,1
0,1,2
$ python ../code/readings_02.py small-01.csv0.333333333333
1.0
Using small data files as input also allows us to check our results more easily: here, for example, we can see that our program is calculating the mean correctly for each line, whereas we were really taking it on faith before. This is yet another rule of programming: test the simple things first.
We want our program to process each file separately, so we need a loop that executes once for each filename. If we specify the files on the command line, the filenames will be in sys.argv, but we need to be careful: sys.argv[0] will always be the name of our script, rather than the name of a file. We also need to handle an unknown number of filenames, since our program could be run for any number of files.
The solution to both problems is to loop over the contents of sys.argv[1:]. The ā1ā tells Python to start the slice at location 1, so the programās name isnāt included; since weāve left off the upper bound, the slice runs to the end of the list, and includes all the filenames. Hereās our changed program readings_03.py:
$ cat ../code/readings_03.pyand here it is in action:
$ python ../code/readings_03.py small-01.csv small-02.csv0.333333333333
1.0
13.6666666667
11.0
Handling Command-Line Flags
The next step is to teach our program to pay attention to the --min, --mean, and --max flags. These always appear before the names of the files, so we could do this:
$ cat ../code/readings_04.pyThis works:
$ python ../code/readings_04.py --max small-01.csv1.0
2.0
but there are several things wrong with it:
mainis too large to read comfortably.If we do not specify at least two additional arguments on the command-line, one for the flag and one for the filename, but only one, the program will not throw an exception but will run. It assumes that the file list is empty, as
sys.argv[1]will be considered theaction, even if it is a filename. Silent failures like this are always hard to debug.The program should check if the submitted
actionis one of the three recognized flags.
This version pulls the processing of each file out of the loop into a function of its own. It also checks that action is one of the allowed flags before doing any processing, so that the program fails fast:
$ cat ../code/readings_05.pyThis is four lines longer than its predecessor, but broken into more digestible chunks of 8 and 12 lines.
Handling Standard Input
The next thing our program has to do is read data from standard input if no filenames are given so that we can put it in a pipeline, redirect input to it, and so on. Letās experiment in another script called count_stdin.py:
$ cat ../code/count_stdin.pyThis little program reads lines from a special āfileā called sys.stdin, which is automatically connected to the programās standard input. We donāt have to open it ā Python and the operating system take care of that when the program starts up ā but we can do almost anything with it that we could do to a regular file. Letās try running it as if it were a regular command-line program:
$ python ../code/count_stdin.py < small-01.csv2 lines in standard input
A common mistake is to try to run something that reads from standard input like this:
$ python ../code/count_stdin.py small-01.csvi.e., to forget the < character that redirects the file to standard input. In this case, thereās nothing in standard input, so the program waits at the start of the loop for someone to type something on the keyboard. Since thereās no way for us to do this, our program is stuck, and we have to halt it using the Interrupt option from the Kernel menu in the Notebook.
We now need to rewrite the program so that it loads data from sys.stdin if no filenames are provided. Luckily, numpy.loadtxt can handle either a filename or an open file as its first parameter, so we donāt actually need to change process. Only main changes:
$ cat ../code/readings_06.pyLetās try it out:
$ python ../code/readings_06.py --mean < small-01.csv0.333333333333
1.0
Thatās better. In fact, thatās done: the program now does everything we set out to do.
Arithmetic on the Command Line
Write a Python program that adds, subtracts, multiplies, or divides two numbers provided on the command line:
$ python arith.py --add 1 23.0
$ python arith.py --subtract 3 4-1.0
Solution (Solution).
Finding Particular Files
Using the glob module introduced earlier, write a simple version of ls that shows files in the current directory with a particular suffix. A call to this script should look like this:
$ python my_ls.py pyleft.py
right.py
zero.py
Solution (Solution).
Changing Flags
Rewrite readings.py so that it uses -n, -m, and -x instead of --min, --mean, and --max respectively. Is the code easier to read? Is the program easier to understand?
Solution (Solution).
Adding a Help Message
Separately, modify readings.py so that if no parameters are given (i.e., no action is specified and no filenames are given), it prints a message explaining how it should be used.
Solution (Solution).
Adding a Default Action
Separately, modify readings.py so that if no action is given it displays the means of the data.
Solution (Solution).
A File-Checker
Write a program called check.py that takes the names of one or more inflammation data files as arguments and checks that all the files have the same number of rows and columns. What is the best way to test your program?
Solution (Solution).
Counting Lines
Write a program called line_count.py that works like the Unix wc command:
- If no filenames are given, it reports the number of lines in standard input.
- If one or more filenames are given, it reports the number of lines in each, followed by the total number of lines.
Solution (Solution).
Generate an Error Message
Write a program called check_arguments.py that prints usage then exits the program if no arguments are provided. (Hint: You can use sys.exit() to exit the program.)
$ python check_arguments.pyusage: python check_argument.py filename.txt
$ python check_arguments.py filename.txtThanks for specifying arguments!
- The
syslibrary connects a Python program to the system it is running on. - The list
sys.argvcontains the command-line arguments that a program was run with. - Avoid silent failures.
- The pseudo-file
sys.stdinconnects to a programās standard input.