Writing & Debugging Scripts

ENV 859 - Geospatial Data Analytics   |   Fall 2024   |   Instructor: John Fay  

Introduction

This session follows the previous sessions of “Approaching a Scripting Project” and “Introducing Git & GitHub” where we created our initial project workspace. Here, we do the actual coding for our tool that reads in standardized ARGOS tracking data and presents the user with the turtle’s location data for a provided date.

We approach this as a set of smaller, more manageable tasks, using Git to track our changes as we go along.

Learning Objectives

On completion of this exercise, you should be able to:

  • Initialize a script file with “front matter”
  • Split strings into a list and access items in this list
  • Read and write text files with Python’s file object
  • Create a for loop and use it to repeat a section of code
  • Create a while loop and use it to repeat a section of code
  • Apply conditional statements & comparative operators
  • Identify and debug scripting errors (syntax and logical)

Task Outline


» Setting VSCode to run code snippets in the Interactive Window

I find it’s easier to track your code and its output using VSCode’s interactive window than the terminal, so we’ll change the setting so that’s the default.

  • Open VS Code’s settings (<Ctrl>-,) and search for “jupyter execute”.
  • Check the box next to the setting “When pressing shift+enter, send selected code in a Python file to the Jupyter interactive...

» Task 1. Data prep

Coding projects begin with creating your project workspace and linking it to a Git/GitHub repository – something we did in the previous sessions. We still have a few remaining data prep tasks to carry out, including creating our Python script file, and setting up our VSCode project.

  • First, ensure that you have a Git-enabled workspace on your machine that has the sara.txt data file, located in the data/raw/ subfolder and a corresponding metadata file. You should also have a README.md file. This is the exact same workspace we created in the Git/GitHub lab exercise we just completed.
  • Open your workspace in VSCode.

  • Next, create an new Python script file in your project folder, that is, a text file we’ll rename ARGOSTrackingTool.py.

    • Select File>New File. Then select Python File from the dropdown.

    • Save the file as ARGOSTrackingTool.py in your project folder.

  • Add the following “front matter” to your Python script. (Feel free to modify the format and content.) This comment section provides a quick preview of what this script does, useful in case this script gets separated from its workspace for whatever reason:

    #-------------------------------------------------------------
    # ARGOSTrackingTool.py
    #
    # Description: Reads in an ARGOS tracking data file and allows
    #   the user to view the location of the turtle for a specified
    #   date entered via user input.
    #
    # Author: John Fay (john.fay@duke.edu)
    # Date:   Fall 2024
    #--------------------------------------------------------------
    
  • Save the file.

  • Finally, stage & commit your changes to your Git repository (message = “Initial commit of ARGOSTrackingTool script”) then push those changes to your GitHub repository.

» Task 2. Parse one line of tracking data

In our first coding task, we create a string variable called lineData, setting it equal to a line of ARGOS data copied from the sara.txt file and pasted into our script. Then we parse this line of data into its components so that we can print out, in a readable format, information about that record.

In case you are wondering why we are starting with a line of data simply copied from the ARGOS file and pasted into our script, it’s because it simplifies the task of figuring out how to deal with a line once its read in. Once we nail that, then we should be able to read in all the lines from the input file using the code we develop here.

  • Open your ARGOSTrackingTool.py file in VSCode (if not open already).
  • Add the lines code below to your script, underneath the “front matter” code added in the previous step.
    # Parse Data
    # Copy and paste a line of data as the lineString variable value
    lineString = ""
      
    # Use the split command to parse the items in lineString into a list object
    lineData = lineString
      
    # Assign variables to specfic items in the list
    record_id = lineData[]   # ARGOS tracking record ID
    obs_date = lineData[2]   # Observation date
    ob_lc = lineData[]       # Observation Location Class
    obs_lat = lineData[]     # Observation Latitude
    obs_lon = lineData[]     # Observation Longitude
      
    # Print information to the use
    print (f"Record {record_id} indicates Sara was seen at {obs_lat}N and {obs_lon}W on {obs_date}")
    
  • Fill in the missing code:
    • Paste in a line of data on the second line (lineString = ).
    • Split the lineString in subsequent Python statement (lineData = ...)
    • Find an insert the correct index values for the variable assignment statements (record_id = ..., etc.).
  • Run the code to check for errors.
    :point_right: You may want to run individual lines in the interactive window and inspect variable values throughout the process to ensure the code is working as you expect.

  • Stage & commit these changes to your ARGOSTrackingTool.py script to your Git repository. (Adding whatever message you think is appropriate…)

    Link to what the code should look like after Task 2

    Explanation of the above code:

    • Line 13: Assigns the first data line in sara.txt to the sting variable called lineString.
    • Line 16: The information included in the ARGOS data are separated by tabs. This line parses the line into a list (lineData) data items using the string split() function on the string lineString.
    • Lines 19-23: Assigns selected items in the lineData list to their own variables to make for easier use of these values throughout the script.
    • Line 26: Prints a simple message indicating where Sara was seen given the values contained, and parsed from, the initial lineString variable.

» Side Task : Python’s File Object

Now that we have a simple script working with a sample of data, let’s expand it so that, instead of having to copy and paste lines of data from the sara.txt file into our script, we can read the data directly from the text file. This is done using Python’s file object, so let’s first pause and examine this file object.

♦ Reading text files…

  • To do these steps, you need a Python terminal, i.e., one with the >>> prompt. If you don’t have one open, you can create one by running a line of code in your script using <Shift>-<Enter>.

  • In the Python terminal, type the following at the command prompt to open the sara.txt file in read mode:

    fileObj = open('data/raw/sara.txt','r')

  • With the file open, we can read (and print) a single line from the file using the readine() command:

    print(fileObj.readline())

  • Hit to recall that last command you ran (the readline one above) and run it again. You’ll see that Python prints the next line in the file. It’s good to note this behavior, i.e., that once a line is read, Python moves to the next line in the file.

  • We can also store the contents of a single line as a variable:

    lineString = fileObj.readline(); print(lineString)

  • If we wanted to reset the “cursor” back to the first line, you can use fileObj.seek(0).

  • Alternatively, we can read the entire contents of a text file into a list of lines by replacing readline() with readlines(). Here we point the variable “lineList” to the entire list of lines, and then print the last item in that list:

    lineList = fileObj.readlines(); print(lineList[-1])

  • And when we are finished reading the file, we should close it to release Python’s hold on the file:
    fileObj.close()

♦ Writing to text files

  • Now, we’ll create a new text file by using the open() function in write mode:
    newFile = open('newfile.txt','w')

    *Careful when writing to files as it will overwrite any existing file with that name without warning!*

  • And we can write a string to this file:
    newFile.write("Hello world\nIt's me")

  • Be sure to close it!
    newFile.close()

  • Take a look at your new file in your favorite text editor!

♦ Appending to text files…

  • Lastly, we can add to an existing text file by opening a file in append mode:
    open('newfile.txt','a').write("See what I did here")

» Task 3: Read the data directly from the ARGOS file

Now that we have a handle on Python’s file object, let’s return to our ARGOSTrackingTool.py script read the data into our script in place of just copying and pasting it in. The code snippet below can serve as a template for what’s next. You can copy and paste this into your script just after the front matter (and replacing all existing code after the front matter).

#Create a variable pointing to the data file
file_name = '█'

#Create a file object from the file
file_object = open(,'r')

#Read contents of file into a list
line_list = file_object.

#Close the file
file_object.

#Pretend we read one line of data from the file
lineString = line_list[]

#Split the string into a list of data items
lineData = lineString.split()

#Extract items in list into variables
record_id = lineData[0]
obs_date = lineData[2]
obs_lc = lineData[4]
obs_lat = lineData[6]
obs_lon = lineData[7]

#Print the location of sara
print(f"Record {record_id} indicates Sara was seen at lat:{obs_lat},lon:{obs_lon} on {obs_date}")

Now, you’ll have to make the following edits (where you see the █ character) so that it runs correctly:

  1. Set the file_name variable to a string indicating the location of the sara.txt data file. This path will be relative to our script file, so the full relative path will be “./data/raw/sara.txt”.
  2. Next, set the file_object variable to be a Python “file object” created by opening the file who’s path is stored in the file_name variable in “read-only” mode.
  3. Apply the readlines() function to the file object to read it’s entire contents as a list of lines stored as the line_list variable.
  4. With all the contents of the file object extracted into a local variable, close the file object.
  5. Now, instead of assigning the lineString variable to a string that was copied from the sara.txt file and pasted in your script, assign the lineString variable to the 200th item in the line_list list object.
  • The remaining lines are the same as in the previous script…
  • Run the code and if it runs successfully, commit the changes to your Git repository.

Link to what the code should look like after Task 3


» Task 4a: Process all lines in the ARGOS file using a for loop

Now let’s expand on what we did above and loop through all lines in the file. To do this we’ll replace the line were where we extract one line from our line file (by its index) with a “for” loop that iterates through all [data] lines and processes each just as we did the one.

  1. Change the line lineString = line_list[100] with for lineString in line_list:.
  2. Select all lines below that and indent them (by hitting the tab key when selected in VSCode).
  3. Run the script with these changes

→ You get an error here because the first 17 or so lines are not data lines and thus cannot be parsed correctly!

One way to fix this issue is simply to skip the first 17 records that we iterate through:

  1. Alter the line for lineString in line_list: to for lineString in line_list[17:]:.
    Be sure you understand why this fixes this issue; ask if you do not.

Perhaps a more robust way to solve this issue to ensure a line is, in fact, a data line before processing it (i.e., parsing it into values). This we can do with conditional execution, i.e. an “if” statement. If you look at the sara.txt file, you’ll notice that all the metadata lines begin with an “#”, and the one line after that begins with a “u”. So, we can add a conditional statement that inspects the first character of the lineString variable in each iteration of the for loop: if it begins with either a # or a u, then we’ll skip to the next iteration of the line_list using the continue statement:

  1. Revert the “for” loop line to remove the [17:] from it.
  2. Just below the “for” loop line, insert code that evaluates whether the first character of lineString occurs “in” the tuple ("#","u").
  3. If the above it true: execute the continue statement to skip the lines that parse the string into variables…
  4. Run the code to see that it works without error.

Link to what the code should look like after Task 4a


» Task 4b: Process all lines in the ARGOS file using a while loop

The for loop works quite well, but for it to work, Python has to store the entire contents of the text file into the computer’s memory. That’s fine in our example, but what if we had an enormous file? A while loop, combined with using the readline() function (vs readlines()) on our file object, allows us to process just one line at a time, bypassing the need to load the entire file into memory. Thus, it’s good to know how to do this…

To implement a While loop in this code:

  • Change Line 19 to lineString = file_object.readline() (We just want to read one line at a time in our while loop…)

    • You may also want to update the comment above the line…
  • Delete Lines 20, 21, & 22. (We don’t want to close the file just yet…)

  • Replace the for loop (and comment - now lines 21 & 22) to a while loop:

    #Iterate through lines
    while lineString:
    

    It may seem peculiar that we are not evaluating any statement that generates a True or False in our while statement here (e.g. while lineString != "":), but this actually works. Why? Because when the readline() function hits the end of the file, it returns a null object, which equates to “False” when evaluated as a Boolean. Thus while linestring: will end when the last record has been read, and the loop will terminate!

    :question:What would happen if we ran the code right now??

  • At the end of all indented lines (should be line 38) , insert the following code:

      
        # Move to the next line
        lineString = file_object.readline()
      
    # Close the file
    file_object.close()
    
  • Run your code. It will likely run and produce no output, nor any error. Can you think why?

  • Click the trash can icon in the top right corner of your terminal. This interrupts the script from running.

  • Amend the bit of code that checks to see if the line read in from the sara.txt file is a data line or not to:

    if linestring[0] in ("#","u"):
        lineString = file_object.readline()
        continue
    
  • , and if it runs correctly commit the changes

Link to what the code should look like after Task 4b


Explanation:

  • Instead of reading in all lines as a list, we read only one line in at a time, first outside the while loop (Line 19), and then continue reading lines within the while loop (Line 42).
  • We removed the lines that read all lines into a list object and instead just read one line of data, the first line.
  • Then, in Line 22, we initiated our While loop. You may have noted that there is no statement to evaluate in the while loop, just the variable name. We could have used while lineString != None, but by just supplying the variable name, the While loop will run as long as the variable has a value. And this works because when Line 44 is run, which fetches the next line of data from the file file, if no lines are left, the readline() function returns an empty object - which, in turn, shuts down the While loop!
  • Also note that we don’t close the file object until we have completed the While loop.

» Side-task: Reverting a commit

Say you actually did want to use a for loop in your analysis, not a while loop. This is where Git is handy. Unfortunately, the Git interface in VSCode is not as friendly to do this, but we can always use Git commands to manage our repository. In this case, we’ll use the git revert command to undo a specific commit in our repository. To do this, however, we’ll need to get the ID of the commit we want to revert.

  • Find the ID of the commit to revert – from GitHub.com

    • Navigate to your repository on GitHub.com
    • Below the green Code button you’ll see a listing of the number of commits you’ve made in your repository. Click that to get more info on each commit.
    • Each commit is listed with its message and on the right side an ID. Just to left of each ID (called a “SHA”) is an icon you can click to copy the SHA to the clipboard.
    • Copy the SHA associated with the While loop commit to your clipboard. a7adcd81d6a737d63f6d8a4df2a7787145a2b622
  • [Alternate] Find the ID of the commit to revert – from VSCode
    • In VSCode, open the Explorer pane by clicking the top icon in the Activity Bar
    • At the bottom of the pane is an entry called TIMELINE. Click that while your script is active.
    • This list all the saves and commits to your file. Right click the commit associated with the While loop commit and select “Copy Commit ID”.
  • Undo the commit

    • Open a new terminal in VSCode.

    • Type git status to ensure Git is receiving commands.

    • If all is good, type git revert followed by the SHA you obtained above, and then ending with --no-edit

      git revert a7adcd81d6a737d63f6d8a4df2a7787145a2b622 --no-edit
      
    • Have a look at your code: it should have reverted back to the for loop.

  • Push your “reversion” to GitHub

    • Open the Source Control pane in VSCode and push (Sync Changes)
    • Check your GitHub site to see that the latest version has the for loop, not the while loop.

    :point_right: The git revert doesn’t actually remove the commit, but rather it creates a new commit that undoes the changes in the commit we specified. This is in line with the notion that Git really doesn’t like to lose versions…


» Task 5: Building Python dictionaries of ARGOS observations

So far, we’ve just pulled in data and printed it to the interactive window. More than likely, you’ll want to do some analysis with each observation record – which you could do with some more analytical Python statements within your while… or for… loops. Or perhaps you’ll want to perform analyses on collections of ARGOS observations, in which case you’ll want to be able to extract them from a collection.

In this task, we create a pair of Python dictionaries – one containing date information and one containing location information – with values referenced by their ARGOS record ID. These dictionaries, in turn, enable us to extract data and location specific observations, identified by their IDs, in subsequent analyses. This will serve as an important step in arriving at our final scripted tool, which allows a user to identify two ARGOS observations and calculate distance between them.

  • Prior to the line where the “for” loop starts, create two empty dictionaries, one called date_dict and the other called location_dict . These will hold each record’s observation date and location coordinates, respectively, and will be “keyed” by the records recordID value.
  • Just after the existing line where you print the status of the turtle (should currently be the last line in the script), add new code that, add items to the date_dict and location_dict dictionaries. Here, the key will be the recordID (i.e. a unique value for each line of data), and the values will be the observation date (obs_date) and a tuple of the observation latitude (obs_lat) and longitude (obs_lon), respectively.
  • Run the code and commit the changes.
  • Push your changes to the GitHub repository (on-line)

Link to what the code should look like after Task 5

When complete, test how our script did. In the VSCode console…

  • Get the key for the first item in the location_dict dictionary:

    list(location_dict.keys())[0] (→ Should return '20616')

  • Get the value for that key:

    print (location_dict['20616']) (→ Should return ('33.898', '-77.958'))

  • When was record “24719” observed? (→ Should be July 25, 2003)


» Task 6: Filtering records added to our dictionary

The method used to collect ARGOS data can produce some widely errant results, which is why the location classification attribute is included with observation data. Location classification values of 1, 2, or 3 are “acceptable” values, with other values indicating suspect records.

With this being the case, we want to omit these suspect records from being included in our location and date dictionaries. We do this with Python’s if…else… statements. Specifically, we need to insert an if... statement before adding the records to the date and location dictionaries such that only records with an obs_lc value of 1, 2, or 3 get added:

  • Insert the following code just prior to the line where values are printed & added to the dictionaries (Line 45):

        if obs_lc in ("1","2","3"):
    
  • Indent the lines that add the records to the dictionary so that they only run if the above statement is true.

  • Run the script again in the interactive window.

  • At the prompt in the interactive window, type the command print(len(dataDict)).

    • Now, only 332 (of the original 2496) records should have been added to the dictionaries…
  • If everything is working, commit your changes to your Git repository.

Link to what the code should look like after Task 6


» Task 7: Display information of a user-specified ARGOS observation

Our objectives here include:

  1. Adding code that asks the user to specify a date;
  2. Using this date to find the record number(s), i.e., the keys associated with that date - using a reverse-lookup of our dictionary;
  3. And finally, use that record number to extract the coordinates from the location dictionary.

Task 7.1 Asking the user to specify a date.

For this step, we have to learn a new Python command, specifically the input() command. Find out about this command in VSCode’s help window by activating the help window and typing “input” as the Object. The help is not super helpful. You could search the web for “Python input function” and likely find some quick tutorials, but what we do below should suffice.

  • First, delete all existing terminals and interactive windows in VSCode to clear all variables.

  • Run a single line of code in the interactive window to get the interactive prompt.

  • At the prompt, type x = input("Pick a number: ") and run the command.

    Nothing immediately appears in the terminal because Python is waiting for you to type in a number!.

  • The Python console is awaiting your input. Type a number immediately after the “Pick a number:” prompt and hit enter.

    What is the value of x in the variable explorer? What is the variable’s data type?

:point_right: Python’s input() function is a quick way to have your script ask the user for input at the Python console and store the input as a variable. The variable is always stored as a string, but it can usually be converted to other data types quite easily.

Now we are ready to enter code asking for the user to enter a date.

  • Just after the front matter in your script add the following statement:

    # Ask the user for a date, specifying the format
    user_date = input("Enter a date (M/D/YYYY):")
    
  • Run the script and enter a date when asked.

  • Check the value of the user_date variable in the variable explorer or in the Python console.

If all checks out, great! However, as we are still developing our script, it will be a pain to have to manually enter a date each time we run the script, so instead, we’ll “hard code” a date, and then when everything is working, revert our code to ask for a date.

  • Revise the user_date line to:

    user_date = "7/3/2003" #input("Enter a date (M/D/YYYY)")
    
  • Run the model to confirm no errors. (Optionally, commit your changes to Git…)

Task 7.2 Perform a reverse lookup of dictionary keys from values

The only way to extract keys from a dictionary for items that match a specific value it to iterate through all items, inspect their values and add the key to a list if the values match a given criteria. We need to use this technique to find which keys in the date_dict correspond to the date values we want to search for.

To do this, we loop through all items in the dictionary and print key/value pairs where the value matches the user_date date.

  • Add the following code at the end of your script.

    # Loop through all key, value pairs in the date_dictionary
    for key, value in date_dict.items():
        #See if the date (the value) matches the user date
        if value == user_date:
            print(key,value)
    
  • Run the script to make sure two keys are returned are correct: (20616 and 20620).

    This is a typical for loop except for the fact that, because date_dict.items() returns a list of key-value tuples, we can separate each in to specific variables: the_key is assigned the key and the_value is assigned the value of the current item in the loop.

It works! Let’s add some complexity. Instead of printing the key and value, let’s add the key to a list. Of course we have to create an empty list first.

  • Before the for loop (at Line 54), insert the following:

    #Initialize key list
    keys = []
    
  • Then, replace the print(key,value) line with: keys.append(key).

  • Run the script in the interactive console, then see if the keys list has members by printing it at the console.

If all is good, then we can add more code to extract the location using the keys in the keyList:

  • Add the following at the end of your current script, making sure it’s “dedented” so it doesn’t run as part of your for loop…

    #Reveal locations for each key in matching_keys
    for key in keys:
        lat, lng = location_dict[matching_key]
        print(f"On {user_date}, Sara the the turtle was seen at {lat}d Lat, {lng}d Lng.")
    
  • Run the code to check that it works.

If so, we are ready to revert our user input from “hard coded” back to interactive.

  • Alter Line 13 so that it reads: user_date = input("Enter a date (M/D/YYYY)")
  • Run the code again, entering “7/3/2003”. Is it working? If so, commit your changes to the Git repository.

Link to what the code should look like after Task 7


Handling Errors

How does our script does if we enter an improper date (recall the date we dried above was 7/3/2003):

  • Run your script, entering My Birthday, 2023 as the date. Note any output generated.
  • Run the script again, entering 07\03\2003 as the date.

Running the script with a date that does not occur in the ARGOS dataset (or dictionaries) produces some odd output. In this case, it’s just some incomplete output, but in other cases adding the wrong input can lead to some ugly red error messages. Either way, not handling errant input and other hiccups in scripts makes them less user friendly than they should be.

To fix this we need some method of handling errors. There are two approaches to doing this: one to handle errors you may anticipate (like a missing date) and one to handle any error, including unexpected ones. Task 8a does the former, Task  8b does the latter.


» Task 8: Handling Errors with if statements

Here we are going to insert an if statement to see if any keys were returned after doing our reverse-lookup of the date_dict values for the user supplied date.

  • Insert, either before or after the reverse-lookup loop the following if code lines:

    # Report whether no keys were found
    if len(keys) == 0:
        print(f"Sara was not located on {user_date}")
    
  • Run the model, entering a bogus date.

Now your code adds a nice message to the user that the script ran, but found no records. But let’s now avoid running the code that [attempts to] lookup up the locations for our [empty] key list.

  • After the if statement we just added, insert and else: statement and indent all the existing code beneath it in your script.

We’ll see no difference in running the script, and indeed this is a somewhat trivial change to how our script functions, but you get the idea that, if we can anticipate specific errors, we can code for those errors.

  • Commit your changes to the Git repository.

Link to what the code should look like after Task 8


♦ Summary

Looking back

Before beginning this tutorial, it’s quite possible that, had you looked at the final ARGOSTrackingTool.py script, you would have seen nothing but an unrecognizable jumble of gobbledygook. Indeed, there’s a lot going on in this script, but through the process of developing this script bit by bit and in learning more about the fundamentals of Python, the general logic and flow of the script should now be coherent.

Furthermore, if you look back on the general process we used in developing the final script here, you may note some useful tactics for writing scripts. While the ability to “whip out” good clean scripts depends much on one’s experience and knowledge of the language, mixed in with a smidgen of artistry, some of the foundations of script writing are seen in this very exercise: First, it’s useful to map out the overall logic of a script before diving in and writing it. Second, copious comments can make a script more understandable to you and to others. Third, the use of variables - well named variables - can make your script run efficiently and elegantly.

Looking ahead

We’re still just getting started with the fundamentals of writing Python scripts. We haven’t even ventured beyond what can be done with Python’s built-in functions. What we’ll soon tinker with is the astounding library of additional plug-ins/extensions/modules/etc. that has been written for the Python language - with ESRI’s ArcPy module being one of them. So that’s what in store next.