Problem Set 3 - Writing Scripts

ENV 859 - Geospatial Data Analytics   |   Fall 2023   |   Instructor: John Fay  

The materials for this problem set are located in this zip file. Download and unpack to your V: drive, then open the folder in VS Code as you would any new coding workspace.

Part I.

The following exercises consist of editing an existing Python script (ProblemSet3_Part1.py). Open this script in VS Code and you’ll notice that the tasks are separated by lines of code starting with #%%; this notation demarks code cells within VS code that behave much like code cells in Jupyter Notebooks. Above each code chunk, you’ll see options to run or debug specific cells, but you can also run selected line(s) individually via righ-clicking and selecting Run Selection/Line in Interactive Window. (Note that <shift>-<enter> runs the code cell, not the selected line(s)!)

:exclamation: NOTES: :exclamation:

If you are unable to fully complete any of the scripts, get as close as you can to the finished product, and I will award partial credit where it’s due. Feel free to be creative in striving for partial credit. For example, if you are unable to read in a file to get the first line, copy the first line and paste it in as a variable.

I will be paying attention to script efficiency and readability in scoring these. Your primary task is that the script execute what’s asked, but beyond that, take time to create clean, logical, and readable code. Tag each script with your name and date. Also, supply abundant comments within your script that briefly describe each major step in the script’s flow. Also, use descriptive variables names where appropriate.


1. (10 pts) Python syntax & string manipulation

Open the ProblemSet3_Part1.py and open it up in VS Code. Debug/edit the first code cell so that it successfully prints out the following three lines, exactly as shown below, to the console.

PS3_Task1

:point_right: NOTE: Do NOT use multiline strings in your code, and strive to use as few edits as possible.


2. (15 pts) Lists and iteration

Just below the code you wrote for Task 1 in the ProblemSet3_Part1.py script, you’ll see a code cell break: **#%% Task 2. **Below that code break, write a brief code snippet that does the following:

  1. Assigns a variable named data_folder to the string object that reads “W:\859_data\triangle".
    (Note for this exercise, data does not actually have to exist in this folder, but the value should be in the format of a valid path…)

  2. Creates a list object called data_list containing the following string objects:
    • streams.shp
    • stream_types.csv
    • naip_imagery.tif
  3. Assigns a variable named user_item and sets it to the string “roads.shp”.
  4. Adds the user_item string object to the data_list list.
  5. Loops through each item in the data_list list, and for for each object prints the full Windows path of each dataset, created by concatenating the data_folder string with the item in the data_list. The output should appear in the Spyder console as follows:

PS3_Task2


3. (23 pts) Lists and iteration

Add a new code cell break (#%% Task 3) at the end of your current script document. Then add code that does the following:

  1. Creates an empty list variable named user_numbers.

  2. Iterates the following process three times (using the range() function and a for loop):

    • Uses the input() function to ask the user to “Enter an integer:”.

    • Adds the user supplied integer to the user_numbers list created in 3.1.

  3. Sorts the user_numbers in ascending numeric order.

  4. Prints the highest value in the user_numbers, i.e., the last value when sorted, to the interactive window.

Run your script using the numbers: 1, 100, and 20. Does it return 100? It should, but if not, try to explain in words (in a commented portion of your script code cell) what might be going wrong.

►Challenge question - 2 pts.

  • Copy your code cell above into a new code cell (“Task 3 - Challenge”)
  • Alter it so that it again sorts the userNumbers, but this time in descending numeric order, and then prints the entire contents of the list. (The output should be [100,20,1]).

Part II.

The remaining questions deal with datasets related to Global Fishing Watch’s Daily Fishing Effort and Vessel Presence research on Global Patterns of Transshipment Behavior. Two data files from the project’s database - transshipment_vessels_20180723.csv and loitering_events_20180723.csv, as well as a READ_ME text file describing these files, should be in your ProblemSet3 workspace.

Take a look at these files and familiarize yourself with the data: both its format and what data each includes.

You will, through a series of semi-guided steps, pull the data stored in these files into Python objects that will allow us to query the data. More specifically, you will:

  • Create a dictionary listing the attributes of each vessel listed in the transshipment_vessels... file
  • Loop through the records in the loitering_events... dataset, and for each vessel observed within a defined geographic region, print information about the vessel.

4. (25 pts) Lists, dictionaries, string manipulation, and iteration

The task here is to convert the data held in the transshipment_vessels_20180723.csv into a format such that we can specify the vessel’s Maritime Mobile Service Identity or MMSI code and it will return the fleet to which the vessel belongs.

The sequence of tasks provided below will lead to than end, but you will need to add the correct code to execute the steps. Complete the following tasks in the code boxes below them.

Task 4.1: Reading in the data and displaying the column headers

Create a new Python script called ProblemSet3_Part2.py. Paste in the code below and replace the redacted areas so that it successfully:

  1. Creates a file object named fileObj by opening the transshipment_vessels_20180723.csv text file. [This is done for you.]
  2. Reads in the entire contents (.i.e. all lines) of the fileObj into a list variable named lineList. Be sure to close the file once you have its contents stored in a local variable.
  3. Creates a variable called headerLineString to hold the first item in thelineList list
  4. Prints the contents of this headerLineString variable.
#%% Task 4.1 

#Create a Python file object, i.e., a link to the file's contents
fileObj = open(file='data/raw/transshipment_vessels_20180723.csv',mode='█')

#Read the entire contents into a list object
lineList = fileObj.()

#Release the link to the file objects (now that we have all its contents)
fileObj.() #Close the file

#Save the contents of the first line in the list of lines to the variable "headerLineString"
headerLineString = 

#Print the contents of the headerLine
print()

Result should be: mmsi,shipname,callsign,fleet_iso3,fleet_name,imo

Task 4.2: Splitting the header string into a list of column names and extracting index values

As above, paste the code below into your Python script and replace the redacted parts so that it:

  1. Splits the contents of the first line into a list variable called headerItems. (Note that the items in this file are separated by commas…)

  2. Uses the index() function to find the indices associated with the mmsi,shipname, and fleet_name items in the headerItems list and assigns the index value to a variables called mmsa_idx, name_idx, and fleet_idx, respectively.
    :point_right: If you are unable to do this step, just set the mmsi_idx, name_idx, and fleet_idx variables to the numbers 0, 1, 4, respectively

  3. Prints the value of each index value.

#%% Task 4.2

#Split the headerLineString into a list of header items
headerItems = █

#List the index of the mmsi, shipname, and fleet_name values
mmsi_idx = █
name_idx = █
fleet_idx = █

#Print the values
print(mmsi_idx,name_idx,fleet_idx)

Result should be: 0 1 4

Task 4.3: Iterating through the data lines and adding values to a dictionary

In the cell below, write code that:

  1. Creates and empty dictionary object named vesselDict
  2. Loops through all the data lines in the lineList variable created above. (Remember to skip the first line as it contains header information, not actual data).
  3. In each iteration of the loop:
  4. Splits the data line (a string) into a list of values
  5. Extracts the mmsi value from this list. Use the mmsi_idx variable created above to get this value.
  6. Likewise, extracts the fleet value from this list
  7. Adds an item to the vesselDict dictionary with the key set to the mmsi and the value set to the fleet.
    #%% Task 4.3
    #Create an empty dictionary
    vesselDict = █
    #Iterate through all lines (except the header) in the data file:
    for █:
    #Split the data into values
    █
    #Extract the mmsi value from the list using the mmsi_idx value
    mmsi = █
    #Extract the fleet value
    fleet = █
    #Adds info to the vesselDict dictionary
    █
    

The resulting vesselDict dictionary should have 1040 items in it.

While the lineList has 1122 items, not all are translated into dictionary items. If you look carefully at the transshipment_vessels_20180723.csv file, you will see that not all records have “mmsi” values…

Task 4.4: Using your dictionary

In a new code cell in your script, add code that:

  1. Assigns the string value 258799000 to a variable named vesselID

  2. Uses the vesselDict dictionary to lookup the fleet value for the vessel with the MMSI equal to the vesselID value.

  3. Prints the statement:

    Vessel # 258799000 flies the flag of Norway
    

    using the vesselID and the fleet value extracted above to construct the string that’s printed.


5. (25 pts) Scripting task

In this exercise, we use the GFW “loitering event” dataset. This dataset contains location and movement data of vessels classified as moving idly, as opposed to steaming to a certain destination. We want to write some Python code that scans these data for particular records, namely those loitering events that cross the equator (from south to north) and originated within a certain longitudinal band (from 165°E to 170°E). And if any records are found, it prints out the MMSI of the vessel and its fleet - using the dictionary created above to tell us.

I’ve provided the following pseudocode to help you. Feel free to deviate from it, but I’ve given it to you to help out, not to trick you or anything like that.

→ Append your code to the end of the script you wrote for Task 4 as it uses the dictionary you created there.

Pseudocode for Task 5:

  • Open the loitering_events_20180723.csv file into a file object variable. (Tip: the code provided in Part 4A above can serve as a good template)

  • Construct a list of all lines in the csv file. (Again, just as you did above…)

  • Loop through each data line (i.e. skip the header line) in this line list, and at each iteration:

  • Split the line string into a list of data items.

  • Store the transshipment_mmsi, starting & ending latitude, and starting & ending longitude values into their own respective variables (e.g. mmsi = ...)

  • Examines the starting and ending latitude (the 2nd and 4th columns in the csv) to determine whether the event crosses the equator, passing from the southern hemisphere to the north.

  • Examines the starting longitude to see whether it falls between 165°E and 170°E. Again, it’s useful to create a Boolean variable that store whether this is true or false.

  • If both the latitude and longitude constraints are true, then use the value of the transmission_mmsi for the current line to query the vesselsDict created above to print the vessel’s mmsi and its fleet.

  • BONUS: If no vessels meet your criteria, print a message that states “No vessels met criteria”

Include abundant comments to your code! They make it easier to award partial credit if your code fails to work…

Results for Task 5 should read:

Vessel #576276000 flies the flag of Vanuatu
Vessel #441034000 flies the flag of South Korea

When complete, zip your project folder, including your 2 Python scripts and the csv files to a folder and submit to Sakai.


Rubric

Step   Description Pts Possible
Part 1 1 a Variables included in code 4
  b Code prints exactly as shown 4
  c Minimal number of edits 2
2 1 Assigns W:\859_data\texas to dataFolder variable 2
  2 Creates dataList containing string items 2
  3 Creates userItem variable and sets it to streams.shp 2
  4 Adds userItem string object to dataList list 2
  5 Loops through dataList and prints full path 2
3 1 Creates empy list variable named userNumbers 5
  2 Iterates using range function and for loop 9
  3 Sorts userNumbers in ascending numeric order 5
  4 Prints highest value in userNumbers 5
3 - Bonus a Sorts userNumbers in descending numeric order 1
  b Prints entire contents of the list 1
Part 2 4.1 2 Reads contents of fileObj into lineList variable 1
  3 Creates headerLineString variable 1
  3 Closes file after storing contents in a local variable 1
  4 Prints headerLineString contents 1
4.2 1 Splits contents of first line into headerItems variable 2
  2 Uses index () function to find indices; assigns index value to variables 4
  3 Prints value of each index variable 2
4.3 1 Creates vesselDict empty dictionary 1
  2 Loops through data lines and splits the data line into list of values 2
  2 Loops through data and extracts mmsi value and fleet value 2
  3 Adds an item to vesselDict with key set to mmsi and value set to fleet 2
4.4 1 Assigns 258799000 value to vesselID variable 1
  2 Uses vesselDict to find vessel with MMSI equal to vesselID value 3
  3 Prints statement using vesselID and the fleet value extracted previously 2
5 1 Opens .csv file into a file object variable 3
  2 Constructs list of all lines in the .csv file 3
  3 Loops through each data line, and at each iteration: 2
  3 Splits line string into a list of data items 2
  3 Stores values in their own respective variables 2
  3 Examines latitude 5
  3 Examines longitude 5
  3 If latitude and longitude constraints TRUE, prints mmsi and fleet 5
5 - Bonus a If no vessels meet criteria, print “No vessels met criteria.” 2