Problem Set 3 - Writing Scripts
The materials for this problem set are located in this zip file. Download and unpack to your
V:
drive, then open the folder in VS Code as you would any new coding workspace. You are welcome to use Git/GitHub with this assignment, but it is not required or expected.
Part I.
The following exercises consist of editing an existing Python script (ProblemSet3_Part1.py
). Open this script in VS Code and you’ll notice that the tasks are separated by lines of code starting with #%%
; this notation demarks code cells within VS code that behave much like code cells in Jupyter Notebooks. Above each code chunk, you’ll see options to run or debug specific cells, but you can also run selected line(s) individually via righ-clicking and selecting Run Selection/Line in Interactive Window
. (Note that <shift>
-<enter>
runs the code cell, not the selected line(s)!)
NOTES:
If you are unable to fully complete any of the scripts, get as close as you can to the finished product, and I will award partial credit where it’s due. Feel free to be creative in striving for partial credit. For example, if you are unable to read in a file to get the first line, copy the first line and paste it in as a variable.
I will be paying attention to script efficiency and readability in scoring these. Your primary task is that the script execute what’s asked, but beyond that, take time to create clean, logical, and readable code. Tag each script with your name and date. Also, supply abundant comments within your script that briefly describe each major step in the script’s flow. Also, use descriptive variables names where appropriate.
1. (10 pts) Python syntax & string manipulation
Open the ProblemSet3_Part1.py
and open it up in VS Code. Debug/edit the first code cell so that it successfully prints out the following three lines, exactly as shown below, to the console.
NOTE: Do NOT use multiline strings in your code, and strive to use as few edits as possible.
2. (15 pts) Lists and iteration
Just below the code you wrote for Task 1 in the ProblemSet3_Part1.py
script, you’ll see a code cell break: #%% Task 2
.
Below that code break, write a brief code snippet that does the following:
-
Assigns a variable named
data_folder
to the string object that prints as "W:\859_data\triangle".
(Note for this exercise, data does not actually have to exist in this folder, but the value should be in the format of a valid path…) - Creates a list object called
data_list
containing the following string objects:- streams.shp
- stream_types.csv
- naip_imagery.tif
- Assigns a variable named
user_item
and sets it to the string “roads.shp”. - Adds the
user_item
string object to thedata_list
list. - Loops through each item in the
data_list
list, and for for each object prints the full Windows path of each dataset, created by concatenating thedata_folder
string with the item in thedata_list
. The output should appear in the console as follows:
3. (23 pts) Lists and iteration
Add a new code cell break (#%% Task 3
) at the end of your current script document. Then add code that does the following:
-
Creates an empty list variable named
user_numbers
. -
Iterates the following process three times (using the
range()
function and afor
loop):-
Uses the
input()
function to ask the user to “Enter an integer:”.
(Note: you can assume that the user will correctly enter an integer.) -
Adds the user supplied integer to the
user_numbers
list created in 3.1.
-
-
Sorts the
user_numbers
in ascending numeric order. -
Prints the highest value in the
user_numbers
, i.e., the last value when sorted, to the interactive window.
Run your script using the numbers: 1
, 100
, and 20
. Does it return 100
? It should, but if not, try to explain in words (in a commented portion of your script code cell) what might be going wrong.
►Challenge question - 2 pts.
- Copy your code cell above into a new code cell (“Task 3 - Challenge”)
- Alter it so that it again sorts the
userNumbers
, but this time in descending numeric order, and then prints the entire contents of the list. (The output should be[100,20,1]
).
Part II.
The remaining questions deal with datasets related to Global Fishing Watch’s Daily Fishing Effort and Vessel Presence research on Global Patterns of Transshipment Behavior. Two data files from the project’s database - transshipment_vessels_20180723.csv
and loitering_events_20180723.csv
, as well as a READ_ME
text file describing these files, should be in your ProblemSet3 workspace.
Take a look at these files and familiarize yourself with the data: both its format and what data each includes.
You will, through a series of semi-guided steps, pull the data stored in these files into Python objects that will allow us to query the data. More specifically, you will:
- Create a dictionary listing the attributes of each vessel listed in the
transshipment_vessels_2018723.csv
file - Loop through the records in the
loitering_events_2018723.csv
file, and for each vessel observed within a defined geographic region, print information about the vessel.
4. (25 pts) Lists, dictionaries, string manipulation, and iteration
The task here is to convert the data held in the transshipment_vessels_20180723.csv
into a format such that we can specify the vessel’s Maritime Mobile Service Identity or MMSI code and it will return the fleet to which the vessel belongs.
The sequence of tasks provided below will lead to than end, but you will need to add the correct code to execute the steps. Complete the following tasks in the code boxes below them.
Task 4.1: Reading in the data and displaying the column headers
Create a new Python script called ProblemSet3_Part2.py
. Paste in the code below and replace the redacted areas so that it successfully:
- Creates a file object named
fileObj
by opening thetransshipment_vessels_20180723.csv
text file. [This is done for you.] - Reads in the entire contents (.i.e. all lines) of the
fileObj
into a list variable namedlineList
. Be sure to close the file once you have its contents stored in a local variable. - Creates a variable called
headerLineString
to hold the first item in thelineList
list - Prints the contents of this
headerLineString
variable.
#%% Task 4.1
#Create a Python file object, i.e., a link to the file's contents
fileObj = open(file='data/raw/transshipment_vessels_20180723.csv',mode='█')
#Read the entire contents into a list object
lineList = fileObj.█()
#Release the link to the file objects (now that we have all its contents)
fileObj.█() #Close the file
#Save the contents of the first line in the list of lines to the variable "headerLineString"
headerLineString = █
#Print the contents of the headerLine
print(█)
Result should be:
mmsi,shipname,callsign,fleet_iso3,fleet_name,imo
Task 4.2: Splitting the header string into a list of column names and extracting index values
As above, paste the code below into your Python script and replace the redacted parts so that it:
-
Splits the contents of the first line into a list variable called
headerItems
. (Note that the items in this file are separated by commas…) -
Uses the
index()
function to find the indices associated with themmsi
,shipname
, andfleet_name
items in theheaderItems
list and assigns the index value to a variables calledmmsa_idx
,name_idx
, andfleet_idx
, respectively.
If you are unable to do this step, just set themmsi_idx
,name_idx
, andfleet_idx
variables to the numbers 0, 1, 4, respectively -
Prints the value of each index value.
#%% Task 4.2
#Split the headerLineString into a list of header items
headerItems = █
#List the index of the mmsi, shipname, and fleet_name values
mmsi_idx = █
name_idx = █
fleet_idx = █
#Print the values
print(mmsi_idx,name_idx,fleet_idx)
Result should be:
0 1 4
Task 4.3: Iterating through the data lines and adding values to a dictionary
In the cell below, write code that:
- Creates and empty dictionary object named
vesselDict
- Loops through all the data lines in the lineList variable created above. (Remember to skip the first line as it contains header information, not actual data).
- In each iteration of the loop:
- Splits the data line (a string) into a list of values
- Extracts the
mmsi
value from this list. Use themmsi_idx
variable created above to get this value. - Likewise, extracts the
fleet
value from this list - Adds an item to the
vesselDict
dictionary with the key set to the mmsi and the value set to the fleet.#%% Task 4.3 #Create an empty dictionary vesselDict = █ #Iterate through all lines (except the header) in the data file: for █: #Split the data into values █ #Extract the mmsi value from the list using the mmsi_idx value mmsi = █ #Extract the fleet value fleet = █ #Adds info to the vesselDict dictionary █
The resulting
vesselDict
dictionary should have 1040 items in it.Note: While the lineList has 1122 items, not all are translated into dictionary items. If you look carefully at the
transshipment_vessels_20180723.csv
file, you will see that not all records have “mmsi” values…
Task 4.4: Using your dictionary
In a new code cell in your script, add code that:
-
Assigns the string value 440196000 to a variable named
vesselID
-
Uses the
vesselDict
dictionary to lookup the fleet value for the vessel with the MMSI equal to thevesselID
value. -
Prints the statement:
Vessel # 440196000 flies the flag of South Korea
using the
vesselID
and the fleet value extracted above to construct the string that’s printed.
5. (25 pts) Scripting task
In this exercise, we use the GFW “loitering event” dataset. This dataset contains location and movement data of vessels classified as moving idly, as opposed to steaming to a certain destination. We want to write some Python code that scans these data for particular records, namely those loitering events that cross the equator (from south to north) and originated within a certain longitudinal band (from 145°E to 155°E). And if any records are found, it prints out the MMSI of the vessel and its fleet - using the dictionary created above to tell us.
I’ve provided the following pseudocode to help you. Feel free to deviate from it, but I’ve given it to you to help out, not to trick you or anything like that.
→ Append your code to the end of the script you wrote for Task 4 as it uses the dictionary you created there.
Pseudocode for Task 5:
Open the
loitering_events_20180723.csv
file into a file object variable. (Tip: the code provided in Part 4A above can serve as a good template)Construct a list of all lines in the csv file. (Again, just as you did above…)
Loop through each data line (i.e. skip the header line) in this line list, and at each iteration:
Split the line string into a list of data items.
Store the
transshipment_mmsi
,starting
&ending latitude
, andstarting & ending longitude
values into their own respective variables (e.g.mmsi = ...
)Examines the starting and ending latitude (the 2nd and 4th columns in the csv) to determine whether the event crosses the equator, passing from the southern hemisphere to the north.
Examines the starting longitude to see whether it falls between 145°E and 155°E. Again, it’s useful to create a Boolean variable that store whether this is true or false.
If both the latitude and longitude constraints are true, then use the value of the
transmission_mmsi
for the current line to query thevesselsDict
created above to print the vessel’s mmsi and its fleet.BONUS: If no vessels meet your criteria, print a message that states “No vessels met criteria”
►Include abundant comments to your code! They make it easier to award partial credit if your code fails to work…◄
Results for Task 5 should read:
Vessel #352085000 flies the flag of Panama
Vessel #357020000 flies the flag of Panama
Vessel #441690000 flies the flag of South Korea
Vessel #529445000 flies the flag of Kiribati
Vessel #529120000 flies the flag of Kiribati
When complete, zip your project folder, including your 2 Python scripts and the csv files to a folder and submit to Sakai.
Rubric
Step | Description | Pts Possible | |
---|---|---|---|
Part 1 1 | a | Variables included in code | 4 |
b | Code prints exactly as shown | 4 | |
c | Minimal number of edits | 2 | |
2 | 1 | Assigns W:\859_data\triangle to dataFolder variable |
2 |
2 | Creates dataList containing string items | 2 | |
3 | Creates userItem variable and sets it to streams.shp | 2 | |
4 | Adds userItem string object to dataList list | 2 | |
5 | Loops through dataList and prints full path | 2 | |
3 | 1 | Creates empy list variable named userNumbers | 5 |
2 | Iterates using range function and for loop | 9 | |
3 | Sorts userNumbers in ascending numeric order | 5 | |
4 | Prints highest value in userNumbers | 5 | |
3 - Bonus | a | Sorts userNumbers in descending numeric order | 1 |
b | Prints entire contents of the list | 1 | |
Part 2 4.1 | 2 | Reads contents of fileObj into lineList variable | 1 |
3 | Creates headerLineString variable | 1 | |
3 | Closes file after storing contents in a local variable | 1 | |
4 | Prints headerLineString contents | 1 | |
4.2 | 1 | Splits contents of first line into headerItems variable | 2 |
2 | Uses index () function to find indices; assigns index value to variables | 4 | |
3 | Prints value of each index variable | 2 | |
4.3 | 1 | Creates vesselDict empty dictionary | 1 |
2 | Loops through data lines and splits the data line into list of values | 2 | |
2 | Loops through data and extracts mmsi value and fleet value | 2 | |
3 | Adds an item to vesselDict with key set to mmsi and value set to fleet | 2 | |
4.4 | 1 | Assigns correct value to vesselID variable | 1 |
2 | Uses vesselDict to find vessel with MMSI equal to vesselID value | 3 | |
3 | Prints statement using vesselID and the fleet value extracted previously | 2 | |
5 | 1 | Opens .csv file into a file object variable | 3 |
2 | Constructs list of all lines in the .csv file | 3 | |
3 | Loops through each data line, and at each iteration: | 2 | |
3 | Splits line string into a list of data items | 2 | |
3 | Stores values in their own respective variables | 2 | |
3 | Evaluates cross-equator criteria | 5 | |
3 | Evaluates start longitude criteria | 5 | |
3 | If both criteria are met, prints mmsi and fleet | 5 | |
5 - Bonus | a | If no vessels meet criteria, print “No vessels met criteria.” | 2 |