GIS & Data Science- Section Overview

ENV 859 - Geospatial Data Analytics   |   Fall 2024   |   Instructor: John Fay  

Introduction

Data Science is a fast emerging and “sexy” topic these days, but what exactly is “Data Science”? Here we’ll discuss what’s behind the big movement and the role GIS plays in it. We’ll examine the data science workflow of data engineering, data visualization/exploration, analysis/modeling/scripting, and sharing/collaboration. We’ll also discuss the importance of reproducibility and transparency. Additionally, we’ll examine key data structures used in data science, specifically the dataframe and its spatial counterpart, the spatially enabled dataframe, learning how these are constructed and from various data sources and used in analyses. It’s here where we’ll take a deep dive into ESRI’s ArcGIS API for Python, a powerful new package that links GIS, data science, and our next topic - cloud based GIS.

Time permitting, we’ll also examine the non-ESRI, open-source alternatives to include spatial analysis in data science tasks. These include GDAL, GeoPandas, Shapely, Fiona, OSM, and Folium. We may also explore technologies such as machine learning, artificial intelligence, and image processing from a spatial analysis perspective. We could also examine spatial analysis tools supported in R.


Section organization & learning outcomes

Topic Learning Objectives
Introduction to data science • Explain why data science has become a “hot topic” these days
• Explain the advantages of open science and reproducible workflows
• Explain the concept of tidy data and its importance in data analysis
• List and describe each step in the traditional data science workflow
• Differentiate a tidy data set from a messy one
• Describe the roles GIS plays in data science
Scientific data structures:
NumPy Arrays
• Explain the difference between a Python list and a Numpy vector
• Create Numpy arrays of various shapes, sizes, & values
• Convert feature classes to NumPy arrays using ArcPy
Compute basic statistics on NumPy arrays
Convert a raster to a NumPy array using ArcPy
• Explain the concept and utility of stacked arrays
Scientific data structures:
Pandas DataFrames
• Describe the structure and basic functionality of a Pandas dataframe
Read tabular data into a Pandas dataframe
• Reveal key properties of a dataframe (size, shape, datatypes, etc)
Select data by columns
• Generate descriptive statistics
• Generate basic plots
Generate dataframes from lists and dictionaries
Processing data w/DataFrames Execute basic calculations on a dataframe
Select rows and columns in a dataframe
Filter and update data
• Deal with missing data
Convert/coerce data types
• List unique values
Sort data
Write data to a file
Scientific data structures:
GeoPandas & GeoDataFrames
• Create, edit, & describe properties of geometric objects using Shapely
• Manage vector spatial data using Geopandas and geodataframes
• Re-project geodataframes from one projection to another
• Perform spatial analysis using Geopandas
    – Calculating distances
    – Nearest neighbors
    – Joining data
Read, write, and explore raster spatial data using Rasterio
• Work with rasters as nd-arrays with Numpy and Scikit-Image
Create static maps using Geopandas and Contextily
Create interactive maps using Bokeh and Folium
Data Visualization
Basic plotting with Pandas
• Fine tuned plots in Pandas
• Plotting with ggplot & Plotnine
• Mapping w/Folium & GeoPandas
• Mapping w/the ArcGIS Python API