GIS & Data Science- Section Overview
Introduction
Data Science is a fast emerging and “sexy” topic these days, but what exactly is “Data Science”? Here we’ll discuss what’s behind the big movement and the role GIS plays in it. We’ll examine the data science workflow of data engineering, data visualization/exploration, analysis/modeling/scripting, and sharing/collaboration. We’ll also discuss the importance of reproducibility and transparency. Additionally, we’ll examine key data structures used in data science, specifically the dataframe and its spatial counterpart, the spatially enabled dataframe, learning how these are constructed and from various data sources and used in analyses. It’s here where we’ll take a deep dive into ESRI’s ArcGIS API for Python, a powerful new package that links GIS, data science, and our next topic - cloud based GIS.
Time permitting, we’ll also examine the non-ESRI, open-source alternatives to include spatial analysis in data science tasks. These include GDAL, GeoPandas, Shapely, Fiona, OSM, and Folium. We may also explore technologies such as machine learning, artificial intelligence, and image processing from a spatial analysis perspective. We could also examine spatial analysis tools supported in R.
Section organization & learning outcomes
Topic | Learning Objectives |
---|---|
Introduction to data science | • Explain why data science has become a “hot topic” these days • Explain the advantages of open science and reproducible workflows • Explain the concept of tidy data and its importance in data analysis • List and describe each step in the traditional data science workflow • Differentiate a tidy data set from a messy one • Describe the roles GIS plays in data science |
Scientific data structures: NumPy Arrays |
• Explain the difference between a Python list and a Numpy vector • Create Numpy arrays of various shapes, sizes, & values • Convert feature classes to NumPy arrays using ArcPy • Compute basic statistics on NumPy arrays • Convert a raster to a NumPy array using ArcPy • Explain the concept and utility of stacked arrays |
Scientific data structures: Pandas DataFrames |
• Describe the structure and basic functionality of a Pandas dataframe • Read tabular data into a Pandas dataframe • Reveal key properties of a dataframe (size, shape, datatypes, etc) • Select data by columns • Generate descriptive statistics • Generate basic plots • Generate dataframes from lists and dictionaries |
Processing data w/DataFrames | • Execute basic calculations on a dataframe • Select rows and columns in a dataframe • Filter and update data • Deal with missing data • Convert/coerce data types • List unique values • Sort data • Write data to a file |
Scientific data structures: GeoPandas & GeoDataFrames |
• Create, edit, & describe properties of geometric objects using Shapely • Manage vector spatial data using Geopandas and geodataframes • Re-project geodataframes from one projection to another • Perform spatial analysis using Geopandas – Calculating distances – Nearest neighbors – Joining data • Read, write, and explore raster spatial data using Rasterio • Work with rasters as nd-arrays with Numpy and Scikit-Image • Create static maps using Geopandas and Contextily • Create interactive maps using Bokeh and Folium |
Data Visualization |
• Basic plotting with Pandas • Fine tuned plots in Pandas • Plotting with ggplot & Plotnine • Mapping w/Folium & GeoPandas • Mapping w/the ArcGIS Python API |