Data Visualization

ENV 859 - Geospatial Data Analytics   |   Fall 2024   |   Instructor: John Fay  

Note: This section is optional and will not appear on any problem set. Not directly at least….

A good portion of the data science workflow is visualizing your data, either in the data exploration phase or in sharing your findings. As you might guess, many, many options exist for constructing plots and graphs of your data, making it quite a challenge to distill into just a single session. With that in mind, we focus on the very basics of plotting your data.

Confounding matters is the fact that plotting data is, at first, much more challenging to do in code than other platforms that you may be used to (e.g. Excel or Tableau). However, once you get the structure of plotting commands and a bit of the lingo, your skill acquisition rate should accelerate. And the pay of is re-useable code that can be used over and over again, with just minor tweaks to build compelling graphics that are compelling, reactive to code changes (and thus potentially interactive), and transparently reproducible.

With our limited time on this topic, we focus on a few major Python packages used for creating plots, and also on the basics of visualization. Specifically, the learning objectives include:

Notebook Learning Objectives
1. Basic Plotting with Pandas • Wrangling data into proper formats for plotting
• How to examine data from a plotting perspective
• Grouping & aggregating data for better plots
• Setting basic plot types with kind
• Different kinds of plots: line, bar, barh,
• Set plot colors with color
• Set plot canvas size with figsize
• Set plot colors with colormap
• Set plot labels with labels
2. More Pandas plotting • Creating time series plots
• Using dataframe indices in your plots
• Describing plots in terms of “geoms” (or “axes”) and “aesthetics”
3. Plotting with ggplot & plotnine • The Grammar of Graphics & ggplot
• Installing packages on the fly with pip
• Create a stacked bar plot
• Create a facet plot
• Customize labels and coordinate systems
• Exporting plots
4. Visualizing Spatial Data • Creating static maps with geopandas and contexitily
• Creating interactive maps with mplleaflet
• Creating interactive maps with folium

Section Prep

Conda environment

As we did with spatial dataframes, we’ll need a custom Python environment to run our notebooks. We can use the same one we did fpr spatial dataframes. Consult that page for info on constructing it – or just continue using that one.

GitHub repository

The materials for this session are in a GitHub repository here: https://github.com/ENV859/DataViz. You should fork and clone this repository to your local machine. In this repository you’ll see a shortcut to open Jupyter Notebooks using the gis environment we created in the spatial dataframes