Background
Background
Learning objective:
- Explain why data science has become a big deal lately
- Define reproducibility and transparency in a data science context
- Describe each step in the data science workflow
- Differentiate a tidy dataset from a messy one
- Describe ways in which you might explore a dataset
-
What is “Data Science”?
- More than a sexy job getter
- Product of the data deluge, increased computing power, and more accessible scripting
- Like fracking: giving us access to previously untapped resources
- Also: thanks to Hadley Wickham, a concerted effort to make data analysis transparent & reproducible
- The data science workflow:
- Data engineering
- Data exploration
- Data analysis
- Presenting results and sharing workflows
- Post mortem
-
Data science and GIS
- Natural partners: Spatial Data Science
- GIS is an “information system”
- ESRI is investing in the field; big deal at UC 2020
- ArcGIS Pro + Python a good Data Science platform
- Spatial statistics, machine learning, AI
- Big uses in spatial data
- Look for spatial patterns; space time mapping
- Data enrichment
- Natural partners: Spatial Data Science
-
The data science workflow
-
Data engineering
- Identifying and accessing raw data
- Ingesting data into a coding environment
- Addressing missing data
- Tidying / reshaping data
- Needs: data structures: arrays, matrices, data frames
- Solutions: NumPy and Pandas
- EXERCISE: Cleaning data
- EXERCISE: Data Enrichment
-
Data exploration, including visualization
- Knowing your data source: uncovering biases
- Describing data in terms of summaries and distributions
- Helpful plots to explore your data
- EXERCISE: Exploring data
-
Data analysis
- Scripting…
- Statistics: Hotspot Random Forests, AI
-
Sharing & communicating results
- Plots and maps
- Stories
- Dashboards
-
-