Background

Background

Learning objective:

  • Explain why data science has become a big deal lately
  • Define reproducibility and transparency in a data science context
  • Describe each step in the data science workflow
  • Differentiate a tidy dataset from a messy one
  • Describe ways in which you might explore a dataset
  • What is “Data Science”?

    • More than a sexy job getter
    • Product of the data deluge, increased computing power, and more accessible scripting
    • Like fracking: giving us access to previously untapped resources
    • Also: thanks to Hadley Wickham, a concerted effort to make data analysis transparent & reproducible
    • The data science workflow:
      • Data engineering
      • Data exploration
      • Data analysis
      • Presenting results and sharing workflows
      • Post mortem
  • Data science and GIS

    • Natural partners: Spatial Data Science
      • GIS is an “information system”
      • ESRI is investing in the field; big deal at UC 2020
    • ArcGIS Pro + Python a good Data Science platform
    • Spatial statistics, machine learning, AI
      • Big uses in spatial data
      • Look for spatial patterns; space time mapping
    • Data enrichment
  • The data science workflow

    • Data engineering

      • Identifying and accessing raw data
      • Ingesting data into a coding environment
      • Addressing missing data
      • Tidying / reshaping data
      • Needs: data structures: arrays, matrices, data frames
      • Solutions: NumPy and Pandas
      • EXERCISE: Cleaning data
      • EXERCISE: Data Enrichment
    • Data exploration, including visualization

      • Knowing your data source: uncovering biases
      • Describing data in terms of summaries and distributions
      • Helpful plots to explore your data
      • EXERCISE: Exploring data
    • Data analysis

      • Scripting…
      • Statistics: Hotspot Random Forests, AI
    • Sharing & communicating results

      • Plots and maps
      • Stories
      • Dashboards