• Reading a data file with Pandas
    • Multiple formats: CSV is most popular
    • CSV options:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
      • Skip rows
      • Comment
      • Use specific columns
      • dtypes
      • na
  • Exploring data
    • Viewing: head()/tail()/sample()
    • Size: len(), shape
    • Info: columns, info, dtypes
    • Structure: index, shape
  • Selecting data: columns
  • Descriptive statistics
    • mean, describe
  • Quick plots

https://www.epa.gov/enviroatlas/data-download-step-2

1. Exploring Data with Pandas:

  • Basic form of a dataframe

    • Rows and columns (vs numpy array)
    • Values in a column all have the same data type
    • Can be seen as a list of lists or as a collection of dictionaries
    • Role of the index
  • Loading data with Pandas

    • Basic format: read_csv
    • Overriding default data types
    • Skipping rows, comments
    • Only reading certain columns
  • Viewing and inspecting data

    • Viewing formatted data with head tail sample
    • Dataframe properties: len, shape, columns, index, dtypes, info
  • Selecting columns

    • Selecting 1 column → series
    • Selecting >1 column → dataframe
  • Descriptive statistics

    • Stats on a series: mean, min, max, median, std, percentiles
    • Stats on a dataframe
    • Correlations with corr
  • Basic Plots

    • Setting plot types
    • Axes
  • Selecting data
  • Filtering, sorting, & grouping data
  • Data cleaning
  • Joining dataframes

2. Processing Data with Pandas

  • Basic calculations
  • Filtering and updating data
  • Dealing with missing data
  • Data type conversions
  • Unique values
  • Sorting data
  • Grouping and transforming data
  • Joining data
  • Writing data to a file