ENV 859

Reading a data file with Pandas
- Multiple formats: CSV is most popular
- CSV options:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
  - Skip rows
  - Comment
  - Use specific columns
  - dtypes
  - na
Exploring data
- Viewing: head()/tail()/sample()
- Size: len(), shape
- Info: columns, info, dtypes
- Structure: index, shape
Selecting data: columns
Descriptive statistics
- mean, describe
Quick plots

https://www.epa.gov/enviroatlas/data-download-step-2

Basic form of a dataframe
- Rows and columns (vs numpy array)
- Values in a column all have the same data type
- Can be seen as a list of lists or as a collection of dictionaries
- Role of the index
Loading data with Pandas
- Basic format: read_csv
- Overriding default data types
- Skipping rows, comments
- Only reading certain columns
Viewing and inspecting data
- Viewing formatted data with head tail sample
- Dataframe properties: len, shape, columns, index, dtypes, info
Selecting columns
- Selecting 1 column → series
- Selecting >1 column → dataframe
Descriptive statistics
- Stats on a series: mean, min, max, median, std, percentiles
- Stats on a dataframe
- Correlations with corr
Basic Plots
- Setting plot types
- Axes
Selecting data
Filtering, sorting, & grouping data
Data cleaning
Joining dataframes