- Reading a data file with Pandas
- Multiple formats: CSV is most popular
- CSV options:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
- Skip rows
- Comment
- Use specific columns
- dtypes
- na
- Exploring data
- Viewing: head()/tail()/sample()
- Size: len(), shape
- Info: columns, info, dtypes
- Structure: index, shape
- Selecting data: columns
- Descriptive statistics
- mean, describe
- Quick plots
https://www.epa.gov/enviroatlas/data-download-step-2
1. Exploring Data with Pandas:
-
Basic form of a dataframe
- Rows and columns (vs numpy array)
- Values in a column all have the same data type
- Can be seen as a list of lists or as a collection of dictionaries
- Role of the index
-
Loading data with Pandas
- Basic format: read_csv
- Overriding default data types
- Skipping rows, comments
- Only reading certain columns
-
Viewing and inspecting data
-
Viewing formatted data with head tail sample - Dataframe properties: len, shape, columns, index, dtypes, info
-
-
Selecting columns
- Selecting 1 column → series
- Selecting >1 column → dataframe
-
Descriptive statistics
- Stats on a series: mean, min, max, median, std, percentiles
- Stats on a dataframe
- Correlations with corr
-
Basic Plots
- Setting plot types
- Axes
- Selecting data
- Filtering, sorting, & grouping data
- Data cleaning
- Joining dataframes
2. Processing Data with Pandas
- Basic calculations
- Filtering and updating data
- Dealing with missing data
- Data type conversions
- Unique values
- Sorting data
- Grouping and transforming data
- Joining data
- Writing data to a file