What is a Data Frame?
1. What is a Data Frame?
- Table of data
- Rows represent observations
- Columns (attributes) must have the same data type
- Cells in the table can be referenced by intrinsic location (iloc) or by row and column labels (indices)
2. Loading data into a Data Frame
read_csv()
- Quite customizable
- Has defaults on how data are read in, which can be overridden (dtype parameter)
- Other options:
- read_html(), read_excel(),…
3. Viewing and inspecting data frame properties
4. Selecting columns
5. Descriptive statistics
5.3.1 Pandas - Intro to Data Frames
Time | Topic |
---|---|
0:45 | What is a Data Frame? |
3:10 | Dataframe as a list of lists |
5:20 | Dataframe as a collection of dictionaries |
10:20 | Loading data into a dataframe (read_csv() ) |
14:30 | Exploring your data - revealing column data types |
16:35 | Specifying data types when importing with read_csv() |
18:06 | Specifying a column to be the index when importing with read_csv() |
5.3.2 Pandas - Exploring Data
Time | Topic |
---|---|
0:20 | Inspecting the data with head() , tail() and sample() |
1:45 | Reading in raw text files stored on the internet; more on read_csv() |
4:45 | Revealing aspects of your dataframe: len(df) , df.shape , df.size |
6:42 | Listing columns of your dataframe with df.columns |
8:05 | Listing the index of your dataframe with df.index |
8:55 | Setting the index column when reading in your data with read_csv() |
10:35 | Listing data frame info with df.info() |
13:15 | Selecting specific columns from your data frame into a new dataframe |
16:15 | Selecting a single column into a Series object; what is a Series object? |
17:20 | Referring to columns: brackets (df['column'] ) vs dot notation (df.column ) |
19:21 | Descriptive statistics for a column of data |
20:39 | Quantiles |
21:25 | Correlations among numeric columns |
22:58 | Styling your correlation output |
24:47 | Generating summary stats with df.describe() |
25:25 | Listing unique values and number of unique values with df.unique() and df.nunique() |
26:52 | Listing number of records for each value with value_counts() |
28:12 | Basic plots in Pandas: histograms with df.hist() |
30:32 | Boxplots |