What is a Data Frame?

1. What is a Data Frame?

  • Table of data
  • Rows represent observations
  • Columns (attributes) must have the same data type
  • Cells in the table can be referenced by intrinsic location (iloc) or by row and column labels (indices)

2. Loading data into a Data Frame

  • read_csv()
    • Quite customizable
    • Has defaults on how data are read in, which can be overridden (dtype parameter)
  • Other options:
    • read_html(), read_excel(),…

3. Viewing and inspecting data frame properties

4. Selecting columns

5. Descriptive statistics


5.3.1 Pandas - Intro to Data Frames

Time Topic
0:45 What is a Data Frame?
3:10 Dataframe as a list of lists
5:20 Dataframe as a collection of dictionaries
10:20 Loading data into a dataframe (read_csv())
14:30 Exploring your data - revealing column data types
16:35 Specifying data types when importing with read_csv()
18:06 Specifying a column to be the index when importing with read_csv()

5.3.2 Pandas - Exploring Data

Time Topic
0:20 Inspecting the data with head(), tail() and sample()
1:45 Reading in raw text files stored on the internet; more on read_csv()
4:45 Revealing aspects of your dataframe: len(df), df.shape, df.size
6:42 Listing columns of your dataframe with df.columns
8:05 Listing the index of your dataframe with df.index
8:55 Setting the index column when reading in your data with read_csv()
10:35 Listing data frame info with df.info()
13:15 Selecting specific columns from your data frame into a new dataframe
16:15 Selecting a single column into a Series object; what is a Series object?
17:20 Referring to columns: brackets (df['column']) vs dot notation (df.column)
19:21 Descriptive statistics for a column of data
20:39 Quantiles
21:25 Correlations among numeric columns
22:58 Styling your correlation output
24:47 Generating summary stats with df.describe()
25:25 Listing unique values and number of unique values with df.unique() and df.nunique()
26:52 Listing number of records for each value with value_counts()
28:12 Basic plots in Pandas: histograms with df.hist()
30:32 Boxplots

3. Pandas - Analysis 1