Analyzing Data with Pandas¶

Here, we continue our exploration of the Pandas package. We focus on various ways data stored in dataframes can be analyzed using the package. Specifically, we look at:

Calculating and updating fields

Selecting data in a dataframe
Grouping and aggregating data in a dataframe

#Import data
import pandas as pd

#Initial import of surveys dataset
survey_df = pd.read_csv('../data/surveys.csv',
                        index_col='record_id',
                        dtype={'plot_id':'str'})
survey_df.head()

1. Calculating and updating fields¶

A typical operation in dataframes is creating new fields from values in other fields. We'll show this by computing hindfoodt_length (currently in mm) into inches. (25.4 mm per inch...)

#Create a field of hindfoot length in inches
survey_df['hindfoot_length_in'] = survey_df['hindfoot_length'] / 25.4

#Review the output
survey_df.head()

► YOU TRY IT:

Create a new column "weight_oz" converting weigth (grams) into ounces (28.35 g/oz)

#Create a field of weight in ounces


#Review the output
survey_df.tail()

2. Selecting data from a dataframe¶

We often want to isolate specific rows and/or columns of data from our dataframe for further analysis. Pandas offers many ways to do this. Here we'll review these techniques.

2a. Selecting single rows, select rows, or row slices using `iloc`¶

Recall that dataframes can be considered lists (rows) of lists (columns). As such, we can select specific rows by their integer index, either indivdually or in slices. This is done using the dataframe's iloc method.

Refer to the Pandas help on iloc for more info on this functionality:

iloc is short for integer location

#Fetch the 4th row of data
survey_df.iloc[3]

month                       7
day                        16
year                     1977
plot_id                     7
species_id                 DM
sex                         M
hindfoot_length            36
weight                    NaN
hindfoot_length_in    1.41732
Name: 4, dtype: object

#Fetch the 101st thru 110th rows of data
survey_df.iloc[100:110]

#Fetch the 3rd, 5th, and 10th rows
survey_df.iloc[[2,4,9]]

2b. Selecting rows and columns using `iloc`¶

The iloc method can also be used to select columns. Which columns to select are passed as the second parameter in the iloc function. Therefore, we need to specify which rows we want to fetch first. If we want to retrieve the selected columns for all rows, we can just pass a ":" as the first parameter.

#Select all rows and the 4th column of the dataset
survey_df.iloc[:,3]

record_id
1         2
2         3
3         2
4         7
5         3
         ..
35545    15
35546    15
35547    10
35548     7
35549     5
Name: plot_id, Length: 35549, dtype: object

#Select the first 5 rows and the 4th column of the dataset
survey_df.iloc[:5,3]

record_id
1    2
2    3
3    2
4    7
5    3
Name: plot_id, dtype: object

→ So you see, selecting subsets is just a matter of messing around with the row/column integer indices and inserting them into the iloc method.

2c. Selecting rows and columns using `loc`¶

While iloc selects rows and columns by their absolute position, loc selects rows and columns by their labels. Yes, each row and column has a label. A column's label is its column name; a row's label is its label index (not to be confused with its integer index, which is simply its row number).

When Pandas reads in a CSV file, the CSV file determines the column names, but row labels are assigned a sequential numeric value by default. When we read in data into our surveys_df, we specified the values inthe record_id column to become our index, i.e., our row labels. We can now use these labels to select rows, and combine them with column names to select row/column combinations:

#Fetch the row with row label of 10
survey_df.loc[10]

month                        7
day                         16
year                      1977
plot_id                      6
species_id                  PF
sex                          F
hindfoot_length             20
weight                     NaN
hindfoot_length_in    0.787402
Name: 10, dtype: object

#Fetch value in the species_id column in the row with row label of 10 
survey_df.loc[10,'species_id']

'PF'

Because our dataframe's row index (i.e. its row labels) happens to be numeric, we can fetch slices of values. We can also fetch multiple columns by passing them in as a list.

#Fetch value in the species_id column in the row with rows with labels between 10 and 20
survey_df.loc[10:20,['species_id','sex']]

2d. Selecting rows based on criteria - using queries¶

Instead of selecting data by position or by index, we can also query our dataframe for all records meeting specific criteria. In this first example, we'll use the dataframe's query() function.

#Fetch all records from the year 1997 into a new variable
records_1997 = survey_df.query('year == 1997')
records_1997.shape

(2493, 9)

#Fetch all records from 1997 for female individuals
females_1997 = survey_df.query('year == 1997 & sex == "F"')
females_1997.shape

(1071, 9)

2e. Selecting rows based on criteria - using masks¶

Queries are handy, but as more criteria are used, the query statement can get cumbersome. The use of masks provide a somewhat more simplified approach, or at least a more modular approach.

A "mask" is a Python series (i.e. one column of data) with a Boolean values for each row:

survey_df

#Create a mask for all records with hindfeet smaller than 35mm
mask_smallfeet = survey_df['hindfoot_length'] < 35
mask_smallfeet

record_id
1         True
2         True
3        False
4        False
5        False
         ...  
35545    False
35546    False
35547     True
35548    False
35549    False
Name: hindfoot_length, Length: 35549, dtype: bool

We can now apply this mask to select records where the mask value is true by using it with the loc method.

survey_df.loc[mask_smallfeet]

What's nice about masks is that we can combine multiple masks. Think of each Boolean value as as on/off switch that determines whether or not the value is added to the final set. We use & ["and" switch] to combine switches such that each must be "on", and | ["or" switch] to require only one switch to be on.

#Create a mask of females
mask_male = survey_df['sex'] == 'M'
mask_male

#Create a mask for records colected in April
mask_april = survey_df['month'] == 4

#Filter records meeting all criteria
data_subset = survey_df.loc[mask_smallfeet & mask_male & mask_april]
data_subset.shape

(790, 9)

#Filter records meeting any criteria
data_subset = survey_df.loc[mask_smallfeet | mask_male | mask_april]
data_subset.shape

(27550, 9)

►Knowledge check:¶

Use the query() function to select all male individuals with weight greater than 41 g.
- What is the median hindfoot length of this population?
Perform the same analysis using masks.

#Extract males weighing more than 41 g - using query
subset = survey_df.query("sex == 'M' and weight > 41")
subset['hindfoot_length'].median()

36.0

#Extract males weighing more than 41 g - using masks
mask_weight = survey_df['weight'] > 41
subset_2 = survey_df.loc[mask_male & mask_weight]
subset_2['hindfoot_length'].median()

36.0

2f. Updating values in selected rows/columns¶

We've learned how to select data in specific rows and columns. Now we'll update those values.

Our dataset has some erroneous data: records occuring on the "31st" day of April (which has only 30 days in it). We'll replace those dates to be the 30th day of April. To do this, we have to select rows where the month is '4' and the day is '31'. And then with those rows selected, we'll update the the values in the day column to be 30.

#Create masks for day = 30 and month = 4
mask_day = survey_df['day'] == 31
mask_april = survey_df['month'] == 4 
mask_sept = survey_df['month'] == 9

If we apply the masks, it returns a dataframe of that selection

#Apply the masks
survey_df.loc[mask_day & (mask_april | mask_sept)]

We can add a column designation to just get one column with the selected rows:

#Fetch the just the day column of the filtered rows
survey_df.loc[mask_day & (mask_april | mask_sept), 'day']

record_id
30650    31
30651    31
30652    31
30653    31
30654    31
         ..
31500    31
31501    31
31502    31
31503    31
31504    31
Name: day, Length: 136, dtype: int64

Rather than just display to the screen, we can assign and new value to this selection

#Update the values
survey_df.loc[mask_day & (mask_april | mask_sept), 'day'] = 30

Check our results...

#What's the largest value in the day column for records where month = 4
survey_df.query('month == 4 or month == 9')['day'].max()

30

With the dates fixed, we can now create a date attribute

survey_df['date'] = pd.to_datetime(survey_df[['year','month','day']])
survey_df.head()

3. Grouping data¶

We often want to calculate summary statistics grouped by subsets or attributes within fields of our data. For example, we might want to calculate the average weight of all individuals per some variable like species_id, plot_id or sex. This is done by Pandas' group_by() function.

So let's go through the process of grouping data by sex (a variable without a lot of unique values) and computing mean weight and mean hindfoot length of males vs females.

First, some review:

We can calculate summary statistics for all records in a single column using the syntax below:

survey_df['weight'].describe()

count    32283.000000
mean        42.672428
std         36.631259
min          4.000000
25%         20.000000
50%         37.000000
75%         48.000000
max        280.000000
Name: weight, dtype: float64

We can also extract each single metric separately if we wish:

print(" Min: ", survey_df['weight'].min())
print(" Max: ", survey_df['weight'].max())
print(" Mean: ", survey_df['weight'].mean())
print(" Std Dev: ", survey_df['weight'].std())
print(" Count: ", survey_df['weight'].count())

 Min:  4.0
 Max:  280.0
 Mean:  42.672428212991356
 Std Dev:  36.63125947458358
 Count:  32283

To summarize by a categorical value, we group the data on that variable. The result of this operation is a new type of object - a Pandas "DataFrameGroupBy" object, which is an intermediate to doing analysis on grouped data.

#Group the data by unique values in the `sex` field
grouped_data = survey_df.groupby('sex')

#This creates a Panda's "grouped dataframe" object
type(grouped_data)

pandas.core.groupby.generic.DataFrameGroupBy

To analyze the grouped dataframe object, we supply an aggregate function like min(), median(), count() etc.

#Compute the median value of all numeric fields for each group
grouped_data.median()

# Provide a list of fields if you want to restrict what columns are reported
grouped_data[['hindfoot_length','weight']].median()

# Or, use the describe function to reveal all summary stats for the grouped data
grouped_data['weight'].describe()

Challenge - Summary Data ¶

How many recorded individuals are female F and how many male M

# Challenge 1: Show the *count* of records, grouped by sex
#grouped_data.█()
grouped_data.count()

The statement above produces another dataframe. Show just the count of the records in the weight field in the resulting dataframe.

# Challege 2: Just show the 'month' column in the above statement
#grouped_data.█()['█']
grouped_data.count()['month']

sex
F    15690
M    17348
Name: month, dtype: int64

What happens when you group by two columns using the following syntax and then grab mean values:
- grouped_data2 = survey_df.groupby(['plot_id','sex'])
- grouped_data2.mean()

# Challenge 3
grouped_data2 = survey_df.groupby(['plot_id','sex'])
grouped_data2.mean()

Display summary statistics of hindfood length values for each plot id in your data.
HINT: First group data, then extract the one variable you want to summarize, and finally use the describe function on that result to compute summary statistics...

# Challenge 4

More complex aggregating functions...¶

We can also supply a dictionary of aggregating functions so that each column in the grouped result is aggregated exactly how we want (i.e. instead of computing just the sum or mean of all columns). This dictionary is built by specifying the column name as the key and the aggregate function(s) as the values. Below is an example to aggregate the data by sex, computing the minumum and maximum of the year, the median of the hindfoot_length, and the mean weight.

#Create the aggregate function dictionary
aggFuncs = {
    "year": ['count','min','max'],#compute count, min, and max of the year attribute
    "hindfoot_length": 'median',  #compute the median hindfoot length
    "weight": 'mean'}             #compute the mean weight
#Apply the dictionary
survey_df.groupby('sex').agg(aggFuncs)

4. Transforming data with pivot tables¶

And finally, we can transform our data via pivot tables. In this operation, we take two categorical values in our data set, creating row labels with one and column names with the other.

survey_df.pivot_table(
    values = 'weight',
    index = 'species_id',
    columns= 'year',
    aggfunc ='mean'
).fillna(-1).T.style.background_gradient(cmap = 'YlGnBu')

	hindfoot_length	weight
sex
F	27.0	34.0
M	34.0	39.0

		month	day	year	hindfoot_length	weight	hindfoot_length_in
plot_id	sex
1	F	6.597877	15.338443	1990.933962	31.733911	46.311138	1.249367
1	M	6.121461	15.905936	1990.091324	34.302770	55.950560	1.350503
10	F	5.588652	16.964539	1989.248227	18.641791	17.094203	0.733929
10	M	5.718310	16.739437	1989.007042	19.567164	19.971223	0.770361
11	F	6.759124	16.272506	1989.836983	32.029299	43.515075	1.260996
11	M	6.374150	15.968902	1989.856171	32.078014	43.366197	1.262914
12	F	6.509434	16.305660	1990.266981	30.975124	49.831731	1.219493
12	M	6.304167	16.367500	1990.400833	31.762489	48.909710	1.250492
13	F	6.802548	16.216561	1990.619427	27.201014	40.524590	1.070906
13	M	6.480204	16.000000	1989.911877	27.893793	40.097754	1.098181
14	F	6.510578	16.672779	1989.974612	32.973373	47.355491	1.298164
14	M	6.660748	16.499065	1990.587850	32.961802	45.159378	1.297709
15	F	6.569647	16.149688	1992.523909	21.949891	26.670236	0.864169
15	M	6.185819	17.405868	1990.770171	21.803109	27.523691	0.858390
16	F	6.360360	16.810811	1991.351351	23.144928	25.810427	0.911218
16	M	6.201465	16.615385	1990.926740	23.480916	23.811321	0.924446
17	F	6.650224	15.892377	1990.785874	30.918536	48.176201	1.217265
17	M	6.569801	16.183286	1991.331434	32.227634	47.558853	1.268804
18	F	6.698013	15.960265	1990.536424	26.690341	36.963514	1.050801
18	M	6.610224	16.797125	1988.551118	27.703072	43.546952	1.090672
19	F	6.701149	15.226054	1993.417625	21.257937	21.978599	0.836927
19	M	6.533563	16.647160	1991.740103	21.071685	20.306878	0.829594
2	F	6.426804	16.288660	1990.449485	30.161220	52.561845	1.187450
2	M	6.340035	15.440559	1990.756119	30.353760	51.391382	1.195030
20	F	6.743077	16.026154	1990.253846	27.069193	52.624406	1.065716
20	M	6.489396	16.375204	1989.243067	27.908451	44.197279	1.098758
21	F	6.860884	16.307692	1993.878887	22.366554	25.974832	0.880573
21	M	6.671840	16.203991	1992.199557	21.736721	22.772622	0.855776
22	F	6.651267	15.521610	1990.973174	34.108320	53.647059	1.342847
22	M	6.381872	16.682021	1990.650817	33.359746	54.572531	1.313376
23	F	6.860606	16.036364	1989.024242	20.051948	20.564417	0.789447
23	M	6.391304	16.077295	1989.961353	19.850000	18.941463	0.781496
24	F	6.596708	16.393004	1987.485597	26.993377	47.914405	1.062731
24	M	6.360825	16.971134	1988.641237	25.786996	39.321503	1.015236
3	F	6.604703	16.155655	1992.013438	23.774044	31.215349	0.935986
3	M	6.271429	16.444048	1992.275000	23.833744	34.163241	0.938336
4	F	6.442661	15.741972	1990.235092	33.249102	46.818824	1.309020
4	M	6.430097	16.500971	1991.000971	34.097959	48.888119	1.342439
5	F	6.142315	15.722960	1986.485769	28.921844	40.974806	1.138655
5	M	6.194539	15.703072	1986.817406	29.694794	40.708551	1.169086
6	F	6.628223	16.301221	1991.579376	26.981322	36.352288	1.062257
6	M	6.035340	16.083770	1990.556283	27.425591	36.867388	1.079748
7	F	6.385075	15.313433	1991.441791	19.779553	20.006135	0.778723
7	M	6.719870	15.778502	1991.462541	20.536667	21.194719	0.808530
8	F	6.632143	15.825000	1991.267857	32.187578	45.623011	1.267227
8	M	6.571719	15.840285	1991.686673	33.751059	49.641372	1.328782
9	F	6.499414	15.547479	1989.303634	35.126092	53.618469	1.382917
9	M	6.361554	15.201195	1990.632470	34.175732	49.519309	1.345501

species_id	BA	DM	DO	DS	NL	OL	OT	OX	PB	PE	PF	PH	PI	PL	PM	PP	PX	RF	RM	RO	RX	SF	SH	SO	SS
year
1977	-1.000000	41.141304	42.666667	121.437500	-1.000000	21.666667	21.000000	21.333333	-1.000000	19.500000	7.173913	-1.000000	-1.000000	-1.000000	-1.000000	15.333333	-1.000000	-1.000000	10.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
1978	-1.000000	40.795455	45.000000	115.198653	184.656250	31.027027	23.220000	-1.000000	-1.000000	20.428571	7.088235	-1.000000	-1.000000	-1.000000	20.666667	14.869565	-1.000000	-1.000000	7.500000	-1.000000	-1.000000	-1.000000	89.000000	-1.000000	130.000000
1979	-1.000000	43.507317	45.862069	111.796954	138.000000	33.717391	23.075758	-1.000000	-1.000000	20.529412	7.529412	-1.000000	-1.000000	-1.000000	23.666667	15.105263	-1.000000	-1.000000	8.333333	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
1980	-1.000000	44.136082	48.058824	120.725664	158.982143	33.107143	24.083333	-1.000000	-1.000000	22.450000	7.455556	-1.000000	-1.000000	-1.000000	21.000000	14.176471	-1.000000	-1.000000	10.227273	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
1981	-1.000000	44.032907	49.111111	125.818444	165.903226	33.296296	24.415385	-1.000000	-1.000000	20.558140	7.152542	-1.000000	-1.000000	-1.000000	19.933333	13.950000	-1.000000	-1.000000	11.178571	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	57.000000
1982	-1.000000	41.975042	47.919643	115.647929	160.613208	35.590361	24.925532	24.000000	-1.000000	21.173077	6.918919	-1.000000	-1.000000	-1.000000	21.391304	16.125000	-1.000000	11.500000	10.436242	-1.000000	-1.000000	-1.000000	79.000000	-1.000000	-1.000000
1983	-1.000000	40.944551	47.150602	122.033088	156.691489	34.584615	24.697674	-1.000000	-1.000000	20.864865	6.833333	-1.000000	-1.000000	-1.000000	22.023810	15.468750	-1.000000	-1.000000	9.872000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
1984	-1.000000	40.765306	48.415094	124.082192	150.095238	32.550725	22.416667	18.000000	-1.000000	20.210526	7.428571	28.000000	-1.000000	-1.000000	19.545455	15.307692	-1.000000	-1.000000	11.152542	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
1985	-1.000000	41.507645	47.956731	124.326316	148.829268	32.108696	22.064516	-1.000000	-1.000000	20.360000	-1.000000	32.666667	-1.000000	-1.000000	19.000000	15.764706	-1.000000	-1.000000	8.371429	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
1986	-1.000000	43.361596	49.372727	128.071429	159.724138	30.880952	21.500000	-1.000000	-1.000000	22.520000	8.000000	39.000000	-1.000000	-1.000000	21.696970	16.750000	-1.000000	-1.000000	10.750000	-1.000000	-1.000000	-1.000000	55.000000	-1.000000	-1.000000
1987	-1.000000	43.232609	50.087379	126.383838	158.840000	30.779221	23.069767	-1.000000	-1.000000	21.625899	7.500000	-1.000000	-1.000000	-1.000000	22.297710	17.840909	-1.000000	-1.000000	10.657609	-1.000000	-1.000000	-1.000000	73.800000	-1.000000	-1.000000
1988	-1.000000	43.397790	51.463768	129.490566	163.104167	30.306122	24.070588	-1.000000	-1.000000	22.625668	8.000000	33.000000	-1.000000	-1.000000	21.759259	18.280000	-1.000000	13.818182	10.322115	-1.000000	-1.000000	-1.000000	72.806452	-1.000000	-1.000000
1989	7.000000	44.349206	51.025641	121.896552	151.278689	31.947368	24.663366	20.000000	-1.000000	21.935673	7.230769	30.500000	-1.000000	-1.000000	20.222222	17.409091	-1.000000	13.346939	10.411311	-1.000000	-1.000000	54.800000	76.345455	-1.000000	-1.000000
1990	8.000000	41.769912	48.512048	121.187500	154.275862	31.200000	23.675325	-1.000000	-1.000000	21.671233	7.117647	31.142857	-1.000000	-1.000000	-1.000000	16.029412	-1.000000	12.916667	10.305677	-1.000000	-1.000000	52.611111	76.888889	-1.000000	-1.000000
1991	9.240000	43.148338	49.695652	113.000000	148.785714	28.171429	24.588235	-1.000000	-1.000000	21.435484	7.827586	-1.000000	-1.000000	-1.000000	-1.000000	17.904762	-1.000000	-1.000000	10.498305	11.000000	-1.000000	93.166667	63.000000	53.909091	-1.000000
1992	7.833333	43.877966	48.367816	112.352941	139.000000	28.454545	24.928571	-1.000000	-1.000000	22.750000	8.767857	31.142857	18.000000	-1.000000	-1.000000	17.479675	-1.000000	-1.000000	10.904762	-1.000000	-1.000000	43.000000	-1.000000	55.263158	-1.000000
1993	-1.000000	43.373984	48.461538	105.722222	127.391304	27.545455	22.363636	-1.000000	-1.000000	19.230769	8.238806	-1.000000	-1.000000	-1.000000	-1.000000	17.954023	-1.000000	-1.000000	10.747475	-1.000000	-1.000000	44.000000	-1.000000	55.600000	-1.000000
1994	-1.000000	42.373288	47.750000	106.625000	186.875000	21.900000	25.285714	-1.000000	-1.000000	18.000000	7.855072	-1.000000	-1.000000	-1.000000	-1.000000	17.585714	-1.000000	-1.000000	10.675000	-1.000000	-1.000000	-1.000000	-1.000000	62.333333	-1.000000
1995	-1.000000	44.806147	49.592593	-1.000000	155.833333	27.296296	24.868421	-1.000000	34.000000	21.230769	8.780645	35.333333	-1.000000	22.333333	27.375000	16.844444	-1.000000	-1.000000	12.653846	-1.000000	20.000000	-1.000000	-1.000000	-1.000000	-1.000000
1996	-1.000000	44.506173	47.234940	-1.000000	148.000000	27.769231	23.453704	-1.000000	32.578947	22.027778	8.393846	-1.000000	-1.000000	21.166667	20.224490	18.095563	-1.000000	-1.000000	11.455556	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
1997	-1.000000	44.708551	48.177866	111.000000	150.688889	33.625000	24.785156	-1.000000	31.085603	20.929825	8.448649	22.000000	-1.000000	18.789474	21.126394	18.175000	19.000000	20.000000	11.230769	-1.000000	11.000000	-1.000000	50.857143	54.666667	-1.000000
1998	-1.000000	43.131403	49.731183	116.000000	159.466667	-1.000000	24.675676	-1.000000	30.082237	20.304348	8.720000	-1.000000	-1.000000	16.714286	20.591398	16.266990	-1.000000	-1.000000	13.100000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
1999	-1.000000	43.945402	48.012048	120.714286	182.200000	-1.000000	25.723810	-1.000000	31.710037	25.428571	-1.000000	-1.000000	-1.000000	-1.000000	22.523810	16.521212	-1.000000	-1.000000	10.555556	-1.000000	-1.000000	-1.000000	54.000000	-1.000000	-1.000000
2000	-1.000000	43.126638	49.224719	-1.000000	179.307692	-1.000000	25.303448	-1.000000	30.878899	21.615385	-1.000000	-1.000000	-1.000000	21.000000	20.500000	16.788618	-1.000000	-1.000000	11.400000	-1.000000	-1.000000	-1.000000	73.375000	-1.000000	-1.000000
2001	-1.000000	45.442177	52.233766	-1.000000	167.851064	-1.000000	23.707792	-1.000000	32.798851	20.400000	8.538462	-1.000000	18.000000	-1.000000	26.666667	17.752896	-1.000000	-1.000000	11.333333	-1.000000	-1.000000	-1.000000	79.900000	-1.000000	-1.000000
2002	-1.000000	46.168317	49.514403	-1.000000	182.159091	25.428571	23.833333	-1.000000	32.359447	21.719298	7.388889	-1.000000	20.000000	-1.000000	19.000000	17.018617	-1.000000	-1.000000	10.000000	10.142857	-1.000000	62.166667	64.666667	-1.000000	-1.000000

	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
record_id
1	7	16	1977	2	NL	M	32.0	NaN
2	7	16	1977	3	NL	M	33.0	NaN
3	7	16	1977	2	DM	F	37.0	NaN
4	7	16	1977	7	DM	M	36.0	NaN
5	7	16	1977	3	DM	M	35.0	NaN

	month	day	year	plot_id	species_id	sex	hindfoot_length	weight	hindfoot_length_in
record_id
35545	12	31	2002	15	AH	NaN	NaN	NaN	NaN
35546	12	31	2002	15	AH	NaN	NaN	NaN	NaN
35547	12	31	2002	10	RM	F	15.0	14.0	0.590551
35548	12	31	2002	7	DO	M	36.0	51.0	1.417323
35549	12	31	2002	5	NaN	NaN	NaN	NaN	NaN

	month	day	year	plot_id	species_id	sex	hindfoot_length	weight	hindfoot_length_in
record_id
30650	4	31	2000	6	PP	F	22.0	19.0	0.866142
30651	4	31	2000	6	PB	M	25.0	32.0	0.984252
30652	4	31	2000	6	PB	F	25.0	30.0	0.984252
30653	4	31	2000	6	PP	M	23.0	20.0	0.905512
30654	4	31	2000	6	PP	M	23.0	24.0	0.905512
...	...	...	...	...	...	...	...	...	...
31500	9	31	2000	16	PM	M	17.0	11.0	0.669291
31501	9	31	2000	16	OT	F	20.0	NaN	0.787402
31502	9	31	2000	5	NaN	NaN	NaN	NaN	NaN
31503	9	31	2000	7	NaN	NaN	NaN	NaN	NaN
31504	9	31	2000	10	NaN	NaN	NaN	NaN	NaN

	count	mean	std	min	25%	50%	75%	max
sex
F	15303.0	42.170555	36.847958	4.0	20.0	34.0	46.0	274.0
M	16879.0	42.995379	36.184981	4.0	20.0	39.0	49.0	280.0

	month	day	year	plot_id	species_id	hindfoot_length	weight	hindfoot_length_in	date
sex
F	15690	15690	15690	15690	15690	14894	15303	14894	15690
M	17348	17348	17348	17348	17348	16476	16879	16476	17348

	year			hindfoot_length	weight
	count	min	max	median	mean
sex
F	15690	1977	2002	27.0	42.170555
M	17348	1977	2002	34.0	42.995379