We can plot our summary stats using Pandas, too. First, to enable plots to appear in our notebook, we use the 'magic' command %matplotlib inline
. (Note, if you use %matplotlib notebook
instead, you get interactive plots, but they can be a bit less reliable...)
Documentation on plotting in Pandas is here:
http://pandas.pydata.org/pandas-docs/stable/visualization.html#basic-plotting-plot
Let't try a few examples:
#Import pandas
import pandas as pd
# make sure figures appear inline in Ipython Notebook
%matplotlib inline
#Read in the surveys.csv file
surveys_df = pd.read_csv('../data/surveys.csv')
surveys_df.head()
#Group data by species id and compute row counts
species_counts = surveys_df.groupby('species_id')['record_id'].count()
# create a quick bar chart by setting `kind` to 'bar'
species_counts.plot(kind='bar',
figsize=(15,3), #Sets the size of the plot
title='Count by species', #Sets the title
logy=True); #Converts y axis to log scale
#Challenge 1: Plot average weight per plot
data = surveys_df.groupby('█').mean()['█']
#Create a plot as the variable "ax"
ax = data.plot(kind='bar',
title="Mean weight by plot",
figsize = (10,4))
#Set axis labels for the "ax" plot
ax.set(xlabel='Plot ID',
ylabel='Mean weight (g)');
#Challenge 1: Plot average weight per plot
data = surveys_df.groupby('plot_id').mean()['weight']
#Create a plot as the variable "ax"
ax = data.plot(kind='bar',
title="Mean weight by plot",
figsize = (10,4))
#Set axis labels for the "ax" plot
ax.set(xlabel='Plot ID',
ylabel='Mean weight (g)');
#Challenge 2:
data = surveys_df.groupby('█').count()['█']
data.plot(kind='pie',title='Total records, by sex');
#Challenge 2:
data = surveys_df.groupby('sex').count()['record_id']
data.plot(kind='pie',title='Total records, by sex');
#Pandas has lots of plotting options...
surveys_df.boxplot(column=['weight'],by='month',figsize=(15,3));
Create a stacked bar plot, with weight on the Y axis, and the stacked variable being sex
. The plot should show total weight by sex for each plot. Some tips are below to help you solve this challenge:
d = {'one' : pd.Series([1., 2., 3.],
index=['a', 'b', 'c']),
'two' : pd.Series([1., 2., 3., 4.],
index=['a', 'b', 'c', 'd'])}
pd.DataFrame(d)
We can plot the above with:
# plot stacked data so columns 'one' and 'two' are stacked
my_df = pd.DataFrame(d)
my_df.plot(kind='bar',stacked=True,title="The title of my graph");
.unstack()
on some DataFrames above and see what it yields.Start by transforming the grouped data (by plot and sex) into an unstacked layout, then create a stacked plot.
#Group data by plot and by sex, and then calculate a sum of weights for each plot.
by_plot_sex = surveys_df.groupby(['plot_id','sex'])
plot_sex_count = by_plot_sex['weight'].sum()
plot_sex_count
Below we’ll use .unstack()
on our grouped data to figure out the total weight that each sex contributed to each plot.
by_plot_sex = surveys_df.groupby(['plot_id','sex'])
plot_sex_count = by_plot_sex['weight'].sum()
dfPlotSex = plot_sex_count.unstack()
dfPlotSex.head()
Now, create a stacked bar plot with that data where the weights for each sex are stacked by plot.
Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows:
s_plot = dfPlotSex.plot(kind='bar',stacked=True,title="Total weight by plot and sex")
s_plot.set_ylabel("Weight")
s_plot.set_xlabel("Plot");