Quick & Easy Plotting Data Using Pandas

We can plot our summary stats using Pandas, too. First, to enable plots to appear in our notebook, we use the 'magic' command %matplotlib inline. (Note, if you use %matplotlib notebook instead, you get interactive plots, but they can be a bit less reliable...)

Documentation on plotting in Pandas is here:
http://pandas.pydata.org/pandas-docs/stable/visualization.html#basic-plotting-plot

Let't try a few examples:

In [1]:
#Import pandas
import pandas as pd
# make sure figures appear inline in Ipython Notebook
%matplotlib inline
In [2]:
#Read in the surveys.csv file
surveys_df = pd.read_csv('../data/surveys.csv')
surveys_df.head()
Out[2]:
record_id month day year plot_id species_id sex hindfoot_length weight
0 1 7 16 1977 2 NL M 32.0 NaN
1 2 7 16 1977 3 NL M 33.0 NaN
2 3 7 16 1977 2 DM F 37.0 NaN
3 4 7 16 1977 7 DM M 36.0 NaN
4 5 7 16 1977 3 DM M 35.0 NaN
In [3]:
#Group data by species id and compute row counts
species_counts = surveys_df.groupby('species_id')['record_id'].count()
In [17]:
# create a quick bar chart by setting `kind` to 'bar'
species_counts.plot(kind='bar',
                    figsize=(15,3),           #Sets the size of the plot
                    title='Count by species', #Sets the title
                    logy=True);               #Converts y axis to log scale

Challenge - Plots

  1. Create a plot of average weight across all species per plot.
    Hint: you first need to summarize the data on plot_id, computing mean of the weigth column, then follow the syntax above.
In [ ]:
#Challenge 1: Plot average weight per plot
data = surveys_df.groupby('█').mean()['█']
#Create a plot as the variable "ax"
ax = data.plot(kind='bar',
               title="Mean weight by plot",
               figsize = (10,4))
#Set axis labels for the "ax" plot
ax.set(xlabel='Plot ID',
       ylabel='Mean weight (g)');
In [30]:
#Challenge 1: Plot average weight per plot
data = surveys_df.groupby('plot_id').mean()['weight']
#Create a plot as the variable "ax"
ax = data.plot(kind='bar',
               title="Mean weight by plot",
               figsize = (10,4))
#Set axis labels for the "ax" plot
ax.set(xlabel='Plot ID',
       ylabel='Mean weight (g)');
  1. Create a pie chart showing the proportion _recordids of males versus females for the entire dataset.
    Hint: you need to group on sex and then compute the count of record_ids in the resulting grouped object.
In [ ]:
#Challenge 2:
data = surveys_df.groupby('█').count()['█']
data.plot(kind='pie',title='Total records, by sex');
In [33]:
#Challenge 2:
data = surveys_df.groupby('sex').count()['record_id']
data.plot(kind='pie',title='Total records, by sex');
In [34]:
#Pandas has lots of plotting options...
surveys_df.boxplot(column=['weight'],by='month',figsize=(15,3));

Advanced Plotting...</font>

Create a stacked bar plot, with weight on the Y axis, and the stacked variable being sex. The plot should show total weight by sex for each plot. Some tips are below to help you solve this challenge:

In [35]:
d = {'one' : pd.Series([1., 2., 3.], 
                       index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], 
                       index=['a', 'b', 'c', 'd'])}
pd.DataFrame(d)
Out[35]:
one two
a 1.0 1.0
b 2.0 2.0
c 3.0 3.0
d NaN 4.0

We can plot the above with:

In [36]:
# plot stacked data so columns 'one' and 'two' are stacked
my_df = pd.DataFrame(d)
my_df.plot(kind='bar',stacked=True,title="The title of my graph");
  • You can use the .unstack() method to transform grouped data into columns for each plotting. Try running .unstack() on some DataFrames above and see what it yields.

Start by transforming the grouped data (by plot and sex) into an unstacked layout, then create a stacked plot.

In [37]:
#Group data by plot and by sex, and then calculate a sum of weights for each plot.
by_plot_sex = surveys_df.groupby(['plot_id','sex'])
plot_sex_count = by_plot_sex['weight'].sum()
plot_sex_count
Out[37]:
plot_id  sex
1        F      38253.0
         M      59979.0
2        F      50144.0
         M      57250.0
3        F      27251.0
         M      28253.0
4        F      39796.0
         M      49377.0
5        F      21143.0
         M      23326.0
6        F      26210.0
         M      27245.0
7        F       6522.0
         M       6422.0
8        F      37274.0
         M      47755.0
9        F      44128.0
         M      48727.0
10       F       2359.0
         M       2776.0
11       F      34638.0
         M      43106.0
12       F      51825.0
         M      57420.0
13       F      24720.0
         M      30354.0
14       F      32770.0
         M      46469.0
15       F      12455.0
         M      11037.0
16       F       5446.0
         M       6310.0
17       F      42106.0
         M      48082.0
18       F      27353.0
         M      26433.0
19       F      11297.0
         M      11514.0
20       F      33206.0
         M      25988.0
21       F      15481.0
         M       9815.0
22       F      34656.0
         M      35363.0
23       F       3352.0
         M       3883.0
24       F      22951.0
         M      18835.0
Name: weight, dtype: float64

Below we’ll use .unstack() on our grouped data to figure out the total weight that each sex contributed to each plot.

In [38]:
by_plot_sex = surveys_df.groupby(['plot_id','sex'])
plot_sex_count = by_plot_sex['weight'].sum()
dfPlotSex = plot_sex_count.unstack()
dfPlotSex.head()
Out[38]:
sex F M
plot_id
1 38253.0 59979.0
2 50144.0 57250.0
3 27251.0 28253.0
4 39796.0 49377.0
5 21143.0 23326.0

Now, create a stacked bar plot with that data where the weights for each sex are stacked by plot.

Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows:

In [39]:
s_plot = dfPlotSex.plot(kind='bar',stacked=True,title="Total weight by plot and sex")
s_plot.set_ylabel("Weight")
s_plot.set_xlabel("Plot");
In [ ]: