# coding: utf-8
#
#
#
#
#
#
# |
#
# Bokeh Tutorial
# |
#
#
#
# 10. High Level Charts
# This section covers the `bokeh.charts` interface, which is a high-level API that is especially useful for exploratory data analysis (for instance, in a Jupyter notebook). It provides functions for quickly producing many standard chart types, often with a single line of code. We will look at the following types in this notebook:
#
# * [Scatter Plot](#Scatter-Plot)
# * [Bar Chart](#Bar-Chart)
# * [Histogram](#Histogram)
# * [Box Plot](#Box-Plot)
# In[1]:
from bokeh.io import output_notebook, show
output_notebook()
# # Scatter Plot
#
# A high-level scatter plot is provided by [`bokeh.charts.Scatter`]().
#
# For this section will use the "iris" data set. First let's import it and take a look at a few rows:
# In[2]:
from bokeh.sampledata.iris import flowers
flowers.head()
# In[3]:
from bokeh.charts import Scatter
# A basic scatter chart takes the data (in this case a pandas DataFrame) as the first argument, and specifies the `x` and `y` coordinates for the scatter as the names of columns in the data.
# In[4]:
p = Scatter(flowers, x='petal_length', y='petal_width')
show(p)
# By passing a column name for the `color` parameter, you can make `Scatter` automatically color the markers according to the groups in that column. Let's also add a legend by specify its location as the value of a `legend` paramter (in this case `"top_left"`)
# In[5]:
p = Scatter(flowers, x='petal_length', y='petal_width', color='species', legend='top_left')
show(p)
# By passing a column name for the `marker` parameter, you can make `Scatter` automatically vary the marker shapes according to the groups in that column. Let's try that as an exercise.
# In[6]:
# EXERCISE: vary the marker shape by passing a column name as the `marker` keyword argument
# # Bar Chart
#
# A high-level bar chart is provided by [`bokeh.charts.Bar`]()
#
# For this section, we will use the "autompg" data set. Let's import it and take a quick look:
# In[7]:
from bokeh.sampledata.autompg import autompg
autompg.head()
# In[8]:
from bokeh.charts import Bar
# A basic bar chart takes the data (again a DataFrame) as the first value, as well as column names for:
#
# * `label` - a column to group to label the x-axis
# * `values` - a column to aggregate values for each group, to give the bar heights
# * `agg` - the name of an aggregation to perform over the values (e.g., `"mean"`, `"max"`, etc.)
#
# A simple example that also specifies some other properties such as `title` and `legend` is shown below:
# In[9]:
p = Bar(autompg, label='cyl', values='mpg', agg='max',
title="Max MPG by CYL", legend=None, tools='crosshair')
show(p)
# By passing another column name as the `group` parameter, the aggregations can be further subdivided by the groups in that column, and the bars grouped visually. The example below demonstrates this, as well as adding a legend by specifying its location:
# In[10]:
p = Bar(autompg, label='yr', values='mpg', agg='median', group='origin',
title="Median MPG by YR, grouped by ORIGIN", legend='top_left', tools='crosshair')
show(p)
# Similarly, bars for subgroups can be stacked visually, by providing a column name for the `stack` parameter. Let's try that as an exercise.
# In[11]:
# EXERCISE: change the chart above to stack the bars with title "Median MPG by YR, stacked by ORIGIN"
# # Histogram
#
# A high-level Histogram is provided by [`bokeh.charts.Histogram`]()
#
# For this section, we will construct our own synthetic data set that has values generated from two different probability distributions.
# In[12]:
import pandas as pd
import numpy as np
# build some distributions
mu, sigma = 0, 0.5
normal = pd.DataFrame({'value': np.random.normal(mu, sigma, 1000), 'type': 'normal'})
lognormal = pd.DataFrame({'value': np.random.lognormal(mu, sigma, 1000), 'type': 'lognormal'})
# create a pandas data frame
df = pd.concat([normal, lognormal])
df[995:1005]
# In[13]:
from bokeh.charts import Histogram
# A basic histogram takes the data as the first parameter, and a column name as the `values` parameter. Optionally, you can also specify the number of bins to use by giving a value for the `bins` parameter. The example below shows the distribution of ***all*** the values (both the "normal" and "lognormal" values).
# In[14]:
hist = Histogram(df, values='value', bins=30)
show(hist)
# It's also possible to generate multiple histograms at once by grouping the data. The column to group by is specified by the `color` parameter (and the histogram for each group is colored differently automatically). Let's try that as an exercise.
# In[15]:
# EXERCISE: generate histograms for each "type" of distribution, and add a legend to the top left.
# # Box Plot
#
# A high-level box plot is provided by [`bokeh.charts.BoxPlot`]()
#
# For this section we will use the "iris" data set again.
# In[16]:
from bokeh.charts import BoxPlot
# A basic box plot takes the data as the first value, as well as column names for:
#
# * `label` - a column to group to label the x-axis
# * `values` - a column to aggregate values for each group
#
# A simple example that also specifies some other properties such as `title` and `legend` is shown below:
# In[17]:
p = BoxPlot(flowers, label='species', values='petal_width', tools='crosshair', color='#aa4444',
xlabel='', ylabel='petal width, mm', title='Distributions of petal widths')
show(p)
# Instead of a single color, the box and whiskers groups can be colored by grouping one of the columns. This is done by passing a column name as the `color` parameter. Let's try that as an exercise.
# In[18]:
# EXERCISE: color the boxes by "species" and add a legend to the top left
# ---
#
# # Further reading
#
#
# http://nbviewer.jupyter.org/github/bokeh/bokeh/tree/0.11.1/examples/charts/file/
#
# http://nbviewer.jupyter.org/github/bokeh/bokeh/tree/0.11.1/examples/howto/charts/
#
# http://nbviewer.jupyter.org/github/bokeh/bokeh-demos/blob/master/presentations/2016-03-pydata-strata/notebooks/Charts.ipynb
#
# http://nbviewer.jupyter.org/github/bokeh/bokeh-demos/blob/master/presentations/2016-03-pydata-strata/notebooks/Charts%20Demo.ipynb