Bokeh is an interactive visualization library that targets modern web browsers for presentation. It is good for:
- Interactive visualization in modern browsers
- Standalone HTML documents, or server-backed apps
- Large, dynamic or streaming data
among other things like plotting spatial data on maps. While it is best utilized in Jupyter notebooks and for creating visualizations in HTML and Javascript, it has the ability to generate output files in formats like PNG and SVG. Bokeh is also capable of creating great looking visualizations with very few commands.
Bokeh has several submodules and generally requires quite a few imports. bokeh.io
is used to establish where the output plot is intended to be displayed. bokeh.plotting
provides functions to create figures and glyphs for a plot/graphic. bokeh.models
gives the user a way to turn Python dictionaries or Pandas DataFrames into data that Bokeh can display quickly. The imports relevant to our discussion are shown below. Of particular importance is the bokeh.io.output_notebook
function that gives us the ability to display Bokeh plots in output cells of Jupyter notebooks.
import bokeh.io
import bokeh.plotting
import bokeh.models
import numpy as np
import pandas as pd
import os
bokeh.io.output_notebook()
There are three things required for a Bokeh plot:
figure()
-- Controls the canvas. Things like: figure size, title, interactive tools, toolbar location.- data source -- possibly from a Pandas Dataframe
- glyphs or line types -- the data points and/or line styles
An example is shown below. First we load our CSV file into a Pandas DataFrame.
df = pd.read_csv('datasets/200wells.csv'); df.head(n=3)
The Bokeh plotting commands are then
The ColumnDataSource
class converts our Panda's DataFrame into a Bokeh source for plotting. The circle
member function creates the glyph to display. There are others such as Line
or Arc
. The full list is here.
p = bokeh.plotting.figure()
data_source = bokeh.models.ColumnDataSource(df)
p.circle(x='porosity', y='permeability', source=data_source)
bokeh.io.show(p)
df = pd.read_csv('datasets/33013014020000.csv', parse_dates=['date'])
The previous example used a minimum amount of styling available to produce the Bokeh plot. This example shows more options such as those used for plotting time series data, adding labels, controlling the tools available in the toolbar, etc. More visual styling options can be seen in the Bokeh documentation.
p = bokeh.plotting.figure(plot_width=400, plot_height=300,
x_axis_type='datetime', x_axis_label='Date',
y_axis_label='Oil (bbls)', tools='pan,box_zoom')
data_source = bokeh.models.ColumnDataSource(df)
p.line(x='date', y='volume_oil_formation_bbls', source=data_source)
bokeh.io.show(p)
Bokeh offers a couple of options for visualizing geographic and/or spatial data on maps. It's interactivity makes it a superior library to Matplotlib for these kinds of plots, especially when the intended output is a Jupyter notebook or website.
In the example below we will plot all of Pioneer Natural Resources' (PDX) oil and gas wells in the Permian basin on a Google map. First we read in the latitude and longitude information from a CSV file into a Pandas DataFrame
df = pd.read_csv('datasets/pxd_permian_wells.csv'); df.head(n=3)
To plot data on Google maps in Bokeh uses several special features that deviate somewhat from the standard Bokeh figure
class, but instead used a dedicated bokeh.models.GMapOptions
class to set map options as well as bokeh.plotting.gmap
for creating the figure. After figure creation, setting a data source and adding glyphs proceeds as usual.
The gmap
class requires a Google API key as the first argument. In this example, the API key is taken from a system environment variable called 'GOOGLE_API_KEY'
. Instructions for getting an API key are here.
map_options = bokeh.models.GMapOptions(lat=np.mean(df['latitude_surface_hole'].values),
lng=np.mean(df['longitude_surface_hole'].values),
map_type="terrain", zoom=5)
p = bokeh.plotting.gmap(os.environ['GOOGLE_API_KEY'], map_options, title="Well Locations",
tools='box_select,tap,pan,wheel_zoom,reset', width=600, height=400)
source = bokeh.models.ColumnDataSource(df)
p.circle(x='longitude_surface_hole', y='latitude_surface_hole', size=15,
fill_color="blue", fill_alpha=0.8, source=source)
bokeh.io.show(p)
In addition to Google, there are other map tile providers such as Carto, OpenStreetMap, WikiMedia, and ESRI to provide the map backgrounds. The following example, is a fairly complex example that shows off a different tile provider along with some interactivity in a Bokeh plot. The code for this is beyond the scope of this introduction, but hopefully this gives you a few ideas of the types of things that you can do in Bokeh.
Hovering your mouse over wells in the contour plot displays tooltip information as well as updates the time series plot with production data. To "freeze" the production plot on a particular well(s), click on the well. To return to full interactivity, click anywhere on the canvas away from a well.
With Bokeh, you can make sophisticated interactive visualizations with callbacks. There are two types of callbacks:
Javascript callbacks allow for transformations of the plot's data sources and other features, e.g. $x$/$y$-axis scaling, by writing Javascript code that is executed on set interactions, e.g. clicking or hovering over a glyph. These allow for fast updating of the plot display while maintaining the "stand alone" nature of the figure, i.e. plots can still be output to stand alone HTML and embedded in web sites backed by standard web servers. Javascript callback are used to provide the interactivity in the previous example.
Python callbacks allow for transformations of any and all plot features, data sources, etc. through the execution of arbitrary Python code. These types of callbacks require a Bokeh server to be running such that the Python code can be executed.
Both types of callbacks can be used with widgets, although an easier-to-use widget toolkit built on top of Bokeh, called Panel, is recommended for sophisticated widget and dashboard creation.
Matplotlib is the defacto-standard plotting library for Python. It has the ability to create virtually any two-dimensional visualization you have ever seen including standard plots, bar charts, box plots, contour and surface plots, etc.
Holoviews is a plotting package with a similar interface to Bokeh, but allows you to chose the backend to be either Bokeh (best for web) or Matplotlib (best for print publications) from a unified front end.
Plotly is another modern plotting library primarily targeting web-based visualizations and offers built in dashboarding capabilities.
Altair, the newest of the group, is based on Vega-Lite, a Javascript visualization grammar similar to the Grammar of Graphics implementation in the R programming language.
Further Reading
Further reading on Bokeh can be found in the official documentation.