Interactive static plots in Bokeh
Matplotlib is my go-to tool for plotting in Python; I like that it is essentially infinitely customizable, allowing you to create polished looking plots. But, it suffers at least a few drawbacks that may make it undesirable for some applications:
- Its interface is unintuitive
- It doesn’t generalize well to large data sets
- It is non-interactive out of the box
Although the three deficiencies listed above are unrelated, a single plotting tool Bokeh seems to fill the gaps left by matplotlib. Bokeh is a Python library that uses Python code to generate pure json data, which in turn is used as instructions for rendering a plot client-side via the BokehJS library.
I fooled around in Bokeh a bit and created a couple visualizations to show off how easy it is to get up and running. Since I don’t host my own website, I don’t have a way of creating dynamic plots with server-end processing. Keep in mind that if you’re using the full Bokeh server you can add much more interactivity to your plots.
Here are a couple simple examples, which you can find the code for at the end of the post. I’m really impressed at how fast it was in going from installation to putting this blog post together.
Simple linked plots
This is a simple figure that shows how you can link the x-axes of two separate Bokeh plots. This is immensely helpful for looking for correlations between two different data sets, as in the following figure that shows historical CO2 and temperature measurements.
Sources:
Histogram plot with brushing
This plot demonstrates Bokeh’s ‘brushing’ feature, which allows you to highlight data points within a region of the plot made using the lasso tool. I think this would be much more useful if coupled with additional processing in Python on the server-end. For example, in the below plot we could generate histograms over only the selected points, which might be a useful visualization. Not possible in the static case, unfortunately. Still pretty cool, though!
Conclusions
There’s definitely waaaaay more to Bokeh than I was able to explore in this blog post. What I like is that it’s easy to get up-and-running, its high level interface means that you don’t have to memorize use cases for hundreds of different commands like you have to with matplotlib, and the interactivity adds a flashy element to plots that makes them impressive to show off. But, it’s apparent that to make the most of Bokeh one really needs to be running a Bokeh server to add dynamic processing capabilities to the plots. Maybe something for the future.
Code
Imports¶
import csv
import numpy as np
import numpy.random
import bokeh.plotting
import bokeh.layouts
import bokeh.embed
Temperature and CO$_{2}$¶
File io & data loading¶
Temp¶
temp_file_path = './temp'
temp_file_handle = open(temp_file_path, 'r')
temp_file_reader = csv.reader(temp_file_handle, delimiter = ' ')
temp_years = []
temp_temps = []
for i, row in enumerate(temp_file_reader):
temp_years.append(row[0])
temp_temps.append(row[1])
CO$_{2}$¶
co2_file_path = './co2'
co2_file_handle = open(co2_file_path, 'r')
co2_file_reader = csv.reader(co2_file_handle, delimiter = ' ')
co2_years = []
co2_co2s = []
for i, row in enumerate(co2_file_reader):
row = [ele for ele in row if ele != '']
if row[3] != '-99.99':
co2_years.append(row[2])
co2_co2s.append(row[3])
Create interactive plot¶
# Create static html file as output
bokeh.plotting.output_file('temp_co2.html', title = 'NASA historical temperature and CO2 annual mean records')
# Create the two figures
temp_figure = bokeh.plotting.figure(title = 'temperature', x_axis_label='year', y_axis_label = 'avg temp anomaly (deg C)', width = 500, height = 250)
co2_figure = bokeh.plotting.figure(title = 'co2', x_axis_label = 'year', y_axis_label = 'avg conc. co2 (ppm)', width = 500, height = 250, x_range = temp_figure.x_range)
# Plot in the two figures
temp_figure.line(temp_years, temp_temps, legend = 'temp', line_width=2)
co2_figure.line(co2_years, co2_co2s, legend = 'co2')
# Create the combined figure
figure = bokeh.layouts.column(temp_figure, co2_figure)
# Display results
bokeh.plotting.show(figure)
# Save results
bokeh.plotting.save(figure, './co2_temp.html')
Scatter¶
Create data¶
# Cluster one
mu_one = [1, 1]
covar_one = np.array([[1,.9],[.9,1]])
cluster_one = numpy.random.multivariate_normal(mu_one, covar_one, 200)
# Cluster two
mu_two = [-5,-4]
covar_two = np.array([[4,1], [1,4]])
cluster_two = numpy.random.multivariate_normal(mu_two, covar_two, 300)
# Cluster three
mu_three = [5,-3]
covar_three = np.array([[2,1.5], [1.5,2]])
cluster_three = numpy.random.multivariate_normal(mu_three, covar_three, 100)
# X Histogram
data_x = np.concatenate((cluster_one[:,0], cluster_two[:,0], cluster_three[:,0]), axis = 0)
vals_x, edges_x = np.histogram(data_x, bins = 10)
data_y = np.concatenate((cluster_one[:,1], cluster_two[:,1], cluster_three[:,1]), axis = 0)
vals_y, edges_y = np.histogram(data_y, bins = 10)
Create interactive plot¶
bokeh.plotting.output_file('scatter_plot.html', title = 'Bivariate gaussians')
##############
# Scatter plot
##############
scatter_figure = bokeh.plotting.figure(plot_width = 320, plot_height = 320,\
tools = "pan,wheel_zoom,box_select,lasso_select,reset", title = 'Bivariate Gaussians')
# Cluster one
scatter_figure.circle(x = cluster_one[:,0], y = cluster_one[:,1], color = 'red', size = 3)
# Cluster two
scatter_figure.circle(x = cluster_two[:,0], y = cluster_two[:,1], color = 'blue', size = 3)
# Cluster three
scatter_figure.circle(x = cluster_three[:,0], y = cluster_three[:,1], color = 'green', size = 3)
###############
# Histogram
###############
x_histogram_figure = bokeh.plotting.figure(toolbar_location = None, plot_width = scatter_figure.plot_width, plot_height = 200, x_range=scatter_figure.x_range,\
min_border = 10, y_axis_location = 'right')
y_histogram_figure = bokeh.plotting.figure(toolbar_location = None, plot_width = 200, plot_height = scatter_figure.plot_height, y_range = scatter_figure.y_range,\
min_border = 10, x_axis_location = 'above', x_range = (vals_y.min(), vals_y.max()))
# x- Histogram
x_histogram = x_histogram_figure.quad(top = vals_x, bottom = 0, left = edges_x[:-1], right = edges_x[1:], fill_color = '#c8c8c8', line_color = '#727272')
# y- Histogram
y_histogram = y_histogram_figure.quad(top = edges_y[:-1], bottom = edges_y[1:], left = 0, right = vals_y, fill_color = '#c8c8c8', line_color = '#727272')
figure = bokeh.layouts.column(bokeh.layouts.row(scatter_figure, y_histogram_figure), bokeh.layouts.row(x_histogram_figure, bokeh.layouts.Spacer(width=200, height=200)))
# Display the figure
bokeh.plotting.show(figure)
# Save figure
bokeh.plotting.save(figure, './scatterplot.html')