Interactive static plots in Bokeh

January 1, 2017

Matplotlib is my go-to tool for plotting in Python; I like that it is essentially infinitely customizable, allowing you to create polished looking plots. But, it suffers at least a few drawbacks that may make it undesirable for some applications:

  1. Its interface is unintuitive
  2. It doesn’t generalize well to large data sets
  3. It is non-interactive out of the box

Although the three deficiencies listed above are unrelated, a single plotting tool Bokeh seems to fill the gaps left by matplotlib. Bokeh is a Python library that uses Python code to generate pure json data, which in turn is used as instructions for rendering a plot client-side via the BokehJS library.

I fooled around in Bokeh a bit and created a couple visualizations to show off how easy it is to get up and running. Since I don’t host my own website, I don’t have a way of creating dynamic plots with server-end processing. Keep in mind that if you’re using the full Bokeh server you can add much more interactivity to your plots.

Here are a couple simple examples, which you can find the code for at the end of the post. I’m really impressed at how fast it was in going from installation to putting this blog post together.

Simple linked plots

This is a simple figure that shows how you can link the x-axes of two separate Bokeh plots. This is immensely helpful for looking for correlations between two different data sets, as in the following figure that shows historical CO2 and temperature measurements.

NASA historical temperature and CO2 annual mean records

                  Sources:

                  1. NASA CO2 data
                  2. NASA temperature data

                  Histogram plot with brushing

                  This plot demonstrates Bokeh’s ‘brushing’ feature, which allows you to highlight data points within a region of the plot made using the lasso tool. I think this would be much more useful if coupled with additional processing in Python on the server-end. For example, in the below plot we could generate histograms over only the selected points, which might be a useful visualization. Not possible in the static case, unfortunately. Still pretty cool, though!

                  Bivariate gaussians

                            Conclusions

                            There’s definitely waaaaay more to Bokeh than I was able to explore in this blog post. What I like is that it’s easy to get up-and-running, its high level interface means that you don’t have to memorize use cases for hundreds of different commands like you have to with matplotlib, and the interactivity adds a flashy element to plots that makes them impressive to show off. But, it’s apparent that to make the most of Bokeh one really needs to be running a Bokeh server to add dynamic processing capabilities to the plots. Maybe something for the future.

                            Code

                            plots

                            Imports

                            In [ ]:
                            import csv
                            
                            import numpy as np
                            import numpy.random
                            
                            import bokeh.plotting
                            import bokeh.layouts
                            import bokeh.embed
                            

                            Temperature and CO$_{2}$

                            File io & data loading

                            Temp

                            In [ ]:
                            temp_file_path = './temp'
                            temp_file_handle = open(temp_file_path, 'r')
                            temp_file_reader = csv.reader(temp_file_handle, delimiter = ' ')
                            
                            temp_years = []
                            temp_temps = []
                            for i, row in enumerate(temp_file_reader):
                                temp_years.append(row[0])
                                temp_temps.append(row[1])
                            

                            CO$_{2}$

                            In [ ]:
                            co2_file_path = './co2'
                            co2_file_handle = open(co2_file_path, 'r')
                            co2_file_reader = csv.reader(co2_file_handle, delimiter = ' ')
                            
                            co2_years = []
                            co2_co2s = []
                            for i, row in enumerate(co2_file_reader):
                                row = [ele for ele in row if ele != '']
                                if row[3] != '-99.99':
                                    co2_years.append(row[2])
                                    co2_co2s.append(row[3])
                            

                            Create interactive plot

                            In [ ]:
                            # Create static html file as output
                            bokeh.plotting.output_file('temp_co2.html', title = 'NASA historical temperature and CO2 annual mean records')
                            
                            # Create the two figures
                            temp_figure = bokeh.plotting.figure(title = 'temperature', x_axis_label='year', y_axis_label = 'avg temp anomaly (deg C)', width = 500, height = 250)
                            
                            co2_figure = bokeh.plotting.figure(title = 'co2', x_axis_label = 'year', y_axis_label = 'avg conc. co2 (ppm)', width = 500, height = 250, x_range = temp_figure.x_range)
                            
                            
                            # Plot in the two figures
                            temp_figure.line(temp_years, temp_temps, legend = 'temp', line_width=2)
                            co2_figure.line(co2_years, co2_co2s, legend = 'co2')
                            
                            # Create the combined figure
                            figure = bokeh.layouts.column(temp_figure, co2_figure)
                            
                            # Display results
                            bokeh.plotting.show(figure)
                            
                            # Save results
                            bokeh.plotting.save(figure, './co2_temp.html')
                            

                            Scatter

                            Create data

                            In [ ]:
                            # Cluster one
                            mu_one = [1, 1]
                            covar_one = np.array([[1,.9],[.9,1]])
                            cluster_one = numpy.random.multivariate_normal(mu_one, covar_one, 200)
                            
                            # Cluster two
                            mu_two = [-5,-4]
                            covar_two = np.array([[4,1], [1,4]])
                            cluster_two = numpy.random.multivariate_normal(mu_two, covar_two, 300)
                            
                            # Cluster three
                            mu_three = [5,-3]
                            covar_three = np.array([[2,1.5], [1.5,2]])
                            cluster_three = numpy.random.multivariate_normal(mu_three, covar_three, 100)
                            
                            # X Histogram
                            data_x = np.concatenate((cluster_one[:,0], cluster_two[:,0], cluster_three[:,0]), axis = 0)
                            vals_x, edges_x = np.histogram(data_x, bins = 10)
                            
                            data_y = np.concatenate((cluster_one[:,1], cluster_two[:,1], cluster_three[:,1]), axis = 0)
                            vals_y, edges_y = np.histogram(data_y, bins = 10)
                            

                            Create interactive plot

                            In [ ]:
                            bokeh.plotting.output_file('scatter_plot.html', title = 'Bivariate gaussians')
                            
                            ##############
                            # Scatter plot
                            ##############
                            
                            scatter_figure = bokeh.plotting.figure(plot_width = 320, plot_height = 320,\
                                                                   tools = "pan,wheel_zoom,box_select,lasso_select,reset", title = 'Bivariate Gaussians')
                            
                            # Cluster one
                            scatter_figure.circle(x = cluster_one[:,0], y = cluster_one[:,1], color = 'red', size = 3)
                            
                            # Cluster two
                            scatter_figure.circle(x = cluster_two[:,0], y = cluster_two[:,1], color = 'blue', size = 3)
                            
                            # Cluster three
                            scatter_figure.circle(x = cluster_three[:,0], y = cluster_three[:,1], color = 'green', size = 3)
                            
                            ###############
                            # Histogram
                            ###############
                            
                            x_histogram_figure = bokeh.plotting.figure(toolbar_location = None, plot_width = scatter_figure.plot_width, plot_height = 200, x_range=scatter_figure.x_range,\
                                                        min_border = 10, y_axis_location = 'right')
                            
                            y_histogram_figure = bokeh.plotting.figure(toolbar_location = None, plot_width = 200, plot_height = scatter_figure.plot_height, y_range = scatter_figure.y_range,\
                                                                      min_border = 10, x_axis_location = 'above', x_range = (vals_y.min(), vals_y.max()))
                            
                            # x- Histogram
                            x_histogram = x_histogram_figure.quad(top = vals_x, bottom = 0, left = edges_x[:-1], right = edges_x[1:], fill_color = '#c8c8c8', line_color = '#727272')
                            
                            # y- Histogram
                            y_histogram = y_histogram_figure.quad(top = edges_y[:-1], bottom = edges_y[1:], left = 0, right = vals_y, fill_color = '#c8c8c8', line_color = '#727272')
                            
                            
                            figure = bokeh.layouts.column(bokeh.layouts.row(scatter_figure, y_histogram_figure),                   bokeh.layouts.row(x_histogram_figure, bokeh.layouts.Spacer(width=200, height=200)))
                            
                            # Display the figure
                            bokeh.plotting.show(figure)
                            
                            # Save figure
                            bokeh.plotting.save(figure, './scatterplot.html')