Building a JSON API for California state code

State legislative websites are a mess. They are generally a pain to navigate, difficult to find information on, and make it nearly impossible to access the data in a convenient way outside of using the web user interface. While all in all governments have progressed in the recent years in terms of technology, including in California, the state still has a way to go.

A few weeks ago I started a side project with the goal of making state code in the United States easier to access through APIs. Right now, Virginia is so far the only state I've discovered that does this on their own. This side project was prompted by my frustration on another side project where I had to scrape the West Virginia legislature site to do an analysis calculating the percentage rate for bills sponsored by each member of the state legislature.

With HTML being the only format for state code accessible through the web for the majority of states, scraping is needed to get the data. Some states are easier than others for scraping -- for example North Carolina has unique URLs for parts of the state code which makes it easier to grab, unlike California where a headless browser like Poltergeist with Capybara is needed to jump through the pages.

Fortunately, after a while of digging, I found that California provides an up-to-date dump of the state code that is available through FTP. After massaging their scripts a bit, I imported the data into a SQL database (here's a MySQL dump on S3).

Next up was to build an API on top of it. I briefly flirted with Go and Martini as I've been playing with it more recently, but ended up deciding to use Rails master now that rails-api has been merged into Rails core in preparation of Rails 5.

There are endpoints for sections and codes and the ability to filter by code, division, chapter, and article.

More documentation and the source code for the API is available on Github at tylerpearson/california-laws-api.

In addition, I indexed the state code in Elasticsearch so that people could easily find the parts of the state code relevant to what they are looking for.

The site's hosted on Heroku out of convenience. Elasticsearch is through Bonsai on production and SSL is through Cloudflare and Heroku.

If you have any ideas on improvements, please feel free to add an issue or pull request.