Data, suddenly available
Hack Tyler is an idea born out of pragmatism and self-exorcism, but underlying that are my beliefs about open governments, open data and the power of public service. One of the more persuasive statements of this ethos I’ve heard is “Public Equals Online”, the name of the Sunlight Foundation’s 2010 campaign for government transparency. Its not enough that governments produce and warehouse data that is legally accessible to the public—this is the equivalent of building a park in the mountains and not telling anyone it exists. In order for data to be truly public it must be like the town square—open, accessible and obvious. The corollary benefit is, of course, that someone can come along and build useful things with it.
So it is with great pleasure that I note that the Smith County Mapsite (that also warehouses GIS data for the City of Tyler) now holds official shapefiles for bus routes and bus stops. This is proper survey data and supersedes the information I described aggregating in my last blog post. This raises a few important points:
- I was wrong. I should have asked for the data first. My desire to get things done probably cost me more time than it would have taken to ask for the data. In addition, I made an ill-founded assumption about what data existed. (The Tyler GIS department clearly has good maps.)
- Public equals online. This data is now public, it wasn’t before. This is a success. Now its time to learn from this and ask for better timetable data.
- I wasn’t wasting my time. It has always been my belief that you don’t influence governments by explaining how awesome things could be. You influence them by proving something is useful and then explaining how much more awesome it could be. Its clear that in some (perhaps indirect) sense Hack Tyler caused these files to become public. I’m putting that in the “win” category.
- As far as hand-crafted shapefiles go, I didn’t do too bad:
Hack Tyler data:
Using the official data, I can also revise another calculation: in fact, 72.7% of all streets in Tyler are within a half-mile of a bus stop. That’s not very far given that, according to a Tyler city planner, all buses in Tyler are equipped with bike racks.
Hacking Tyler Transit
Why bus schedules? In my first post I named them at the top of my list of datasets I would like to build on. I also mentioned that I intended to avoid buying a car once I moved, a statement that provoked significant eye-rolling. I’ve been told that no one rides the bus in Tyler or that only poor people do. A fellow hacker who grew up Tyler told me he didn’t even know they had a bus system. This isn’t really a surprise—Tyler has low population-density (1,982 people per square mile, according to Wolfram Alpha) and a food desert in its urban core. I was stunned to discover that a transit system even existed. So why do I think its a good idea to digitize the bus schedule? Five reasons:
- I need it. Its not just that I don’t want to drive. It’s that I suck at driving. Having access to public transit is an immediately useful thing for me.
- Tyler has several colleges, but none of them even mention the bus system on their websites. If building this app means one student takes the bus instead of driving then it will be a success.
- It’s easy. (Mostly, more on this below.)
- It’s an excellent pilot project. The data is available (albeit in a terrible format) and the shape of the application I will build is relatively straightforward.
- Financial freedom, green living, world peace, etc.
The first thing I needed in order to build this app was to get data for routes, schedules and stop locations. The Tyler Transit agency publishes a route map as PDF, though it only includes a very small number of stops. They publish schedule data for weekdays and Saturdays as PDFs. These PDFs only include estimated arrival times for five stops per route, less than ten percent of the total number of stops. Stop location data isn’t available anywhere online, so I emailed Tyler Transit and asked for a complete list. I requested an Excel document; they sent me a PDF of a scan of a printout of a web application.
I don’t raise these data quality issues as an affront to Tyler Transit. Through my own experiences and those of my many friends in the open government community I’ve learned that this is the state of public data in much of the US. I want to help change that, but right now I’m not trying to open governments, I’m just trying to build a transit app, so I did what a pragmatic geek has to do sometimes:
I keyed them.
Lacking an obvious way to extract the data I needed from a scanned PDF I took two hours and re-keyed the spreadsheet. This also gave me the opportunity to correct numerous typos in street names that would have foiled any geocoder. The results:
Using the route map, the street centerline GIS data available from Smith County, QGIS and a lot of patience I was able construct what is possibly the only digital map of Tyler’s bus routes. I then geocoded the above bus stops list and put those over the top, yielding:
Fun fact: A simple buffer computation on the stops will tell you that over 70% of all streets in the city of Tyler are within a half mile of a bus stop. (That’s less than the distance I walk to and from the L every day.)
This is good progress, however, its far from perfect. The geocodes for the bus stops are not their actual location, but rather that of the next intersection following the stop. Worse, many of them didn’t geocode at all, forcing me into an ardous process of trying to manually locate them using Google Maps and Google Street View. Even then I wasn’t able to determine even an approximate location for some of the stops.
I have long-term plans for dealing with this and the other data quality issues. Better stop locations can be crowd-sourced by users. The arrival times present a more audacious challenge as I have to compute estimated times for all the stops which don’t have times in the official timetable. Fortunately, the street centerline data provides me with both distance and speed limit, so I should be able to make sound estimates and fine-tune those with user feedback.
Though much of it was painfully manual, most of the required data preparation is done at this point and I’ll can move on to prototyping the application. Interested coders can follow my progress at the hacktyler-transit repository on Github. Everyone else: speak your mind.
Everything begins with data
A week’s gone by since I sparked an unexpected ruckus with my inaugural Hack Tyler post. I had no idea it was going to find resonance with so many people. I’ve received comments from coders, journalists, and government wonks of all stripes. Even more exciting, I’ve heard from a diverse cast of current and former citizens of Tyler—some wild about my ideas and some… less so. I even heard from a local high school student who wants to become a coder, but isn’t sure where to start.
I’ve tried to respond to everyone as best I can, however, I’ve made a conscious decision not to try to correct every misconception about what I’ve written. If folks are concerned I might to be about to embark on some carpetbagging idealistic crusade against the local government, I’m happy to try to sort those concerns out individually. I’m not about to turn this blog into a policy. I want to spend my time actually doing things.
To that end, I’ve spent the last week focusing on the data made available by the City of Tyler and its parent, Smith County. I’ve created a list of all data sources I’ve been able to identify. Its been heartening to see how much data actually is available (albeit often in less than ideal formats). A few items are of particular note:
- Tyler and Smith County have a joint GIS repository that is quite extensive and (ostensibly) updated at regular intervals.
- Tyler has a real-time list of where its police officers are responding to calls. I’ve never seen this in any other city. (It seems to have been a student capstone project.)
- Smith County has put most, if not all, of its financial documentation online.
Since starting this project I’ve learned a little bit about Texas’ Public Information Law, which seems robust. Either because of the law or because of Texas culture there is a greater amount of “public by default” data than I’m used to. According to Texas Tribune reporter Matt Stiles in an On The Media interview this transparency is an effect of the state’s conservative culture. I’m hoping I can take proper advantage of this openness to get even more data, such as some of the datasets off Max Ogden’s authoritative, crowd-sourced civic datasets list.
In addition to creating my list of links I’ve also started building out infrastructure. So far I’ve:
- Purchased a domain to host apps at, hacktyler.com and migrated this blog over to it.
- Setup a hacktyler organization on Github—leaving open the possibility someone might join me in my endeavors.
- Provisioned an EC2 instance to host all my projects. (Codename Hasufel.)
- Started to hack together a Boundary Service instance (source) for Tyler and Smith County.
There isn’t much data in the Boundary Service yet, but once populated it will allow me (and any other developers) to build apps using regional GIS data, without having to muck about with shapefiles and databases. I had a great deal of fun building the Boundary Service for Chicago so I’m excited to be able to repurpose it for Tyler. I believe strongly in building APIs like this one so that others can build on the things I make and I hope to create more of them as I go along.
Preparing the data is a crucial step toward building any applications, but I hope to get started building more generally useful products soon. I’ve got a number of ideas queued up and I’m saving the first of them for my next blog post, by which time I hope to have gotten a response from the City of Tyler Transit Department on my first data request.