Research and [Barriers to] Development

I’m somewhat reticent to admit that the pace of Hack Tyler development has slowed and will likely remain that way for a month or two. I spent the last week packing and cleaning. My wife and son have moved and transported the majority of my belongings with them. My things are now waiting for me in a storage unit in Tyler. As a consequence, I have only my netbook to hack on and no desk space to do even that.

I’m relatively used to limited accommodations, so I’m not particularly uncomfortable. However, it does take the edge off my capacity and encourages me to reach for other activities I haven’t found enough time for over the last year. I’ve also contracted some additional work to keep myself busy in the interim. In order not to completely lose momentum on this project, I’ve shifted my focus to research and communication tasks.

I’ve been in touch with Tyler Transit regarding Tyler on Time and learned a great deal of interesting things about their systems. Most notably, the current transit system is in the process of being completely overhauled and the existing bus routes will cease to exist sometime in August. The Transportation Operations Coordinator for the department has offered to provide me with updated shapefiles and timetable data in advance of the switchover, which will allow me to preemptively refactor Tyler On Time for the new routes. This opens up the possibility of Tyler on Time “launching” with the new routes, which seems eminently useful.

Unfortunately, this new data will not include timetable for all stops, but will continue to be “waypointed” as the current data is. This makes it very difficult to offer accurate intermediate stop times. I’ve yet to decide how to handle this, but I’m leaning toward to presentation solution rather than an algorithmic solution. Something like:

The nearest stop with scheduled departure times is 4 stops away, the next bus is scheduled to arrive at that stop in 5 minutes. The previous bus departed that stop 14 minutes ago.

Predicting stop times is likely not possible as Tyler is reputed to have significant traffic congestion problems, which would render estimates based on speed and distance inaccurate. I’m open to suggestions about how else I might handle this.

Learning about the details of Tyler’s changing transit system has also led me to a number of interesting documents related to Tyler’s municipal planning:

These documents present more information that I can possibly digest during the time I have left in Chicago, but I expect studying them to provide me with essential context for my own ideas. Additionally, the “Summary File 1” batch of census data for Texas will be released sometime in the next two months, providing further insight into the place and its people. I’ll be especially excited to write about this data, given how much time I’ve spent working with census data lately.

All in all, I expect I will write much less code this month than I did in June, but I will continue to inform myself and prepare for the things that come next. Best of all, I’ve now got my son’s comprehensive nightly reports:

It’s hot. We’re going to the pool again.

Delivering the beta

Under ordinary circumstances I would have released a first beta of this app weeks ago. I was dissuaded both by the shifting landscape of data as well as by my concern that someone in Tyler might actually try to use it to catch the bus and fail due to its incompleteness. I’m confident now that it sufficiently advertises its failures (lack of Saturday schedules, for example) to prevent this. Thus I present for commentary the first original Hack Tyler app:

Tyler On Time »

The application delivers the following features that I determined to be absolutely necessary in a transit app:

  • Tell me when the bus is coming.
  • Show me where the bus is going to be (maps).
  • Allow me to save my favorite stops.
  • Function acceptably on desktop, tablet, and mobile devices.
  • Be usable (via PhoneGap Build) as a native Android/iPhone app*.
  • Do not require an internet connection.

Of the items on this list, I’m perhaps most excited about having static maps for every stop. I owe their existence almost entirely to the fine folks at Development Seed who created TileMill.

Here is the map for the 1900 N Broadway Ave stop on the Red Line North:

1900 N Broadway

With these maps, I can provide a visual aid to navigation without compromising the app’s ability to run offline. The code for generating the maps can be found in the maps directory of the repository.

There are a number of worthwhile features that have not yet been developed, including a “Stops Near Me” geolocation feature, a crowd-sourcing mechanism for stop landmarks and a dynamic route/stop map for desktop and mobile users with internet access. You can see the complete list of issues and ideas on the project’s Github Issues page.

The most significant problem with the application is the relatively poor accuracy of the departure times. The coarse schedule information available from official sources requires that I estimate times for the vast majority of the stops. Although the estimations are likely good enough to be useful, the algorithm is crude. Consequently, my next step will be to ask Tyler Transit for more detailed timetable data. As I mentioned in my last blog post, it’s my belief that governments are much more likely to produce information if the utility of it is self-evident. Hopefully the existence of Tyler On Time justifies whatever investment would be required for them to release this data.

Though the basic functionality validates my time investment so far, this project also has a couple of significant stretch goals. First, I would like to build an SMS version of the app for users without smartphones. My friends at the awesome cloud-telephony service Tropo have expressed an interest in partnering on this project, which shouldn’t be particularly challenging to implement once better timetables are nailed down.

Second, I would like to convert the bus data into GTFS format and have Google Maps pick up the results. I suspect this would require an official endorsement from Tyler Transit, however, the value of doing so would be very high. It would allow Tylerites and visitors to get directions that include public transit as a navigation option. It would also allow Tyler On Time to provide “walk, ride, walk” directions to users of the application, like this.

Finally, some notes about the technology being used in the app. The stack was heavily inspired by a very successful sprint the Tribapps team executed for the Chicago Breaking News Live application. Similar to that app, Tyler On Time’s logic is entirely client-side, backed by a small amount of Backbone.js (for url routing) and a tremendous amount of Underscore.js (for everything else). The static files themselves are hosted on Amazon S3. Basic styles and responsive switchy design come from the Skeleton framework. It has HTML5 semantic markup. The data processing was scripted primarily with Python, GDAL and csvkit. Stop maps were produced using TileMill with a modified version of Development Seed's custom base layer for Washington D.C. and data from the Smith County Map Site and Open Street Map. The whole thing was developed on Ubuntu Linux. Everything is open source.

I expect to keep iterating this application for at least a month, so please leave your suggestions (especially those of you from Tyler). Hopefully by my next post I will have detailed timetable data and be ready to move forward with additional methods of delivering that information to users.

*The application has not yet been deployed to either the Android Market or the App Store, but those with comfortable installing unsigned Android packages can download a beta here.

Data, suddenly available

Hack Tyler is an idea born out of pragmatism and self-exorcism, but underlying that are my beliefs about open governments, open data and the power of public service. One of the more persuasive statements of this ethos I’ve heard is “Public Equals Online”, the name of the Sunlight Foundation’s 2010 campaign for government transparency. Its not enough that governments produce and warehouse data that is legally accessible to the public—this is the equivalent of building a park in the mountains and not telling anyone it exists. In order for data to be truly public it must be like the town square—open, accessible and obvious.  The corollary benefit is, of course, that someone can come along and build useful things with it.

So it is with great pleasure that I note that the Smith County Mapsite (that also warehouses GIS data for the City of Tyler) now holds official shapefiles for bus routes and bus stops. This is proper survey data and supersedes the information I described aggregating in my last blog post. This raises a few important points:

  • I was wrong. I should have asked for the data first. My desire to get things done probably cost me more time than it would have taken to ask for the data. In addition, I made an ill-founded assumption about what data existed. (The Tyler GIS department clearly has good maps.)
  • Public equals online. This data is now public, it wasn’t before. This is a success. Now its time to learn from this and ask for better timetable data.
  • I wasn’t wasting my time. It has always been my belief that you don’t influence governments by explaining how awesome things could be. You influence them by proving something is useful and then explaining how much more awesome it could be. Its clear that in some (perhaps indirect) sense Hack Tyler caused these files to become public. I’m putting that in the “win” category.
  • As far as hand-crafted shapefiles go, I didn’t do too bad:

Hack Tyler data:

Official data:

Using the official data, I can also revise another calculation: in fact, 72.7% of all streets in Tyler are within a half-mile of a bus stop. That’s not very far given that, according to a Tyler city planner, all buses in Tyler are equipped with bike racks.

Hacking Tyler Transit

Why bus schedules? In my first post I named them at the top of my list of datasets I would like to build on. I also mentioned that I intended to avoid buying a car once I moved, a statement that provoked significant eye-rolling. I’ve been told that no one rides the bus in Tyler or that only poor people do. A fellow hacker who grew up Tyler told me he didn’t even know they had a bus system. This isn’t really a surprise—Tyler has low population-density (1,982 people per square mile, according to Wolfram Alpha) and a food desert in its urban core. I was stunned to discover that a transit system even existed. So why do I think its a good idea to digitize the bus schedule? Five reasons:

  1. I need it. Its not just that I don’t want to drive. It’s that I suck at driving. Having access to public transit is an immediately useful thing for me.
  2. Tyler has several colleges, but none of them even mention the bus system on their websites. If building this app means one student takes the bus instead of driving then it will be a success.
  3. It’s easy. (Mostly, more on this below.)
  4. It’s an excellent pilot project. The data is available (albeit in a terrible format) and the shape of the application I will build is relatively straightforward.
  5. Financial freedom, green living, world peace, etc.

The first thing I needed in order to build this app was to get data for routes, schedules and stop locations. The Tyler Transit agency publishes a route map as PDF, though it only includes a very small number of stops. They publish schedule data for weekdays and Saturdays as PDFs. These PDFs only include estimated arrival times for five stops per route, less than ten percent of the total number of stops. Stop location data isn’t available anywhere online, so I emailed Tyler Transit and asked for a complete list. I requested an Excel document; they sent me a PDF of a scan of a printout of a web application.

I don’t raise these data quality issues as an affront to Tyler Transit. Through my own experiences and those of my many friends in the open government community I’ve learned that this is the state of public data in much of the US. I want to help change that, but right now I’m not trying to open governments, I’m just trying to build a transit app, so I did what a pragmatic geek has to do sometimes:

I keyed them.

Lacking an obvious way to extract the data I needed from a scanned PDF I took two hours and re-keyed the spreadsheet. This also gave me the opportunity to correct numerous typos in street names that would have foiled any geocoder. The results:

Short version: Tyler has 236 bus stops serving virtually all significant public and private institutions, including both large colleges: UT Tyler and Tyler Junior College.

Using the route map, the street centerline GIS data available from Smith County, QGIS and a lot of patience I was able construct what is possibly the only digital map of Tyler’s bus routes. I then geocoded the above bus stops list and put those over the top, yielding:

The black outline is the Tyler city limits, the thin gray lines are streets, and the thick colored lines are the bus routes. The bus icons are individuals stops.

Fun fact: A simple buffer computation on the stops will tell you that over 70% of all streets in the city of Tyler are within a half mile of a bus stop. (That’s less than the distance I walk to and from the L every day.)

This is good progress, however, its far from perfect. The geocodes for the bus stops are not their actual location, but rather that of the next intersection following the stop. Worse, many of them didn’t geocode at all, forcing me into an ardous process of trying to manually locate them using Google Maps and Google Street View. Even then I wasn’t able to determine even an approximate location for some of the stops.

I have long-term plans for dealing with this and the other data quality issues. Better stop locations can be crowd-sourced by users. The arrival times present a more audacious challenge as I have to compute estimated times for all the stops which don’t have times in the official timetable. Fortunately, the street centerline data provides me with both distance and speed limit, so I should be able to make sound estimates and fine-tune those with user feedback.

Though much of it was painfully manual, most of the required data preparation is done at this point and I’ll can move on to prototyping the application. Interested coders can follow my progress at the hacktyler-transit repository on Github. Everyone else: speak your mind.