Urbana Crime
On Saturday night, I was browsing the City of Urbana website to look up information regarding the Farmers' Market that is held each Saturday morning in the summer. While I was there, I noticed that the city police department posted their crime reports on the website in PDF format. I had recently visited ChicagoCrime.org, and thought that it would be cool if something like that existed for Urbana. I went to bed, and thought it over a bit before I went to sleep. On Sunday, Holly had to work (silly patients), so I spent the day working on this project.
The application was basically split into two components:
- A service that periodically collects and catalogs the daily crime reports
- An interface to analyze and display the crime data
Crime Collector Service
The first challenge of making this service was to create something that could access the Urbana website, get a listing of the posted reports, and retrieve the ones that haven't yet been processed. This seems fairly straightforward, but the Urbana website is "best viewed with IE", and didn't have the best markup.
For the first example of the less-than-helpful site, they discourage direct linking to the page that lists the files. It turns out that the page is actually a script that simply spits out a pretty directory listing given a path in the query string. Really, this isn't so bad. I understand them not wanting to give a direct directory listing. However, the script that generates the html doesn't know of the existence of a closing tag! That's right. Even though the directory listing had 3388 "<td>" tags, there wasn't a single closing tag among them. No wonder firefox took up all of my cpu for 15 seconds while trying to render the page. Oh, and while the page does in fact include a css stylesheet, there are a lovely 3386 font tags included in the bloat as well. Ugh.
After writing some Regex patterns to banish the idiocy, I now had a nice file listing from which to choose files. It was now up to me to write some code to actually retrieve the files and parse them.
Retrieving the files had a small caveat though. Their webserver is somehow misconfigured and sends back headers with spaces in the header names. This causes the .Net framework to freak out saying that the server "committed an http protocol violation". Fortunately, some simple configuration changes to tell the .net framework to use unsafe header parsing.
<system.net>
<settings>
<httpWebRequest useUnsafeHeaderParsing=”true” />
</settings>
</system.net>
Anyhoo, now I could get PDFs. I tried looking into some open source pdf to text converters, but they kept barfing on the PDFs, saying that they were invalid. It figures. Adobe has a free online pdf to html and pdf to text conversion service, but that seemed like a kludge to have another point of failure for when adobe decides to change their web forms. Instead, since the PDFs were basically text files (really, they are. There's no special formatting to them at all), I opened them in a text editor and realized that I could parse them just as easily.
So, now that I could open and read the PDFs with code, I whipped up some more Regular Expressions to parse out individual crime reports. I didn't go nuts, but I did manage to get out the reportID, the date, the location, the officer, and the description of the offense. This data get stuck in a SQL server for use later.
Then I just linked up all of that code to a Windows Service that will poll the website every few (six) hours, and installed it on my server.
Web Interface
I had basically finished the service by lunch time, so I went to Holly's work in Danville and met her for lunch. When I got back, I felt like creating the data retrieval interface.
At first, I just wanted to have a daily report of what crimes were committed. So, I fired up SQL Reporting Services, and I had what I was looking for in about 20 minutes.
Thinking back to ChicagoCrime.org, I decided that I should look into all of the Google Maps hacks that are floating around on the web. I came across this page, and knew I had what I was looking for. I had three tasks to do:
- Generate an xml file containing the location data
- Modify the XSL to transform the data in the format I wanted
- Integrate the sample page into an Asp.Net page.
I actually started in reverse order. Using the provided xml files, I worked on embedding the right code into my asp.net page. I had it working properly in about 20 minutes. After that, I started modifying the xsl to add a few more elements to the "speech bubble" that shows up over the map icon when clicked. This again took about 15 minutes to get right.
The hairy part was actually generating the proper xml. The locations listed in the crime reports are in a different form than what google wants (i.e. "1600 BLOCK OF LINCOLN AVE N", rather than "1600 N Lincoln Ave"). I applied some deft text manipulation to get the proper addresses. Then I saw that the xml file actually wanted latitude and longitude coordinates for each point, rather than an address. I found (by surfing through other web resources) that if one queries google maps with the address and adds "&output=js" to the existing query, google returns xml data with the lat and long coordinates of the address. so, I modified the windows service to store the lat and long along with each location so that I don't have to query google for each location every time I want to view a map of the data.
After that, it was just a matter of linking everything up, color-coding the crimes (red for violent crimes, blue for theft, yellow for non-violent, etc), and I had a fairly functional site. In the future, I can add more reports in reporting services to look at trends if I feel so inclined.
All in all, it wasn't bad for a Sunday's worth of work. And it really wasn't work, since I actually wanted to do it.
11 comments:
This is pretty god damned pimp.
linkage?
Yes, with all of this technical wizardry, you'd think I could figure out how to have my Vonage router and Coyote Linux box be able to handle double-NAT and QOS at the same time. Unfortunately, I can either allow access to my web server, or I can have my calls not drop under high load. For now anyway, I choose the latter.
Come on Tim, I was seriously impressed until this last comment. I want access! And Indy data as well! Get moving!
Show me where Indy lists their incident reports online where they *don't* charge a fee for each report and we'll talk.
Well they do have this sucky version of your app here, but the text reports are all pay-only. Suxxon!
Would be really cool... I'd also like to see Champaign and U of I police stats added in. (Although I see a couple of yellow flags in Champaign, so are they still kept separate? At least when I was an undergrad, the U was notorious for suppressing rape reports.) And I'd also be interested in seeing one more color (pink?) for violent crimes against women. I can think of a few groups who might even be interested in supporting something....
The problem is a lack of normalized reports. I am able to parse the urbana reports because they are all in the same format. The campus police have theirs in some sort of narrative format (http://www.dps.uiuc.edu/crimereports.aspx), while I can't even find any for Champaign. One would think this information would be more readily available.
Borrow some university bandwidth and go to town.
Ok, I'm not sure what I've stumbled on, but I have been trying to find crime reports for Urbana and I can't. In all your computer jargon and accomplishments is there a way for me to see the crime reports and I missed it? Let me know!
Sadly, my service provider has changed, and I am unable to host the mapping website anymore. You can still view the text version of the reports by going to city.urbana.il.us and clicking on Police-> Reports.
Post a Comment