Monday, January 31, 2011

GeoCoding

So I'm doing some work on Geocoding in PHP... Almost everything out there uses Perl and the old 2004 or max 2006 TIGER data (rt1 & rt2 format, not dbf/shp). I want to make my own geocoder, I want it done in PHP so I can set it up as a web service so I can meter it and charge our customers appropriately for its usage, and I want it done with 2010 Tiger data... wat do?

Well, first I went through the motions to do the Geocoding in Perl with the 2006 data set, only for our existing addresses just to see what kind of a success rate I would get. Boy was I suprised when I got a 75% success rate! (For more info on the Perl solution visit http://www.developer.com/tech/article.php/3557171)

For the remaining 25% I just ran them through Google's geocoder and now everything is Geocoded, taking the pressure off me to get that part finished to get some product out the door. Now I can take my time, and so begins my trek for a modern PHP geocoding web service.

First stop, get tiger 2010 shape files from http://www.census.gov/cgi-bin/geo/shapefiles2010/main. Reading the PDF of technical notes (http://www.census.gov/geo/www/tiger/tgrshp2010/TGRSHP10.pdf) I see that "block" is a census block and not the kind of block that has the address numbers we need, so that's not the right one.

Now I'm not sure which one we need yet, but so far it looks like "Address Ranges" which has the starting and ending house numbers on a block and a zip code, but doesn't have Road name... We could team this up with Roads but correlating the road to the address range seems not to work without "All Edges" as well. Well, it's moot unless we can import this data into a database so lets jump over to that for a minute.

Google search for shp2mysql brings us to http://kartoweb.itc.nl/rimapper/

Download the loader on CentOS x64 and try to compile gives us an error. Ok, delete the *.o files in the src/ dir and download shapelib from http://shapelib.maptools.org/dl. Extract/compile that and copy the .o files into the shp2mysql src dir. Edit makefile in shp2mysql and remove getopt.o then compile. Run it on a test .shp file off the tiger set and it starts spitting out SQL queries. Awesome.

Left to do:
1) Figure out which shapefiles I need to import
2) Figure out how to find the house number ranges for a given road from the shapefiles
3) Figure out how to find the spot on the line for a range of house numbers that a specific house number lies. I'm thinking it's simple division, if we are searching for 1650 and a given line has 1600-1699 and is 100 meters long, then we use this formula for where on the line (as a percentage of length of line) to show our marker: pos = (highaddr - ouraddr) * ((highaddr - lowaddr) / maxlength)
4) Split up a given address into components to do a search on tiger data to find house number, road, zip, etc.
5) Write the geocoder!

More coming soon...

No comments:

Post a Comment