It is convenient to have a geocoded dataset to work with when developing visualization tools. Although batch geocoding should be done in methods other than webservice calls, it is nice to have a command-line tool that we can call as needed. Below is a description of one such tool. Here is also included descriptions of various scripts utilized to prepare the dataset.
Let’s play with a dataset from NYC Open Data. One of the dataset that is available is SAT Scores for NYC public schools. Let’s download the CSV from here: https://nycopendata.socrata.com/Education/SAT-College-Board-2010-School-Level-Results/zt9s-n5aj
The first line of the file contains the column names. Let’s use that information to create a database table in which we can save it.
In order to load the file into this table, we will need to remove the first line (the column names) and put the file in the /tmp directory so the mysql user can access it.
Let’s load it into the table:
It just so happens that this dataset does not contain addresses for the schools. The addresses for the schools are available on school-specific pages like this one: http://schools.nyc.gov/SchoolPortals/01/M292/default.htm. However, the dataset contains what they call a DBN. That number happens to be contained in the URL. So what we can do is get the DBN from our newly created table and parse the contents of the URL. We will use the HTMLParser class in Python (http://docs.python.org/library/htmlparser.html).
Once we have the addresses, we can fetch the geocodes. There’s a script at http://rbrundritt.wordpress.com/2010/01/23/accessing-bing-maps-web-services-with-perl-and-python/ that we can modify and use for that purpose. But we will need to install a few libraries before we can call the web service.
And here’s the script that has been modified for our purpose:
Of course, we will need to save all this data to our table. In order to use the Python MySQL libraries, we need to install them.
And here’s the code for retrieving and saving the data to our table:
Now that we have all our tasks in place, we can call them from a driver:
That’s it! Now we can go ahead and build the cool stuff…the tools that can make this data useful!
BLEN Corp is a small, minority and veteran-owned information technology firm located in Washington D.C. Since 2004, we have been ahead of the curve in early adaption and implementation of cutting edge technologies including web and mobile development, service-oriented architecture, and other innovative web based solutions. Look at some of our projects.
655 New York Ave NW
Washington, DC 20001