Geocoding is a process whereby an address is converted into a coordinate pair. While this may seem fairly straightforward—the prevalence of Google Maps, etc. tells us this—it is a very difficult technical problem, especially when many addresses in many localities in multiple languages are involved.
There are a huge number of geocoding services available and they change all the time—my current favorite is one called OpenCage. There are a number of reasons I like it:
So we like OpenCage. Go to the OpenCage Website and sign-up for an account. Once you have your account set-up you should see something called an ‘API key’ on your Dashboard. You will need this key to geocode your addresses. Copy your API key and store it somewhere. You will need it later!
We begin by creating a new virtual environment for our geocoding work. As a reminder, a virtual environment is a self-contained Python environment, generally built to accommodate a specific project. We use these so that we’re not endlessly installing pacakges into a single bloated environment; instead, we work only with those packages we need. In a terminal window type…
# Tells python 3 to run the venv module...
# ...and creating a new virtual environment...
# ...in our current directory.
python3 -m venv venv/
# On Windows...
python -m venv venv\
We can activate this virtual environment by running the following at the terminal:
# On macOS or Linux...
. venv/bin/activate
# On Windows...
venv\Scripts\activate.bat
Once we’ve activated our virtual environment, any packages we install
will be specific to this environment—this is very useful for managing
packages on a project-by-project basis. We’re going to install the
opencage
package, which is a fairly thin wrapper for the
OpenCage API (by ‘thin wrapper’, I just mean that it provides
convenience functions without contributing much in the way of added
functionality).
pip install opencage jupyter
Create a new Python file; at the top, we need to import the relevant
OpenCageGeocode
object from the library. We’ll also import
pprint
(pretty print) which will print, for example, JSON
responses nicely.
from opencage.geocoder import OpenCageGeocode
from pprint import pprint
We’ll then create a new instance of the OpenCageGeocode object; in order to do this, we need to use the API key that we were given when we created a new OpenCage account. But we want to do this in a way that prevents others from finding our key. If you’re going to be working on a public git repository (e.g., Github, Gitlab), your goal is to NEVER COMMIT YOUR KEY to your public repository. This means writing your code such that the key is read in from an environment variable.
The easiest way to store an environment variable is to create a new
file called simply .env
in your project directory and use a
nifty package called python-dotenv
to load your variables.
If you’re using a git repository, make sure you add .env
to
your .gitignore
file! You’ll have to install the package
like so (assuming that you’ve activated your virtual environment…)
pip install python-dotenv
In your .env
file, include this line:
OPENCAGE=yourapikey
Once you’ve set the environment variable, reading it in a Python script becomes fairly trivial. We’ll now do this and create a new OpenCageGeocode instance.
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv()
= os.getenv('OPENCAGE')
key = OpenCageGeocode(key) geocoder
Generally speaking, there are two primary ‘directions’ for a geocoding operation.
The former is the more common research task, but for completeness’
sake, let’s demonstrate a reverse geocode. The instance of the OpenCage
geocoder you created above has a method called
reverse_geocode
. Let’s query our geocoder using a
latitute/longitude pair in the fine city of Somerville, Massachusetts
using pprint()
to print the (pretty extensive!) results
readably.
= geocoder.reverse_geocode(42.385776, -71.094382)
results pprint(results)
[{'annotations': {'DMS': {'lat': "42° 23' 8.83716'' N",
'lng': "71° 5' 39.85944'' W"},
'FIPS': {'county': '25017', 'state': '25'},
'MGRS': '19TCG2759494735',
'Maidenhead': 'FN42kj82qo',
'Mercator': {'x': -7914193.002, 'y': 5190132.398},
'OSM': {'edit_url': 'https://www.openstreetmap.org/edit?way=29794889#map=17/42.38579/-71.09441',
'note_url': 'https://www.openstreetmap.org/note/new#map=17/42.38579/-71.09441&layers=N',
'url': 'https://www.openstreetmap.org/?mlat=42.38579&mlon=-71.09441#map=17/42.38579/-71.09441'},
'UN_M49': {'regions': {'AMERICAS': '019',
'NORTHERN_AMERICA': '021',
'US': '840',
'WORLD': '001'},
'statistical_groupings': ['MEDC']},
'callingcode': 1,
'currency': {'alternate_symbols': ['US$'],
'decimal_mark': '.',
'disambiguate_symbol': 'US$',
'html_entity': '$',
'iso_code': 'USD',
'iso_numeric': '840',
'name': 'United States Dollar',
'smallest_denomination': 1,
'subunit': 'Cent',
'subunit_to_unit': 100,
'symbol': '$',
'symbol_first': 1,
'thousands_separator': ','},
'flag': '🇺🇸',
'geohash': 'drt3nhneh794edc7fzzj',
'qibla': 60.41,
'roadinfo': {'drive_on': 'right',
'road': 'Highland Avenue',
'speed_in': 'mph'},
'sun': {'rise': {'apparent': 1583320500,
'astronomical': 1583314920,
'civil': 1583318820,
'nautical': 1583316840},
'set': {'apparent': 1583361480,
'astronomical': 1583280660,
'civil': 1583363160,
'nautical': 1583365080}},
'timezone': {'name': 'America/New_York',
'now_in_dst': 0,
'offset_sec': -18000,
'offset_string': '-0500',
'short_name': 'EST'},
'what3words': {'words': 'beans.anyway.valve'}},
'bounds': {'northeast': {'lat': 42.3859639, 'lng': -71.0941313},
'southwest': {'lat': 42.3856123, 'lng': -71.0946794}},
'components': {'ISO_3166-1_alpha-2': 'US',
'ISO_3166-1_alpha-3': 'USA',
'_category': 'building',
'_type': 'building',
'continent': 'North America',
'country': 'United States of America',
'country_code': 'us',
'county': 'Middlesex County',
'house_number': '79',
'neighbourhood': 'Spring Hill',
'postcode': '02143',
'road': 'Highland Avenue',
'state': 'Massachusetts',
'state_code': 'MA',
'town': 'Somerville',
'unknown': 'Somerville Central Library'},
'confidence': 10,
'formatted': 'Somerville Central Library, 79 Highland Avenue, Somerville, MA '
'02143, United States of America',
'geometry': {'lat': 42.3857881, 'lng': -71.0944054}}]
We see that the results are extensive! Not only does it return an address, but it returns a placename—Somerville Central Library—administrative boundaries, current sunset and sunrise information, and local currency! The data is returned formatted in JavaScript Object Notation—or JSON, This is very, very common when working with web service APIs. JSON has become something like the lingua franca of application development.
We can parse it easily! Let’s say we wanted to retrieve only the
formatted placename, including the address. We want the first result
(i.e., the 0th index position in Python-ese), and we want the value of
the formatted
key. So…
= results[0]['formatted']
formatted print(formatted)
Somerville Central Library, 79 Highland Avenue, Somerville, MA 02143, United States of America
Similarly, if we wanted to get just the street name, the town and the state, we could concatenate those values like this…
= results[0]['components']['road'] + ', ' + results[0]['components']['town'] + ', ' + results[0]['components']['state_code']
street print(street)
# Or, using join...
= ', '.join([results[0]['components']['road'], results[0]['components']['town'], results[0]['components']['state_code']])
street print(street)
# Or, slightly cleaner yet...
= results[0]['components']
comp = ', '.join([comp['road'], comp['town'], comp['state_code']])
street print(street)
Highland Avenue, Somerville, MA
Highland Avenue, Somerville, MA
Highland Avenue, Somerville, MA
Parsing JSON like this is a very common task so it’s worth getting comfortable with it!
The more common task for a geocoder is to take an address, or a long list of addresses, and turn it into a mappable latitude/longitude pair. This, again, is called forward geocoding. The process is very similar to the above. Note that one of the great things about OpenCage is that your dataset does not need to be pre-parsed! No more meticulously combing your address fields for street numbers, street names, town names, state names, postal codes, etc. The geocoder will handle that for you.
Let’s do the reverse of the above: send the geocoder the address of the Somerville Public Library Central Branch in Winter Hill.
= u'79 Highland Avenue, Somerville, MA 02143'
query = geocoder.geocode(query)
results
pprint(results)
[{'annotations': {'DMS': {'lat': "42° 23' 8.83716'' N",
'lng': "71° 5' 39.85944'' W"},
'FIPS': {'county': '25017', 'state': '25'},
'MGRS': '19TCG2759494735',
'Maidenhead': 'FN42kj82qo',
'Mercator': {'x': -7914193.002, 'y': 5190132.398},
'OSM': {'edit_url': 'https://www.openstreetmap.org/edit?way=29794889#map=17/42.38579/-71.09441',
'note_url': 'https://www.openstreetmap.org/note/new#map=17/42.38579/-71.09441&layers=N',
'url': 'https://www.openstreetmap.org/?mlat=42.38579&mlon=-71.09441#map=17/42.38579/-71.09441'},
'UN_M49': {'regions': {'AMERICAS': '019',
'NORTHERN_AMERICA': '021',
'US': '840',
'WORLD': '001'},
'statistical_groupings': ['MEDC']},
'callingcode': 1,
'currency': {'alternate_symbols': ['US$'],
'decimal_mark': '.',
'disambiguate_symbol': 'US$',
'html_entity': '$',
'iso_code': 'USD',
'iso_numeric': '840',
'name': 'United States Dollar',
'smallest_denomination': 1,
'subunit': 'Cent',
'subunit_to_unit': 100,
'symbol': '$',
'symbol_first': 1,
'thousands_separator': ','},
'flag': '🇺🇸',
'geohash': 'drt3nhneh794edc7fzzj',
'qibla': 60.41,
'roadinfo': {'drive_on': 'right',
'road': 'Highland Avenue',
'speed_in': 'mph'},
'sun': {'rise': {'apparent': 1583320500,
'astronomical': 1583314920,
'civil': 1583318820,
'nautical': 1583316840},
'set': {'apparent': 1583361480,
'astronomical': 1583280660,
'civil': 1583363160,
'nautical': 1583365080}},
'timezone': {'name': 'America/New_York',
'now_in_dst': 0,
'offset_sec': -18000,
'offset_string': '-0500',
'short_name': 'EST'},
'what3words': {'words': 'beans.anyway.valve'}},
'bounds': {'northeast': {'lat': 42.3859639, 'lng': -71.0941313},
'southwest': {'lat': 42.3856123, 'lng': -71.0946794}},
'components': {'ISO_3166-1_alpha-2': 'US',
'ISO_3166-1_alpha-3': 'USA',
'_category': 'education',
'_type': 'library',
'continent': 'North America',
'country': 'United States of America',
'country_code': 'us',
'county': 'Middlesex County',
'house_number': '79',
'library': 'Somerville Central Library',
'neighbourhood': 'Spring Hill',
'postcode': '02143',
'road': 'Highland Avenue',
'state': 'Massachusetts',
'state_code': 'MA',
'town': 'Somerville'},
'confidence': 9,
'formatted': 'Somerville Central Library, 79 Highland Avenue, Somerville, MA '
'02143, United States of America',
'geometry': {'lat': 42.3857881, 'lng': -71.0944054}},
{'annotations': {'DMS': {'lat': "42° 23' 11.51160'' N",
'lng': "71° 5' 52.17000'' W"},
'FIPS': {'state': '25'},
'MGRS': '19TCG2731494825',
'Maidenhead': 'FN42kj82gs',
'Mercator': {'x': -7914573.676, 'y': 5190243.952},
'OSM': {'note_url': 'https://www.openstreetmap.org/note/new#map=17/42.38653/-71.09783&layers=N',
'url': 'https://www.openstreetmap.org/?mlat=42.38653&mlon=-71.09783#map=17/42.38653/-71.09783'},
'UN_M49': {'regions': {'AMERICAS': '019',
'NORTHERN_AMERICA': '021',
'US': '840',
'WORLD': '001'},
'statistical_groupings': ['MEDC']},
'callingcode': 1,
'currency': {'alternate_symbols': ['US$'],
'decimal_mark': '.',
'disambiguate_symbol': 'US$',
'html_entity': '$',
'iso_code': 'USD',
'iso_numeric': '840',
'name': 'United States Dollar',
'smallest_denomination': 1,
'subunit': 'Cent',
'subunit_to_unit': 100,
'symbol': '$',
'symbol_first': 1,
'thousands_separator': ','},
'flag': '🇺🇸',
'geohash': 'drt3nhhpkw54pt5hzfm9',
'qibla': 60.4,
'roadinfo': {'drive_on': 'right',
'road': 'Highland Ave',
'speed_in': 'mph'},
'sun': {'rise': {'apparent': 1583320500,
'astronomical': 1583314920,
'civil': 1583318820,
'nautical': 1583316840},
'set': {'apparent': 1583361480,
'astronomical': 1583280660,
'civil': 1583363160,
'nautical': 1583365080}},
'timezone': {'name': 'America/New_York',
'now_in_dst': 0,
'offset_sec': -18000,
'offset_string': '-0500',
'short_name': 'EST'},
'what3words': {'words': 'zones.pardon.rainy'}},
'components': {'ISO_3166-1_alpha-2': 'US',
'ISO_3166-1_alpha-3': 'USA',
'_category': 'building',
'_type': 'building',
'continent': 'North America',
'country': 'United States of America',
'country_code': 'us',
'house_number': '79',
'road': 'Highland Ave',
'state': 'Massachusetts',
'state_code': 'MA',
'town': 'Somerville'},
'confidence': 10,
'formatted': '79 Highland Ave, Somerville, MA, United States of America',
'geometry': {'lat': 42.386531, 'lng': -71.097825}},
{'annotations': {'DMS': {'lat': "42° 22' 58.44000'' N",
'lng': "71° 6' 10.08000'' W"},
'FIPS': {'county': '25017', 'state': '25'},
'MGRS': '19TCG2689594432',
'Maidenhead': 'FN42kj71pv',
'Mercator': {'x': -7915127.49, 'y': 5189698.73},
'OSM': {'note_url': 'https://www.openstreetmap.org/note/new#map=17/42.38290/-71.10280&layers=N',
'url': 'https://www.openstreetmap.org/?mlat=42.38290&mlon=-71.10280#map=17/42.38290/-71.10280'},
'UN_M49': {'regions': {'AMERICAS': '019',
'NORTHERN_AMERICA': '021',
'US': '840',
'WORLD': '001'},
'statistical_groupings': ['MEDC']},
'callingcode': 1,
'currency': {'alternate_symbols': ['US$'],
'decimal_mark': '.',
'disambiguate_symbol': 'US$',
'html_entity': '$',
'iso_code': 'USD',
'iso_numeric': '840',
'name': 'United States Dollar',
'smallest_denomination': 1,
'subunit': 'Cent',
'subunit_to_unit': 100,
'symbol': '$',
'symbol_first': 1,
'thousands_separator': ','},
'flag': '🇺🇸',
'geohash': 'drt3n58d29jw7kz7cjf8',
'qibla': 60.4,
'roadinfo': {'drive_on': 'right', 'speed_in': 'mph'},
'sun': {'rise': {'apparent': 1583320500,
'astronomical': 1583314920,
'civil': 1583318820,
'nautical': 1583316840},
'set': {'apparent': 1583361480,
'astronomical': 1583280660,
'civil': 1583363160,
'nautical': 1583365080}},
'timezone': {'name': 'America/New_York',
'now_in_dst': 0,
'offset_sec': -18000,
'offset_string': '-0500',
'short_name': 'EST'},
'what3words': {'words': 'line.desks.plant'}},
'bounds': {'northeast': {'lat': 42.392238, 'lng': -71.072728},
'southwest': {'lat': 42.372724, 'lng': -71.118193}},
'components': {'ISO_3166-1_alpha-2': 'US',
'ISO_3166-1_alpha-3': 'USA',
'_category': 'place',
'_type': 'county',
'continent': 'North America',
'country': 'United States of America',
'country_code': 'us',
'county': 'Middlesex County',
'postcode': '02143',
'state': 'Massachusetts',
'state_code': 'MA'},
'confidence': 7,
'formatted': 'Middlesex County, MA 02143, United States of America',
'geometry': {'lat': 42.3829, 'lng': -71.1028}}]
Note that in this case, the geocoder has returned three results. Each slightly more general than the previous. The first matches the address to a specific place—the Somerville Central Library—the second matches it to an address, and the third simply places the address in Middlesex County with a zip code. In general, the first result is going to be the most precise location (and the one with the highest confidence score), so it’ll be the most useful for data analysis.
Just like above, we can parse the result! We could get the formatted address, the location’s identifying what3words words, or the latitude and longitude corresponding to the address.
= results[0]['formatted']
formatted print(formatted)
= results[0]['annotations']['what3words']['words']
what3words print(what3words)
= results[0]['geometry']['lat']
lat = results[0]['geometry']['lng']
lng print([lat, lng])
Somerville Central Library, 79 Highland Avenue, Somerville, MA 02143, United States of America
beans.anyway.valve
[42.3857881, -71.0944054]
“This has all been well and good,” you might say, “but I’m here because I have 2,304 addresses in a spreadsheet. I don’t really have to hand-code 2,304 queries and parse 2,304 geocoder results, do I?”
OF COURSE NOT!
We can write a few (relatively) simple lines of code to open a delimited text file and iterate over its addresses, parsing and returning the results returned for each row.
We’ll first need to write a few lines of code that open the CSV file and iterate over its rows. There numerous ways to do this. We’re going to use the leanest possible method that uses very little beyond base Python.
# Import the csv library.
import csv
# Provide the location of the som_lib.csv file.
= 'som_lib.csv'
addressfile # Open the file...
with open(addressfile, 'r') as file:
# ...and read its contents as a dictionary.
= csv.DictReader(file)
reader # Iterate over the CSV rows...
for line in reader:
# ...and print the value of the 'name' column.
print(line['name'])
Somerville Public Library East Branch
Somerville Public Library West Branch
Somerville Public Library Central Branch
But what if the file can’t be found? Let’s intentionally introduce a typo into the file name.
# Import the csv library.
import csv
# Provide the location of the som_lib.csv file.
= 'som_libs.csv'
addressfile # Open the file...
with open(addressfile, 'r') as file:
# ...and read its contents as a dictionary.
= csv.DictReader(file)
reader # Iterate over the CSV rows...
for line in reader:
# ...and print the value of the 'name' column.
print(line['name'])
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-33-c5a8f5894e0e> in <module>
5 addressfile = 'som_libs.csv'
6 # Open the file...
----> 7 with open(addressfile, 'r') as file:
8 # ...and read its contents as a dictionary.
9 reader = csv.DictReader(file)
FileNotFoundError: [Errno 2] No such file or directory: 'som_libs.csv'
Python returns an error that is not especially interpretable, though
it is nice enough to inform us that the error is of the
FileNotFoundError
type! Instead of throwing an error, let’s
add a try
/except
condition…
import csv
= 'som_libs.csv'
addressfile
# Try to load the file...
try:
with open(addressfile, 'r') as file:
= csv.DictReader(file)
reader for line in reader:
print(line['name'])
# ...and if reading it throws a FileNotFoundError,
# print a simple, legible message.
except FileNotFoundError:
print(f'{addressfile} does not exist.')
som_libs.csv does not exist.
Let’s add one more of these conditions—we’re using the OpenCage free
service, which has a rate limit of 2,500 requests per day. If we exceed
the rate limit, the service will return a
RateLimitExceededError
. Let’s accommodate this congingency
by adding another except
conditional.
import csv
= 'som_lib.csv'
addressfile
# Try to load the file...
try:
with open(addressfile, 'r') as file:
= csv.DictReader(file)
reader for line in reader:
print(line['name'])
# ...and if reading it throws a FileNotFoundError,
# print a simple, legible message.
except FileNotFoundError:
print(f'{addressfile} does not exist.')
except RateLimitExceededError as ex:
print(ex)
Somerville Public Library East Branch
Somerville Public Library West Branch
Somerville Public Library Central Branch
Now that we’re set up to deal with errors, we can write geocoder queries into our script. We want to keep rows that OpenCage cannot successfully geocode, so we insert an if/else statement that creates null latitude and longitude values if no result is returned.
import csv
= 'som_lib.csv'
addressfile
try:
with open(addressfile, 'r') as incsv:
= csv.DictReader(incsv)
reader for line in reader:
# clean up address, stripping leading and trailing charaters
= line['address'].strip()
address # send query to geocoder
= geocoder.geocode(address)
result # check if there are results...
if result and len(result):
# attach lat and long fields to the iterator line.
'lat'] = result[0]['geometry']['lat']
line['lng'] = result[0]['geometry']['lng']
line[else:
'lat'] = None
line['lng'] = None
line[print(line)
except FileNotFoundError:
print(f'{addressfile} does not exist.')
except RateLimitExceededError as ex:
print(ex)
OrderedDict([('name', 'Somerville Public Library East Branch'), ('address', '115 Broadway, Somerville, MA 02145'), ('lat', 42.3879326), ('lng', -71.0835948)])
OrderedDict([('name', 'Somerville Public Library West Branch'), ('address', '40 College Ave, Somerville, MA 02144'), ('lat', 42.3981483), ('lng', -71.1217138)])
OrderedDict([('name', 'Somerville Public Library Central Branch'), ('address', '79 Highland Avenue, Somerville, MA 02143'), ('lat', 42.3857881), ('lng', -71.0944054)])
Not too bad!
The only thing we’ll want our basic geocoder to do is write this same
information out to a file. Let’s add this functionality. We’ll do this
by naming a new CSV, opening it, and instantiating it as a DictWriter.
We’ll also need to specify field names for the output CSV; we do that by
simply adding ‘lat’ and ‘lng’ to the list of fieldnames that our input
file containts (accessed using reader.fieldnames
).
Note that we also included two ways to name the output file:
manually, or by using os.path.splitext()
which takes a path
to a file and splits it into the file path and the file extension.
import csv
= 'som_lib.csv'
addressfile = 'som_lib_geocode.csv'
outputfile # Or, if you want to be fancy...
= os.path.splitext(addressfile)[0] + '_geocode.csv'
outputfile
try:
with open(addressfile, 'r') as incsv:
# Open outputfile in write mode ('w')...
with open(outputfile, 'w') as outcsv:
= csv.DictReader(incsv)
reader # Add lat and lng to existing fieldnames
= reader.fieldnames + ['lat', 'lng']
field_names # Create the writer DictWriter - we'll use this to write rows.
= csv.DictWriter(outcsv, field_names)
writer for line in reader:
= line['address'].strip()
address = geocoder.geocode(address)
result if result and len(result):
'lat'] = result[0]['geometry']['lat']
line['lng'] = result[0]['geometry']['lng']
line[else:
'lat'] = None
line['lng'] = None
line[print(line)
# Write geocoded row to output file.
writer.writerow(line)except FileNotFoundError:
print(f'{addressfile} does not exist.')
except RateLimitExceededError as ex:
print(ex)
OrderedDict([('name', 'Somerville Public Library East Branch'), ('address', '115 Broadway, Somerville, MA 02145'), ('lat', 42.3879326), ('lng', -71.0835948)])
OrderedDict([('name', 'Somerville Public Library West Branch'), ('address', '40 College Ave, Somerville, MA 02144'), ('lat', 42.3981483), ('lng', -71.1217138)])
OrderedDict([('name', 'Somerville Public Library Central Branch'), ('address', '79 Highland Avenue, Somerville, MA 02143'), ('lat', 42.3857881), ('lng', -71.0944054)])
Excellent! We’ve written a geocoder that will take a CSV as input (with any number of columns), and output a CSV with additional columns storing latitude and longitude in WGS 84 coordinates. This can be easily imported into a desktop GIS. For simple purposes, this script is all you’ll need. There are certainly additions you could make! Foe example: how would you add a field to the output CSV that stores the what3words code?
But beyond these fairly simple ‘add more fields’ questions, we can also:
These possibilities are included below, though we will not explore them in depth.
One of the first ways you’ll want to extend any useful set of
instructions is to wrap them in a re-usable function. We use standard
python syntax here, providing parameters for the input file location and
name, the name of the field containing addresses, and a path to an
output file. Note that if the last isn’t provided, the function
automatically appends _geocode.csv
to end of the input
filename and uses that as the output.
def geocode(in_file, address_field, out_file = False):
if out_file:
= out_file
out_file else:
= os.path.splitext(in_file)[0] + '_geocode.csv'
out_file
try:
with open(in_file, 'r') as incsv:
# Open outputfile in write mode ('w')...
with open(out_file, 'w') as outcsv:
= csv.DictReader(incsv)
reader # Add lat and lng to existing fieldnames
= reader.fieldnames + ['lat', 'lng']
field_names # Create the writer DictWriter - we'll use this to write rows.
= csv.DictWriter(outcsv, field_names)
writer for line in reader:
= line['address'].strip()
address = geocoder.geocode(address)
result if result and len(result):
'lat'] = result[0]['geometry']['lat']
line['lng'] = result[0]['geometry']['lng']
line[else:
'lat'] = None
line['lng'] = None
line[print(line)
# Write geocoded row to output file.
writer.writerow(line)except FileNotFoundError:
print(f'{in_file} does not exist.')
except RateLimitExceededError as ex:
print(ex)
='som_lib.csv', address_field='address') geocode(in_file
OrderedDict([('name', 'Somerville Public Library East Branch'), ('address', '115 Broadway, Somerville, MA 02145'), ('lat', 42.3879326), ('lng', -71.0835948)])
OrderedDict([('name', 'Somerville Public Library West Branch'), ('address', '40 College Ave, Somerville, MA 02144'), ('lat', 42.3981483), ('lng', -71.1217138)])
OrderedDict([('name', 'Somerville Public Library Central Branch'), ('address', '79 Highland Avenue, Somerville, MA 02143'), ('lat', 42.3857881), ('lng', -71.0944054)])
Many contemporary data science workflows use a Python library called
pandas
; essentially, the role of pandas
is to
make Python behave more like R. It gives you access to dataframes object
classes and a whole range of vectorized data processing operations (a
fancier form of iteration that moves more quickly over rows or
columns).
Note the line that reads df.apply(...)
—this is a very
classic pandas
syntax. It’s also a very class R syntax!
This is a vectorized for loop. All it is doing is applying an operation
(in this case, the geocoding function) to each row
(axis=1
).
import pandas as pd
def geocode(row, query_field):
try:
= row[query_field].strip()
address = geocoder.geocode(address)
result if result and len(result):
'lat'] = result[0]['geometry']['lat']
row['lng'] = result[0]['geometry']['lng']
row[else:
'lat'] = None
row['lng'] = None
row[return(row)
except RateLimitExceededError as ex:
print(ex)
quit
def run(in_file, query_field, out_file=False):
if out_file:
= out_file
out_file else:
= os.path.splitext(in_file)[0] + '_geocode.csv'
out_file try:
= pd.read_csv(in_file)
df = df.apply(lambda x: geocode(x, query_field), axis=1)
df print(df)
df.to_csv(out_file)except FileNotFoundError:
print(f'{in_file} does not exist.')
='som_lib.csv', query_field='address') run(in_file
name Somerville Public Library East Branch
address 115 Broadway, Somerville, MA 02145
lat 42.3879
lng -71.0836
Name: 0, dtype: object
name Somerville Public Library East Branch
address 115 Broadway, Somerville, MA 02145
lat 42.3879
lng -71.0836
Name: 0, dtype: object
name Somerville Public Library West Branch
address 40 College Ave, Somerville, MA 02144
lat 42.3981
lng -71.1217
Name: 1, dtype: object
name Somerville Public Library Central Branch
address 79 Highland Avenue, Somerville, MA 02143
lat 42.3858
lng -71.0944
Name: 2, dtype: object
Finally, we can extend the geocoder to automatically write the file
to a GeoJSON, removing the intermediate step in which we interpret the
latitudes and longitudes using a desktop GIS. The key row here is
row['geometry'] = Point(...)
—we’re creating a Point object,
which Geopandas can interpret (the block beginning
`gpd.GeoDataFrame()..
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
def geocode(row, query_field):
try:
= row[query_field].strip()
address = geocoder.geocode(address)
result if result and len(result):
'geometry'] = Point(result[0]['geometry']['lng'], result[0]['geometry']['lat'])
row[else:
'geometry'] = None
row[return(row)
except RateLimitExceededError as ex:
print(ex)
quit
def run(in_file, query_field, out_file=False):
if out_file:
= out_file
out_file else:
= os.path.splitext(in_file)[0] + '_geocode.geojson'
out_file try:
= pd.read_csv(in_file)
df = gpd.GeoDataFrame(
gdf apply(lambda x: geocode(x, query_field), axis=1),
df.='geometry'
geometry
)print(gdf)
='GeoJSON')
gdf.to_file(out_file, driverexcept FileNotFoundError:
print(f'{in_file} does not exist.')
='som_lib.csv', query_field='address') run(in_file
name \
0 Somerville Public Library East Branch
1 Somerville Public Library West Branch
2 Somerville Public Library Central Branch
address geometry
0 115 Broadway, Somerville, MA 02145 POINT (-71.08359 42.38793)
1 40 College Ave, Somerville, MA 02144 POINT (-71.12171 42.39815)
2 79 Highland Avenue, Somerville, MA 02143 POINT (-71.09441 42.38579)