Mapping in Python: Folium

Eric Robsky Huntley, PhD (They/them)
Lecturer in Urban Science and Planning
ehuntley@mit.edu

Tutorial Information

Module 7 in Spatial Data Science.

Download Materials >

Methods

Spatial Analysis, Mapping

Tools

Python

In this workshop, we’ll be using Python’s folium package to produce interactive maps of ‘public’ tree densities in Cambridge and Somerville using data drawn from the former’s Public Works department and the latter’s Urban Forestry Division. Folium provides a neat and full-featured wrapper for the well-known and widely loved Leaflet.js web mapping library. With it, we can draw interactive, feature-rich Leaflet maps using straightforward Python without ever leaving the comfort of a familiar Pandas-based workflow. Note that if you’re more of an R afficianado (n.b., the author is just such a creature), you can find an almost identical tutorial written for R’s Leaflet wrapper earlier in this series.

Create a New Virtual Environment

We begin by creating a new virtual environment for our Python spatial work. As a reminder, a virtual environment is a self-contained Python environment, generally built to accommodate a specific project. We use these so that we’re not endlessly installing pacakges into a single bloated environment; instead, we work only with those packages we need.

First, we want to navigate to the folder to which we downloaded the exercise data. In a terminal (macOS/Linux) or command prompt (Windows) window type…

cd path/to/your/data/

Once you’ve changed your working directory, you’ll want to create a new virtual environment using the venv utility, which is a module (hence, -m) of Python.

# Tells python 3 to run the venv module...
# ...and create a new virtual environment...
# ...in our current directory.
python3 -m venv venv/

# On Windows...
python -m venv venv\

We can activate this virtual environment by running the following at the terminal:

# On macOS or Linux...
. venv/bin/activate
# On Windows...
venv\Scripts\activate.bat
# The next line of your prompt should begin (venv)

Once we’ve activated our virtual environment, any packages we install will be specific to this environment—this is very useful for managing packages on a project-by-project basis. We’re going folium, as well as the GeoPandas package, which extends the ever-popular Pandas package. We’ll also make sure that matplotlib is installed (these packages help us plot, in general, and plot polygons, specifically). We’ll also install jenkspy, which gives us a very simple utility function that produces Jenks Natural Breaks.

pip install folium geopandas matplotlib jenkspy

Draw a Map

We begin by importing the folium package and creating a new map object. We create the map using only a few properties: a latitude-longitude pair on which to center it, and a zoom level at which to render it. Zoom levels are standardized for webmaps, and range from 0-20, with 0 being the ‘largest scale’, and 20 being the ‘smallest scale’ (speaking in terms of cartographic scale). As a default, a new folium map will include an OpenStreetMap tile layer. While a full discussion of map tiles is outside the scope of this workshop, I would recommend reading the map tile introduction linked above—it’s a simple enough concept to grasp!

import folium

# Create new map.
m = folium.Map(
    location=[42.38172, -71.111408],
    zoom_start=13
)
# Draw map.
m

We can use a different tileset by providing a string argument to the tiles parameter. This can be either…

a string denoting a specific tileset included by the Folium developers (the documentation tells us this can be ‘OpenStreetMap’, ‘Mapbox Bright’, ‘Mapbox Control Room’, ‘Stamen Terrain’, ‘Stamen Toner’, ‘Stamen Watercolor’, ‘Cloudmade’, ‘Mapbox’, and ‘CartoDB positron’ or ‘CartoDB dark_matter’) or…
a custom tile server, e.g. http://{s}.yourtiles.com/{z}/{x}/{y}.png

Let’s first try adding a custom tile server! I’m a big fan of the Atlascope project from the Leventhal Map & Education Center at the Boston Public Library. Their collection of georeferenced and mosaic-ed atlases is available as a collection of hosted tilesets! (If you want to locate others, simply find an atlas you’re interested in and click ‘About This Map’ under the dropdown menu in the footer.) We can add the 1894 Bromley Atlas of the City of Cambridge like so. Note that we need to pass an attribution to the attr parameter when loading a custom tilset. This is interpreted as HTML, we can pass a hyperlink.

# Create new map with provider tileset.
m = folium.Map(
    location=[42.38172, -71.111408],
    tiles='https://s3.us-east-2.wasabisys.com/urbanatlases/39999059015576/tiles/{z}/{x}/{-y}.png',
    attr='<a href="https://www.leventhalmap.org/">Leventhal Map & Education Center at the Boston Public Library</a>',
    zoom_start=13
)
# Draw map.
m

But, given that we might want our basemap to be slightly less interesting, we can add Stamen Toner instead. Note that we don’t require an attribution here—the attribution is stored alongside the tilset identifier and is automatically included in the bottom-right corner.

# Create new map with provider tileset.
m = folium.Map(
    location=[42.38172, -71.111408],
    tiles='Stamen Toner',
    zoom_start=13
)
# Draw map.
m

Drawing Spatial Data

Basemaps are well and good, but we’re in this to do exploratory data visualization! As such, we’ll need a way to add spatial data to our project, whether it is stored on disk or in a Pandas object. As a case study, let’s read a GeoJSON file from disk. This is the dataset at which we arrived at the conclusion of last week’s session—it contains ‘public’ tree counts and densities for the cities of Cambridge and Somerville.

…from a GeoJSON

Recall that Folium is built on top of Leaflet. Leaflet, as a JavaScript library, expects JSON (JavaScript Object Notation); it therefore provides a helpful GeoJson object that will read a geojson file from disk and add it to our map. This is remarkably simple. Assuming that we have a map m, we can simply create a new GeoJson object, passing the file path, and add the geometries to the map. Note that the geometry type (POINT, LINE, POLYGON, etc.) is automatically determined, as this information is included in the GeoJSON geometries.

folium.GeoJson(
    'data/trees_per_area.geojson'
).add_to(m)

m

…from a GeoPandas Object

However, in many cases, you will want to render spatial data that is stored in a non-GeoJSON format, or that is the result of spatial analysis in GeoPandas. Why introduce the intermediate step of writing the JSON and reading if you don’t have to?

As an example, let’s read the same data from a Shapefile instead of a GeoJSON, storing it first in a GeoDataFrame.

import geopandas as gpd
trees = gpd.read_file('data/trees_per_area.shp')

We can’t pass this object directly to folium, but we also don’t want to introduce the complexity of writing and re-reading a geojson file. Instead, we can simply convert the trees GeoDataFrame to JSON using the .to_json() method.

For example, let’s convert the first five rows of the dataframe using the .head() method…

from pprint import pprint
pprint(trees.head().to_json())

('{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", '
 '"properties": {"area": 1073506.546135969, "area_sqmi": 0.038504538957531, '
 '"id": 0, "land_area": 0.038504, "tree_count": null, "trees_p_sq": null, '
 '"water_area": 0.0}, "geometry": {"type": "Polygon", "coordinates": '
 '[[[-71.15932825278648, 42.419756678264065], [-71.15727140671973, '
 '42.418868638184634], [-71.15728097173223, 42.41710479634876], '
 '[-71.159347267742, 42.41622899425637], [-71.16140405707002, '
 '42.41711699781159], [-71.16139460712705, 42.4188808399828], '
 '[-71.15932825278648, 42.419756678264065]]]}}, {"id": "1", "type": "Feature", '
 '"properties": {"area": 1073506.546135969, "area_sqmi": 0.038504538957531, '
 '"id": 1, "land_area": 0.038504, "tree_count": null, "trees_p_sq": null, '
 '"water_area": 0.0}, "geometry": {"type": "Polygon", "coordinates": '
 '[[[-71.15520499564046, 42.41974440245647], [-71.15314820791414, '
 '42.41885628870503], [-71.15315788799609, 42.41709244720855], '
 '[-71.15522424073497, 42.416216719123526], [-71.15728097173223, '
 '42.41710479634876], [-71.15727140671973, 42.418868638184634], '
 '[-71.15520499564046, 42.41974440245647]]]}}, {"id": "2", "type": "Feature", '
 '"properties": {"area": 1073506.546135969, "area_sqmi": 0.038504538957531, '
 '"id": 2, "land_area": 0.038504, "tree_count": null, "trees_p_sq": null, '
 '"water_area": 0.0}, "geometry": {"type": "Polygon", "coordinates": '
 '[[[-71.1510817401059, 42.41973197896548], [-71.14902501072956, '
 '42.41884379154408], [-71.14903480588082, 42.417079950391056], '
 '[-71.15110121533912, 42.416204296315385], [-71.15315788799609, '
 '42.41709244720855], [-71.15314820791414, 42.41885628870503], '
 '[-71.1510817401059, 42.41973197896548]]]}}, {"id": "3", "type": "Feature", '
 '"properties": {"area": 1073506.546135969, "area_sqmi": 0.038504538957531, '
 '"id": 3, "land_area": 0.038504, "tree_count": null, "trees_p_sq": null, '
 '"water_area": 0.0}, "geometry": {"type": "Polygon", "coordinates": '
 '[[[-71.14695848620202, 42.41971940779114], [-71.14490181518528, '
 '42.41883114670187], [-71.14491172540569, 42.41706730589637], '
 '[-71.14697819157371, 42.41619172583203], [-71.14903480588082, '
 '42.417079950391056], [-71.14902501072956, 42.41884379154408], '
 '[-71.14695848620202, 42.41971940779114]]]}}, {"id": "4", "type": "Feature", '
 '"properties": {"area": 1073506.546135969, "area_sqmi": 0.038504538957531, '
 '"id": 4, "land_area": 0.034519, "tree_count": null, "trees_p_sq": null, '
 '"water_area": 111090.88590224039}, "geometry": {"type": "Polygon", '
 '"coordinates": [[[-71.14283523394816, 42.419706688933566], '
 '[-71.14077862130056, 42.41881835417849], [-71.14078864658997, '
 '42.417054513724565], [-71.14285516945803, 42.416179007673534], '
 '[-71.14491172540569, 42.41706730589637], [-71.14490181518528, '
 '42.41883114670187], [-71.14283523394816, 42.419706688933566]]]}}]}')

.to_json() creates a GeoJSON representation from a GeoDataFrame! We can pass the results to our GeoJson object constructor in the same was as a GeoJSON file read from disk.

m = folium.Map(
    location=[42.38172, -71.111408],
    tiles='Stamen Toner',
    zoom_start=13
)

folium.GeoJson(
    trees.to_json()
).add_to(m)

m

Drawing a Choropleth Map

But of course, geometries aren’t the matter of interest here. Hexagons are nice shapes, but they’re useful here primarily as symbols for our measured tree densities! We want to create a choropleth map, visualizing the density observed in each hexagon using a color ramp. This is, of course, an exceedingly common map type, and for that reason, folium provides a built-in object that simplifies choropleth mapping.

It looks like this: assuming that we have a map m we create a new Choropleth object using the following parameters.

geo_data = trees.to_json() This simply passes our JSON-ified spatial data to the choropleth constructor.
data = trees This passes our un-JSONified data frame that includes measured tree densities.
columns = ['id', 'trees_p_sqmi'] This identifies columns of interest in the data. Namely, the first is a unique identifier that will join the dataframe to our JSON-ified spatial data, and the second is the quantity to be visualized.
key_on = 'feature.properties.id' This identifies the property of our JSON which will be joined the first attribute passed to the columns parameter. We’re joining feature.properties.id to id: this makes sense! Look at the JSON representation—feature.properties.id is simply the id stored in the properties list for each feature.
fill_color = 'YlGn' This identifies one of Cindy Brewer’s ColorBrewer color ramps.
bins = 5 The data will be classed into five equal-interval bins.

The other parameters are simply stylistic, setting the line properties and fill properties.

m = folium.Map(
    location=[42.38172, -71.111408],
    tiles='Stamen Toner',
    zoom_start=13
)

folium.Choropleth(
    geo_data = trees.to_json(),
    data = trees,
    columns = ['id', 'trees_p_sq'],
    key_on = 'feature.properties.id',
    fill_color = 'YlGn',
    fill_opacity = 0.7,
    line_color = 'White',
    line_opaciy = 0.1,
    legend_name = 'Public Trees per Square Mile',
    bins = 5
).add_to(m)

m

Great! The only issue now is the many nan values. There are several ways of dealing with these. We could set a nan_fill_color in folium.Choropleth()… or we could just remove them. Because they’re artifacts of the approach taken in the previous workshop, let’s simply filter them using Pandas’s .dropna() method, specifying that we’d like to drop rows for which trees_p_sq is na.

trees = trees.dropna(subset=['trees_p_sq'])

m = folium.Map(
    location=[42.38172, -71.111408],
    tiles='Stamen Toner',
    zoom_start=13
)

folium.Choropleth(
    geo_data = trees.to_json(),
    data = trees,
    columns = ['id', 'trees_p_sq'],
    key_on = 'feature.properties.id',
    fill_color = 'YlGn',
    fill_opacity = 0.7,
    line_color = 'White',
    line_opaciy = 0.1,
    legend_name = 'Public Trees per Square Mile',
    bins = 5
).add_to(m)

m

Change Classification Schemes

The choropleth constructor default to an equal interval classification scheme. This is well and good, but often phenomena do not lend themselves to such tidy binning schemes. In cases where there are obvious clusters in the distribution, or unusually long tails, we may want to experiment with either quantile or natural breaks schemes. This is a relatively simple thing: the Choropleth constructor’s bins parameter accepts either an integer that indicates the number of bins, or an array that indicates the break points for a given classification scheme.

For example, we can calculate quantile breaks like so:

quantile_breaks = trees['trees_p_sq'].quantile([0, 0.2, 0.4, 0.6, 0.8, 1])

This locates break points such that the bottom quantile includes the lowest 20th percentile, the second includes the 20th to the 40th, and on and on. In other words, we get five class breaks that includes roughly equal numbers of observations. We can visualize this using a histogram by plotting our distribution and adding a vertical line for each class break.

import matplotlib.pyplot as plt

plt.hist(trees['trees_p_sq'], bins=30)
for a in quantile_breaks:
    plt.axvline(a, color='red')
plt.show()

Similarly, we can calculate Jenks natural breaks using the jenkspy package. Recall from your introduction to GIS class that Jenks natural breaks is an algorithm that attempts to produce classes that maximize difference between groups an minimize difference within groups.

from jenkspy import jenks_breaks
jenks_breaks = jenkspy.jenks_breaks(trees['trees_p_sq'], nb_class=5)

plt.hist(trees['trees_p_sq'], bins=30)
for a in jenks_breaks:
    plt.axvline(a, color='red')
plt.show()

I have a soft spot for Jenks, so for the remainder of the workshop, it will be the classification scheme of choice. However, as always, there is a good argument to be made for equal interval schemes: they tend to be the most interpretable of the available schemes.

Having satisfied ourselves that natural breaks scheme meets our needs, we can pass it to the Choropleth constructor like below. One of the things you should notice is that folium provides, by default, an improvement on the standard Leaflet legend. It dynamically resizes the legend bins to reflect the actual size of the class. This actually goes quite a long way towards making non-equal interval schemes more interpretable.

m = folium.Map(
    location=[42.38172, -71.111408],
    tiles='Stamen Toner',
    zoom_start=13
)

folium.Choropleth(
    geo_data = trees.to_json(),
    name = "Public Trees per Square Mile",
    data = trees,
    columns = ['id', 'trees_p_sq'],
    key_on = 'feature.properties.id',
    fill_color = 'YlGn',
    fill_opacity = 0.7,
    line_color = 'White',
    line_opaciy = 0.1,
    legend_name = 'Public Trees per Square Mile',
    bins = jenks_breaks
).add_to(m)

m

Add Interactivity

Leaflet provides some fairly pre-baked means of adding interactivity to our maps. First, we can add a simple, effective highlighting behavior to our geometries that causes their line weight to increase when the user mouses over them. This is a binary switch, flipped on by setting the highlight parameter equal to True.

m = folium.Map(
    location=[42.38172, -71.111408],
    tiles='Stamen Toner',
    zoom_start=13
)

folium.Choropleth(
    geo_data = trees.to_json(),
    name = "Public Trees per Square Mile",
    data = trees,
    columns = ['id', 'trees_p_sq'],
    key_on = 'feature.properties.id',
    fill_color = 'YlGn',
    fill_opacity = 0.7,
    line_color = 'White',
    line_opaciy = 0.1,
    legend_name = 'Public Trees per Square Mile',
    bins = jenks_breaks,
    highlight = True,
).add_to(m)

m

Tooltips

Tooltips are slightly more complicated for choropleth maps. The logic goes something like this: in the background, folium.Choropleth creates a geojson property, which allows us to access the GeoJSON features after the fact using choro.geojson. To these, we can add a tooltip that appears when the user mouses over the polygon. Namely, we can add a special GeoJsonToolTip that visualizes specified fields.

tooltips = choro.geojson.add_child(
    folium.GeoJsonTooltip(
        fields=['trees_p_sq'], 
        [...]
    )
)

We can also give this field a slightly more readable alias, and choose to localize it, which does some rounding and switches periods for commas as is dictated by the location of a user’s system.

tooltips = choro.geojson.add_child(
    folium.GeoJsonTooltip(
        fields=['trees_p_sq'], 
        aliases=['Trees per mile^2'],
        localize=True,
    )
)

Finally, we add this to our map using the .add_to(m) method.

m = folium.Map(
    location = [42.38172, -71.111408],
    tiles = 'Stamen Toner',
    zoom_start = 13
)

choro = folium.Choropleth(
    geo_data = trees.to_json(),
    data = trees,
    columns = ['id', 'trees_p_sq'],
    key_on = 'feature.properties.id',
    fill_color = 'YlGn',
    fill_opacity = 0.7,
    line_color = 'White',
    line_opaciy = 0.1,
    legend_name = 'Public Trees per Square Mile',
    bins = jenks_breaks,
    highlight = True
).add_to(m)

tooltips = choro.geojson.add_child(
    folium.GeoJsonTooltip(
        fields=['trees_p_sq'], 
        aliases=['Trees per mile^2'],
        localize=True,
    )
).add_to(m)

m

Cleaning and Automating!

The above is a very functional interactive choropleth map! But there are some things that we could improve. For example, notice all of the things we’re passing manually to our map options: namely, the center location latitude and longitude and the zoom level. This code will be more generally useful if we build in simple ways of automating the centering process and the zoom level. We’ll start by using a geometric method to derive a centering point, based on the geometric centroid of the hexagon grid.

Programmatically Center the Map

Let’s center the map on the centroid of our input geometries. To do this, we can first calculate the unary_union of the input hexagons. This will join all polygons that share an edge, essentially melting our hexagons together.

trees.unary_union

Based on the unary union, we can calculate a centroid point and extract its x and y coordinates (its longitude, and latidue, respectively).

centroid = trees.unary_union.centroid
lat = centroid.y
lon = centroid.x

We can then simply pass these values to the location parameter of the folium.Map() constructor.

m = folium.Map(
    location = [lat, lon],
    tiles = 'Stamen Toner',
    zoom_start = 13
)

choro = folium.Choropleth(
    geo_data = trees.to_json(),
    data = trees,
    columns = ['id', 'trees_p_sq'],
    key_on = 'feature.properties.id',
    fill_color = 'YlGn',
    fill_opacity = 0.7,
    line_color = 'White',
    line_opaciy = 0.1,
    legend_name = 'Public Trees per Square Mile',
    bins = jenks_breaks,
    highlight = True
).add_to(m)

tooltips = choro.geojson.add_child(
    folium.GeoJsonTooltip(
        fields=['trees_p_sq'], 
        aliases=['Trees per mile^2'],
        localize=True,
    )
).add_to(m)

m

Programatically Set the Zoom and Location

The obvious shortcoming of the above is that we are still manually providing the zoom (zoom_start = 13). The folium package provides a FitBounds object that simultaneously sets both the zoom and the location based on the maximum zoom size that includes the entire bounding box described by a lat-lng pair. Here, we first get the bounding box that encloses our hexagon grid….

min_lon, min_lat, max_lon, max_lat = trees.total_bounds
bbox = [(min_lat, min_lon), (max_lat, max_lon)]
print(bbox)

[(42.35514007312575, -71.16157406589225), (42.419667646262205, -71.07100786215695)]

We can use these coordinates to create a FitBounds object and add it to the map. The map will be automatically zoomed and its location automatically set such that our entire hexagon grid fits.

m = folium.Map(
    tiles = 'Stamen Toner',
)

folium.FitBounds(bbox).add_to(m)

choro = folium.Choropleth(
    geo_data = trees.to_json(),
    data = trees,
    columns = ['id', 'trees_p_sq'],
    key_on = 'feature.properties.id',
    fill_color = 'YlGn',
    fill_opacity = 0.7,
    line_color = 'White',
    line_opaciy = 0.1,
    legend_name = 'Public Trees per Square Mile',
    bins = jenks_breaks,
    highlight = True
).add_to(m)

tooltips = choro.geojson.add_child(
    folium.GeoJsonTooltip(
        fields=['trees_p_sq'], 
        aliases=['Trees per mile^2'],
        localize=True,
    )
).add_to(m)

m

Setting Bounds

The final adjustment we will make will be to set a fence for our user interaction such that the user can’t accidentally pan their map to someplace far away and not find their way back. This is controlled using the min_lat, min_lon, max_lat, and max_lon parameters that are passed to our map object. max_bounds must also be set to True. Here, we can simply use the same coordinates we found above using total_bounds.

m = folium.Map(
    tiles = 'Stamen Toner',
    min_lat = min_lat,
    min_lon = min_lon,
    max_lat = max_lat,
    max_lon = max_lon,
    max_bounds = True
)

folium.FitBounds(bbox).add_to(m)

choro = folium.Choropleth(
    geo_data = trees.to_json(),
    data = trees,
    columns = ['id', 'trees_p_sq'],
    key_on = 'feature.properties.id',
    fill_color = 'YlGn',
    fill_opacity = 0.7,
    line_color = 'White',
    line_opaciy = 0.1,
    legend_name = 'Public Trees per Square Mile',
    bins = jenks_breaks,
    highlight = True
).add_to(m)

tooltips = choro.geojson.add_child(
    folium.GeoJsonTooltip(
        fields=['trees_p_sq'], 
        aliases=['Trees per mile^2'],
        localize=True,
    )
).add_to(m)

m

Export the Map

Once we’re satisfied with the map, we can export it to a self-contained html file.

m.save('trees_per_area.html')

VIA Learning Lab

Visualization, Inquiry, Analysis

MIT 9-545

77 Massachusetts Ave

Cambridge, MA 02139

`vialab@mit.edu`

Mapping in Python: Folium

Eric Robsky Huntley, PhD (They/them)

Tutorial Information

Methods

Tools

Create a New Virtual Environment

Draw a Map

Drawing Spatial Data

…from a GeoJSON

…from a GeoPandas Object

Drawing a Choropleth Map

Change Classification Schemes

Add Interactivity

Tooltips

Cleaning and Automating!

Programmatically Center the Map

Programatically Set the Zoom and Location

Setting Bounds

Export the Map

References

VIA Learning Lab

Visualization, Inquiry, Analysis

MIT 9-545 77 Massachusetts Ave Cambridge, MA 02139

vialab@mit.edu

Mapping in Python: Folium

Eric Robsky Huntley, PhD (They/them)

Tutorial Information

Methods

Tools

Create a New Virtual Environment

Draw a Map

Drawing Spatial Data

…from a GeoJSON

…from a GeoPandas Object

Drawing a Choropleth Map

Change Classification Schemes

Add Interactivity

Tooltips

Cleaning and Automating!

Programmatically Center the Map

Programatically Set the Zoom and Location

Setting Bounds

Export the Map

References

MIT 9-545

77 Massachusetts Ave

Cambridge, MA 02139

`vialab@mit.edu`