EuroPython 2018: Processing Geodata using Python and Open Source Modules

Martin Christen is a professor of Geoinformatics and Computer Graphics at the Institute of Geomatics at the University of Applied Sciences Northwestern Switzerland (FHNW).

What is geo data? There are some standards, but the most important thing is that it has associations with gographical data (on earth for now). There are popular GIS, for example ArcGIS (ESRI) and QGIS, both of which can be used via Python. But today we'll talk about how to manipulate, analyze, and present geodata using Python.

Important Libraries

The most important libraries in C++ for this are GDAL (/OGR) and GEOS, both with Python bindings. GDAL is used in the Python libraries rasterio by mapbox (raster data) and fiona (vector data), while GEOS is used in the Python library shapely (vector data). Python's geopandas (vector) uses both fiona and shapely.

GDAL supports a huge amount of raster (155) and vector (95) formats, among which GeoTIFF.

The further talk uses this notebook, using Python 3.6 and the required libraries.

Shapely

Shapely supports features like Point, LineString, LinearRing, Polygon, etc. You can create polygons by hand, for example, and will be presented with the polygon's area and length etc. (Note that for a closed polygon, you have to repeat the first point in the end.) Use this feature for debugging. Display is also possible via matplotlib. Polygons can be merged, and intersected, and the symmetric difference taken, etc, to generate new polygons.

Shapes can be esported to wkt (just access the wkt property), which is a readable, well-known text format. You can also shapely.wkt.loads to import wkt data.

There are also binary operations available, like contains, intersects, crosses, touches, within etc.

Fiona

Fiona can read vector data like .shp files. We can iterate through the dataset (or create a list, but in that case be aware of your memory consumption). You can access all sorts of fancy metadata properties on the datasets, like wikipedia links (if available, of course), and coordinates.

Use pypro to convert between reference systems (the standard being EPSG4326).

Rasterio

We usually have three raster bands (rgb) in files we read. We can transform pixel locations to our coordinate system (again EPSG4326) using the affine property and ~, and the other way around.

r = dataset.read(1)
g = dataset.read(2)
b = dataset.read(3)


rgb = np.dstack((r,g,b))
fig, ax = plt.subplots(figsize=(15,9))
ax.imshow(rgb, interpolation='nearest')
ax.plot(px,py, 'ro')

You can plot the data then using matplotlib and numpy!

Geopandas

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.

You can use the usual fancy pandas functions for querying and accessing data. (head, query, drop, etc) The main difference to Pandas is that there is a geometry column (instead of lat/lon), which stores shapely data. We can call a plot method directly on that data frame, which is pretty cool and versatile. You can either plot single columns or queries or complete datasets, naturally.

You can also access the shapely polygon directly (data.iloc[0].geometry), and use it in queries in turn.

Data can be exported to a shapefile, or geojson, etc.

Folium

Folium accesses leaflet.js. It's way awesome!

import folium
map_osm = folium.Map(location=[55.946167, -3.209626], zoom_start=17)
map_osm.save('edi.html')

Well, that was easy. But now we can also combine this with other geojson data!

import folium

center = [55.946167, -3.209626] # EICC
map_uk = folium.Map(center, zoom_start=3)   
folium.GeoJson(uk).add_to(map_uk)

Wooohooo!

Consider going to GeoPython 2019 in Muttenz, Switzerland, June 24-26!