Vector data and Geopandas#

Geospatial data primarily comes in two forms: raster data and vector data. This guide focuses on the latter.

Typical examples of file formats containing vector data are:

  • ESRI shapefile

  • GeoJSON

  • Geopackage

Vector data consist of vertices (corner points), optionally connected by paths. The three primary categories of vector data are:

  • Points

  • Lines

  • Polygons

In groundwater modeling, typical examples of each are:

  • Pumping wells, observation wells, boreholes

  • Canals, ditches, waterways

  • Lakes, administrative boundaries, land use

These data consist of geospatial coordinates, indicating the location in space and a number of attributes: for a canal, this could be parameters like its width, depth, and water level. In GIS software like QGIS, the geometry is visible in the map view, and the attributes can inspected via e.g. the attribute table.

In Python, such data can be represented by a geopandas.GeoDataFrame. Essentially, geopandas is a pandas DataFrame to store tabular data (the attribute table), and adds a geometry column to store the geospatial coordinates.

import geopandas as gpd
import numpy as np

import imod

tempdir = imod.util.temporary_directory()
gdf = imod.data.lakes_shp(tempdir / "lake")
gdf.iloc[:5, -3:]  # first 5 rows, last 3 columns
SHAPE_Leng SHAPE_Area geometry
0 4689.155578 511471.386406 POLYGON ((108774.371 466627.43, 108774.375 466...
1 2050.843018 86091.854890 POLYGON ((115938.892 463013.165, 115926.865 46...
2 5023.190894 625149.040467 POLYGON ((111380.977 448065.855, 111378.115 44...
3 3724.550747 467233.192689 POLYGON ((117298.904 478782.595, 117297.744 47...
4 6834.063594 809445.407846 POLYGON ((112096.692 450851.187, 112098.812 45...


This geodataframe contains all the data from the shapefile. Note the geometry column. The geometry can be plotted:

gdf.plot()
02 vector data
<Axes: >

A GeoDataFrame of points can also be easily generated from pairs of x and y coordinates.

x = np.arange(90_000.0, 120_000.0, 1000.0)
y = np.arange(450_000.0, 480_000.0, 1000.0)

geometry = gpd.points_from_xy(x, y)
points_gdf = gpd.GeoDataFrame(geometry=geometry)

points_gdf.plot()
02 vector data
<Axes: >

An important feature of every geometry is its geometry type:

gdf.geom_type
0     Polygon
1     Polygon
2     Polygon
3     Polygon
4     Polygon
       ...
72    Polygon
73    Polygon
74    Polygon
75    Polygon
76    Polygon
Length: 77, dtype: object

As expected, the points are of the type … Point:

points_gdf.geom_type
0     Point
1     Point
2     Point
3     Point
4     Point
5     Point
6     Point
7     Point
8     Point
9     Point
10    Point
11    Point
12    Point
13    Point
14    Point
15    Point
16    Point
17    Point
18    Point
19    Point
20    Point
21    Point
22    Point
23    Point
24    Point
25    Point
26    Point
27    Point
28    Point
29    Point
dtype: object

Input and output#

Geopandas supports many vector file formats. It wraps fiona, which in turns wraps OGR, which is a part of GDAL. For example, the lake polygons above are loaded from an ESRI Shapefile:

filenames = [path.name for path in (tempdir / "lake").iterdir()]
print("\n".join(filenames))
lakes.cpg
lakes.dbf
lakes.prj
lakes.shp
lakes.shx

They can be easily stored into more modern formats as well, such as GeoPackage:

points_gdf.to_file(tempdir / "points.gpkg")
filenames = [path.name for path in tempdir.iterdir()]
print("\n".join(filenames))
C:\buildagent\work\4b9080cbb3354582\imod-python\.pixi\envs\default\Lib\site-packages\pyogrio\geopandas.py:662: UserWarning: 'crs' was not provided.  The output dataset will not have projection information defined and may not be usable in other systems.
  write(
lake
points.gpkg

… and back:

back = gpd.read_file(tempdir / "points.gpkg")
back
geometry
0 POINT (90000 450000)
1 POINT (91000 451000)
2 POINT (92000 452000)
3 POINT (93000 453000)
4 POINT (94000 454000)
5 POINT (95000 455000)
6 POINT (96000 456000)
7 POINT (97000 457000)
8 POINT (98000 458000)
9 POINT (99000 459000)
10 POINT (100000 460000)
11 POINT (101000 461000)
12 POINT (102000 462000)
13 POINT (103000 463000)
14 POINT (104000 464000)
15 POINT (105000 465000)
16 POINT (106000 466000)
17 POINT (107000 467000)
18 POINT (108000 468000)
19 POINT (109000 469000)
20 POINT (110000 470000)
21 POINT (111000 471000)
22 POINT (112000 472000)
23 POINT (113000 473000)
24 POINT (114000 474000)
25 POINT (115000 475000)
26 POINT (116000 476000)
27 POINT (117000 477000)
28 POINT (118000 478000)
29 POINT (119000 479000)


Conversion to raster#

From the perspective of MODFLOW groundwater modeling, we are often interested in the properties of cells in specific polygons or zones. Refer to the examples or the API reference for imod.prepare.

GeoPandas provides a full suite of vector based GIS operations, such as intersections, spatial joins, or plotting.

Total running time of the script: (0 minutes 1.110 seconds)

Gallery generated by Sphinx-Gallery