Tip

For an interactive online version click here:

Example: Working with geodatasets#

[1]:

from hydromt import DataCatalog
from hydromt.log import setuplog

logger = setuplog("raster data", log_level=10)
dc = DataCatalog(logger=logger, data_libs=["artifact_data"])

2024-02-26 15:13:06,766 - raster data - log - INFO - HydroMT version: 0.9.4

2024-02-26 15:13:06,791 - raster data - data_catalog - INFO - Reading data catalog archive artifact_data v0.0.8

2024-02-26 15:13:06,791 - raster data - data_catalog - INFO - Parsing data catalog from /home/runner/.hydromt_data/artifact_data/v0.0.8/data_catalog.yml

Here, we illustrate some common GIS problems and how the functionality of the DataArray/Dataset vector accessor can be used. The data is accessed using the HydroMT DataCatalog. For more information see the Reading vector data example.

Geospatial attributes#

Some of the available geospatial attributes are listed here below. For all of attributes (and methods for that matter): check the HydroMT API reference

[2]:

# Get the waterlevel dataset from the datacatalog
# The waterlevels are series per location (points; here in lat lon)
ds = dc.get_geodataset("gtsmv3_eu_era5")

[3]:

# Coordinate reference system
ds.vector.crs

[3]:

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- undefined
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

[4]:

# Geospatial data as geometry objects (we show the first 5 points)
ds.vector.geometry.head()

[4]:

stations
13670    POINT (12.25342 45.25635)
2798     POINT (12.22412 45.27100)
2799     POINT (12.29736 45.22705)
13775    POINT (12.75879 45.24902)
13723    POINT (12.50244 45.25635)
dtype: geometry

[5]:

# names of x- and y coordinates

(ds.vector.x_name, ds.vector.y_name)

[5]:

('lon', 'lat')

[6]:

# Bounding box of the geospatial data
ds.vector.bounds

[6]:

array([12.22412, 45.22705, 12.99316, 45.62256])

Reprojection#

[7]:

# Reproject data to Pseudo Mercator (EPSG: 3857)
ds_pm = ds.vector.to_crs(3857)

# Coordinate reference system
ds_pm.vector.crs

[7]:

<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

[8]:

# Check that the coordinate values are indeed no longer in degrees
ds_pm.lon.values[0]

[8]:

1364044.4748761142

Conversion#

[9]:

# Create an ogr compliant dataset from ds
# When written in netcdf4 format, this can be read by ogr (osgeo; QGIS)
from numpy import mean

ds_ogr = ds.vector.ogr_compliant(reducer=mean)
ds_ogr

[9]:

<xarray.Dataset> Size: 388B
Dimensions:      (stations: 19)
Coordinates:
  * stations     (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789
    ogc_wkt      (stations) object 152B 'POINT (12.25342 45.25635)' ... 'POIN...
    spatial_ref  int64 8B 0
Data variables:
    waterlevel   (stations) float64 152B dask.array<chunksize=(19,), meta=np.ndarray>
Attributes:
    Conventions:         CF-1.6
    GDAL:                GDAL 3.8.4
    ogr_geometry_field:  ogc_wkt
    ogr_layer_type:      Point

[10]:

# Convert the geospatial data to wkt strings
ds_wkt = ds.vector.to_wkt()
ds_wkt

[10]:

<xarray.DataArray 'waterlevel' (time: 2016, stations: 19)> Size: 306kB
dask.array<open_dataset-waterlevel, shape=(2016, 19), dtype=float64, chunksize=(2016, 19), chunktype=numpy.ndarray>
Coordinates:
  * time         (time) datetime64[ns] 16kB 2010-02-01 ... 2010-02-14T23:50:00
    spatial_ref  int32 4B ...
  * stations     (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789
    ogc_wkt      (stations) object 152B 'POINT (12.25342 45.25635)' ... 'POIN...
Attributes:
    long_name:       sea_surface_height_above_mean_sea_level
    units:           m
    CDI_grid_type:   unstructured
    short_name:      waterlevel
    description:     Total water level resulting from the combination of baro...
    category:        ocean
    paper_doi:       10.24381/cds.8c59054f
    paper_ref:       Copernicus Climate Change Service 2019
    source_license:  https://cds.climate.copernicus.eu/cdsapp/#!/terms/licenc...
    source_url:      https://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24...
    source_version:  GTSM v3.0

[11]:

# Convert the geospatial data to geometry objects
ds_geom = ds.vector.to_geom()
ds_geom

[11]:

<xarray.DataArray 'waterlevel' (time: 2016, stations: 19)> Size: 306kB
dask.array<open_dataset-waterlevel, shape=(2016, 19), dtype=float64, chunksize=(2016, 19), chunktype=numpy.ndarray>
Coordinates:
  * time         (time) datetime64[ns] 16kB 2010-02-01 ... 2010-02-14T23:50:00
    spatial_ref  int32 4B ...
  * stations     (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789
    geometry     (stations) object 152B POINT (12.25342 45.25635) ... POINT (...
Attributes:
    long_name:       sea_surface_height_above_mean_sea_level
    units:           m
    CDI_grid_type:   unstructured
    short_name:      waterlevel
    description:     Total water level resulting from the combination of baro...
    category:        ocean
    paper_doi:       10.24381/cds.8c59054f
    paper_ref:       Copernicus Climate Change Service 2019
    source_license:  https://cds.climate.copernicus.eu/cdsapp/#!/terms/licenc...
    source_url:      https://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24...
    source_version:  GTSM v3.0

I/O (Internal and External)#

[12]:

# Create a dummy GeoDataFrame for this example
from geopandas import GeoDataFrame
from shapely.geometry import Point
from pyproj import CRS
from hydromt.vector import GeoDataset, GeoDataArray
import numpy as np

gdf = GeoDataFrame(
    [{"Loc": "I", "Stuff": 1}, {"Loc": "II", "Stuff": 2}],
    geometry=[Point(0, 0), Point(1, 1)],
    crs=CRS.from_epsg(4326),
)

ds_gdf = GeoDataArray.from_gdf(gdf, np.arange(gdf.index.size))
ds_gdf

[13]:

# Write the Dataset of the DataCatalog to a GeoDataFrame
# Waterlevel has besides stations a time dimension
# GeoDataFrames don't like vectors/ 2d array's, so if you want to keep the variable it can be reduced along the time dimension

gdf = ds.vector.to_gdf(reducer=mean)
gdf.head()

[13]:

	geometry	waterlevel
stations
13670	POINT (12.25342 45.25635)	0.138058
2798	POINT (12.22412 45.27100)	0.140681
2799	POINT (12.29736 45.22705)	0.133489
13775	POINT (12.75879 45.24902)	0.121096
13723	POINT (12.50244 45.25635)	0.126971

[14]:

# Write the Dataset to an ogr compliant netcdf
from os.path import join

ds.vector.to_netcdf(join("tmpdir", "ds_ogr.nc"), ogr_compliant=True)

# It is indeed ogr compliant as ogrinfo.exe is able to read it
!ogrinfo tmpdir/ds_ogr.nc

Warning 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored.
Warning 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored.
INFO: Open of `tmpdir/ds_ogr.nc'
      using driver `netCDF' successful.
Metadata:
  NC_GLOBAL#Conventions=CF-1.6
  NC_GLOBAL#coordinates=ogc_wkt spatial_ref
  NC_GLOBAL#GDAL=GDAL 3.8.4
  NC_GLOBAL#ogr_geometry_field=ogc_wkt
  NC_GLOBAL#ogr_layer_type=Point