Tip

For an interactive online version click here: Binder badge

Example: Working with geodatasets#

[1]:
from hydromt import DataCatalog
from hydromt.log import setuplog

logger = setuplog("raster data", log_level=10)
dc = DataCatalog(logger=logger, data_libs=["artifact_data"])
2024-04-25 17:20:29,751 - raster data - log - INFO - HydroMT version: 0.9.5.dev0
2024-04-25 17:20:29,776 - raster data - data_catalog - INFO - Reading data catalog archive artifact_data v0.0.8
2024-04-25 17:20:29,777 - raster data - data_catalog - INFO - Parsing data catalog from /home/runner/.hydromt_data/artifact_data/v0.0.8/data_catalog.yml

Here, we illustrate some common GIS problems and how the functionality of the DataArray/Dataset vector accessor can be used. The data is accessed using the HydroMT DataCatalog. For more information see the Reading vector data example.

Geospatial attributes#

Some of the available geospatial attributes are listed here below. For all of attributes (and methods for that matter): check the HydroMT API reference

[2]:
# Get the waterlevel dataset from the datacatalog
# The waterlevels are series per location (points; here in lat lon)
ds = dc.get_geodataset("gtsmv3_eu_era5")
[3]:
# Coordinate reference system
ds.vector.crs
[3]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- undefined
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
[4]:
# Geospatial data as geometry objects (we show the first 5 points)
ds.vector.geometry.head()
[4]:
stations
13670    POINT (12.25342 45.25635)
2798     POINT (12.22412 45.27100)
2799     POINT (12.29736 45.22705)
13775    POINT (12.75879 45.24902)
13723    POINT (12.50244 45.25635)
dtype: geometry
[5]:
# names of x- and y coordinates

(ds.vector.x_name, ds.vector.y_name)
[5]:
('lon', 'lat')
[6]:
# Bounding box of the geospatial data
ds.vector.bounds
[6]:
array([12.22412, 45.22705, 12.99316, 45.62256])

Reprojection#

[7]:
# Reproject data to Pseudo Mercator (EPSG: 3857)
ds_pm = ds.vector.to_crs(3857)

# Coordinate reference system
ds_pm.vector.crs
[7]:
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
[8]:
# Check that the coordinate values are indeed no longer in degrees
ds_pm.lon.values[0]
[8]:
1364044.4748761142

Conversion#

[9]:
# Create an ogr compliant dataset from ds
# When written in netcdf4 format, this can be read by ogr (osgeo; QGIS)
from numpy import mean

ds_ogr = ds.vector.ogr_compliant(reducer=mean)
ds_ogr
[9]:
<xarray.Dataset> Size: 388B
Dimensions:      (stations: 19)
Coordinates:
  * stations     (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789
    ogc_wkt      (stations) object 152B 'POINT (12.25342 45.25635)' ... 'POIN...
    spatial_ref  int64 8B 0
Data variables:
    waterlevel   (stations) float64 152B dask.array<chunksize=(19,), meta=np.ndarray>
Attributes:
    Conventions:         CF-1.6
    GDAL:                GDAL 3.8.4
    ogr_geometry_field:  ogc_wkt
    ogr_layer_type:      Point
[10]:
# Convert the geospatial data to wkt strings
ds_wkt = ds.vector.to_wkt()
ds_wkt
[10]:
<xarray.DataArray 'waterlevel' (time: 2016, stations: 19)> Size: 306kB
dask.array<open_dataset-waterlevel, shape=(2016, 19), dtype=float64, chunksize=(2016, 19), chunktype=numpy.ndarray>
Coordinates:
  * time         (time) datetime64[ns] 16kB 2010-02-01 ... 2010-02-14T23:50:00
    spatial_ref  int32 4B ...
  * stations     (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789
    ogc_wkt      (stations) object 152B 'POINT (12.25342 45.25635)' ... 'POIN...
Attributes:
    long_name:       sea_surface_height_above_mean_sea_level
    units:           m
    CDI_grid_type:   unstructured
    short_name:      waterlevel
    description:     Total water level resulting from the combination of baro...
    category:        ocean
    paper_doi:       10.24381/cds.8c59054f
    paper_ref:       Copernicus Climate Change Service 2019
    source_license:  https://cds.climate.copernicus.eu/cdsapp/#!/terms/licenc...
    source_url:      https://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24...
    source_version:  GTSM v3.0
[11]:
# Convert the geospatial data to geometry objects
ds_geom = ds.vector.to_geom()
ds_geom
[11]:
<xarray.DataArray 'waterlevel' (time: 2016, stations: 19)> Size: 306kB
dask.array<open_dataset-waterlevel, shape=(2016, 19), dtype=float64, chunksize=(2016, 19), chunktype=numpy.ndarray>
Coordinates:
  * time         (time) datetime64[ns] 16kB 2010-02-01 ... 2010-02-14T23:50:00
    spatial_ref  int32 4B ...
  * stations     (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789
    geometry     (stations) object 152B POINT (12.25342 45.25635) ... POINT (...
Attributes:
    long_name:       sea_surface_height_above_mean_sea_level
    units:           m
    CDI_grid_type:   unstructured
    short_name:      waterlevel
    description:     Total water level resulting from the combination of baro...
    category:        ocean
    paper_doi:       10.24381/cds.8c59054f
    paper_ref:       Copernicus Climate Change Service 2019
    source_license:  https://cds.climate.copernicus.eu/cdsapp/#!/terms/licenc...
    source_url:      https://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24...
    source_version:  GTSM v3.0

I/O (Internal and External)#

[12]:
# Create a dummy GeoDataFrame for this example
from geopandas import GeoDataFrame
from shapely.geometry import Point
from pyproj import CRS
from hydromt.vector import GeoDataset, GeoDataArray
import numpy as np

gdf = GeoDataFrame(
    [{"Loc": "I", "Stuff": 1}, {"Loc": "II", "Stuff": 2}],
    geometry=[Point(0, 0), Point(1, 1)],
    crs=CRS.from_epsg(4326),
)

ds_gdf = GeoDataArray.from_gdf(gdf, np.arange(gdf.index.size))
ds_gdf
[12]:
<xarray.DataArray (index: 2)> Size: 16B
array([0, 1])
Coordinates:
  * index        (index) int64 16B 0 1
    Loc          (index) object 16B 'I' 'II'
    Stuff        (index) int64 16B 1 2
    geometry     (index) object 16B POINT (0 0) POINT (1 1)
    spatial_ref  int64 8B 0
[13]:
# Write the Dataset of the DataCatalog to a GeoDataFrame
# Waterlevel has besides stations a time dimension
# GeoDataFrames don't like vectors/ 2d array's, so if you want to keep the variable it can be reduced along the time dimension

gdf = ds.vector.to_gdf(reducer=mean)
gdf.head()
[13]:
geometry waterlevel
stations
13670 POINT (12.25342 45.25635) 0.138058
2798 POINT (12.22412 45.27100) 0.140681
2799 POINT (12.29736 45.22705) 0.133489
13775 POINT (12.75879 45.24902) 0.121096
13723 POINT (12.50244 45.25635) 0.126971
[14]:
# Write the Dataset to an ogr compliant netcdf
from os.path import join

ds.vector.to_netcdf(join("tmpdir", "ds_ogr.nc"), ogr_compliant=True)

# It is indeed ogr compliant as ogrinfo.exe is able to read it
!ogrinfo tmpdir/ds_ogr.nc
Warning 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored.
Warning 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored.
INFO: Open of `tmpdir/ds_ogr.nc'
      using driver `netCDF' successful.
Metadata:
  NC_GLOBAL#Conventions=CF-1.6
  NC_GLOBAL#coordinates=ogc_wkt spatial_ref
  NC_GLOBAL#GDAL=GDAL 3.8.4
  NC_GLOBAL#ogr_geometry_field=ogc_wkt
  NC_GLOBAL#ogr_layer_type=Point