Example: Working with geodatasets#
[1]:
from hydromt import DataCatalog
from hydromt.log import setuplog
logger = setuplog("raster data", log_level=10)
dc = DataCatalog(logger=logger, data_libs=["artifact_data"])
2024-02-26 15:13:06,766 - raster data - log - INFO - HydroMT version: 0.9.4
2024-02-26 15:13:06,791 - raster data - data_catalog - INFO - Reading data catalog archive artifact_data v0.0.8
2024-02-26 15:13:06,791 - raster data - data_catalog - INFO - Parsing data catalog from /home/runner/.hydromt_data/artifact_data/v0.0.8/data_catalog.yml
Here, we illustrate some common GIS problems and how the functionality of the DataArray/Dataset vector accessor can be used. The data is accessed using the HydroMT DataCatalog. For more information see the Reading vector data example.
Geospatial attributes#
Some of the available geospatial attributes are listed here below. For all of attributes (and methods for that matter): check the HydroMT API reference
[2]:
# Get the waterlevel dataset from the datacatalog
# The waterlevels are series per location (points; here in lat lon)
ds = dc.get_geodataset("gtsmv3_eu_era5")
[3]:
# Coordinate reference system
ds.vector.crs
[3]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- undefined
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
[4]:
# Geospatial data as geometry objects (we show the first 5 points)
ds.vector.geometry.head()
[4]:
stations
13670 POINT (12.25342 45.25635)
2798 POINT (12.22412 45.27100)
2799 POINT (12.29736 45.22705)
13775 POINT (12.75879 45.24902)
13723 POINT (12.50244 45.25635)
dtype: geometry
[5]:
# names of x- and y coordinates
(ds.vector.x_name, ds.vector.y_name)
[5]:
('lon', 'lat')
[6]:
# Bounding box of the geospatial data
ds.vector.bounds
[6]:
array([12.22412, 45.22705, 12.99316, 45.62256])
Reprojection#
[7]:
# Reproject data to Pseudo Mercator (EPSG: 3857)
ds_pm = ds.vector.to_crs(3857)
# Coordinate reference system
ds_pm.vector.crs
[7]:
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
[8]:
# Check that the coordinate values are indeed no longer in degrees
ds_pm.lon.values[0]
[8]:
1364044.4748761142
Conversion#
[9]:
# Create an ogr compliant dataset from ds
# When written in netcdf4 format, this can be read by ogr (osgeo; QGIS)
from numpy import mean
ds_ogr = ds.vector.ogr_compliant(reducer=mean)
ds_ogr
[9]:
<xarray.Dataset> Size: 388B Dimensions: (stations: 19) Coordinates: * stations (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789 ogc_wkt (stations) object 152B 'POINT (12.25342 45.25635)' ... 'POIN... spatial_ref int64 8B 0 Data variables: waterlevel (stations) float64 152B dask.array<chunksize=(19,), meta=np.ndarray> Attributes: Conventions: CF-1.6 GDAL: GDAL 3.8.4 ogr_geometry_field: ogc_wkt ogr_layer_type: Point
[10]:
# Convert the geospatial data to wkt strings
ds_wkt = ds.vector.to_wkt()
ds_wkt
[10]:
<xarray.DataArray 'waterlevel' (time: 2016, stations: 19)> Size: 306kB dask.array<open_dataset-waterlevel, shape=(2016, 19), dtype=float64, chunksize=(2016, 19), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 16kB 2010-02-01 ... 2010-02-14T23:50:00 spatial_ref int32 4B ... * stations (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789 ogc_wkt (stations) object 152B 'POINT (12.25342 45.25635)' ... 'POIN... Attributes: long_name: sea_surface_height_above_mean_sea_level units: m CDI_grid_type: unstructured short_name: waterlevel description: Total water level resulting from the combination of baro... category: ocean paper_doi: 10.24381/cds.8c59054f paper_ref: Copernicus Climate Change Service 2019 source_license: https://cds.climate.copernicus.eu/cdsapp/#!/terms/licenc... source_url: https://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24... source_version: GTSM v3.0
[11]:
# Convert the geospatial data to geometry objects
ds_geom = ds.vector.to_geom()
ds_geom
[11]:
<xarray.DataArray 'waterlevel' (time: 2016, stations: 19)> Size: 306kB dask.array<open_dataset-waterlevel, shape=(2016, 19), dtype=float64, chunksize=(2016, 19), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 16kB 2010-02-01 ... 2010-02-14T23:50:00 spatial_ref int32 4B ... * stations (stations) int32 76B 13670 2798 2799 13775 ... 2791 2790 2789 geometry (stations) object 152B POINT (12.25342 45.25635) ... POINT (... Attributes: long_name: sea_surface_height_above_mean_sea_level units: m CDI_grid_type: unstructured short_name: waterlevel description: Total water level resulting from the combination of baro... category: ocean paper_doi: 10.24381/cds.8c59054f paper_ref: Copernicus Climate Change Service 2019 source_license: https://cds.climate.copernicus.eu/cdsapp/#!/terms/licenc... source_url: https://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24... source_version: GTSM v3.0
I/O (Internal and External)#
[12]:
# Create a dummy GeoDataFrame for this example
from geopandas import GeoDataFrame
from shapely.geometry import Point
from pyproj import CRS
from hydromt.vector import GeoDataset, GeoDataArray
import numpy as np
gdf = GeoDataFrame(
[{"Loc": "I", "Stuff": 1}, {"Loc": "II", "Stuff": 2}],
geometry=[Point(0, 0), Point(1, 1)],
crs=CRS.from_epsg(4326),
)
ds_gdf = GeoDataArray.from_gdf(gdf, np.arange(gdf.index.size))
ds_gdf
[12]:
<xarray.DataArray (index: 2)> Size: 16B array([0, 1]) Coordinates: * index (index) int64 16B 0 1 Loc (index) object 16B 'I' 'II' Stuff (index) int64 16B 1 2 geometry (index) object 16B POINT (0 0) POINT (1 1) spatial_ref int64 8B 0
[13]:
# Write the Dataset of the DataCatalog to a GeoDataFrame
# Waterlevel has besides stations a time dimension
# GeoDataFrames don't like vectors/ 2d array's, so if you want to keep the variable it can be reduced along the time dimension
gdf = ds.vector.to_gdf(reducer=mean)
gdf.head()
[13]:
geometry | waterlevel | |
---|---|---|
stations | ||
13670 | POINT (12.25342 45.25635) | 0.138058 |
2798 | POINT (12.22412 45.27100) | 0.140681 |
2799 | POINT (12.29736 45.22705) | 0.133489 |
13775 | POINT (12.75879 45.24902) | 0.121096 |
13723 | POINT (12.50244 45.25635) | 0.126971 |
[14]:
# Write the Dataset to an ogr compliant netcdf
from os.path import join
ds.vector.to_netcdf(join("tmpdir", "ds_ogr.nc"), ogr_compliant=True)
# It is indeed ogr compliant as ogrinfo.exe is able to read it
!ogrinfo tmpdir/ds_ogr.nc
Warning 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored.
Warning 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored.
INFO: Open of `tmpdir/ds_ogr.nc'
using driver `netCDF' successful.
Metadata:
NC_GLOBAL#Conventions=CF-1.6
NC_GLOBAL#coordinates=ogc_wkt spatial_ref
NC_GLOBAL#GDAL=GDAL 3.8.4
NC_GLOBAL#ogr_geometry_field=ogc_wkt
NC_GLOBAL#ogr_layer_type=Point