Tip

For an interactive online version click here: Binder badge

Example: Working with raster data#

[1]:
# import hydromt and initiate the test artifact_data catalog
import hydromt
from hydromt.log import setuplog

logger = setuplog("raster data", log_level=10)
data_catalog = hydromt.DataCatalog(logger=logger, data_libs=["artifact_data"])
2024-02-26 15:14:01,993 - raster data - log - INFO - HydroMT version: 0.9.4
2024-02-26 15:14:02,017 - raster data - data_catalog - INFO - Reading data catalog archive artifact_data v0.0.8
2024-02-26 15:14:02,017 - raster data - data_catalog - INFO - Parsing data catalog from /home/runner/.hydromt_data/artifact_data/v0.0.8/data_catalog.yml

Here, we illustrate some common GIS problems and how the functionality of the DataArray/Dataset raster accessor can be used. The data is accessed using the HydroMT DataCatalog. For more information see the Reading raster data example.

Note that this implementation is based on a very earlier version of rioxarray and in fact we are using some of rioxarray functionality and working towards replacing our duplicated functionality with that of rioxarray and will try to contribute new functionality to rioxarray. The original reason for the raster data accessor was that we needed some changes in the internals of the writing methods for PCRaster data which is not fully supported by its GDAL driver. Currently the key difference between both packages, besides the naming of the accessor and some methods, are in new methods that have been added over time by both packages and the way that the raster attribute data is stored. In HydroMT this attribute data is always stored in the spatial_ref coordinate of the DataArray/Dataset whereas rioxarray uses additional attributes of the RasterDataset class.

Geospatial attributes#

Using the raster accessor we can get (and set) many geospatial properties of regular raster datasets.

[2]:
# get the GHS population raster dataset from the catalog
da = data_catalog.get_rasterdataset("ghs_pop_2015").rename("population")
da.raster.mask_nodata().reset_coords(drop=True).plot(vmax=200)
2024-02-26 15:14:02,069 - raster data - rasterdataset - INFO - Reading ghs_pop_2015 raster data from /home/runner/.hydromt_data/artifact_data/v0.0.8/ghs_pop_2015.tif
[2]:
<matplotlib.collections.QuadMesh at 0x7fe3fcada290>
../_images/_examples_working_with_raster_6_2.png
[3]:
# coordinate reference system
da.raster.crs
[3]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- undefined
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
[4]:
# geospatial transform, see https://www.perrygeo.com/python-affine-transforms.html
da.raster.transform
[4]:
Affine(0.0024999999900981278, 0.0, 11.600416643117885,
       0.0, -0.0024999999899385983, 46.80041668121458)
[5]:
# names of x- and y dimensions
(da.raster.x_dim, da.raster.y_dim)
[5]:
('x', 'y')
[6]:
# nodata value (or fillvalue) of a specific variable
da.raster.nodata
[6]:
-200.0

Reproject (warp) raster data#

A common preprocessing step to generate model data is to make sure all data is in the same CRS. The .raster.reproject() are build on rasterio and GDAL and make this really easy for you.

In this example we demonstrate how to reproject a population count grid. This grid should not be reprojected directly to other resolutions in order to conserve the total number of people. Therefore, we first derive the population density using .raster.density_grid() which can be reprojected and combined with the project grid cell area using .raster.area_grid() to calculate the final population count. Using this approach we only make a small error which we correct to preserve the total count.

[7]:
from hydromt.gis_utils import utm_crs

utm = utm_crs(da.raster.bounds)
print(f"Destination CRS: {utm}")
da_pop = da.raster.mask_nodata()
da_pop_dens = da_pop.raster.density_grid().rename("population_dens")  # pop.m-2
da_pop_dens_utm = da_pop_dens.raster.reproject(
    method="bilinear", dst_crs=utm, dst_res=250
)
da_pop_utm = da_pop_dens_utm * da_pop_dens_utm.raster.area_grid()  # pop
bias = (da_pop.sum() / da_pop_utm.sum()).compute().item()
print(f"Error: {(1-bias)*100:.3f}%")
da_pop_utm_adj = da_pop_utm * bias  # bias correct
da_pop_utm_adj.name = "population"
da_pop_utm_adj.reset_coords(drop=True).plot(vmax=200)
Destination CRS: EPSG:32633
Error: 0.036%
[7]:
<matplotlib.collections.QuadMesh at 0x7fe3fc762550>
../_images/_examples_working_with_raster_13_2.png

Zonal statistics#

For many vector models, zonal statistics are required to derive model parameters. The .raster.zonal_stats() method implements a range of statistics, but also allows for user defined statistics passed as a callable. Here we provide an example to get the population count per admin 3 level. HydroMT takes care that the vector data is reprojected to the raster data CRS if necessary.

[8]:
import xarray as xr
import numpy as np

gdf = data_catalog.get_geodataframe("gadm_level3", variables=["NAME_3"])
ds = xr.merge(
    [
        da_pop.raster.zonal_stats(gdf, stats=["sum"]) / 1e3,  # count [pop x1000]
        da_pop_dens.raster.zonal_stats(gdf, stats=["max", np.nanmax])
        * 1e6,  # density [pop km-2]
    ]
)
for var in ds.data_vars:
    gdf[var] = ds[var]

gdf.sort_values("population_sum", ascending=False).head()
2024-02-26 15:14:03,512 - raster data - geodataframe - INFO - Reading gadm_level3 vector data from /home/runner/.hydromt_data/artifact_data/v0.0.8/gadm_level3.gpkg
[8]:
NAME_3 geometry population_sum population_dens_max population_dens_nanmax
339 Venezia MULTIPOLYGON (((12.17899 45.46872, 12.17908 45... 241.430775 19783.234302 19783.234302
186 Padova MULTIPOLYGON (((11.97606 45.40236, 11.97544 45... 204.654669 14919.109912 14919.109912
324 Treviso MULTIPOLYGON (((12.18336 45.66973, 12.18387 45... 81.099755 10378.027434 10378.027434
278 Pordenone MULTIPOLYGON (((12.62392 45.98614, 12.62745 46... 51.308482 11382.017847 11382.017847
232 Bassano Del Grappa MULTIPOLYGON (((11.68664 45.73788, 11.68658 45... 44.323634 8070.610833 8070.610833
[9]:
gdf.plot(
    "population_sum",
    scheme="NaturalBreaks",
    legend=True,
    legend_kwds=dict(fmt="{:.0f}", title="population [x1000]"),
    figsize=(6, 6),
)
[9]:
<Axes: >
../_images/_examples_working_with_raster_17_1.png

Interpolate nodata values#

To create a continuos grid with values gaps in the data can be filled trough interpolation. HydroMT has the .raster.interpolate_na() method with several interpolation options available. For the nearest, linear and cubic interpolation the scipy.interpolate.griddata() method is used. First, a mesh of all valid values surrounding the data gaps is build using voronoi triangulation. Then, values are calculated for each grid cell with a nodata value. Note that nearest method also extrapolates, while the other methods only interpolate gaps. A final method is based on rasterio.fill.fillnodata() method.

[10]:
da_soil = data_catalog.get_rasterdataset(
    "soilgrids", bbox=[12.7, 45.6, 13, 45.8], variables=["ph_sl1"]
)
da_soil.raster.mask_nodata().plot()
2024-02-26 15:14:17,474 - raster data - rasterdataset - INFO - Reading soilgrids raster data from /home/runner/.hydromt_data/artifact_data/v0.0.8/soilgrids/{variable}.tif
2024-02-26 15:14:17,491 - raster data - rasterdataset - DEBUG - Clip to [12.700, 45.600, 13.000, 45.800] (epsg:4326))
2024-02-26 15:14:17,492 - raster data - rasterdataset - DEBUG - Convert units for 1 variables.
[10]:
<matplotlib.collections.QuadMesh at 0x7fe3d7e01c90>
../_images/_examples_working_with_raster_20_4.png
[11]:
# Note that data is only interpolated leaving a small nodata gap in the lower right corner. This can be extrapolated with the 'nearest' method.
da_soil.raster.interpolate_na(method="linear").raster.mask_nodata().plot()
[11]:
<matplotlib.collections.QuadMesh at 0x7fe3d548e950>
../_images/_examples_working_with_raster_21_1.png

Reproject and merge#

This example shows how to use the .raster.reproject_like() method to align different datasets such that these are at identical grids and can be merged.

[12]:
# step 1: read the data and mask nodata values
bbox = [12.2, 45.3, 13, 45.8]
da_dem = data_catalog.get_rasterdataset("merit_hydro", variables=["elevtn"], bbox=bbox)
da_dem = da_dem.raster.mask_nodata()
da_dem.attrs.update(units="m+EGM96")
print(f"resolution MERIT Hydro: {da_dem.raster.res[0]:.05f}")
da_bath = data_catalog.get_rasterdataset("gebco").raster.mask_nodata()
print(f"resolution GEBCO: {da_bath.raster.res[0]:.05f}")
da_mdt = data_catalog.get_rasterdataset("mdt_cnes_cls18").raster.mask_nodata()
print(f"resolution MDT: {da_mdt.raster.res[0]:.05f}")

plot_kwargs = dict(
    vmin=-50,
    vmax=50,
    cmap="coolwarm",
)
da_dem.plot(**plot_kwargs)
2024-02-26 15:14:17,905 - raster data - rasterdataset - INFO - Reading merit_hydro raster data from /home/runner/.hydromt_data/artifact_data/v0.0.8/merit_hydro/{variable}.tif
2024-02-26 15:14:17,921 - raster data - rasterdataset - DEBUG - Clip to [12.200, 45.300, 13.000, 45.800] (epsg:4326))
resolution MERIT Hydro: 0.00083
2024-02-26 15:14:17,925 - raster data - rasterdataset - INFO - Reading gebco raster data from /home/runner/.hydromt_data/artifact_data/v0.0.8/gebco.tif
resolution GEBCO: 0.00417
2024-02-26 15:14:17,943 - raster data - rasterdataset - INFO - Reading mdt_cnes_cls18 raster data from /home/runner/.hydromt_data/artifact_data/v0.0.8/mdt_cnes_cls18.tif
resolution MDT: 0.12500
[12]:
<matplotlib.collections.QuadMesh at 0x7fe3d537a310>
../_images/_examples_working_with_raster_24_6.png
[13]:
# step 2: convert GEBCO to EGM96 ref and reproject to MERIT Hydro grid
da_bath_egm = da_bath + da_mdt.raster.reproject_like(da_bath)
da_bath_reproj = da_bath_egm.raster.reproject_like(da_dem, method="cubic")
print(f"resolution reprojected GEBCO: {da_bath_reproj.raster.res[0]:.05f}")
da_dem_merged = da_dem.where(da_dem.notnull(), da_bath_reproj)
da_dem_merged.plot(**plot_kwargs)
resolution reprojected GEBCO: 0.00083
[13]:
<matplotlib.collections.QuadMesh at 0x7fe3d524f9d0>
../_images/_examples_working_with_raster_25_2.png

Write raster to file#

To write a dataset to a raster file the .raster.to_raster() can be used. By default the file is written in GeoTiff format. Each band is written as a layer of the raster file. A xarray.Dataset with multiple variables can be written to multiple files in a single folder, each with the name of the variable as basename with .raster.to_mapstack().

Here, we use the merged DEM output of the previous example to write to file. To ensure the CRS and nodata metadata are written we first update these attributes of the data based on the original DEM data.

[14]:
da_dem_merged.raster.set_crs(da_dem.raster.crs)
da_dem_merged.raster.set_nodata(-9999)
da_dem_merged.raster.to_raster(
    "tmpdir/dem_merged.tif", tags={"history": "produced with HydroMT"}
)
[15]:
# use gdalinfo to check output. Note the CRS (GEOGCRS), Metadata and NoData Value
!gdalinfo tmpdir/dem_merged.tif
Driver: GTiff/GeoTIFF
Files: tmpdir/dem_merged.tif
Size is 960, 600
Coordinate System is:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (12.200000000000014,45.799999999999997)
Pixel Size = (0.000833333333333,-0.000833333333333)
Metadata:
  AREA_OR_POINT=Area
  history=produced with HydroMT
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (  12.2000000,  45.8000000) ( 12d12' 0.00"E, 45d48' 0.00"N)
Lower Left  (  12.2000000,  45.3000000) ( 12d12' 0.00"E, 45d18' 0.00"N)
Upper Right (  13.0000000,  45.8000000) ( 13d 0' 0.00"E, 45d48' 0.00"N)
Lower Right (  13.0000000,  45.3000000) ( 13d 0' 0.00"E, 45d18' 0.00"N)
Center      (  12.6000000,  45.5500000) ( 12d36' 0.00"E, 45d33' 0.00"N)
Band 1 Block=960x1 Type=Float64, ColorInterp=Gray
  NoData Value=-9999