Overview data#
The best way to provide data to HydroMT is by using a data catalog. The goal of this data catalog is to provide simple and standardized access to (large) datasets. It supports many drivers to read different data formats and contains several pre-processing steps to unify the datasets. A data catalog can be initialized from one or more yaml file(s), which contain all required information to read and pre-process a dataset, as well as meta data for reproducibility.
You can explore and make use of pre-defined data catalogs (primarily global data), prepare your own data catalog (e.g. to include local data) or use a combination of both.
Tip
If no yaml file is provided to the CLI build or update methods or to DataCatalog
,
HydroMT will use the data stored in the artifact_data
which contains an extract of global data for a small region around the Piave river in Northern Italy.
Tip
Tiles of tiled rasterdatasets which are described by a .vrt file can be cached locally (starting from v0.7.0). The requested data tiles will by default be stored to ~/.hydromt_data. To use this option from command line add –cache to the hydromt build or hydromt update commands In Python the cache is a property of the DataCatalog and can be set at Initialization.
From CLI#
When using the HydroMT command line interface (CLI), one can provide a data catalog by specifying the
path to the yaml file with the -d (--data)
option.
Multiple yaml files can be added by reusing the -d
option. If the yaml files have data sources with
the same name, the source from the last catalog in the list is used.
For example when using the build CLI method:
hydromt build MODEL -r REGION -d /path/to/data_catalog1.yml -d /path/to/data_catalog2.yml
Alternatively, you can also use names and versions of the predefined data catalogs. If no version is specified, the latest version available is used.
hydromt build MODEL -r REGION -d deltares_data=v2022.5 -d artifact_data
A special exception is made for the Deltares data catalog which can be accessed with the
--dd (--deltares-data)
flag (requires access to the Deltares P-drive).
hydromt build MODEL -r REGION --dd
Note
When using several data catalogs, the order in which they are listed is important! If several catalogs contain
data sources with the same names, the sources from the last catalog in the list are used.
If the --dd (--deltares-data)
flag is used the deltares_data catalog is read first.
From Python#
To read a dataset in Python using the HydroMT requires two steps:
Initialize a
DataCatalog
with references to user- or pre-defined data catalog yaml filesUse one of the DataCatalog.get_* methods to access (a temporal or spatial region of) the data.
For example to retrieve a raster dataset use get_rasterdataset()
:
import hydromt
data_cat = hydromt.DataCatalog(data_libs=r'/path/to/data-catalog.yml')
ds = data_cat.get_rasterdataset('source_name', bbox=[xmin, ymin, xmax, ymax]) # returns xarray.dataset
More details about reading raster data or vector data is provided in the linked examples.