Overview data#

The best way to provide data to HydroMT is by using a data catalog. The goal of this data catalog is to provide simple and standardized access to (large) datasets. It supports many drivers to read different data formats and contains several pre-processing steps to unify the datasets. A data catalog can be initialized from one or more yaml file(s), which contain all required information to read and pre-process a dataset, as well as meta data for reproducibility.

You can explore and make use of pre-defined data catalogs (primarily global data), prepare your own data catalog (e.g. to include local data) or use a combination of both.

Tip

If no yaml file is provided to the CLI build or update methods or to DataCatalog, HydroMT will use the data stored in the artifact_data which contains an extract of global data for a small region around the Piave river in Northern Italy.

Tip

Tiles of tiled rasterdatasets which are described by a .vrt file can be cached locally. The requested data tiles will by default be stored to ~/.hydromt_data. To use this option from command line add –cache to the hydromt build or hydromt update commands In Python the cache is a property of the DataCatalog and can be set at Initialization.

From CLI#

When using the HydroMT command line interface (CLI), one can provide a data catalog by specifying the path to the yaml file with the -d (--data) option. Alternatively, you can also use names and versions of the predefined data catalogs. If no version is specified, the latest version available is used.

hydromt build MODEL -d artifact_data

From Python#

To read a dataset in Python using the HydroMT requires two steps:

  1. Initialize a DataCatalog with references to user- or pre-defined data catalog yaml files

  2. Use :ref: one of the DataCatalog.get_* methods to access (a temporal or spatial region of) the data.

For example to retrieve a raster dataset use get_rasterdataset():

import hydromt
data_cat = hydromt.DataCatalog(data_libs=r'/path/to/data-catalog.yml')
ds = data_cat.get_rasterdataset('source_name', bbox=[xmin, ymin, xmax, ymax])  # returns xarray.dataset

More details about reading raster data or vector data is provided in the linked examples.