Migrating the Data Catalog#

Overview#

The data catalog structure has been refactored to introduce a more modular design and clearer separation of responsibilities across several new classes (DataSource, Driver, URIResolver, and DataAdapter):

URIResolver is in charge of parsing the path or URI of the file (e.g if you are using some keywords like {year} or {month} in your paths or if you want to read tiled raster)
Driver is in charge of reading the data from the source (e.g reading a netcdf file from a local disk or from cloud)
DataAdapter is in charge of harmonizing the data to standard HydroMT data structures (e.g. renaming variables, setting attributes, units conversion, etc.)
DataSource is the main class that ties everything together and is used by the DataCatalog to load data.

Key format changes:

path is renamed to uri
driver: filesystem or driver_kwargs moved under driver. driver can be a single string or a dictionnary with name and options (passed to underlying function that will read the data, e.g. xarray.open_mfdataset, etc.).
data_adapter:unit_add, unit_mult, rename, etc. moved under data_adapter
uri_resolver: can be specified mostly in the case of tiled rasters to pass required options.
metadata: crs and nodata are moved under metadata (renamed from meta)
A single catalog entry can now reference multiple data variants or versions

See more information about the current format in the data catalog documentation.

How to upgrade#

All existing pre-defined catalogs have been updated to the new format. For your own catalogs, you can upgrade easily with the HydroMT check command:

hydromt check -d /path/to/data_catalog.yml --format v0 --upgrade -v