hydromt.data_catalog.DataCatalog#

class hydromt.data_catalog.DataCatalog(data_libs: ~typing.List | str | None = None, fallback_lib: str | None = 'artifact_data', logger=<Logger hydromt.data_catalog (WARNING)>, cache: bool | None = False, cache_dir: str | None = None, **artifact_keys)[source]#

Base class for the data catalog object.

Catalog of DataAdapter sources.

Helps to easily read from different files and keep track of files which have been accessed.

Parameters:
  • data_libs ((list of) str, Path, optional) – One or more paths to data catalog configuration files or names of predefined data catalogs. By default the data catalog is initiated without data entries. See from_yml() for accepted yaml format.

  • fallback_lib – Name of pre-defined data catalog to read if no data_libs are provided, by default ‘artifact_data’. If None, no default data catalog is used.

  • cache (bool, optional) – Set to true to cache data locally before reading. Currently only implemented for tiled rasterdatasets, by default False.

  • cache_dir (str, Path, optional) – Folder root path to cach data to, by default ~/.hydromt_data

  • artifact_keys – Deprecated from version v0.5

  • logger (logger object, optional) – The logger object used for logging messages. If not provided, the default logger will be used.

__init__(data_libs: ~typing.List | str | None = None, fallback_lib: str | None = 'artifact_data', logger=<Logger hydromt.data_catalog (WARNING)>, cache: bool | None = False, cache_dir: str | None = None, **artifact_keys) None[source]#

Catalog of DataAdapter sources.

Helps to easily read from different files and keep track of files which have been accessed.

Parameters:
  • data_libs ((list of) str, Path, optional) – One or more paths to data catalog configuration files or names of predefined data catalogs. By default the data catalog is initiated without data entries. See from_yml() for accepted yaml format.

  • fallback_lib – Name of pre-defined data catalog to read if no data_libs are provided, by default ‘artifact_data’. If None, no default data catalog is used.

  • cache (bool, optional) – Set to true to cache data locally before reading. Currently only implemented for tiled rasterdatasets, by default False.

  • cache_dir (str, Path, optional) – Folder root path to cach data to, by default ~/.hydromt_data

  • artifact_keys – Deprecated from version v0.5

  • logger (logger object, optional) – The logger object used for logging messages. If not provided, the default logger will be used.

Methods

__init__([data_libs, fallback_lib, logger, ...])

Catalog of DataAdapter sources.

add_source(source, adapter)

Add a new data source to the data catalog.

contains_source(source[, provider, version, ...])

Check if source is in catalog.

export_data(data_root[, bbox, time_tuple, ...])

Export a data slice of each dataset and a data_catalog.yml file to disk.

from_archive(urlpath[, version, name])

Read and cache a data archive including a data_catalog.yml file.

from_artifacts([name, version])

Parse artifacts.

from_dict(data_dict[, catalog_name, root, ...])

Add data sources based on dictionary.

from_predefined_catalogs(name[, version])

Add data sources from a predefined data catalog.

from_stac_catalog(stac_like[, on_error])

Write data catalog to STAC format.

from_yml(urlpath[, root, catalog_name, ...])

Add data sources based on yaml file.

get_dataframe(data_like[, variables, ...])

Return a unified and sliced DataFrame.

get_dataset(data_like[, variables, ...])

Return a clipped, sliced and unified Dataset.

get_geodataframe(data_like[, bbox, geom, ...])

Return a clipped and unified GeoDataFrame (vector).

get_geodataset(data_like[, bbox, geom, ...])

Return a clipped, sliced and unified GeoDataset.

get_rasterdataset(data_like[, bbox, geom, ...])

Return a clipped, sliced and unified RasterDataset.

get_source(source[, provider, version])

Return a data source.

get_source_bbox(source[, provider, version, ...])

Retrieve the bounding box and crs of the source.

get_source_names()

Return a list of all available data source names.

get_source_time_range(source[, provider, ...])

Detect the temporal range of the dataset.

iter_sources([used_only])

Return a flat list of all available data sources.

set_predefined_catalogs([urlpath])

Initialise the predefined catalogs.

to_dataframe([source_names])

Return data catalog summary as DataFrame.

to_dict([source_names, root, meta, used_only])

Export the data catalog to a dictionary.

to_stac_catalog(root[, source_names, meta, ...])

Write data catalog to STAC format.

to_yml(path[, root, source_names, ...])

Write data catalog to yaml format.

update(**kwargs)

Add data sources to library or update them.

update_sources(**kwargs)

Add data sources to library or update them.

Attributes

keys

Returns list of data source names.

predefined_catalogs

Return all predefined catalogs.

sources

Returns dictionary of DataAdapter sources.