.. _get_data: Overview data ============= The best way to provide data to HydroMT is by using a **data catalog**. The goal of this data catalog is to provide simple and standardized access to (large) datasets. It supports many drivers to read different data formats and contains several pre-processing steps to unify the datasets. A data catalog can be initialized from one or more **yaml file(s)**, which contain all required information to read and pre-process a dataset, as well as meta data for reproducibility. You can :ref:`explore and make use of pre-defined data catalogs <existing_catalog>` (primarily global data), :ref:`prepare your own data catalog <own_catalog>` (e.g. to include local data) or use a combination of both. .. TIP:: If no yaml file is provided to the CLI build or update methods or to :py:class:`~hydromt.data_catalog.DataCatalog`, HydroMT will use the data stored in the :ref:`artifact_data <existing_catalog>` which contains an extract of global data for a small region around the Piave river in Northern Italy. .. TIP:: Tiles of tiled rasterdatasets which are described by a .vrt file can be cached locally. The requested data tiles will by default be stored to ~/.hydromt_data. To use this option from command line add `--cache` to the `hydromt build` or `hydromt update` commands In Python the cache is a property of the DataCatalog and can be set at Initialization. .. _get_data_cli: From CLI -------- When using the HydroMT command line interface (CLI), one can provide a data catalog by specifying the path to the yaml file with the ``-d (--data)`` option. Alternatively, you can also use names and versions of the :ref:`predefined data catalogs <existing_catalog>`. If no version is specified, the latest version available is used. .. code-block:: console hydromt build MODEL -d artifact_data From Python ----------- To read a dataset in Python using the HydroMT requires two steps: 1) Initialize a :py:class:`~hydromt.data_catalog.DataCatalog` with references to user- or pre-defined data catalog yaml files 2) Use :ref: `one of the DataCatalog.get_* methods` to access (a temporal or spatial region of) the data. For example to retrieve a raster dataset use :py:func:`~hydromt.DataCatalog.get_rasterdataset`: .. code-block:: python import hydromt data_cat = hydromt.DataCatalog(data_libs=r'/path/to/data-catalog.yml') ds = data_cat.get_rasterdataset('source_name', bbox=[xmin, ymin, xmax, ymax]) # returns xarray.dataset More details about reading `raster data <../_examples/reading_raster_data.ipynb>`_ or `vector data <../_examples/reading_vector_data.ipynb>`_ is provided in the linked examples. Related API references ---------------------- For related functions see: - :ref: `DataCatalog API <api_data_catalog>` - :ref: `DataCatalog.get_* methods` - :ref: `data reading-methods <open_methods>`