{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Example: Exporting data from a data catalog" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "This example illustrates the how to read and export data for a specific region / dates using the HydroMT [DataCatalog](../_generated/hydromt.data_catalog.DataCatalog.rst) and the [export_data()](../_generated/hydromt.data_catalog.DataCatalog.export_data.rst) method." ] }, { "attachments": {}, "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "## Explore the current data catalog\n", "For this exercise, we will use the pre-defined catalog `artifact_data` which contains a global data extracts for the Piave basin in Northern Italy. This data catalog and the actual data linked to it are for a small geographic extent as it is intended for documentation and testing purposes only. If you have another data catalog available (and the linked data), you can use it instead.\n", "\n", "To read your own data catalog (as well as a predefined catalog), you can use the **data_libs** argument of the [DataCatalog](../_generated/hydromt.data_catalog.DataCatalog.rst) which accepts either a absolute/relative path to a data catalog yaml file or a name of a pre-defined catalog.\n", "First let's read the pre-defined artifact data catalog:" ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "import hydromt\n", "\n", "# Download and read artifacts for the Piave basin to `~/.hydromt_data/`.\n", "data_catalog = hydromt.DataCatalog(data_libs=[\"artifact_data=v1.0.0\"])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "The `artifact_data` catalog is one of the pre-defined available DataCatalog of HydroMT. You can find an overview of [pre-defined data catalogs](../guides/user_guide/data_existing_cat.rst) in the online user guide. You can also get an overview of the pre-defined catalogs with their version number from HydroMT." ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [], "source": [ "from pprint import pprint\n", "\n", "pprint(data_catalog.predefined_catalogs)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "Let's now check which data sources are available in the catalog:" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": {}, "outputs": [], "source": [ "# For a list of sources including attributes\n", "data_catalog.sources.keys()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "And let's now open a plot one of the available datasets to check extent and available dates:" ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "ds = data_catalog.get_rasterdataset(\"era5\", time_range=(\"2010-02-02\", \"2010-02-15\"))\n", "print(\"\")\n", "print(f\"Available extent: {ds.raster.bounds}\")\n", "print(f\"Available dates: {ds.time.values[0]} to {ds.time.values[-1]}\")\n", "ds" ] }, { "attachments": {}, "cell_type": "markdown", "id": "10", "metadata": {}, "source": [ "## Export an extract of the data" ] }, { "attachments": {}, "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "Now we will export a subset of the data in our `artifact_data` catalog using the [DataCatalog.export_data()](../_generated/hydromt.data_catalog.DataCatalog.export_data.rst) method. Let's check the method's docstring:" ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": {}, "outputs": [], "source": [ "?data_catalog.export_data" ] }, { "attachments": {}, "cell_type": "markdown", "id": "13", "metadata": {}, "source": [ "Let's select which data source and the extent we want (based on the exploration above): " ] }, { "cell_type": "code", "execution_count": null, "id": "14", "metadata": {}, "outputs": [], "source": [ "# List of data sources to export\n", "# NOTE that for ERA5 we only export the precip variable and for merit_hydro we only export the elevtn variable\n", "source_list = [\"merit_hydro[elevtn,flwdir]\", \"era5[precip]\", \"vito_2015\"]\n", "# Geographic extent\n", "bbox = [12.0, 46.0, 13.0, 46.5]\n", "# Time extent\n", "time_range = (\"2010-02-10\", \"2010-02-15\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "15", "metadata": {}, "source": [ "And let's export the *tmp_data_export* folder:" ] }, { "cell_type": "code", "execution_count": null, "id": "16", "metadata": {}, "outputs": [], "source": [ "folder_name = \"tmp_data_export\"\n", "data_catalog.export_data(\n", " new_root=folder_name,\n", " bbox=bbox,\n", " time_range=time_range,\n", " source_names=source_list,\n", " metadata={\"version\": \"1\"},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "17", "metadata": {}, "source": [ "## Open and explore the exported data\n", "\n", "Now we have our new extracted data and HydroMT saved as well a new data catalog file that goes with it:" ] }, { "cell_type": "code", "execution_count": null, "id": "18", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "for path, _, files in os.walk(folder_name):\n", " print(path)\n", " for name in files:\n", " print(f\" - {name}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "19", "metadata": {}, "outputs": [], "source": [ "with open(os.path.join(folder_name, \"data_catalog.yml\"), \"r\") as f:\n", " print(f.read())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "20", "metadata": {}, "source": [ "Let's open the extracted data catalog:" ] }, { "cell_type": "code", "execution_count": null, "id": "21", "metadata": {}, "outputs": [], "source": [ "data_catalog_extract = hydromt.DataCatalog(\n", " data_libs=os.path.join(folder_name, \"data_catalog.yml\")\n", ")\n", "data_catalog_extract.sources.keys()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "22", "metadata": {}, "source": [ "And now let's open the extracted data again and do a nice plot." ] }, { "cell_type": "code", "execution_count": null, "id": "23", "metadata": {}, "outputs": [], "source": [ "# Get both the extracted and original merit_hydro_1k DEM\n", "dem = data_catalog.get_rasterdataset(\n", " \"merit_hydro\", variables=[\"elevtn\"], bbox=[11.6, 45.2, 13.0, 46.8]\n", ")\n", "dem_extract = data_catalog_extract.get_rasterdataset(\n", " \"merit_hydro\", variables=[\"elevtn\"]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "24", "metadata": {}, "outputs": [], "source": [ "import cartopy.crs as ccrs\n", "import geopandas as gpd\n", "import matplotlib.pyplot as plt\n", "from shapely.geometry import box\n", "\n", "proj = ccrs.PlateCarree() # plot projection\n", "\n", "\n", "# get bounding box of each data catalog using merit_hydro_1k\n", "bbox = gpd.GeoDataFrame(geometry=[box(*dem.raster.bounds)], crs=4326)\n", "bbox_extract = gpd.GeoDataFrame(geometry=[box(*dem_extract.raster.bounds)], crs=4326)\n", "\n", "# Initialise plot\n", "fig = plt.figure(figsize=(7, 5))\n", "ax = fig.add_subplot(projection=proj)\n", "\n", "# Plot the bounding box\n", "bbox.boundary.plot(ax=ax, color=\"k\", linewidth=0.8)\n", "bbox_extract.boundary.plot(ax=ax, color=\"red\", linewidth=0.8)\n", "\n", "# Plot elevation\n", "dem.raster.mask_nodata().plot(ax=ax, cmap=\"gray\")\n", "dem_extract.raster.mask_nodata().plot(ax=ax, cmap=\"terrain\")\n", "ax.set_title(\"exported and original DEMs\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" }, "vscode": { "interpreter": { "hash": "3808d5b5b54949c7a0a707a38b0a689040fa9c90ab139a050e41373880719ab1" } } }, "nbformat": 4, "nbformat_minor": 5 }