{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Example: Reading geospatial point time-series"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "This example illustrates the how to read geospatial point time-series data here called `GeoDataset`, using the HydroMT [DataCatalog](../_generated/hydromt.data_catalog.DataCatalog.rst) with the `vector` and `netcdf` or `zarr` drivers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pprint import pprint\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "import hydromt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download artifacts for the Piave basin\n",
    "data_catalog = hydromt.DataCatalog(data_libs=[\"artifact_data=v1.0.0\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "## Xarray Driver"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "To read geospatial point time-series data and parse it into a [xarray Dataset or DataArray](https://docs.xarray.dev/en/stable/user-guide/data-structures.html) we use the [open_mfdataset()](https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html#xarray.open_mfdataset) or the [open_zarr()](https://docs.xarray.dev/en/stable/generated/xarray.open_zarr.html#xarray.open_zarr) method. All `options` in the data catalog yaml file will be passed to to these methods. \n",
    "\n",
    "As an example we will use the [GTSM data](hhttps://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24381/cds.8c59054f?tab=overview) dataset which is stored in Netcdf format. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# inspect data source entry in data catalog yaml file\n",
    "data_catalog.get_source(\"gtsmv3_eu_era5\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "We can load any geospatial point time-series data using [DataCatalog.get_geodataset()](../_generated/hydromt.data_catalog.DataCatalog.get_geodataset.rst). Note that if we don't provide any arguments it returns the full dataset with all data variables and for the full spatial domain.  The result is per default returned as DataArray as the dataset consists of a single variable. To return a dataset  use the `single_var_as_array=False` argument. Only the data coordinates and the time are actually read, the data variables are still lazy [Dask arrays](https://docs.dask.org/en/stable/array.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = data_catalog.get_geodataset(\"gtsmv3_eu_era5\", single_var_as_array=False)\n",
    "ds"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "The data can be visualized with the [DataArray.plot()](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.plot.html) xarray method. We show the evolution of the water level over time for a specific point location (station). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds.sel(stations=2791)[\"waterlevel\"].plot()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "We can request a (spatial) subset data by providing additional `variables` and `bbox` / `geom` arguments. Note that these return less stations and less variables. In this example only spatial arguments are applied as only a single variable is available. The variables argument is especially useful if each variable of the dataset is saved in a separate file and the `{variable}` key is used in the path argument of the data source to limit which files are actually read. If a single variable is requested a DataArray instead of a Dataset is returned unless the `single_var_as_array` argument is set to False (True by default)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "bbox = [12.50, 45.20, 12.80, 45.40]\n",
    "ds_bbox = data_catalog.get_geodataset(\"gtsmv3_eu_era5\", bbox=bbox)\n",
    "ds_bbox"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "With a Geodataset, you can also directly access the associated point geometries using its [to_gdf()](../_generated/hydromt.gis.DataArray.vector.to_gdf.rst) method. Multi-dimensional data (e.g. the time series) can be reduced using a statistical method such as the max."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "pprint(ds_bbox.vector.to_gdf(reducer=np.nanmax))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot\n",
    "import cartopy.crs as ccrs\n",
    "import cartopy.io.img_tiles as cimgt\n",
    "import geopandas as gpd\n",
    "import matplotlib.pyplot as plt\n",
    "from shapely.geometry import box\n",
    "\n",
    "proj = ccrs.PlateCarree()\n",
    "\n",
    "fig = plt.figure(figsize=(6, 10))\n",
    "ax = plt.subplot(projection=proj)\n",
    "\n",
    "bbox = gpd.GeoDataFrame(geometry=[box(12.50, 45.20, 12.80, 45.40)], crs=4326)\n",
    "\n",
    "ax.add_image(cimgt.QuadtreeTiles(), 12)\n",
    "# Plot the points\n",
    "ds.vector.to_gdf().plot(ax=ax, markersize=30, c=\"blue\", zorder=2)\n",
    "ds_bbox.vector.to_gdf().plot(ax=ax, markersize=40, c=\"red\", zorder=2)\n",
    "# Plot the bounding box\n",
    "bbox.boundary.plot(ax=ax, color=\"red\", linewidth=0.8)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "16",
   "metadata": {},
   "source": [
    "## Vector driver"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "To read vector data and parse it into a [xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html#xarray.Dataset) object we use the [open_geodataset()](../_generated/hydromt.data_catalog.DataCatalog.get_geodataset.rst) method. Combined point locations (e.g. CSV or GeoJSON) data as well as text delimited time series (e.g. CSV) data are supported as file formats (see [DataCatalog documentation](../guides/advanced_user/data_types.rst#csv-point-time-series-data)). Both formats must contain an index and a crs has to be indicated in the .yml file. For demonstration we use dummy example data from the *examples/data* folder. \n",
    "\n",
    "First load the data catalog of the corresponding example data *geodataset_catalog.yml*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "geodata_catalog = hydromt.DataCatalog(\"data/geodataset_catalog.yml\")\n",
    "geodata_catalog.get_source(\"waterlevels_txt\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "We see here that our locations are defined in a *data/stations.csv* file and a *data/stations_data.csv*. Let's check the content of these files before loading them with the `get_geodataset` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "print(\"Stations locations:\")\n",
    "df_stations = pd.read_csv(\"data/stations.csv\")\n",
    "pprint(df_stations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Stations data:\")\n",
    "df_stations_data = pd.read_csv(\"data/stations_data.csv\")\n",
    "pprint(df_stations_data.head(3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = geodata_catalog.get_geodataset(\"waterlevels_txt\", single_var_as_array=False)\n",
    "ds"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "default",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.21"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}