.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "user-guide\01-raster-data.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_user-guide_01-raster-data.py: Raster data and xarray ====================== Geospatial data primarily comes in two forms: raster data and vector data. This guide focuses on the first. Raster data consists of rows and columns of rectangular cells. Their location in space is defined by the number of rows, the number of columns, a cell size along the rows, a cell size along the columns, the origin (x, y), and optionally rotation (x, y) -- an `affine`_ matrix. Typical examples of file formats containing raster data are: * GeoTIFF * ESRII ASCII * netCDF * IDF In groundwater modeling, data commonly stored in raster format are: * Layer topology: the tops and bottoms of geohydrological layers * Layer properties: conductivity of aquifers and aquitards * Model output: heads or cell by cell flows These data consist of values for every single cell. Xarray provides many conveniences for such data, such as plotting or selecting. To demonstrate, we'll get some sample data provided with the imod package. .. GENERATED FROM PYTHON SOURCE LINES 30-33 .. code-block:: Python import xarray as xr .. GENERATED FROM PYTHON SOURCE LINES 34-39 .. code-block:: Python import imod elevation = imod.data.ahn()["ahn"] elevation .. raw:: html
<xarray.DataArray 'ahn' (y: 218, x: 248)> Size: 216kB
    [54064 values with dtype=float32]
    Coordinates:
      * x        (x) float64 2kB 9.095e+04 9.105e+04 ... 1.156e+05 1.156e+05
      * y        (y) float64 2kB 4.676e+05 4.674e+05 ... 4.46e+05 4.458e+05
        dx       float64 8B ...
        dy       float64 8B ...


.. GENERATED FROM PYTHON SOURCE LINES 40-48 Two dimensions: x, y -------------------- This dataset represents some surface elevation in the west of the Netherlands, in the form of an xarray DataArray. Xarray provides a "rich" representation of this data, note the x and y coordinates are shown above. We can use these coordinates for plotting, selecting, etc. .. GENERATED FROM PYTHON SOURCE LINES 48-51 .. code-block:: Python elevation.plot() .. image-sg:: /user-guide/images/sphx_glr_01-raster-data_001.png :alt: dx = 100.0, dy = -100.0 :srcset: /user-guide/images/sphx_glr_01-raster-data_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 52-56 This creates an informative plot. We can also easily make a selection of a 10 by 10 km square, and plot the result: .. GENERATED FROM PYTHON SOURCE LINES 56-60 .. code-block:: Python selection = elevation.sel(x=slice(100_000.0, 110_000.0), y=slice(460_000.0, 450_000.0)) selection.plot() .. image-sg:: /user-guide/images/sphx_glr_01-raster-data_002.png :alt: dx = 100.0, dy = -100.0 :srcset: /user-guide/images/sphx_glr_01-raster-data_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 61-69 More dimensions --------------- Raster data can also be "stacked" to represent additional dimensions, such as height or time. Xarray is fully N-dimensional, and can directly represent these data. Let's start with a three dimensional example: a geohydrological layer model. .. GENERATED FROM PYTHON SOURCE LINES 69-73 .. code-block:: Python layermodel = imod.data.hondsrug_layermodel() layermodel .. raw:: html
<xarray.Dataset> Size: 21MB
    Dimensions:  (x: 500, y: 200, layer: 13)
    Coordinates:
      * x        (x) float64 4kB 2.375e+05 2.375e+05 2.376e+05 ... 2.5e+05 2.5e+05
      * y        (y) float64 2kB 5.64e+05 5.64e+05 5.639e+05 ... 5.59e+05 5.59e+05
        dx       float64 8B ...
        dy       float64 8B ...
      * layer    (layer) int32 52B 1 2 3 4 5 6 7 8 9 10 11 12 13
    Data variables:
        top      (layer, y, x) float32 5MB ...
        bottom   (layer, y, x) float32 5MB ...
        k        (layer, y, x) float32 5MB ...
        idomain  (layer, y, x) float32 5MB ...


.. GENERATED FROM PYTHON SOURCE LINES 74-76 This dataset contains multiple variables. We can take a closer look at the the "top" variable, which represents the top of every layer. .. GENERATED FROM PYTHON SOURCE LINES 76-80 .. code-block:: Python top = layermodel["top"] top .. raw:: html
<xarray.DataArray 'top' (layer: 13, y: 200, x: 500)> Size: 5MB
    [1300000 values with dtype=float32]
    Coordinates:
      * x        (x) float64 4kB 2.375e+05 2.375e+05 2.376e+05 ... 2.5e+05 2.5e+05
      * y        (y) float64 2kB 5.64e+05 5.64e+05 5.639e+05 ... 5.59e+05 5.59e+05
        dx       float64 8B ...
        dy       float64 8B ...
      * layer    (layer) int32 52B 1 2 3 4 5 6 7 8 9 10 11 12 13


.. GENERATED FROM PYTHON SOURCE LINES 81-84 This DataArray has three dimensions: layer, y, x. We can't make a planview plot of this entire dataset: every (x, y) locations has as many values as layers. A single layer can be selected and shown as follows: .. GENERATED FROM PYTHON SOURCE LINES 84-87 .. code-block:: Python top.sel(layer=1).plot() .. image-sg:: /user-guide/images/sphx_glr_01-raster-data_003.png :alt: dx = 25.0, dy = -25.0, layer = 1 :srcset: /user-guide/images/sphx_glr_01-raster-data_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 88-90 Xarray doesn't favor specific dimensions. We can select a value along the y-axis just as easily, to create a cross-section. .. GENERATED FROM PYTHON SOURCE LINES 90-94 .. code-block:: Python section = top.sel(y=560_000.0, method="nearest") section.plot.line(x="x") .. image-sg:: /user-guide/images/sphx_glr_01-raster-data_004.png :alt: y = 5.6e+05, dx = 25.0, dy = -25.0 :srcset: /user-guide/images/sphx_glr_01-raster-data_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none [, , , , , , , , , , , , ] .. GENERATED FROM PYTHON SOURCE LINES 95-100 Xarray provides us a lot of convenience compared to working with traditional two dimensional rasters: rather than continuously loop over the data of single timesteps or layers, we can process them in a single command. For example, to compute the thickness of every layer: .. GENERATED FROM PYTHON SOURCE LINES 100-104 .. code-block:: Python thickness = layermodel["top"] - layermodel["bottom"] thickness .. raw:: html
<xarray.DataArray (layer: 13, y: 200, x: 500)> Size: 5MB
    array([[[ 0.        ,  0.        ,  0.        , ...,  3.8964    ,
              3.8964    ,  3.8964    ],
            [ 0.        ,  0.        ,  0.        , ...,  3.8964    ,
              3.8964    ,  3.8964    ],
            [ 0.        ,  0.        ,  0.        , ...,  3.8964    ,
              3.8964    ,  3.8964    ],
            ...,
            [ 1.302     ,  1.302     ,  1.302     , ...,  0.        ,
              0.        ,  0.        ],
            [ 1.302     ,  1.302     ,  1.302     , ...,  0.        ,
              0.        ,  0.        ],
            [ 1.302     ,  1.302     ,  1.302     , ...,  0.        ,
              0.        ,  0.        ]],

           [[ 0.        ,  0.        ,  0.        , ...,  0.        ,
              0.        ,  0.        ],
            [ 0.        ,  0.        ,  0.        , ...,  0.        ,
              0.        ,  0.        ],
            [ 0.        ,  0.        ,  0.        , ...,  0.        ,
              0.        ,  0.        ],
    ...
            [ 0.        ,  0.        ,  0.        , ...,  0.        ,
              0.        ,  0.        ],
            [ 0.        ,  0.        ,  0.        , ...,  0.        ,
              0.        ,  0.        ],
            [ 0.        ,  0.        ,  0.        , ...,  0.        ,
              0.        ,  0.        ]],

           [[63.152397  , 63.152397  , 63.152397  , ..., 94.64      ,
             94.64      , 94.64      ],
            [63.152397  , 63.152397  , 63.152397  , ..., 94.64      ,
             94.64      , 94.64      ],
            [63.152397  , 63.152397  , 63.152397  , ..., 94.64      ,
             94.64      , 94.64      ],
            ...,
            [74.6828    , 74.6828    , 74.6828    , ..., 78.1022    ,
             78.1022    , 78.1022    ],
            [74.6828    , 74.6828    , 74.6828    , ..., 78.1022    ,
             78.1022    , 78.1022    ],
            [74.6828    , 74.6828    , 74.6828    , ..., 78.1022    ,
             78.1022    , 78.1022    ]]], dtype=float32)
    Coordinates:
      * x        (x) float64 4kB 2.375e+05 2.375e+05 2.376e+05 ... 2.5e+05 2.5e+05
      * y        (y) float64 2kB 5.64e+05 5.64e+05 5.639e+05 ... 5.59e+05 5.59e+05
        dx       float64 8B 25.0
        dy       float64 8B -25.0
      * layer    (layer) int32 52B 1 2 3 4 5 6 7 8 9 10 11 12 13


.. GENERATED FROM PYTHON SOURCE LINES 105-107 This is easily multiplied, then summed over the layer dimensions to provide us a map of the total transmissivity: .. GENERATED FROM PYTHON SOURCE LINES 107-112 .. code-block:: Python transmissivity = layermodel["k"] * thickness total = transmissivity.sum("layer") total.plot() .. image-sg:: /user-guide/images/sphx_glr_01-raster-data_005.png :alt: dx = 25.0, dy = -25.0 :srcset: /user-guide/images/sphx_glr_01-raster-data_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 113-124 Input and output ---------------- The imod package started as a collection of functions to read IDF files into xarray DataArrays. By convention, IDF files store the coordinates of the extra dimensions (layer, time) in the file name. :py:func:`imod.idf.save` will automatically generate these names from a DataArray. Let's demonstrate by writing the transmissivity computed above to IDF. (We'll do this in a temporary directory to keep things tidy.) .. GENERATED FROM PYTHON SOURCE LINES 124-128 .. code-block:: Python tempdir = imod.util.temporary_directory() imod.idf.save(tempdir / "transmissivity", transmissivity) .. GENERATED FROM PYTHON SOURCE LINES 129-136 .. note:: ``tempdir`` is a Python ``pathlib.Path`` object. These objects are very convenient for working with paths; we can easily check if paths exists, join paths with ``/``, etc. Let's check which files have been written in the temporary directory: .. GENERATED FROM PYTHON SOURCE LINES 136-140 .. code-block:: Python filenames = [path.name for path in tempdir.iterdir()] print("\n".join(filenames)) .. rst-class:: sphx-glr-script-out .. code-block:: none transmissivity_l1.idf transmissivity_l10.idf transmissivity_l11.idf transmissivity_l12.idf transmissivity_l13.idf transmissivity_l2.idf transmissivity_l3.idf transmissivity_l4.idf transmissivity_l5.idf transmissivity_l6.idf transmissivity_l7.idf transmissivity_l8.idf transmissivity_l9.idf .. GENERATED FROM PYTHON SOURCE LINES 141-147 Just as easily, we can read all IDFs back into a single DataArray. We can do so by using a wildcard. These wildcards function according to the rules of `Glob`_ via the `python glob module`_. Note that every IDF has to have identical x-y coordinates: files with different cell sizes or extents will not be combined automatically. .. GENERATED FROM PYTHON SOURCE LINES 147-151 .. code-block:: Python back = imod.idf.open(tempdir / "*.idf") back .. raw:: html
<xarray.DataArray 'transmissivity' (layer: 13, y: 200, x: 500)> Size: 5MB
    dask.array<stack, shape=(13, 200, 500), dtype=float32, chunksize=(1, 200, 500), chunktype=numpy.ndarray>
    Coordinates:
      * x        (x) float64 4kB 2.375e+05 2.375e+05 2.376e+05 ... 2.5e+05 2.5e+05
      * y        (y) float64 2kB 5.64e+05 5.64e+05 5.639e+05 ... 5.59e+05 5.59e+05
        dx       float64 8B 25.0
        dy       float64 8B -25.0
      * layer    (layer) int64 104B 1 2 3 4 5 6 7 8 9 10 11 12 13


.. GENERATED FROM PYTHON SOURCE LINES 152-153 These glob patterns are quite versatile, and may be used to filter as well. .. GENERATED FROM PYTHON SOURCE LINES 153-157 .. code-block:: Python selection = imod.idf.open(tempdir / "transmissivity_l[1-5].idf") selection .. raw:: html
<xarray.DataArray 'transmissivity' (layer: 5, y: 200, x: 500)> Size: 2MB
    dask.array<stack, shape=(5, 200, 500), dtype=float32, chunksize=(1, 200, 500), chunktype=numpy.ndarray>
    Coordinates:
      * x        (x) float64 4kB 2.375e+05 2.375e+05 2.376e+05 ... 2.5e+05 2.5e+05
      * y        (y) float64 2kB 5.64e+05 5.64e+05 5.639e+05 ... 5.59e+05 5.59e+05
        dx       float64 8B 25.0
        dy       float64 8B -25.0
      * layer    (layer) int64 40B 1 2 3 4 5


.. GENERATED FROM PYTHON SOURCE LINES 158-161 Rather commonly, the paths of the IDFs are not named according to consistent rules. In such cases, we can manually specify how the name should be interpreted via the ``pattern`` argument. .. GENERATED FROM PYTHON SOURCE LINES 161-165 .. code-block:: Python back = imod.idf.open(tempdir / "*.idf", pattern="{name}_l{layer}") back .. raw:: html
<xarray.DataArray 'transmissivity' (layer: 13, y: 200, x: 500)> Size: 5MB
    dask.array<stack, shape=(13, 200, 500), dtype=float32, chunksize=(1, 200, 500), chunktype=numpy.ndarray>
    Coordinates:
      * x        (x) float64 4kB 2.375e+05 2.375e+05 2.376e+05 ... 2.5e+05 2.5e+05
      * y        (y) float64 2kB 5.64e+05 5.64e+05 5.639e+05 ... 5.59e+05 5.59e+05
        dx       float64 8B 25.0
        dy       float64 8B -25.0
      * layer    (layer) int64 104B 1 2 3 4 5 6 7 8 9 10 11 12 13


.. GENERATED FROM PYTHON SOURCE LINES 166-179 See the documenation of :py:func:`imod.idf.open` for more details. Other raster formats -------------------- IDF is one raster format, but there are many more. :py:func:`imod.rasterio.open` wraps the `rasterio`_ Python package (which in turn wraps `GDAL`_) to provide access to many GIS raster formats. :py:func:`imod.rasterio.open` and :py:func:`imod.rasterio.save` work exactly the same as the respective IDF functions, except they write to a different format. For example, to write the transsmisivity to GeoTIFF: .. GENERATED FROM PYTHON SOURCE LINES 179-184 .. code-block:: Python imod.rasterio.save(tempdir / "transmissivity.tif", transmissivity) filenames = [path.name for path in tempdir.iterdir()] print("\n".join(filenames)) .. rst-class:: sphx-glr-script-out .. code-block:: none transmissivity_l1.idf transmissivity_l1.tif transmissivity_l10.idf transmissivity_l10.tif transmissivity_l11.idf transmissivity_l11.tif transmissivity_l12.idf transmissivity_l12.tif transmissivity_l13.idf transmissivity_l13.tif transmissivity_l2.idf transmissivity_l2.tif transmissivity_l3.idf transmissivity_l3.tif transmissivity_l4.idf transmissivity_l4.tif transmissivity_l5.idf transmissivity_l5.tif transmissivity_l6.idf transmissivity_l6.tif transmissivity_l7.idf transmissivity_l7.tif transmissivity_l8.idf transmissivity_l8.tif transmissivity_l9.idf transmissivity_l9.tif .. GENERATED FROM PYTHON SOURCE LINES 185-205 Note :py:func:`imod.rasterio.save` will split the extension off the path, infer the `GDAL driver`_, attach the additional coordinates to the file name, and re-attach the extension. netCDF ------ The final format to discuss here is `netCDF`_. Two dimensional raster files are convenient for viewing, as every file corresponds with a single "map view" in a GIS viewer. However, they are not convenient for storing many timesteps or many layers: especially long running simulations will generate hundreds, thousands, or even millions of files. netCDF is a file format specifically designed for multi-dimensional data, and allows us to conveniently bundle all data in a single file. Xarray objects can directly be written to netCDF, as the data model of xarray itself is based on the netCDF data model. With netCDF, there is no need to encode the different dimension labels in the the name: they are stored directly in the file instead. .. GENERATED FROM PYTHON SOURCE LINES 205-210 .. code-block:: Python layermodel.to_netcdf(tempdir / "layermodel.nc") back = xr.open_dataset(tempdir / "layermodel.nc") back .. raw:: html
<xarray.Dataset> Size: 21MB
    Dimensions:  (x: 500, y: 200, layer: 13)
    Coordinates:
      * x        (x) float64 4kB 2.375e+05 2.375e+05 2.376e+05 ... 2.5e+05 2.5e+05
      * y        (y) float64 2kB 5.64e+05 5.64e+05 5.639e+05 ... 5.59e+05 5.59e+05
        dx       float64 8B ...
        dy       float64 8B ...
      * layer    (layer) int32 52B 1 2 3 4 5 6 7 8 9 10 11 12 13
    Data variables:
        top      (layer, y, x) float32 5MB ...
        bottom   (layer, y, x) float32 5MB ...
        k        (layer, y, x) float32 5MB ...
        idomain  (layer, y, x) float32 5MB ...


.. GENERATED FROM PYTHON SOURCE LINES 211-225 Coordinate reference systems (CRS) ---------------------------------- Reprojection from one CRS to another is a common frustration. Since the data in an xarray DataArray is always accompanied by its x and y coordinates, we can easily reproject the data. See the examples. .. _affine: https://www.perrygeo.com/python-affine-transforms.html .. _Glob: https://en.wikipedia.org/wiki/Glob_(programming) .. _python glob module: https://docs.python.org/3/library/glob.html .. _rasterio: https://rasterio.readthedocs.io/en/latest/ .. _GDAL: https://gdal.org/ .. _GDAL driver: https://gdal.org/drivers/raster/index.html .. _netCDF: https://www.unidata.ucar.edu/software/netcdf/ .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 8.600 seconds) .. _sphx_glr_download_user-guide_01-raster-data.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 01-raster-data.ipynb <01-raster-data.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 01-raster-data.py <01-raster-data.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 01-raster-data.zip <01-raster-data.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_