How do I …#
Data In/Out#
Import IDF file#
With imod.idf.open()
:
da = imod.idf.open("bottom_l1.idf")
Import multiple IDF files#
With imod.idf.open()
:
da = imod.idf.open("bottom_l*.idf")
Import IPF file#
With imod.ipf.read()
:
df = imod.ipf.read("timeseries.ipf")
Import netCDF file as Dataset#
ds = xr.open_dataset("dataset.nc")
Import a single netCDF variable as DataArray#
da = xr.open_dataarray("variable.nc")
Convert structured data to UGRID netCDF#
With imod.util.to_ugrid2d()
:
ugrid_ds = imod.util.to_ugrid2d(da)
ugrid_ds.to_netcdf("ds_ugrid.nc")
Make data readable by QGIS#
This requires the data to be compliant with GDAL for structured data, or MDAL for unstructured data respectively.
For structured data:
da_gdal = imod.util.spatial.gdal_compliant_grid(da)
da_gdal.to_netcdf("path/to/file.nc")
You can open this data as raster data in QGIS.
For unstructured data:
uda_mdal = imod.util.spatial.mdal_compliant_ugrid2d(uda)
uda_mdal.ugrid.to_netcdf("path/to/file.nc")
You you can open this data as mesh data in QGIS.
Data modification#
If-then-else#
Remove all values over 5.0:
da = da.where(da < 5.0)
Let’s say we want to replace all values < 5
by a 1
, and all values >=
5
by a 2
. The easiest way to do this is by using imod.util.where()
:
condition = old < 5.0
new = imod.util.where(condition, if_true=1, if_false=2)
This can also be done with xarray directly, but is less convenient:
condition = old < 5.0
new = xr.full_like(old, 2.0)
new = new.where(condition, other=1)
Alternatively:
condition = old < 5
new = xr.where(condition, x=1.0, y=2.0)
Note
When condition
does not have the same dimension as x
or y
, you
may end up with an unexpected dimension order; da = da.where(...)
always preserves the dimension order of da
. Using
imod.util.where()
avoids this.
Note
Xarray uses NaN (Not-A-Number) for nodata / fill values. NaN values have
special properties in inequality operations: (np.nan < 5) is False
, but also
(np.nan >= 5) is False
as well.
For this reason, imod.util.where()
defaults to keeping NaNs by default:
imod.util.where(old < 5.0, old, 5.0)
will preserve the NaN values, while
xr.where(old < 5.0, old, 5.0)
will fill the NaN values with 5.0
Conditional evaluation#
Let’s say we want to select values between 5 and 10.
Avoid:
condition1 = da > 5
condition2 = da < 10
condition = condition1 and condition2
Do instead:
condition1 = da > 5
condition2 = da < 10
condition = condition1 & condition2
The reason is that and
does work on the individual values, but expects
something like a boolean (True
or False
). To do element-wise
conditional evaluation on the individual values, use:
and:
&
or:
|
not:
~
xor:
^
To check there are no NaNs anywhere, use “reductions” such as .all()
or
.any()
to reduce the array to a single boolean:
has_nodata = da.notnull().any()
Arithmetic#
da3 = da1 * da2 + 5.0
Make sure the grids have the same spatial coordinates.
Change cellsize (and extent)#
Nearest neighbor:
regridder = imod.prepare.Regridder(source, destination, method="nearest")
out = regridder.regrid(source)
Area weighted mean:
regridder = imod.prepare.Regridder(source, destination, method="mean")
out = regridder.regrid(source)
Change time resolution#
From e.g. hourly data to daily average:
new = da.resample(time="1D").mean()
Select along a single layer#
sel()
is “key” selection, this selects the layer named “1”:
da_layer1 = da.sel(layer=1)
isel()
is “index” selection, this selects the first layer:
da_firstlayer = da.isel(layer=0)
Select part of the data#
Generally, raster data is y-descending, so ymax
comes before ymin
:
da_selection = da.sel(x=slice(xmin, xmax), y=slice(ymax, ymin))
Get a single value (e.g. a summary statistic)#
When computing an aggregation such as a mean, xarray will return a single
valued DataArray. To get an ordinary Python scalar (for example a single
integer or float), use .item()
:
single_value = da.mean().item()
Increase the extent of a raster#
Use another raster with appropriate extent, and use align
:
small_aligned, big_aligned = xr.align(small, big, join="outer")
Make sure the cell size is the same, or the result will be non-equidistant.
Create an empty raster#
For just a two dimensional x-y grid:
da = imod.util.empty_2d(dx, xmin, xmax, dy, ymin, ymax)
For a three dimensional version:
da = imod.util.empty_3d(dx, xmin, xmax, dy, ymin, ymax, layer=[1, 2, 3])
For a time varying 2d grid:
da = imod.util.empty_2d_transient(
dx, xmin, xmax, dy, ymin, ymax, time=pd.date_range("2020-01-01", "2020-02-01")
)
For a time varying 3d grid:
da = imod.util.empty_3d_transient(
dx,
xmin,
xmax,
dy,
ymin,
ymax,
layer=[1, 2, 3],
time=pd.date_range("2020-01-01", "2020-02-01")
)
Fill/Interpolate nodata#
To do nearest neighbor interpolation:
new = imod.prepare.fill(da_with_gaps)
To do interpolation along a single dimension:
new = da_with_gaps.interpolate_na(dim="x")
See the xarray documentation on interpolation of NaN values.
To do interpolation in time, see Change time resolution.
To do Laplace interplation (using a linear equation, similar to a groundwater model with constant conductivity):
da = imod.prepare.laplace_interpolate(with_gaps)
Rasterize polygon data#
A geopandas GeoDataFrame can be rasterized by providing a sample DataArray for
like
in imod.prepare.rasterize()
:
rasterized = imod.prepare.rasterize(
geodataframe,
column="column_name_in_geodataframe",
like=like,
)
For large vector datasets, reading the files into a geodataframe can take
longer dan the rasterization step. To avoid this, it’s possible to skip loading
the data altogether with imod.prepare.gdal_rasterize()
rasterized = imod.prepare.gdal_rasterize(
path="path-to-vector-data.shp",
column="column_name_in_vector_data",
like=like,
)
Smooth data#
We can use a convolution to smooth:
kernel = np.ones((1, 10, 10))
kernel /= kernel.sum()
da.values = scipy.ndimage.convolve(da.values, kernel)
Zonal statistics#
To compute a mean:
mean = da.groupby(zones).mean("stacked_y_x")
To compute a sum:
sum = da.groupby(zones).sum("stacked_y_x")
Note
This is not the most efficient way of computing zonal statistics. If it takes a long time or consumes a lot of memory, see e.g. xarray-spatial’s zonal stats function.
Force loading into memory / dask array to numpy array#
da = da.compute()
Alternatively:
da = da.load()
Select a single variable from a dataset#
Select "kd"
from dataset ds
:
da_kd = ds["kd"]
Select points (from a vector dataset)#
geometry = geodataframe.geometry
x = geometry.x
y = geometry.y
selection = imod.select.points_values(da, x=x, y=y)
For time series analysis, converting into a pandas DataFrame may be useful:
df = selection.to_dataframe()
Sum properties over layers#
total_kd = da_kd.sum("layer")
Plot a timeseries for a single cell#
transient_da.sel(x=x, y=y).plot()
Plot head of one layer at one time#
transient_da.sel(layer=1, time="2020-01-01").plot()
Unstructured#
imod uses xugrid to represent data on unstructured grids. Where possible,
the functions of methods of xugrid match those xarray. However, working with
the unstructured topology requires slightly different commands: any operation
dealing the unstructured (x, y) dimensions requires the .ugrid
accessor.
Generate a mesh#
We’ve developed a separate package called pandamesh to help generated (x, y) unstructured meshes based on vector data in the form of geopandas GeoDataFrames.
import geopandas as gpd
import pandamesh as pm
import xugrid as xu
gdf = gpd.read_file("my-study-area.shp")
gdf["cellsize"] = 100.0
mesher = pm.TriangleMesher(gdf)
vertices, cells = mesher.generate()
grid = xugrid.Ugrid2d(vertices, -1, cells)
It can be installed with:
pip install pandamesh
Note
One of the dependencies of pandamesh, the Python bindings to triangle, does not have the (binary) wheels for Python 3.9 and higher yet.
Plot a timeseries for a single cell#
transient_uda.ugrid.sel(x=x, y=y).plot()
Plot head of one layer at one time#
transient_uda.sel(layer=1, time="2020-01-01").ugrid.plot()
Note
Since layer and time do not depend on the unstructured topology, they may
be indexed using the standard xarray methods, without the .ugrid
accessor.
Fill/Interpolate nodata#
To do Laplace interplation (using a linear equation, similar to a groundwater model with constant conductivity):
uda = with_gaps.ugrid.laplace_interpolate()
Select points (from a vector dataset)#
geometry = geodataframe.geometry
x = geometry.x
y = geometry.y
selection = uda.ugrid.select_points(x=x, y=y)
For time series analysis, converting into a pandas DataFrame may be useful:
df = selection.to_dataframe()