xugrid.open_mfdataset#
- xugrid.open_mfdataset(*args, **kwargs)[source]#
Open multiple files as a single dataset.
If combine=’by_coords’ then the function
combine_by_coords
is used to combine the datasets into one before returning the result, and if combine=’nested’ thencombine_nested
is used. The filepaths must be structured according to which combining function is used, the details of which are given in the documentation forcombine_by_coords
andcombine_nested
. By defaultcombine='by_coords'
will be used. Requires dask to be installed. See documentation for details on dask [1]. Global attributes from theattrs_file
are used for the combined dataset.- Parameters:
paths (str or nested sequence of paths) – Either a string glob in the form
"path/to/my/files/*.nc"
or an explicit list of files to open. Paths can be given as strings or as pathlib Paths. If concatenation along more than one dimension is desired, thenpaths
must be a nested list-of-lists (seecombine_nested
for details). (A string glob will be expanded to a 1-dimensional list.)chunks (int, dict, 'auto' or None, optional) – Dictionary with keys given by dimension names and values given by chunk sizes. In general, these should divide the dimensions of each dataset. If int, chunk each dimension by
chunks
. By default, chunks will be chosen to load entire input files into memory at once. This has a major impact on performance: please see the full documentation for more details [2]. This argument is evaluated on a per-file basis, so chunk sizes that span multiple files will be ignored.concat_dim (str, DataArray, Index or a Sequence of these or None, optional) – Dimensions to concatenate files along. You only need to provide this argument if
combine='nested'
, and if any of the dimensions along which you want to concatenate is not a dimension in the original datasets, e.g., if you want to stack a collection of 2D arrays along a third dimension. Setconcat_dim=[..., None, ...]
explicitly to disable concatenation along a particular dimension. Default is None, which for a 1D list of filepaths is equivalent to opening the files separately and then merging them withxarray.merge
.combine ({"by_coords", "nested"}, optional) – Whether
xarray.combine_by_coords
orxarray.combine_nested
is used to combine all the data. Default is to usexarray.combine_by_coords
.compat ({"identical", "equals", "broadcast_equals", "no_conflicts", "override"}, default: "no_conflicts") –
String indicating how to compare variables of the same name for potential conflicts when merging:
”broadcast_equals”: all values must be equal when variables are broadcast against each other to ensure common dimensions.
”equals”: all values and dimensions must be the same.
”identical”: all values, dimensions and attributes must be the same.
”no_conflicts”: only values which are not null in both datasets must be equal. The returned dataset then contains the combination of all non-null values.
”override”: skip comparing and pick variable from first dataset
preprocess (callable, optional) – If provided, call this function on each dataset prior to concatenation. You can find the file-name from which each dataset was loaded in
ds.encoding["source"]
.engine ({"netcdf4", "scipy", "pydap", "h5netcdf", "zarr", None} , installed backend or subclass of xarray.backends.BackendEntrypoint, optional) – Engine to use when reading files. If not provided, the default engine is chosen based on available dependencies, with a preference for “netcdf4”.
data_vars ({"minimal", "different", "all"} or list of str, default: "all") –
- These data variables will be concatenated together:
”minimal”: Only data variables in which the dimension already appears are included.
”different”: Data variables which are not equal (ignoring attributes) across all datasets are also concatenated (as well as all for which dimension already appears). Beware: this option may load the data payload of data variables into memory if they are not already loaded.
”all”: All data variables will be concatenated.
list of str: The listed data variables will be concatenated, in addition to the “minimal” data variables.
coords ({"minimal", "different", "all"} or list of str, optional) –
- These coordinate variables will be concatenated together:
”minimal”: Only coordinates in which the dimension already appears are included.
”different”: Coordinates which are not equal (ignoring attributes) across all datasets are also concatenated (as well as all for which dimension already appears). Beware: this option may load the data payload of coordinate variables into memory if they are not already loaded.
”all”: All coordinate variables will be concatenated, except those corresponding to other dimensions.
list of str: The listed coordinate variables will be concatenated, in addition the “minimal” coordinates.
parallel (bool, default: False) – If True, the open and preprocess steps of this function will be performed in parallel using
dask.delayed
. Default is False.join ({"outer", "inner", "left", "right", "exact", "override"}, default: "outer") –
String indicating how to combine differing indexes (excluding concat_dim) in objects
”outer”: use the union of object indexes
”inner”: use the intersection of object indexes
”left”: use indexes from the first object with each dimension
”right”: use indexes from the last object with each dimension
”exact”: instead of aligning, raise ValueError when indexes to be aligned are not equal
”override”: if indexes are of same size, rewrite indexes to be those of the first object with that dimension. Indexes for the same dimension must have the same size in all objects.
attrs_file (str or path-like, optional) – Path of the file used to read global attributes from. By default global attributes are read from the first file provided, with wildcard matches sorted by filename.
combine_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or callable, default: "override") –
A callable or a string indicating how to combine attrs of the objects being merged:
”drop”: empty attrs on returned Dataset.
”identical”: all attrs must be the same on every object.
”no_conflicts”: attrs from all objects are combined, any that have the same name must also have the same value.
”drop_conflicts”: attrs from all objects are combined, any that have the same name but different values are dropped.
”override”: skip comparing and copy attrs from the first dataset to the result.
If a callable, it must expect a sequence of
attrs
dicts and a context object as its only parameters.**kwargs (optional) – Additional arguments passed on to
xarray.open_dataset()
. For an overview of some of the possible options, see the documentation ofxarray.open_dataset()
- Return type:
xarray.Dataset
Notes
open_mfdataset
opens files with read-only access. When you modify values of a Dataset, even one linked to files on disk, only the in-memory copy you are manipulating in xarray is modified: the original file on disk is never touched.See also
combine_by_coords
,combine_nested
,open_dataset
Examples
A user might want to pass additional arguments into
preprocess
when applying some operation to many individual files that are being opened. One route to do this is through the use offunctools.partial
.>>> from functools import partial >>> def _preprocess(x, lon_bnds, lat_bnds): ... return x.sel(lon=slice(*lon_bnds), lat=slice(*lat_bnds)) ... >>> lon_bnds, lat_bnds = (-110, -105), (40, 45) >>> partial_func = partial(_preprocess, lon_bnds=lon_bnds, lat_bnds=lat_bnds) >>> ds = xr.open_mfdataset( ... "file_*.nc", concat_dim="time", preprocess=partial_func ... )
It is also possible to use any argument to
open_dataset
together withopen_mfdataset
, such as for exampledrop_variables
:>>> ds = xr.open_mfdataset( ... "file.nc", drop_variables=["varname_1", "varname_2"] # any list of vars ... )
References