Modules#

ddlpy.cli module#

Console script for ddlpy.
  • ddlpy --help

  • ddlpy locations --help

  • ddlpy measurements --help

ddlpy module#

Top-level package for Data Distributie Laag. Service from Rijkswaterstaat for distributing water quantity data..

ddlpy.locations(catalog_filter: list = None) DataFrame[source]#

Get station information from DDL (metadata from Catalogue). It conains all metadata regarding stations. The catalog is locally cached for maximum 4 hours, corresponding to the update frequency of the Waterwebservices catalog. If you want to avoid using the cache, pass a valid catalog_filter or delete the caching file manually.

Parameters#

catalog_filterlist, optional

list of catalogs to pass on to OphalenCatalogus CatalogusFilter, if None the list form endpoints.json is retrieved. The cache cannot be used when passing anything other than None. The default is None.

Returns#

pd.DataFrame

DataFrame with a combination of available locations and measurements.

ddlpy.measurements(location: Series, start_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), end_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), freq: int = 1, clean_df: bool = True)[source]#

Returns measurements for the given location and requested period.

Parameters#

locationpd.Series

Single row of the ddlpy.locations() DataFrame.

start_datestr, pd.Timestamp

Start of the retrieval period.

end_datestr, pd.Timestamp

End of the retrieval period.

freqint, dateutil.rrule.MONTHLY, dateutil.rrule.YEARLY, etc., optional

The frequency in which to divide the requested period (e.g. yearly or monthly). Can also be None, in which case the entire dataset will be retrieved at once. Please note that 10-minute measurements can often not be downloaded in yearly (or larger) chunks since the DDL limits the responses to 157681 values and several stations have duplicated timesteps. In that case the query will fail with an error or timeout or just return an empty result (as if there was no data). In that case, the user should fallback to monthly chunks. This is significantly slower but it is also much more robust. The default is dateutil.rrule.MONTHLY.

clean_dfbool, optional

Whether to sort the dataframe and remove duplicate rows. The default is True.

Returns#

measurementspd.DataFrame

DataFrame with measurements.

ddlpy.measurements_latest(location: Series) DataFrame[source]#

Returns the latest available measurement for the given location.

Parameters#

locationpd.Series

Single row of the ddlpy.locations() DataFrame.

Returns#

dfpd.DataFrame

DataFrame with measurements.

ddlpy.measurements_available(location: Series, start_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), end_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>)) bool[source]#

Checks if there are measurements available for a location in the requested period.

Parameters#

locationpd.Series

Single row of the ddlpy.locations() DataFrame.

start_date(str,pd.Timestamp)

The start date of the requested period.

end_date(str,pd.Timestamp)

The end date of the requested period.

Returns#

bool

Whether there are measurements available or not.

ddlpy.measurements_amount(location: Series, start_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), end_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), period: str = 'Jaar') DataFrame[source]#

Retrieves the amount of measurements available for a location for the requested period.

Parameters#

locationpd.Series

Single row of the ddlpy.locations() DataFrame.

start_date(str,pd.Timestamp)

The start date of the requested period.

end_date(str,pd.Timestamp)

The end date of the requested period.

periodstr, optional

“Jaar”, “Maand” or “Dag”. The default is “Jaar”.

Returns#

df_amountpd.DataFrame

A DataFrame with the number of mesurements (AantalMetingen) per period (Groeperingsperiode).

ddlpy.simplify_dataframe(df: DataFrame, always_preserve=[])[source]#

Drop columns with constant values from the dataframe and collect them in a dictionary which is added as attrs of the dataframe. The column Meetwaarde.Waarde_Alfanumeriek is also dropped if it is a duplicate of Meetwaarde.Waarde_Numeriek. The column names passed in always_preserve are preserved even if they are constant.

ddlpy.dataframe_to_xarray(df: DataFrame, always_preserve=[])[source]#

Converts the measurement dataframe to a xarray dataset. The dataframe is first simplified with simplify_dataframe() to minimize the size of the netcdf dataset on disk.

The timestamps are converted to UTC since xarray does not support non-UTC timestamps. These can be converted to different timezones after loading the netcdf and converting to a pandas dataframe with df.index.tz_convert().

Furthermore, all “.Omschrijving” variables are dropped and the information is added as attributes to the Code variables.

When writing the dataset to disk with ds.to_netcdf() it is recommended to use format=”NETCDF3_CLASSIC” or format=”NETCDF4_CLASSIC” since this automatically converts variables of dtype <U to |S which saves a lot of disk space for DDL data.