Modules#
ddlpy.cli module#
- Console script for ddlpy.
ddlpy --help
ddlpy locations --help
ddlpy measurements --help
ddlpy module#
Top-level package for Data Distributie Laag. Service from Rijkswaterstaat for distributing water quantity data..
- ddlpy.locations(catalog_filter: list = None) DataFrame [source]#
Get station information from DDL (metadata from Catalogue). All metadata regarding stations.
Parameters#
- catalog_filterlist, optional
list of catalogs to pass on to OphalenCatalogus CatalogusFilter, if None the list form endpoints.json is retrieved. The default is None.
Returns#
- pd.DataFrame
DataFrame with a combination of available locations and measurements.
- ddlpy.measurements(location: ~pandas.core.series.Series, start_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), end_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), freq: int = 1, clean_df: bool = True)[source]#
Returns measurements for the given location and requested period.
Parameters#
- locationpd.Series
Single row of the ddlpy.locations() DataFrame.
- start_datestr, pd.Timestamp
Start of the retrieval period.
- end_datestr, pd.Timestamp
End of the retrieval period.
- freqint, dateutil.rrule.MONTHLY, dateutil.rrule.YEARLY, etc., optional
The frequency in which to divide the requested period (e.g. yearly or monthly). Can also be None, in which case the entire dataset will be retrieved at once. Please note that 10-minute measurements can often not be downloaded in yearly (or larger) chunks since the DDL limits the responses to 157681 values and several stations have duplicated timesteps. In that case the query will fail with an error or timeout or just return an empty result (as if there was no data). In that case, the user should fallback to monthly chunks. This is significantly slower but it is also much more robust. The default is dateutil.rrule.MONTHLY.
- clean_dfbool, optional
Whether to sort the dataframe and remove duplicate rows. The default is True.
Returns#
- measurementspd.DataFrame
DataFrame with measurements.
- ddlpy.measurements_latest(location: Series) DataFrame [source]#
Returns the latest available measurement for the given location.
Parameters#
- locationpd.Series
Single row of the ddlpy.locations() DataFrame.
Returns#
- dfpd.DataFrame
DataFrame with measurements.
- ddlpy.measurements_available(location: ~pandas.core.series.Series, start_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), end_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>)) bool [source]#
Checks if there are measurements available for a location in the requested period.
Parameters#
- locationpd.Series
Single row of the ddlpy.locations() DataFrame.
- start_date(str,pd.Timestamp)
The start date of the requested period.
- end_date(str,pd.Timestamp)
The end date of the requested period.
Returns#
- bool
Whether there are measurements available or not.
- ddlpy.measurements_amount(location: ~pandas.core.series.Series, start_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), end_date: (<class 'str'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>), period: str = 'Jaar') DataFrame [source]#
Retrieves the amount of measurements available for a location for the requested period.
Parameters#
- locationpd.Series
Single row of the ddlpy.locations() DataFrame.
- start_date(str,pd.Timestamp)
The start date of the requested period.
- end_date(str,pd.Timestamp)
The end date of the requested period.
- periodstr, optional
“Jaar”, “Maand” or “Dag”. The default is “Jaar”.
Returns#
- df_amountpd.DataFrame
A DataFrame with the number of mesurements (AantalMetingen) per period (Groeperingsperiode).
- ddlpy.simplify_dataframe(df: DataFrame)[source]#
Drop columns with constant values from the dataframe and collect them in a dictionary which is added as attrs of the dataframe.
- ddlpy.dataframe_to_xarray(df: DataFrame, drop_if_constant=[])[source]#
Converts the measurement dataframe to a xarray dataset, including several cleanups to minimize the size of the netcdf dataset on disk:
The column ‘Parameter_Wat_Omschrijving’ is dropped (combination of information in other columns)
The column ‘Meetwaarde.Waarde_Alfanumeriek’ is dropped if ‘Meetwaarde.Waarde_Numeriek’ is present (contains duplicate values in that case)
All Omschrijving columns are dropped and added as attributes to the Code variables
All NVT-only Code columns are dropped and added as ds attributes
All location columns are dropped and added as ds attributes
All drop_if_constant columns are dropped and added as ds attributes (if the values are indeed constant)
The timestamps are converted to UTC since xarray does not support non-UTC timestamps. These can be converted to different timezones after loading the netcdf and converting to a pandas dataframe with df.index.tz_convert().
When writing the dataset to disk with ds.to_netcdf() it is recommended to use format=”NETCDF3_CLASSIC” or format=”NETCDF4_CLASSIC” since this automatically converts variables of dtype <U to |S which saves a lot of disk space for DDL data.