Download this notebook: 04_raven_elbow_discharge_ensemble.ipynb
04 - Raven Discharge Ensemble Forecasts#
Learning goals of this module#
Learn how to do a basic analysis, comparing observed discharge with forecast discharge, produced by Raven and forced by GEPS, GEFS and IFS.
Learn about ensemble forecast verification.
Learn how to calculate and visualize simple error metrics, such as the rank_histogram and the continuous_ranked_probability_score.
Assumptions#
We assume you are familiar with the concept of a probabilistic forecast.
We assume you are familiar with the concept of a hydrological model.
Reference to ensemble forecast products#
Reference to Raven#
Run imports and set-up logging#
[1]:
import logging
import sys
import warnings
from pathlib import Path
from dotenv import load_dotenv
from veriflow import run_pipeline
from veriflow.constants import VERSION
# add project root (parent of notebook folder) to path
sys.path.append(str(Path("..").resolve()))
from verification_plots import (
crps_plot,
forecast_timeseries_plot,
rank_histogram_3d_plot,
rank_histogram_plot,
scatter_plot,
)
# Reload automatically
%load_ext autoreload
%autoreload 2
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
load_dotenv(dotenv_path="tutorial.env", override=True)
base_config = Path("config")
base_config.exists()
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler()],
)
logging.info(f"Running Veriflow version {VERSION}")
2026-06-18 22:49:17,077 - INFO - Running Veriflow version 0.1.0
Inspecting the veriflow pipeline configuration#
Open the config file in the “config” directory. The name of the file is identical to the name of the notebook.
Inspect each of the sections to gain an understanding of what this configuration is about.
Running the veriflow pipeline#
[2]:
ods = run_pipeline((base_config / "04_raven_elbow_discharge_ensemble.yaml", "yaml"))
2026-06-18 22:49:17,331 - INFO - Successfully initialized the configuration.
verification_period_start = 2025-05-20 00:00:00
verification_period_end = 2026-06-01 00:00:00
2026-06-18 22:49:17,340 - INFO - Start getting data from FewsWebservice.
2026-06-18 22:49:18,007 - INFO - Successfully got observed data from FewsWebservice.
2026-06-18 22:49:18,008 - INFO - Start getting data from FewsWebservice.
2026-06-18 22:49:18,693 - INFO - Successfully got simulated_GEPS data from FewsWebservice.
2026-06-18 22:49:18,694 - INFO - Start getting data from FewsWebservice.
2026-06-18 22:49:19,598 - INFO - Successfully got simulated_GEFS data from FewsWebservice.
2026-06-18 22:49:19,600 - INFO - Start getting data from FewsWebservice.
2026-06-18 22:49:20,632 - INFO - Successfully got simulated_IFS data from FewsWebservice.
2026-06-18 22:49:20,637 - INFO - Successfully loaded all data from sources.
2026-06-18 22:49:20,703 - INFO - Successfully computed CrpsForEnsemble for verification pair QR_GEPS.
2026-06-18 22:49:20,774 - INFO - Successfully computed CrpsForEnsemble for verification pair QR_GEFS.
2026-06-18 22:49:20,852 - INFO - Successfully computed CrpsForEnsemble for verification pair QR_IFS.
2026-06-18 22:49:20,949 - INFO - Successfully computed RankHistogram for verification pair QR_GEPS.
2026-06-18 22:49:20,991 - INFO - Successfully computed RankHistogram for verification pair QR_GEFS.
2026-06-18 22:49:21,038 - INFO - Successfully computed RankHistogram for verification pair QR_IFS.
2026-06-18 22:49:21,040 - INFO - Verification pipeline completed successfully.
[3]:
ods
[3]:
OutputDataset with 3 verification pairs: [VerificationPair(id='QR_GEPS', obs='observed', sim='simulated_GEPS', variable='discharge'), VerificationPair(id='QR_GEFS', obs='observed', sim='simulated_GEFS', variable='discharge'), VerificationPair(id='QR_IFS', obs='observed', sim='simulated_IFS', variable='discharge')]
Evaluating the results in the veriflow OutputDataset#
Verification metrics and results can contain a level of abstraction. Although these abstractions can reveal important information about forecast quality, a basic “eyeball verification” is often the best and intuitive way to start your verification exercise. You’ll likely find strengths and weaknesses in your forecasts early on, without directly diving into levels of abstraction. In addition, a solid visual inspection may help you later on in understanding or explaining the more abstract results.
1 - Visual inspection of observed and forecast data#
A good starting point for “eyeball” verification is simple: just looking at your observations and forecasts in a visual way. Use the interactive elements in the plots below to zoom, pan and compare the results of our 3 NWP products.
[4]:
stations = ods.get(ods.verification_pairs[0]).coords["station"].values
lead_times = ods.get(ods.verification_pairs[0]).coords["lead_time"].values
[5]:
forecast_timeseries_plot(ods)
2 - Visual inspection with a scatter plot per lead time#
Another great tool for “eyeball” verification is the scatter plot. The scatter plot is relatively easy to understand, but is slightly more abstract than the visualization above.
For all forecasts in our OutputDataset, we collect all realizations of the ensemble at a specific lead time. You can use the lead_times variable (a list of np.timedelta64 instances) to slice the data at a specific lead time. For that given slice, we can make the scatter plot.
[6]:
scatter_plot(ods, lead_time=lead_times[1])
3 - Looking into the Continuous Ranked Probability Score#
Next, we’ll look into the results of the continuous ranked probability score. Before you continue to the next section, we’ll look into the documentation and do a tutorial on the CRPS for ensemble forecasts. The veriflow package relies on scores (developed by the Bureau of Meteorology, Australia) for computation of various scores.
Read the API documentation the crps_for_ensemble function, which is used under the hood in veriflow.
Run through the scores tutorial on the crps_for_ensemble function. You can run it in Binder (link on top of the page), or view the static view.
Q1: what attribute(s) of forecast quality can be measured by the CRPS?
Show suggested answers for Q1
The CRPS captures multiple attributes: accuracy, reliability and sharpness. Can you reason why?
Q2: what is the best possible CRPS score?
Show suggested answers for Q2
The best possible outcome of CRPS is 0. In this case the forecast has maximum sharpness: all ensemble members exactly predict the observed outcome.
Q3: computing the CRPS over just one realization (i.e. a deterministic forecast) yields the exact same result as computing the … for a deterministic score?
Show suggested answers for Q3
The absolute error. The CRPS is a probabilistic generalization of the absolute error. When taking the mean of the CRPS over all forecasts, it is equal to the mean absolute error when the number of realizations is 1.
[7]:
crps_plot(ods)
4 - Looking into the Rank Histogram#
Next, we’ll look into the results of the rank histogram. Before you continue to the next section, we’ll look into the documentation and do a tutorial on the rank histogram for ensemble forecasts. The veriflow package relies on scores (developed by the Bureau of Meteorology, Australia) for computation of various scores.
Read the API documentation the rank_histogram function, which is used under the hood in veriflow.
Run through the scores tutorial on the rank_histogram function. You can run it in Binder (link on top of the page), or view the static view.
Q1: what attribute(s) of forecast quality can be measured by the Rank Histogram?
Show suggested answers for Q1
The rank histogram primarily measures reliability, although bias in forecasts will show up in the rank histogram as well.
Q2: what does a perfect Rank Histogram look like?
Show suggested answers for Q2
The rank histogram shows a uniform (flat) distribution, indicating forecast and observed frequencies are equal.
Q3: does a perfectly uniform/flat rank histogram always correspond to a good forecast? Can you come up with a hypothetical forecast that has a perfect rank histogram, but is still bad?
Show suggested answers for Q3
No, although a perfect rank histogram indicates statistical reliability, the forecast may still have poor resolution (the ability to discriminate between situations that have different observed outcomes).
Imagine an ensemble forecast that ignores current conditions and simply samples from the historical distribution of streamflow for that day of year.
For example, every day in June the ensemble consists of random draws from the last 30 years of June observations.
This forecast can also produce a nearly uniform rank histogram because the observations come from the same climatological distribution. However, it has zero ability to distinguish today’s conditions from any other June day.
[8]:
rank_histogram_plot(ods, station=stations[0], lead_time=lead_times[4])
5 - Looking into the Rank Histogram in 3D (advanced)#
This might seem relatively abstract or advanced, but the 3D rank histogram shows the rank histogram for all lead times. In the current notebook, the number of samples is too low to make a proper analysis of the results, but with many samples this can be an interesting and useful visualization.
[9]:
rank_histogram_3d_plot(ods, station=stations[0])