Notebook

Run it live: Open in Binder

Download this notebook: 02_elbow_deterministic_forecast.ipynb

02 - Precipitation Deterministic Forecasts#

Learning goals of this module#

  • Learn how to do a basic analysis, comparing observed precipitation with forecast precipitation from GDPS

  • Learn how to calculate and visualize simple error metrics

Assumptions#

  • We assume you are familiar with the concept of a forecast, and a lead_time.

Reference to data products#

Run imports and set-up logging#

[1]:
import logging
import sys
import warnings
from pathlib import Path

from dotenv import load_dotenv

from veriflow import run_pipeline
from veriflow.constants import VERSION

# add project root (parent of notebook folder) to path
sys.path.append(str(Path("..").resolve()))


from verification_plots import (
    crps_plot,
    forecast_timeseries_plot,
)

# Reload automatically
%load_ext autoreload
%autoreload 2


warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

load_dotenv(dotenv_path="tutorial.env", override=True)

base_config = Path("config")
base_config.exists()

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler()],
)
logging.info(f"Running Veriflow version {VERSION}")
2026-06-18 22:43:42,381 - INFO - Running Veriflow version 0.1.0

Inspecting the veriflow pipeline configuration#

  1. Open the config file in the “config” directory. The name of the file is identical to the name of the notebook.

  2. Inspect each of the sections to gain an understanding of what this configuration is about.

Running the veriflow pipeline#

[8]:
ods = run_pipeline((base_config / "02_elbow_deterministic_forecast.yaml", "yaml"))
2026-06-18 22:46:37,568 - INFO - Successfully initialized the configuration.
         verification_period_start = 2026-05-15 00:00:00
         verification_period_end = 2026-06-01 00:00:00
2026-06-18 22:46:37,570 - INFO - Start getting data from FewsWebservice.
2026-06-18 22:46:37,991 - INFO - Successfully got observed data from FewsWebservice.
2026-06-18 22:46:37,993 - INFO - Start getting data from FewsWebservice.
2026-06-18 22:46:38,511 - INFO - Successfully got simulated data from FewsWebservice.
2026-06-18 22:46:38,514 - INFO - Successfully loaded all data from sources.
2026-06-18 22:46:38,544 - INFO - Successfully computed ContinuousScores for verification pair PC.
2026-06-18 22:46:38,545 - INFO - Verification pipeline completed successfully.

Evaluating the results in the veriflow OutputDataset#

Verification metrics and results can contain a level of abstraction. Although these abstractions can reveal important information about forecast quality, a basic “eyeball verification” is often the best and intuitive way to start your verification exercise. You’ll likely find strengths and weaknesses in your forecasts early on, without directly diving into levels of abstraction. In addition, a solid visual inspection may help you later on in understanding or explaining the more abstract results.

1 - Visual inspection of observed and forecast data#

A good starting point for “eyeball” verification is simple: just looking at your observations and forecasts in a visual way. Use the interactive elements in the plots below to zoom, pan and compare the results of our 3 NWP products.

[3]:
stations = ods.get(ods.verification_pairs[0]).coords["station"].values

lead_times = ods.get(ods.verification_pairs[0]).coords["lead_time"].values
lead_times_hours = [lt.astype("timedelta64[h]").astype(int) for lt in lead_times]

print(f"Stations: {stations}")
print(f"GDPS Lead times (hours): {lead_times_hours}")
Stations: ['3031092' '3050778' 'MSC-005' '05BL813' '05BJ804' '05BL809' 'FIRES-B4'
 '05BJ805' '05BL812' 'FIRES-B5' '05BJ806' '05BH803' '05BF825' '05BH802'
 '05BL810' '05BF827']
GDPS Lead times (hours): [np.int64(24), np.int64(48), np.int64(72), np.int64(96), np.int64(120), np.int64(144), np.int64(168)]
[4]:
forecast_timeseries_plot(ods, station=stations[1])

2 - Looking into the Mean Error per lead time#

[5]:
# Plot the first 5 stations to avoid overcrowding the plot
crps_plot(ods, score_var="mean_error", stations=stations[:5])