veriflow.datasources.zarr#

Read Zarr stores from local disk or S3.

Classes

Zarr(config)

A datasource for reading Zarr stores compatible with the internal datamodel.

ZarrConfig(*, import_adapter, source, ...[, ...])

A Zarr config element.

S3AuthConfig([_case_sensitive, ...])

Get S3 credentials and connection info safely from environment variables.

class veriflow.datasources.zarr.Zarr(config)[source]#

A datasource for reading Zarr stores compatible with the internal datamodel.

Wraps xarray.open_zarr() and supports both local filesystem paths and remote URLs (currently s3:// is exercised). For S3 stores, credentials are taken from a S3AuthConfig instance loaded from environment variables prefixed with S3_. Additional storage_options configured on the ZarrConfig are merged on top and forwarded to xr.open_zarr.

Note

The dataset must carry a data_type attribute that matches one of the supported data types; if the attribute is missing it will be set from the configuration.

Parameters:

config (ZarrConfig)

kind: str = 'zarr'#
config_class#

alias of ZarrConfig

supported_data_types: ClassVar[set[DataType]] = {DataType.observed_historical, DataType.simulated_forecast_ensemble, DataType.simulated_forecast_probabilistic, DataType.simulated_forecast_single, DataType.threshold}#
fetch_data()[source]#

Retrieve the configured Zarr store as an xarray Dataset.

Return type:

Self

class veriflow.datasources.zarr.ZarrConfig(*, import_adapter, source, data_type, general, id_mapping=None, path, auth_config=None, storage_options=None, consolidated=None, **extra_data)[source]#

A Zarr config element.

Reads a single Zarr store via xarray.open_zarr(). The path may point to a local directory or a remote location (e.g. s3://bucket/key/store.zarr). When the path uses an s3:// URL, credentials and connection details are taken from auth_config (an S3AuthConfig), which is populated from environment variables prefixed with S3_. Additional storage_options are merged on top of the ones derived from auth_config and forwarded to xr.open_zarr.

Parameters:
import_adapter: Literal[DataSourceKind.ZARR]#
path: Annotated[str, FieldInfo(annotation=NoneType, required=True, description="Path to a single Zarr store. Local filesystem path (absolute or relative) or a remote URL such as 's3://bucket/key/store.zarr'.", metadata=[MinLen(min_length=1)])]#
auth_config: Annotated[S3AuthConfig | None, FieldInfo(annotation=NoneType, required=False, default=None, description="Authentication configuration for remote stores. Only consulted when 'path' points to an 's3://' location. Credentials are loaded from S3_-prefixed environment variables; instantiate as 'auth_config: {}' in YAML to enable env-based loading.")]#
storage_options: Annotated[dict[str, str] | None, FieldInfo(annotation=NoneType, required=False, default=None, description="Additional storage_options forwarded to xr.open_zarr. Merged on top of the options derived from 'auth_config'. Use this for advanced fsspec / s3fs settings not exposed by S3AuthConfig.")]#
consolidated: Annotated[bool | None, FieldInfo(annotation=NoneType, required=False, default=None, description="Whether to use consolidated metadata when opening the store. Forwarded to xr.open_zarr. Default ('None') lets xarray auto-detect.")]#
class veriflow.datasources.zarr.S3AuthConfig(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, endpoint_url=None, region_name=None, access_key_id=None, secret_access_key=None, session_token=None, anon=False)[source]#

Get S3 credentials and connection info safely from environment variables.

This config class inherits from pydantic_settings.BaseSettings, that will try to infer field values from environment variables.

Environment variables (all optional):

  • S3_ENDPOINT_URL: Custom S3 endpoint (e.g. for MinIO or non-AWS S3).

  • S3_REGION_NAME: AWS region.

  • S3_ACCESS_KEY_ID: Access key id.

  • S3_SECRET_ACCESS_KEY: Secret access key.

  • S3_SESSION_TOKEN: Session token (for temporary credentials).

  • S3_ANON: Set to true for anonymous access to public buckets.

Fields default to None (or False for anon) so that callers can rely on s3fs / botocore falling back to standard AWS credential discovery (e.g. ~/.aws/credentials, instance metadata) when a value is not explicitly set.

see: https://docs.pydantic.dev/latest/concepts/pydantic_settings/#usage

Parameters:
  • _case_sensitive (bool | None)

  • _nested_model_default_partial_update (bool | None)

  • _env_prefix (str | None)

  • _env_prefix_target (EnvPrefixTarget | None)

  • _env_file (DotenvType | None)

  • _env_file_encoding (str | None)

  • _env_ignore_empty (bool | None)

  • _env_nested_delimiter (str | None)

  • _env_nested_max_split (int | None)

  • _env_parse_none_str (str | None)

  • _env_parse_enums (bool | None)

  • _cli_prog_name (str | None)

  • _cli_parse_args (bool | list[str] | tuple[str, ...] | None)

  • _cli_settings_source (CliSettingsSource[Any] | None)

  • _cli_parse_none_str (str | None)

  • _cli_hide_none_type (bool | None)

  • _cli_avoid_json (bool | None)

  • _cli_enforce_required (bool | None)

  • _cli_use_class_docs_for_groups (bool | None)

  • _cli_exit_on_error (bool | None)

  • _cli_prefix (str | None)

  • _cli_flag_prefix_char (str | None)

  • _cli_implicit_flags (bool | Literal['dual', 'toggle'] | None)

  • _cli_ignore_unknown_args (bool | None)

  • _cli_kebab_case (bool | Literal['all', 'no_enums'] | None)

  • _cli_shortcuts (Mapping[str, str | list[str]] | None)

  • _secrets_dir (PathType | None)

  • _build_sources (tuple[tuple[PydanticBaseSettingsSource, ...], dict[str, Any]] | None)

  • endpoint_url (AnyUrl | None)

  • region_name (str | None)

  • access_key_id (SecretStr | None)

  • secret_access_key (SecretStr | None)

  • session_token (SecretStr | None)

  • anon (bool)

endpoint_url: AnyUrl | None#
region_name: str | None#
access_key_id: SecretStr | None#
secret_access_key: SecretStr | None#
session_token: SecretStr | None#
anon: bool#
to_storage_options()[source]#

Build a storage_options dict for xr.open_zarr / s3fs.

Only keys with non-None values are included. SecretStr values are unwrapped to their plain string form so that s3fs can use them.

Return type:

dict[str, object]