dpyverification.pipeline#

Specification of a pipeline that will collect data and run verification functions on the data.

Functions

run_pipeline(config[, user_datasources, ...])

Execute a verification pipeline as defined in the configuration.

dpyverification.pipeline.run_pipeline(config, user_datasources=None, user_scores=None, user_datasinks=None)[source]#

Execute a verification pipeline as defined in the configuration.

Parameters:
  • config (tuple[Path, ConfigKind] | Config) – When using a configuration file, provide a tuple with the path and kind of configuration file. For now, only ‘yaml’ is supported.

  • user_datasources (list[type[BaseDatasource]] | None, optional) – Option to plug-in a user-implementation of a DataSource., by default None

  • user_scores (list[type[BaseScore | BaseCategoricalScore]] | None, optional) – Option to plug-in a user-implementation of a Score., by default None

  • user_datasinks (list[type[BaseDatasink]] | None, optional) – Option to plug-in a user-implementation of a DataSink., by default None

Returns:

The output dataset containing the results of the verification pipeline. In addition to the option of writing the output to a file or service, the output of the verification pipeline can also be assigned back to a Python variable for further inspection in an interactive Python environment.

Return type:

OutputDataset

Examples

Using a YAML file:

from dpyverification import run_pipeline
from dpyverification.configuration import Config
from pathlib import Path

path_to_config = Path("./config.yaml)
output_dataset = run_pipeline((path_to_config, "yaml"))

Using Python objects directly:

from dpyverification import run_pipeline
from dpyverification.configuration import Config, GeneralInfoConfig

config = Config(
    general=GeneralInfoConfig(log_level="INFO"),
    # ... other sub-models here ...
)

output_dataset = run_pipeline(config)