dpyverification.pipeline#
Specification of a pipeline that will collect data and run verification functions on the data.
Functions
|
Execute a verification pipeline as defined in the configuration. |
- dpyverification.pipeline.run_pipeline(config, user_datasources=None, user_scores=None, user_datasinks=None)[source]#
Execute a verification pipeline as defined in the configuration.
- Parameters:
config (tuple[Path, ConfigKind] | Config) – When using a configuration file, provide a tuple with the path and kind of configuration file. For now, only ‘yaml’ is supported.
user_datasources (list[type[BaseDatasource]] | None, optional) – Option to plug-in a user-implementation of a DataSource., by default None
user_scores (list[type[BaseScore | BaseCategoricalScore]] | None, optional) – Option to plug-in a user-implementation of a Score., by default None
user_datasinks (list[type[BaseDatasink]] | None, optional) – Option to plug-in a user-implementation of a DataSink., by default None
- Returns:
The output dataset containing the results of the verification pipeline. In addition to the option of writing the output to a file or service, the output of the verification pipeline can also be assigned back to a Python variable for further inspection in an interactive Python environment.
- Return type:
Examples
Using a YAML file:
from dpyverification import run_pipeline from dpyverification.configuration import Config from pathlib import Path path_to_config = Path("./config.yaml) output_dataset = run_pipeline((path_to_config, "yaml"))
Using Python objects directly:
from dpyverification import run_pipeline from dpyverification.configuration import Config, GeneralInfoConfig config = Config( general=GeneralInfoConfig(log_level="INFO"), # ... other sub-models here ... ) output_dataset = run_pipeline(config)