whylogs.mlflow.patcher

Module Contents

Classes

WhyLogsRun

Functions

_new_mlflow_conda_env(path=None, additional_conda_deps=None, additional_pip_deps=None, additional_conda_channels=None, install_mlflow=True)

_new_add_to_model(model, loader_module, data=None, code=None, env=None, **kwargs)

Replaces the MLFLow’s original add_to_model

new_model_log(**kwargs)

Hijack the mlflow.models.Model.log method and upload the .whylogs.yaml configuration to the model path

enable_mlflow() → bool

Enable whylogs in mlflow module via mlflow.whylogs.

disable_mlflow()

Attributes

logger

_mlflow

_original_end_run

_active_whylogs

_is_patched

_original_mlflow_conda_env

_original_add_to_model

_original_model_log

WHYLOG_YAML

whylogs.mlflow.patcher.logger
whylogs.mlflow.patcher._mlflow
whylogs.mlflow.patcher._original_end_run
whylogs.mlflow.patcher._active_whylogs = []
whylogs.mlflow.patcher._is_patched = False
whylogs.mlflow.patcher._original_mlflow_conda_env
whylogs.mlflow.patcher._original_add_to_model
whylogs.mlflow.patcher._original_model_log
class whylogs.mlflow.patcher.WhyLogsRun

Bases: object

_session
_active_run_id
_loggers :Dict[str, whylogs.app.logger.Logger]
_create_logger(self, dataset_name: Optional[str] = None)
log_pandas(self, df: pandas.DataFrame, dataset_name: Optional[str] = None)

Log the statistics of a Pandas dataframe. Note that this method is additive within a run: calling this method with a specific dataset name will not generate a new profile; instead, data will be aggregated into the existing profile.

In order to create a new profile, please specify a dataset_name

Parameters
  • df – the Pandas dataframe to log

  • dataset_name – the name of the dataset (Optional). If not specified, the experiment name is used

log(self, features: Dict[str, any] = None, feature_name: str = None, value: any = None, dataset_name: Optional[str] = None)

Logs a collection of features or a single feature (must specify one or the other).

Parameters
  • features – a map of key value feature for model input

  • feature_name – a dictionary of key->value for multiple features. Each entry represent a single columnar feature

  • feature_name – name of a single feature. Cannot be specified if ‘features’ is specified

  • value – value of as single feature. Cannot be specified if ‘features’ is specified

  • dataset_name – the name of the dataset. If not specified, we fall back to using the experiment name

_get_or_create_logger(self, dataset_name: Optional[str] = None)
_close(self)
whylogs.mlflow.patcher._new_mlflow_conda_env(path=None, additional_conda_deps=None, additional_pip_deps=None, additional_conda_channels=None, install_mlflow=True)
whylogs.mlflow.patcher._new_add_to_model(model, loader_module, data=None, code=None, env=None, **kwargs)

Replaces the MLFLow’s original add_to_model https://github.com/mlflow/mlflow/blob/4e68f960d4520ade6b64a28c297816f622adc83e/mlflow/pyfunc/__init__.py#L242

Accepts the same signature as MLFlow’s original add_to_model call. We inject our loader module.

We also inject whylogs into the Conda environment by patching _mlflow_conda_env.

Parameters
  • model – Existing model.

  • loader_module – The module to be used to load the model.

  • data – Path to the model data.

  • code – Path to the code dependencies.

  • env – Conda environment.

  • kwargs – Additional key-value pairs to include in the pyfunc flavor specification. Values must be YAML-serializable.

Returns

Updated model configuration.

whylogs.mlflow.patcher.WHYLOG_YAML = .whylogs.yaml
whylogs.mlflow.patcher.new_model_log(**kwargs)

Hijack the mlflow.models.Model.log method and upload the .whylogs.yaml configuration to the model path This will allow us to pick up the configuration later under /opt/ml/model/.whylogs.yaml path

whylogs.mlflow.patcher.enable_mlflow()bool

Enable whylogs in mlflow module via mlflow.whylogs.

Returns

True if MLFlow has been patched. False otherwise.

Example of whylogs and MLFlow
import mlflow
import whylogs

whylogs.enable_mlflow()

import numpy as np
import pandas as pd
pdf = pd.DataFrame(
    data=[[1, 2, 3, 4, True, "x", bytes([1])]],
    columns=["b", "d", "a", "c", "e", "g", "f"],
    dtype=np.object,
)

active_run = mlflow.start_run()

# log a Pandas dataframe under default name
mlflow.whylogs.log_pandas(pdf)

# log a Pandas dataframe with custom name
mlflow.whylogs.log_pandas(pdf, "another dataset")

# Finish the MLFlow run
mlflow.end_run()
whylogs.mlflow.patcher.disable_mlflow()