whylogs.viz.drift.column_drift_algorithms#

Module Contents#

Classes#

DriftAlgorithmScore

Dataclass for storing drift algorithm score.

ColumnDriftAlgorithm

Abstract class for column drift algorithms.

Hellinger

Hellinger distance algorithm for column drift detection.

ChiSquare

Chi-Squared test algorithm for column drift detection.

KS

Kolmogorov-Smirnov test algorithm for column drift detection.

Functions#

calculate_drift_scores(→ Dict[str, Optional[Dict[str, ...)

Calculate drift scores for all columns in the target dataset profile.

class whylogs.viz.drift.column_drift_algorithms.DriftAlgorithmScore#

Dataclass for storing drift algorithm score.

algorithm: str#
pvalue: Optional[float]#
statistic: Optional[float]#
thresholds: Optional[whylogs.viz.drift.configs.DriftThresholds]#
drift_category: Optional[str]#
to_dict()#
class whylogs.viz.drift.column_drift_algorithms.ColumnDriftAlgorithm(parameter_config: Optional[Any] = None)#

Bases: abc.ABC

Abstract class for column drift algorithms.

Parameters

parameter_config (Optional[Any]) –

abstract calculate(target_column_view: whylogs.core.view.column_profile_view.ColumnProfileView, reference_column_view: whylogs.core.view.column_profile_view.ColumnProfileView, with_thresholds: bool) Optional[DriftAlgorithmScore]#

Calculates drift score for a given column.

If with_thresholds is True, the thresholds defined in the parameter config are also returned, along with the final drift category.

Parameters
Return type

Optional[DriftAlgorithmScore]

abstract set_parameters(parameter_config: Any)#
Parameters

parameter_config (Any) –

class whylogs.viz.drift.column_drift_algorithms.Hellinger(parameter_config: Optional[whylogs.viz.drift.configs.HellingerConfig] = None)#

Bases: ColumnDriftAlgorithm

Hellinger distance algorithm for column drift detection.

Requires the target and reference columns to have non-empty distribution metrics. The statistic is the Hellinger distance between the two distributions, which can assume values between 0 and 1.

Parameters

parameter_config (Optional[whylogs.viz.drift.configs.HellingerConfig]) –

calculate(target_column_view: whylogs.core.view.column_profile_view.ColumnProfileView, reference_column_view: whylogs.core.view.column_profile_view.ColumnProfileView, with_thresholds=False) Optional[DriftAlgorithmScore]#

Calculates drift score for a given column.

Parameters
  • target_column_view (ColumnProfileView) – Column view of the target profile

  • reference_column_view (ColumnProfileView) – Column view of the reference profile

  • with_thresholds (bool, optional) – By default False. If True, the thresholds defined in the parameter config are also returned in the DriftAlgorithmScore object, along with the final drift category.

Returns

Returns a DriftAlgorithmScore object containing the p-value and the KS statistic. If with_thresholds is True, also returns the the thresholds defined in the parameter config and the final drift category. The drift category is determined by the p-value and the thresholds defined in the parameter config.

Return type

Optional[DriftAlgorithmScore]

abstract set_parameters(parameter_config: Any)#
Parameters

parameter_config (Any) –

class whylogs.viz.drift.column_drift_algorithms.ChiSquare(parameter_config: Optional[whylogs.viz.drift.configs.ChiSquareConfig] = None)#

Bases: ColumnDriftAlgorithm

Chi-Squared test algorithm for column drift detection.

Parameters

parameter_config (Optional[whylogs.viz.drift.configs.ChiSquareConfig]) –

calculate(target_column_view: whylogs.core.view.column_profile_view.ColumnProfileView, reference_column_view: whylogs.core.view.column_profile_view.ColumnProfileView, with_thresholds=False) Optional[DriftAlgorithmScore]#

Calculates drift score for a given column.

If with_thresholds is True, the thresholds defined in the parameter config are also returned, along with the final drift category.

Parameters
Return type

Optional[DriftAlgorithmScore]

abstract set_parameters(parameter_config: Any)#
Parameters

parameter_config (Any) –

class whylogs.viz.drift.column_drift_algorithms.KS(parameter_config: Optional[whylogs.viz.drift.configs.KSTestConfig] = None)#

Bases: ColumnDriftAlgorithm

Kolmogorov-Smirnov test algorithm for column drift detection.

Parameters

parameter_config (Optional[whylogs.viz.drift.configs.KSTestConfig]) –

calculate(target_column_view: whylogs.core.view.column_profile_view.ColumnProfileView, reference_column_view: whylogs.core.view.column_profile_view.ColumnProfileView, with_thresholds=False) Optional[DriftAlgorithmScore]#

Compute the Kolmogorov-Smirnov test for two distributions. Require the target and reference column views to have a distribution metric.

Parameters
  • target_column_view (ColumnProfileView) – Column view of the target profile

  • reference_column_view (ColumnProfileView) – Column view of the reference profile

  • with_thresholds (bool, optional) – By default False. If True, the thresholds defined in the parameter config are also returned in the DriftAlgorithmScore object, along with the final drift category.

Returns

Returns a DriftAlgorithmScore object containing the p-value and the KS statistic. If with_thresholds is True, also returns the the thresholds defined in the parameter config and the final drift category. The drift category is determined by the p-value and the thresholds defined in the parameter config.

Return type

Optional[DriftAlgorithmScore]

set_parameters(parameter_config: Any)#
Parameters

parameter_config (Any) –

whylogs.viz.drift.column_drift_algorithms.calculate_drift_scores(target_view: whylogs.core.view.dataset_profile_view.DatasetProfileView, reference_view: whylogs.core.view.dataset_profile_view.DatasetProfileView, drift_map: Optional[Dict[str, ColumnDriftAlgorithm]] = None, with_thresholds=False) Dict[str, Optional[Dict[str, Any]]]#

Calculate drift scores for all columns in the target dataset profile.

If a drift map is provided, the drift algorithm for each column in the map is determined by the map. Columns not in the map (or if map is not provided) will use the default drift algorithm selection logic. If the column does not have the required metrics to apply the selected algorithm, None is returned. For example, if KS or Hellinger is selected for a column with string values, None will be returned.

If with_thresholds is True, the configured algorithm’s thresholds is returned in the DriftAlgorithmScore.

Returns a dictionary of column names to drift scores.

Examples

Parameters
Return type

Dict[str, Optional[Dict[str, Any]]]