whylogs.viz#

Subpackages#

Submodules#

Package Contents#

Classes#

SummaryDriftReport

NotebookProfileVisualizer

Visualize and compare profiles for drift detection, data quality, distribution comparison and feature statistics.

class whylogs.viz.SummaryDriftReport(ref_view: whylogs.DatasetProfileView, target_view: whylogs.DatasetProfileView, height: Optional[str] = None)#

Bases: whylogs.viz.extensions.reports.html_report.HTMLReport

Parameters
report() str#
Return type

str

add_drift_config(column_names: List[str], algorithm: whylogs.viz.drift.column_drift_algorithms.ColumnDriftAlgorithm) None#

Add drift configuration. The algorithms and thresholds added through this method will be used to calculate drift scores in the summary_drift_report() method. If any drift configuration exists, the new configuration will overwrite the standard behavior when appliable. If a column has multiple configurations defined, the last one defined will be used.

Parameters
Return type

None

class whylogs.viz.NotebookProfileVisualizer#

Visualize and compare profiles for drift detection, data quality, distribution comparison and feature statistics.

NotebookProfileVisualizer enables visualization features for Jupyter Notebook environments, but also enables download of the generated reports as HTML files.

Examples

Create target and reference dataframes:

import pandas as pd

data_target = {
    "animal": ["cat", "hawk", "snake", "cat", "snake", "cat", "cat", "snake", "hawk","cat"],
    "legs": [4, 2, 0, 4, 0, 4, 4, 0, 2, 4],
    "weight": [4.3, None, 2.3, 7.8, 3.7, 2.5, 5.5, 3.3, 0.6, 13.3],
}

data_reference = {
    "animal": ["hawk", "hawk", "snake", "hawk", "snake", "snake", "cat", "snake", "hawk","snake"],
    "legs": [2, 2, 0, 2, 0, 0, 4, 0, 2, 0],
    "weight": [2.7, None, 1.2, 10.5, 2.2, 4.6, 3.8, 4.7, 0.6, 11.2],
}

target_df = pd.DataFrame(data_target)
reference_df = pd.DataFrame(data_reference)

Log data and create profile views:

import whylogs as why

results = why.log(pandas=target_df)
prof_view = results.view()

results_ref = why.log(pandas=reference_df)
prof_view_ref = results_ref.view()

Log data and create profile views:

import whylogs as why

results = why.log(pandas=target_df)
prof_view = results.view()

results_ref = why.log(pandas=reference_df)
prof_view_ref = results_ref.view()

Instantiate and set profile views:

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.set_profiles(target_profile_view=prof_view,reference_profile_view=prof_view_ref)
add_drift_config(column_names: List[str], algorithm: whylogs.viz.drift.column_drift_algorithms.ColumnDriftAlgorithm) None#

Add drift configuration. The algorithms and thresholds added through this method will be used to calculate drift scores in the summary_drift_report() method. If any drift configuration exists, the new configuration will overwrite the standard behavior when appliable. If a column has multiple configurations defined, the last one defined will be used.

Parameters
Return type

None

set_profiles(target_profile_view: whylogs.core.view.dataset_profile_view.DatasetProfileView, reference_profile_view: Optional[whylogs.core.view.dataset_profile_view.DatasetProfileView] = None) None#

Set profiles for Visualization/Comparison.

Drift calculation is done if both target_profile and reference profile are passed.

Parameters
  • target_profile_view (DatasetProfileView, required) – Target profile to visualize.

  • reference_profile_view (DatasetProfileView, optional) – Reference, or baseline, profile to be compared against the target profile.

Return type

None

profile_summary(cell_height: Optional[str] = None) IPython.core.display.HTML#
Parameters

cell_height (Optional[str]) –

Return type

IPython.core.display.HTML

summary_drift_report(height: Optional[str] = None) IPython.core.display.HTML#

Generate drift report between target and reference profiles.

KS is calculated if distribution metrics exists for said column. If not, Chi2 is calculated if frequent items, cardinality and count metric exists. If not, then no drift value is associated to the column. If feature is missing from any profile, it will not be included in the report. Both target_profile_view and reference_profile_view must be set previously with set_profiles. If custom drift behavior is desired, use add_drift_config before calling this method.

Parameters

height (str, optional) – Preferred height, in pixels, for in-notebook visualization. Example: “1000px”. (Default is None)

Returns

HTML Page of the given plot.

Return type

HTML

Examples

Generate Summary Drift Report (after setting profiles with set_profiles):

double_histogram(feature_name: Union[str, List[str]], cell_height: Optional[str] = None) IPython.core.display.HTML#

Plot overlayed histograms for specified feature present in both target_profile and reference_profile.

Applicable to numerical features only. If reference profile was not set, double_histogram will plot single histogram for target profile.

Parameters
  • feature_name (str) – Name of the feature to generate histograms.

  • cell_height (str, optional) – Preferred cell height, in pixels, for in-notebook visualization. Example: “1000px”. (Default is None)

Return type

IPython.core.display.HTML

Examples

Generate double histogram plot for feature named weight (after setting profiles with set_profiles)

visualization.double_histogram(feature_name="weight")
distribution_chart(feature_name: Union[str, List[str]], cell_height: Optional[str] = None) IPython.core.display.HTML#

Plot overlayed distribution charts for specified feature between two profiles.

Applicable to categorical features. If reference profile was not set, distribution_chart will plot single chart for target profile.

Parameters
  • feature_name (str) – Name of the feature to plot chart.

  • cell_height (str, optional) – Preferred cell height, in pixels, for in-notebook visualization. Example: cell_height=”1000px”. (Default is None)

Returns

HTML Page of the given plot.

Return type

HTML

Examples

Generate distribution chart for animal feature (after setting profiles with set_profiles):

visualization.distribution_chart(feature_name="animal")
difference_distribution_chart(feature_name: Union[str, List[str]], cell_height: Optional[str] = None) IPython.core.display.HTML#

Plot overlayed distribution charts of differences between the categories of both profiles.

Applicable to categorical features.

Parameters
  • feature_name (str) – Name of the feature to plot chart.

  • cell_height (str, optional) – Preferred cell height, in pixels, for in-notebook visualization. Example: cell_height=”1000px”. (Default is None)

Returns

HTML Page of the given plot.

Return type

HTML

Examples

Generate Difference Distribution Chart for feature named “animal”:

visualization.difference_distribution_chart(feature_name="animal")
constraints_report(constraints: whylogs.core.constraints.Constraints, cell_height: Optional[str] = None) IPython.core.display.HTML#
Parameters
Return type

IPython.core.display.HTML

feature_statistics(feature_name: Union[str, List[str]], profile: str = 'reference', cell_height: Optional[str] = None) IPython.core.display.HTML#

Generate a report for the main statistics of specified feature, for a given profile (target or reference).

Statistics include overall metrics such as distinct and missing values, as well as quantile and descriptive statistics. If profile is not passed, the default is the reference profile.

Parameters
  • feature_name (str) – Name of the feature to generate histograms.

  • profile (str) – Profile to be used to generate the report. (Default is reference)

  • cell_height (str, optional) – Preferred cell height, in pixels, for in-notebook visualization. Example: cell_height=”1000px”. (Default is None)

Return type

IPython.core.display.HTML

Examples

Generate Difference Distribution Chart for feature named “weight”, for target profile:

visualization.feature_statistics(feature_name="weight", profile="target")
static write(rendered_html: IPython.core.display.HTML, preferred_path: Optional[str] = None, html_file_name: Optional[str] = None) None#

Create HTML file for a given report.

Parameters
  • rendered_html (HTML, optional) – Rendered HTML returned by a given report.

  • preferred_path (str, optional) – Preferred path to write the HTML file.

  • html_file_name (str, optional) – Name for the created HTML file. If none is passed, created HTML will be named ProfileVisualizer.html

Return type

None

Examples

Dowloads an HTML page named test.html into the current working directory, with feature statistics for weight feature for the target profile.

import os
visualization.write(
    rendered_html=visualization.feature_statistics(feature_name="weight", profile="target"),
    html_file_name=os.getcwd() + "/test",
)