whylogs.viz.notebook_profile_viz
#
Module Contents#
Classes#
Visualize and compare profiles for drift detection, data quality, distribution comparison and feature statistics. |
Attributes#
- whylogs.viz.notebook_profile_viz.logger#
- class whylogs.viz.notebook_profile_viz.NotebookProfileVisualizer#
Visualize and compare profiles for drift detection, data quality, distribution comparison and feature statistics.
NotebookProfileVisualizer enables visualization features for Jupyter Notebook environments, but also enables download of the generated reports as HTML files.
Examples
Create target and reference dataframes:
import pandas as pd data_target = { "animal": ["cat", "hawk", "snake", "cat", "snake", "cat", "cat", "snake", "hawk","cat"], "legs": [4, 2, 0, 4, 0, 4, 4, 0, 2, 4], "weight": [4.3, None, 2.3, 7.8, 3.7, 2.5, 5.5, 3.3, 0.6, 13.3], } data_reference = { "animal": ["hawk", "hawk", "snake", "hawk", "snake", "snake", "cat", "snake", "hawk","snake"], "legs": [2, 2, 0, 2, 0, 0, 4, 0, 2, 0], "weight": [2.7, None, 1.2, 10.5, 2.2, 4.6, 3.8, 4.7, 0.6, 11.2], } target_df = pd.DataFrame(data_target) reference_df = pd.DataFrame(data_reference)
Log data and create profile views:
import whylogs as why results = why.log(pandas=target_df) prof_view = results.view() results_ref = why.log(pandas=reference_df) prof_view_ref = results_ref.view()
Log data and create profile views:
import whylogs as why results = why.log(pandas=target_df) prof_view = results.view() results_ref = why.log(pandas=reference_df) prof_view_ref = results_ref.view()
Instantiate and set profile views:
from whylogs.viz import NotebookProfileVisualizer visualization = NotebookProfileVisualizer() visualization.set_profiles(target_profile_view=prof_view,reference_profile_view=prof_view_ref)
- add_drift_config(column_names: List[str], algorithm: whylogs.viz.drift.column_drift_algorithms.ColumnDriftAlgorithm) None #
Add drift configuration. The algorithms and thresholds added through this method will be used to calculate drift scores in the summary_drift_report() method. If any drift configuration exists, the new configuration will overwrite the standard behavior when appliable. If a column has multiple configurations defined, the last one defined will be used.
- Parameters
config (DriftConfig, required) – Drift configuration.
column_names (List[str]) –
algorithm (whylogs.viz.drift.column_drift_algorithms.ColumnDriftAlgorithm) –
- Return type
- set_profiles(target_profile_view: whylogs.core.view.dataset_profile_view.DatasetProfileView, reference_profile_view: Optional[whylogs.core.view.dataset_profile_view.DatasetProfileView] = None) None #
Set profiles for Visualization/Comparison.
Drift calculation is done if both target_profile and reference profile are passed.
- Parameters
target_profile_view (DatasetProfileView, required) – Target profile to visualize.
reference_profile_view (DatasetProfileView, optional) – Reference, or baseline, profile to be compared against the target profile.
- Return type
- profile_summary(cell_height: Optional[str] = None) IPython.core.display.HTML #
- Parameters
cell_height (Optional[str]) –
- Return type
IPython.core.display.HTML
- summary_drift_report(height: Optional[str] = None) IPython.core.display.HTML #
Generate drift report between target and reference profiles.
KS is calculated if distribution metrics exists for said column. If not, Chi2 is calculated if frequent items, cardinality and count metric exists. If not, then no drift value is associated to the column. If feature is missing from any profile, it will not be included in the report. Both target_profile_view and reference_profile_view must be set previously with set_profiles. If custom drift behavior is desired, use add_drift_config before calling this method.
- Parameters
height (str, optional) – Preferred height, in pixels, for in-notebook visualization. Example: “1000px”. (Default is None)
- Returns
HTML Page of the given plot.
- Return type
HTML
Examples
Generate Summary Drift Report (after setting profiles with set_profiles):
- double_histogram(feature_name: str, cell_height: Optional[str] = None) IPython.core.display.HTML #
Plot overlayed histograms for specified feature present in both target_profile and reference_profile.
Applicable to numerical features only. If reference profile was not set, double_histogram will plot single histogram for target profile.
- Parameters
- Return type
IPython.core.display.HTML
Examples
Generate double histogram plot for feature named weight (after setting profiles with set_profiles)
visualization.double_histogram(feature_name="weight")
- distribution_chart(feature_name: str, cell_height: Optional[str] = None) IPython.core.display.HTML #
Plot overlayed distribution charts for specified feature between two profiles.
Applicable to categorical features. If reference profile was not set, distribution_chart will plot single chart for target profile.
- Parameters
- Returns
HTML Page of the given plot.
- Return type
HTML
Examples
Generate distribution chart for animal feature (after setting profiles with set_profiles):
visualization.distribution_chart(feature_name="animal")
- difference_distribution_chart(feature_name: str, cell_height: Optional[str] = None) IPython.core.display.HTML #
Plot overlayed distribution charts of differences between the categories of both profiles.
Applicable to categorical features.
- Parameters
- Returns
HTML Page of the given plot.
- Return type
HTML
Examples
Generate Difference Distribution Chart for feature named “animal”:
visualization.difference_distribution_chart(feature_name="animal")
- constraints_report(constraints: whylogs.core.constraints.Constraints, cell_height: Optional[str] = None) IPython.core.display.HTML #
- Parameters
constraints (whylogs.core.constraints.Constraints) –
cell_height (Optional[str]) –
- Return type
IPython.core.display.HTML
- feature_statistics(feature_name: str, profile: str = 'reference', cell_height: Optional[str] = None) IPython.core.display.HTML #
Generate a report for the main statistics of specified feature, for a given profile (target or reference).
Statistics include overall metrics such as distinct and missing values, as well as quantile and descriptive statistics. If profile is not passed, the default is the reference profile.
- Parameters
- Return type
IPython.core.display.HTML
Examples
Generate Difference Distribution Chart for feature named “weight”, for target profile:
visualization.feature_statistics(feature_name="weight", profile="target")
- static write(rendered_html: IPython.core.display.HTML, preferred_path: Optional[str] = None, html_file_name: Optional[str] = None) None #
Create HTML file for a given report.
- Parameters
- Return type
Examples
Dowloads an HTML page named test.html into the current working directory, with feature statistics for weight feature for the target profile.
import os visualization.write( rendered_html=visualization.feature_statistics(feature_name="weight", profile="target"), html_file_name=os.getcwd() + "/test", )