`whylogs`#

whylogs is an open source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to:

Track changes in their dataset
Create data constraints to know whether their data looks they way it should
Quickly visualize key summary statistics about their datasets

These three functionalities enable a variety of use cases for data scientists, machine learning engineers, and: data engineers:

Detecting data drift (and resultant ML model performance degradation)
Data quality validation
Exploratory data analysis via data profiling
Tracking data for ML experiments
And many more…

Subpackages#

Package Contents#

Classes#

`ResultSet`	A holder object for profiling results.
`DatasetProfileView`	Helper class that provides a standard way to create an ABC using

Functions#

`log`(→ result_set.ResultSet)
`log_classification_metrics`(→ result_set.ResultSet)	Function to track metrics based on validation data.
`log_regression_metrics`(→ result_set.ResultSet)	Function to track regression metrics based on validation data.
`profiling`(*[, schema])
`read`(→ result_set.ResultSet)
`write`(→ None)
`init`(→ whylogs.api.whylabs.session.session.Session)	Set up authentication for this whylogs logging session. There are three modes that you can authentiate in.
`v0_to_v1_view`(→ whylogs.core.DatasetProfileView)
`package_version`(→ str)	Calculate version number based on pyproject.toml

class whylogs.ResultSet#

Bases: abc.ABC

A holder object for profiling results.

A whylogs.log call can result in more than one profile. This wrapper class simplifies the navigation among these profiles.

Note that currently we only hold one profile but we’re planning to add other kinds of profiles such as segmented profiles here.

property metadata: Optional[Dict[str, str]]#

Return type: Optional[Dict[str, str]]

property count: int#

Return type: int

property performance_metrics: Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]#

Return type: Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]

static read(multi_profile_file: str) → ResultSet#

Parameters: multi_profile_file (str) –
Return type: ResultSet

static reader(name: str = 'local') → ResultSetReader#

Parameters: name (str) –
Return type: ResultSetReader

writer(name: str = 'local') → ResultSetWriter#

Parameters: name (str) –
Return type: ResultSetWriter

abstract view() → Optional[whylogs.core.DatasetProfileView]#

Return type: Optional[whylogs.core.DatasetProfileView]

abstract profile() → Optional[whylogs.core.DatasetProfile]#

Return type: Optional[whylogs.core.DatasetProfile]

get_writables() → Optional[List[whylogs.api.writer.writer.Writable]]#

Return type: Optional[List[whylogs.api.writer.writer.Writable]]

set_dataset_timestamp(dataset_timestamp: datetime.datetime) → None#

Parameters: dataset_timestamp (datetime.datetime) –
Return type: None

add_model_performance_metrics(metrics: whylogs.core.model_performance_metrics.ModelPerformanceMetrics) → None#

Parameters: metrics (whylogs.core.model_performance_metrics.ModelPerformanceMetrics) –
Return type: None

add_metric(name: str, metric: whylogs.core.metrics.metrics.Metric) → None#

Parameters

name (str) –
metric (whylogs.core.metrics.metrics.Metric) –

Return type

None

abstract merge(other: ResultSet) → ResultSet#

Parameters: other (ResultSet) –
Return type: ResultSet

whylogs.log(obj: Any = None, *, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Dict[str, Any]] = None, schema: Optional[whylogs.core.DatasetSchema] = None, name: Optional[str] = None, multiple: Optional[Dict[str, Loggable]] = None, dataset_timestamp: Optional[datetime.datetime] = None, trace_id: Optional[str] = None, tags: Optional[List[str]] = None, segment_key_values: Optional[Dict[str, str]] = None, debug_event: Optional[Dict[str, Any]] = None) → result_set.ResultSet#

Parameters

obj (Any) –
pandas (Optional[whylogs.core.stubs.pd.DataFrame]) –
row (Optional[Dict[str, Any]]) –
schema (Optional[whylogs.core.DatasetSchema]) –
name (Optional[str]) –
multiple (Optional[Dict[str, Loggable]]) –
dataset_timestamp (Optional[datetime.datetime]) –
trace_id (Optional[str]) –
tags (Optional[List[str]]) –
segment_key_values (Optional[Dict[str, str]]) –
debug_event (Optional[Dict[str, Any]]) –

Return type

result_set.ResultSet

whylogs.log_classification_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, score_column: Optional[str] = None, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) → result_set.ResultSet#

Function to track metrics based on validation data. user may also pass the associated attribute names associated with target, prediction, and/or score.

Parameters

data (pd.DataFrame) – Dataframe with the data to log.
target_column (str) – Column name for the actual validated values.
prediction_column (str) – Column name for the predicted values.
score_column (Optional[str], optional) – Associated scores for each inferred, all values set to 1 if None, by default None
schema (Optional[DatasetSchema], optional) – Defines the schema for tracking metrics in whylogs, by default None
log_full_data (bool, optional) – Whether to log the complete dataframe or not. If True, the complete DF will be logged in addition to the regression metrics. If False, only the calculated regression metrics will be logged. In a typical production use case, the ground truth might not be available at the time the remaining data is generated. In order to prevent double profiling the input features, consider leaving this as False. by default False.
dataset_timestamp (Optional[datetime], optional) – dataset’s timestamp, by default None

Return type

result_set.ResultSet

Examples

data = {
    "product": ["milk", "carrot", "cheese", "broccoli"],
    "category": ["dairies", "vegetables", "dairies", "vegetables"],
    "output_discount": [0, 0, 1, 1],
    "output_prediction": [0, 0, 0, 1],
}
df = pd.DataFrame(data)

results = why.log_classification_metrics(
        df,
        target_column="output_discount",
        prediction_column="output_prediction",
        log_full_data=True,
    )

whylogs.log_regression_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) → result_set.ResultSet#

Function to track regression metrics based on validation data. User may also pass the associated attribute names associated with target, prediction, and/or score.

Parameters

data (pd.DataFrame) – Dataframe with the data to log.
target_column (str) – Column name for the target values.
prediction_column (str) – Column name for the predicted values.
schema (Optional[DatasetSchema], optional) – Defines the schema for tracking metrics in whylogs, by default None
log_full_data (bool, optional) – Whether to log the complete dataframe or not. If True, the complete DF will be logged in addition to the regression metrics. If False, only the calculated regression metrics will be logged. In a typical production use case, the ground truth might not be available at the time the remaining data is generated. In order to prevent double profiling the input features, consider leaving this as False. by default False.
dataset_timestamp (Optional[datetime], optional) – dataset’s timestamp, by default None

Returns

Return type

ResultSet

Examples

import pandas as pd
import whylogs as why

df = pd.DataFrame({"target_temperature": [[10.5, 24.3, 15.6]], "predicted_temperature": [[9.12,26.42,13.12]]})
results = why.log_regression_metrics(df, target_column = "temperature", prediction_column = "prediction_temperature")

whylogs.profiling(*, schema: Optional[whylogs.core.DatasetSchema] = None)#

Parameters: schema (Optional[whylogs.core.DatasetSchema]) –

whylogs.read(path: str) → result_set.ResultSet#

Parameters: path (str) –
Return type: result_set.ResultSet

whylogs.write(profile: whylogs.core.DatasetProfile, base_dir: str) → None#

Parameters

profile (whylogs.core.DatasetProfile) –
base_dir (str) –

Return type

None

whylogs.init(reinit: bool = False, allow_anonymous: bool = True, allow_local: bool = False, whylabs_api_key: Optional[str] = None, default_dataset_id: Optional[str] = None, config_path: Optional[str] = None, **kwargs: bool) → whylogs.api.whylabs.session.session.Session#

Set up authentication for this whylogs logging session. There are three modes that you can authentiate in.

WHYLABS: Data is sent to WhyLabs and is associated with a specific WhyLabs account. You can get a WhyLabs api
key from the WhyLabs Settings page after logging in.
WHYLABS_ANONYMOUS: Data is sent to WhyLabs, but no authentication happens and no WhyLabs account is required.
Sessions can be claimed into an account later on the WhyLabs website.
LOCAL: No authentication. No data is automatically sent anywhere. Use this if you want to explore profiles
locally or manually upload them somewhere.

Typically, you should only have to put why.init() with no arguments at the start of your application/notebook/script. The arguments allow for some customization of the logic that determines the session type. Here is the priority order:

If there is an api key directly supplied to init, then use it and authenticate session as WHYLABS.
If there is an api key in the environment variable WHYLABS_API_KEY, then use it and authenticate session as WHYLABS.
If there is an api key in the whylogs config file, then use it and authenticate session as WHYLABS.
If we’re in an interractive environment (notebook, colab, etc.) then prompt the user to pick a method explicitly.
The options are determined by the allow* argument values to init().
If allow_anonymous is True, then authenticate session as WHYLABS_ANONYMOUS.
If allow_local is True, then authenticate session as LOCAL.

Parameters

session_type – Deprecated, use allow_anonymous and allow_local instead
reinit (bool) – Normally, init() is idempotent, so you can run it over and over again in a notebook without any issues, for example. If reinit=True then it will run the initialization logic again, so you can switch authentication methods without restarting.
allow_anonymous (bool) – If True, then the user will be able to choose WHYLABS_ANONYMOUS if no other authentication method is found.
allow_local (bool) – If True, then the user will be able to choose LOCAL if no other authentication method is found.
whylabs_api_key (Optional[str]) – A WhyLabs api key to use for uploading profiles. There are other ways that you can set an api key that don’t require direclty embedding it in code, like setting WHYLABS_API_KEY env variable or supplying the api key interractively via the init() prompt in a notebook.
default_dataset_id (Optional[str]) – The default dataset id to use for uploading profiles. This is only used if the session is authenticated. This is a convenience argument so that you don’t have to supply the dataset id every time you upload a profile if you’re only using a single dataset id.
config_path (Optional[str]) –
kwargs (bool) –

Return type

whylogs.api.whylabs.session.session.Session

class whylogs.DatasetProfileView(*, columns: Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView], dataset_timestamp: Optional[datetime.datetime], creation_timestamp: Optional[datetime.datetime], metrics: Optional[Dict[str, Any]] = None, metadata: Optional[Dict[str, str]] = None)#

Bases: whylogs.api.writer.writer.Writable

Helper class that provides a standard way to create an ABC using inheritance.

Parameters

columns (Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]) –
dataset_timestamp (Optional[datetime.datetime]) –
creation_timestamp (Optional[datetime.datetime]) –
metrics (Optional[Dict[str, Any]]) –
metadata (Optional[Dict[str, str]]) –

property dataset_timestamp: Optional[datetime.datetime]#

Return type: Optional[datetime.datetime]

property creation_timestamp: Optional[datetime.datetime]#

Return type: Optional[datetime.datetime]

property metadata: Dict[str, str]#

Return type: Dict[str, str]

property model_performance_metrics: Any#

Return type: Any

add_model_performance_metrics(metric: Any) → None#

Parameters: metric (Any) –
Return type: None

merge(other: DatasetProfileView) → DatasetProfileView#

Parameters: other (DatasetProfileView) –
Return type: DatasetProfileView

get_column(col_name: str) → Optional[whylogs.core.view.column_profile_view.ColumnProfileView]#

Parameters: col_name (str) –
Return type: Optional[whylogs.core.view.column_profile_view.ColumnProfileView]

get_columns(col_names: Optional[List[str]] = None) → Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]#

Parameters: col_names (Optional[List[str]]) –
Return type: Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]

get_default_path() → str#

Return type: str

write(path: Optional[str] = None, **kwargs: Any) → Tuple[bool, str]#

Parameters

path (Optional[str]) –
kwargs (Any) –

Return type

Tuple[bool, str]

serialize() → bytes#

Return type: bytes

classmethod zero() → DatasetProfileView#

Return type: DatasetProfileView

classmethod deserialize(data: bytes) → DatasetProfileView#

Parameters: data (bytes) –
Return type: DatasetProfileView

classmethod read(path: str) → DatasetProfileView#

Parameters: path (str) –
Return type: DatasetProfileView

to_pandas(column_metric: Optional[str] = None, cfg: Optional[whylogs.core.configs.SummaryConfig] = None) → whylogs.core.stubs.pd.DataFrame#

Parameters

column_metric (Optional[str]) –
cfg (Optional[whylogs.core.configs.SummaryConfig]) –

Return type

whylogs.core.stubs.pd.DataFrame

whylogs.v0_to_v1_view(msg: whylogs.core.proto.v0.DatasetProfileMessageV0, allow_partial: bool = False) → whylogs.core.DatasetProfileView#

Parameters

msg (whylogs.core.proto.v0.DatasetProfileMessageV0) –
allow_partial (bool) –

Return type

whylogs.core.DatasetProfileView

whylogs.package_version(package: str = __package__) → str#

Calculate version number based on pyproject.toml

Parameters: package (str) –
Return type: str

whylogs#

Subpackages#

Package Contents#

Classes#

Functions#

`whylogs`#