whylogs#

whylogs is an open source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to:

  • Track changes in their dataset

  • Create data constraints to know whether their data looks they way it should

  • Quickly visualize key summary statistics about their datasets

These three functionalities enable a variety of use cases for data scientists, machine learning engineers, and

data engineers:

  • Detecting data drift (and resultant ML model performance degradation)

  • Data quality validation

  • Exploratory data analysis via data profiling

  • Tracking data for ML experiments

  • And many more…

Subpackages#

Package Contents#

Classes#

ResultSet

A holder object for profiling results.

DatasetProfileView

Helper class that provides a standard way to create an ABC using

Functions#

log(→ result_set.ResultSet)

log_classification_metrics(→ result_set.ResultSet)

Function to track metrics based on validation data.

log_regression_metrics(→ result_set.ResultSet)

Function to track regression metrics based on validation data.

profiling(*[, schema])

read(→ result_set.ResultSet)

write(→ None)

init(→ whylogs.api.whylabs.session.session.Session)

Set up authentication for this whylogs logging session. There are three modes that you can authentiate in.

v0_to_v1_view(→ whylogs.core.DatasetProfileView)

package_version(→ str)

Calculate version number based on pyproject.toml

class whylogs.ResultSet#

Bases: abc.ABC

A holder object for profiling results.

A whylogs.log call can result in more than one profile. This wrapper class simplifies the navigation among these profiles.

Note that currently we only hold one profile but we’re planning to add other kinds of profiles such as segmented profiles here.

property metadata: Optional[Dict[str, str]]#
Return type

Optional[Dict[str, str]]

property count: int#
Return type

int

property performance_metrics: Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]#
Return type

Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]

static read(multi_profile_file: str) ResultSet#
Parameters

multi_profile_file (str) –

Return type

ResultSet

static reader(name: str = 'local') ResultSetReader#
Parameters

name (str) –

Return type

ResultSetReader

writer(name: str = 'local') ResultSetWriter#
Parameters

name (str) –

Return type

ResultSetWriter

abstract view() Optional[whylogs.core.DatasetProfileView]#
Return type

Optional[whylogs.core.DatasetProfileView]

abstract profile() Optional[whylogs.core.DatasetProfile]#
Return type

Optional[whylogs.core.DatasetProfile]

get_writables() Optional[List[whylogs.api.writer.writer.Writable]]#
Return type

Optional[List[whylogs.api.writer.writer.Writable]]

set_dataset_timestamp(dataset_timestamp: datetime.datetime) None#
Parameters

dataset_timestamp (datetime.datetime) –

Return type

None

add_model_performance_metrics(metrics: whylogs.core.model_performance_metrics.ModelPerformanceMetrics) None#
Parameters

metrics (whylogs.core.model_performance_metrics.ModelPerformanceMetrics) –

Return type

None

add_metric(name: str, metric: whylogs.core.metrics.metrics.Metric) None#
Parameters
Return type

None

abstract merge(other: ResultSet) ResultSet#
Parameters

other (ResultSet) –

Return type

ResultSet

whylogs.log(obj: Any = None, *, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Dict[str, Any]] = None, schema: Optional[whylogs.core.DatasetSchema] = None, name: Optional[str] = None, multiple: Optional[Dict[str, Loggable]] = None, dataset_timestamp: Optional[datetime.datetime] = None, trace_id: Optional[str] = None, tags: Optional[List[str]] = None, segment_key_values: Optional[Dict[str, str]] = None, debug_event: Optional[Dict[str, Any]] = None) result_set.ResultSet#
Parameters
  • obj (Any) –

  • pandas (Optional[whylogs.core.stubs.pd.DataFrame]) –

  • row (Optional[Dict[str, Any]]) –

  • schema (Optional[whylogs.core.DatasetSchema]) –

  • name (Optional[str]) –

  • multiple (Optional[Dict[str, Loggable]]) –

  • dataset_timestamp (Optional[datetime.datetime]) –

  • trace_id (Optional[str]) –

  • tags (Optional[List[str]]) –

  • segment_key_values (Optional[Dict[str, str]]) –

  • debug_event (Optional[Dict[str, Any]]) –

Return type

result_set.ResultSet

whylogs.log_classification_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, score_column: Optional[str] = None, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) result_set.ResultSet#

Function to track metrics based on validation data. user may also pass the associated attribute names associated with target, prediction, and/or score. :param targets: actual validated values :type targets: List[Union[str, bool, float, int]] :param predictions: inferred/predicted values :type predictions: List[Union[str, bool, float, int]] :param scores: assocaited scores for each inferred, all values set to 1 if not

passed

Parameters
Return type

result_set.ResultSet

whylogs.log_regression_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) result_set.ResultSet#

Function to track regression metrics based on validation data. user may also pass the associated attribute names associated with target, prediction, and/or score. :param targets: actual validated values :type targets: List[Union[str, bool, float, int]] :param predictions: inferred/predicted values :type predictions: List[Union[str, bool, float, int]] :param scores: assocaited scores for each inferred, all values set to 1 if not

passed

Parameters
Return type

result_set.ResultSet

whylogs.profiling(*, schema: Optional[whylogs.core.DatasetSchema] = None)#
Parameters

schema (Optional[whylogs.core.DatasetSchema]) –

whylogs.read(path: str) result_set.ResultSet#
Parameters

path (str) –

Return type

result_set.ResultSet

whylogs.write(profile: whylogs.core.DatasetProfile, base_dir: str) None#
Parameters
Return type

None

whylogs.init(reinit: bool = False, allow_anonymous: bool = True, allow_local: bool = False, whylabs_api_key: Optional[str] = None, default_dataset_id: Optional[str] = None, config_path: Optional[str] = None, **kwargs: bool) whylogs.api.whylabs.session.session.Session#

Set up authentication for this whylogs logging session. There are three modes that you can authentiate in.

  1. WHYLABS: Data is sent to WhyLabs and is associated with a specific WhyLabs account. You can get a WhyLabs api

    key from the WhyLabs Settings page after logging in.

  2. WHYLABS_ANONYMOUS: Data is sent to WhyLabs, but no authentication happens and no WhyLabs account is required.

    Sessions can be claimed into an account later on the WhyLabs website.

  3. LOCAL: No authentication. No data is automatically sent anywhere. Use this if you want to explore profiles

    locally or manually upload them somewhere.

Typically, you should only have to put why.init() with no arguments at the start of your application/notebook/script. The arguments allow for some customization of the logic that determines the session type. Here is the priority order:

  • If there is an api key directly supplied to init, then use it and authenticate session as WHYLABS.

  • If there is an api key in the environment variable WHYLABS_API_KEY, then use it and authenticate session as WHYLABS.

  • If there is an api key in the whylogs config file, then use it and authenticate session as WHYLABS.

  • If we’re in an interractive environment (notebook, colab, etc.) then prompt the user to pick a method explicitly.

    The options are determined by the allow* argument values to init().

  • If allow_anonymous is True, then authenticate session as WHYLABS_ANONYMOUS.

  • If allow_local is True, then authenticate session as LOCAL.

Parameters
  • session_type – Deprecated, use allow_anonymous and allow_local instead

  • reinit (bool) – Normally, init() is idempotent, so you can run it over and over again in a notebook without any issues, for example. If reinit=True then it will run the initialization logic again, so you can switch authentication methods without restarting.

  • allow_anonymous (bool) – If True, then the user will be able to choose WHYLABS_ANONYMOUS if no other authentication method is found.

  • allow_local (bool) – If True, then the user will be able to choose LOCAL if no other authentication method is found.

  • whylabs_api_key (Optional[str]) – A WhyLabs api key to use for uploading profiles. There are other ways that you can set an api key that don’t require direclty embedding it in code, like setting WHYLABS_API_KEY env variable or supplying the api key interractively via the init() prompt in a notebook.

  • default_dataset_id (Optional[str]) – The default dataset id to use for uploading profiles. This is only used if the session is authenticated. This is a convenience argument so that you don’t have to supply the dataset id every time you upload a profile if you’re only using a single dataset id.

  • config_path (Optional[str]) –

  • kwargs (bool) –

Return type

whylogs.api.whylabs.session.session.Session

class whylogs.DatasetProfileView(*, columns: Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView], dataset_timestamp: Optional[datetime.datetime], creation_timestamp: Optional[datetime.datetime], metrics: Optional[Dict[str, Any]] = None, metadata: Optional[Dict[str, str]] = None)#

Bases: whylogs.api.writer.writer.Writable

Helper class that provides a standard way to create an ABC using inheritance.

Parameters
property dataset_timestamp: Optional[datetime.datetime]#
Return type

Optional[datetime.datetime]

property creation_timestamp: Optional[datetime.datetime]#
Return type

Optional[datetime.datetime]

property metadata: Dict[str, str]#
Return type

Dict[str, str]

property model_performance_metrics: Any#
Return type

Any

add_model_performance_metrics(metric: Any) None#
Parameters

metric (Any) –

Return type

None

merge(other: DatasetProfileView) DatasetProfileView#
Parameters

other (DatasetProfileView) –

Return type

DatasetProfileView

get_column(col_name: str) Optional[whylogs.core.view.column_profile_view.ColumnProfileView]#
Parameters

col_name (str) –

Return type

Optional[whylogs.core.view.column_profile_view.ColumnProfileView]

get_columns(col_names: Optional[List[str]] = None) Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]#
Parameters

col_names (Optional[List[str]]) –

Return type

Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]

get_default_path() str#
Return type

str

write(path: Optional[str] = None, **kwargs: Any) Tuple[bool, str]#
Parameters
  • path (Optional[str]) –

  • kwargs (Any) –

Return type

Tuple[bool, str]

serialize() bytes#
Return type

bytes

classmethod zero() DatasetProfileView#
Return type

DatasetProfileView

classmethod deserialize(data: bytes) DatasetProfileView#
Parameters

data (bytes) –

Return type

DatasetProfileView

classmethod read(path: str) DatasetProfileView#
Parameters

path (str) –

Return type

DatasetProfileView

to_pandas(column_metric: Optional[str] = None, cfg: Optional[whylogs.core.configs.SummaryConfig] = None) whylogs.core.stubs.pd.DataFrame#
Parameters
Return type

whylogs.core.stubs.pd.DataFrame

whylogs.v0_to_v1_view(msg: whylogs.core.proto.v0.DatasetProfileMessageV0, allow_partial: bool = False) whylogs.core.DatasetProfileView#
Parameters
  • msg (whylogs.core.proto.v0.DatasetProfileMessageV0) –

  • allow_partial (bool) –

Return type

whylogs.core.DatasetProfileView

whylogs.package_version(package: str = __package__) str#

Calculate version number based on pyproject.toml

Parameters

package (str) –

Return type

str