whylogs
#
whylogs is an open source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to:
Track changes in their dataset
Create data constraints to know whether their data looks they way it should
Quickly visualize key summary statistics about their datasets
- These three functionalities enable a variety of use cases for data scientists, machine learning engineers, and
data engineers:
Detecting data drift (and resultant ML model performance degradation)
Data quality validation
Exploratory data analysis via data profiling
Tracking data for ML experiments
And many more…
Subpackages#
whylogs.api
whylogs.core
whylogs.core.constraints
whylogs.core.metrics
whylogs.core.metrics.aggregators
whylogs.core.metrics.column_metrics
whylogs.core.metrics.compound_metric
whylogs.core.metrics.condition_count_metric
whylogs.core.metrics.decorators
whylogs.core.metrics.deserializers
whylogs.core.metrics.maths
whylogs.core.metrics.metric_components
whylogs.core.metrics.metrics
whylogs.core.metrics.multimetric
whylogs.core.metrics.serializers
whylogs.core.metrics.unicode_range
whylogs.core.model_performance_metrics
whylogs.core.proto
whylogs.core.validators
whylogs.core.view
whylogs.core.column_profile
whylogs.core.common
whylogs.core.configs
whylogs.core.dataset_profile
whylogs.core.datatypes
whylogs.core.errors
whylogs.core.feature_weights
whylogs.core.input_resolver
whylogs.core.metadata
whylogs.core.metric_getters
whylogs.core.predicate_parser
whylogs.core.preprocessing
whylogs.core.projectors
whylogs.core.relations
whylogs.core.resolvers
whylogs.core.schema
whylogs.core.segment
whylogs.core.segmentation_partition
whylogs.core.specialized_resolvers
whylogs.datasets
whylogs.experimental
whylogs.experimental.constraints_generation
whylogs.experimental.constraints_generation.condition_counts
whylogs.experimental.constraints_generation.count_metrics
whylogs.experimental.constraints_generation.distribution_metrics
whylogs.experimental.constraints_generation.frequent_items
whylogs.experimental.constraints_generation.multi_metrics
whylogs.experimental.constraints_generation.types_metrics
whylogs.experimental.performance_estimation
whylogs.migration
whylogs.viz
Package Contents#
Classes#
A holder object for profiling results. |
|
Helper class that provides a standard way to create an ABC using |
Functions#
|
|
|
Function to track metrics based on validation data. |
|
Function to track regression metrics based on validation data. |
|
|
|
|
|
|
|
Set up authentication for this whylogs logging session. There are three modes that you can authentiate in. |
|
|
|
Calculate version number based on pyproject.toml |
- class whylogs.ResultSet#
Bases:
abc.ABC
A holder object for profiling results.
A whylogs.log call can result in more than one profile. This wrapper class simplifies the navigation among these profiles.
Note that currently we only hold one profile but we’re planning to add other kinds of profiles such as segmented profiles here.
- property performance_metrics: Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]#
- Return type
Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]
- abstract view() Optional[whylogs.core.DatasetProfileView] #
- Return type
Optional[whylogs.core.DatasetProfileView]
- abstract profile() Optional[whylogs.core.DatasetProfile] #
- Return type
Optional[whylogs.core.DatasetProfile]
- get_writables() Optional[List[whylogs.api.writer.writer.Writable]] #
- Return type
Optional[List[whylogs.api.writer.writer.Writable]]
- set_dataset_timestamp(dataset_timestamp: datetime.datetime) None #
- Parameters
dataset_timestamp (datetime.datetime) –
- Return type
- add_model_performance_metrics(metrics: whylogs.core.model_performance_metrics.ModelPerformanceMetrics) None #
- Parameters
metrics (whylogs.core.model_performance_metrics.ModelPerformanceMetrics) –
- Return type
- add_metric(name: str, metric: whylogs.core.metrics.metrics.Metric) None #
- Parameters
name (str) –
metric (whylogs.core.metrics.metrics.Metric) –
- Return type
- whylogs.log(obj: Any = None, *, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Dict[str, Any]] = None, schema: Optional[whylogs.core.DatasetSchema] = None, name: Optional[str] = None, multiple: Optional[Dict[str, Loggable]] = None, dataset_timestamp: Optional[datetime.datetime] = None, trace_id: Optional[str] = None, tags: Optional[List[str]] = None, segment_key_values: Optional[List[Dict[str, str]]] = None) result_set.ResultSet #
- Parameters
obj (Any) –
pandas (Optional[whylogs.core.stubs.pd.DataFrame]) –
row (Optional[Dict[str, Any]]) –
schema (Optional[whylogs.core.DatasetSchema]) –
name (Optional[str]) –
multiple (Optional[Dict[str, Loggable]]) –
dataset_timestamp (Optional[datetime.datetime]) –
trace_id (Optional[str]) –
tags (Optional[List[str]]) –
- Return type
- whylogs.log_classification_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, score_column: Optional[str] = None, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) result_set.ResultSet #
Function to track metrics based on validation data. user may also pass the associated attribute names associated with target, prediction, and/or score. :param targets: actual validated values :type targets: List[Union[str, bool, float, int]] :param predictions: inferred/predicted values :type predictions: List[Union[str, bool, float, int]] :param scores: assocaited scores for each inferred, all values set to 1 if not
passed
- Parameters
data (whylogs.core.stubs.pd.DataFrame) –
target_column (str) –
prediction_column (str) –
score_column (Optional[str]) –
schema (Optional[whylogs.core.DatasetSchema]) –
log_full_data (bool) –
dataset_timestamp (Optional[datetime.datetime]) –
- Return type
- whylogs.log_regression_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) result_set.ResultSet #
Function to track regression metrics based on validation data. user may also pass the associated attribute names associated with target, prediction, and/or score. :param targets: actual validated values :type targets: List[Union[str, bool, float, int]] :param predictions: inferred/predicted values :type predictions: List[Union[str, bool, float, int]] :param scores: assocaited scores for each inferred, all values set to 1 if not
passed
- Parameters
data (whylogs.core.stubs.pd.DataFrame) –
target_column (str) –
prediction_column (str) –
schema (Optional[whylogs.core.DatasetSchema]) –
log_full_data (bool) –
dataset_timestamp (Optional[datetime.datetime]) –
- Return type
- whylogs.profiling(*, schema: Optional[whylogs.core.DatasetSchema] = None)#
- Parameters
schema (Optional[whylogs.core.DatasetSchema]) –
- whylogs.write(profile: whylogs.core.DatasetProfile, base_dir: str) None #
- Parameters
profile (whylogs.core.DatasetProfile) –
base_dir (str) –
- Return type
- whylogs.init(reinit: bool = False, allow_anonymous: bool = True, allow_local: bool = False, whylabs_api_key: Optional[str] = None, default_dataset_id: Optional[str] = None, config_path: Optional[str] = None, **kwargs) whylogs.api.whylabs.session.session.Session #
Set up authentication for this whylogs logging session. There are three modes that you can authentiate in.
- WHYLABS: Data is sent to WhyLabs and is associated with a specific WhyLabs account. You can get a WhyLabs api
key from the WhyLabs Settings page after logging in.
- WHYLABS_ANONYMOUS: Data is sent to WhyLabs, but no authentication happens and no WhyLabs account is required.
Sessions can be claimed into an account later on the WhyLabs website.
- LOCAL: No authentication. No data is automatically sent anywhere. Use this if you want to explore profiles
locally or manually upload them somewhere.
Typically, you should only have to put why.init() with no arguments at the start of your application/notebook/script. The arguments allow for some customization of the logic that determines the session type. Here is the priority order:
If there is an api key directly supplied to init, then use it and authenticate session as WHYLABS.
If there is an api key in the environment variable WHYLABS_API_KEY, then use it and authenticate session as WHYLABS.
If there is an api key in the whylogs config file, then use it and authenticate session as WHYLABS.
- If we’re in an interractive environment (notebook, colab, etc.) then prompt the user to pick a method explicitly.
The options are determined by the allow* argument values to init().
If allow_anonymous is True, then authenticate session as WHYLABS_ANONYMOUS.
If allow_local is True, then authenticate session as LOCAL.
- Parameters
session_type – Deprecated, use allow_anonymous and allow_local instead
reinit (bool) – Normally, init() is idempotent, so you can run it over and over again in a notebook without any issues, for example. If reinit=True then it will run the initialization logic again, so you can switch authentication methods without restarting.
allow_anonymous (bool) – If True, then the user will be able to choose WHYLABS_ANONYMOUS if no other authentication method is found.
allow_local (bool) – If True, then the user will be able to choose LOCAL if no other authentication method is found.
whylabs_api_key (Optional[str]) – A WhyLabs api key to use for uploading profiles. There are other ways that you can set an api key that don’t require direclty embedding it in code, like setting WHYLABS_API_KEY env variable or supplying the api key interractively via the init() prompt in a notebook.
default_dataset_id (Optional[str]) – The default dataset id to use for uploading profiles. This is only used if the session is authenticated. This is a convenience argument so that you don’t have to supply the dataset id every time you upload a profile if you’re only using a single dataset id.
config_path (Optional[str]) –
- Return type
- class whylogs.DatasetProfileView(*, columns: Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView], dataset_timestamp: Optional[datetime.datetime], creation_timestamp: Optional[datetime.datetime], metrics: Optional[Dict[str, Any]] = None, metadata: Optional[Dict[str, str]] = None)#
Bases:
whylogs.api.writer.writer.Writable
Helper class that provides a standard way to create an ABC using inheritance.
- Parameters
columns (Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]) –
dataset_timestamp (Optional[datetime.datetime]) –
creation_timestamp (Optional[datetime.datetime]) –
metrics (Optional[Dict[str, Any]]) –
- property dataset_timestamp: Optional[datetime.datetime]#
- Return type
Optional[datetime.datetime]
- property creation_timestamp: Optional[datetime.datetime]#
- Return type
Optional[datetime.datetime]
- property model_performance_metrics: Any#
- Return type
Any
- merge(other: DatasetProfileView) DatasetProfileView #
- Parameters
other (DatasetProfileView) –
- Return type
- get_column(col_name: str) Optional[whylogs.core.view.column_profile_view.ColumnProfileView] #
- Parameters
col_name (str) –
- Return type
Optional[whylogs.core.view.column_profile_view.ColumnProfileView]
- get_columns(col_names: Optional[List[str]] = None) Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView] #
- Parameters
col_names (Optional[List[str]]) –
- Return type
Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]
- classmethod zero() DatasetProfileView #
- Return type
- classmethod deserialize(data: bytes) DatasetProfileView #
- Parameters
data (bytes) –
- Return type
- classmethod read(path: str) DatasetProfileView #
- Parameters
path (str) –
- Return type
- to_pandas(column_metric: Optional[str] = None, cfg: Optional[whylogs.core.configs.SummaryConfig] = None) whylogs.core.stubs.pd.DataFrame #
- Parameters
column_metric (Optional[str]) –
cfg (Optional[whylogs.core.configs.SummaryConfig]) –
- Return type
whylogs.core.stubs.pd.DataFrame
- whylogs.v0_to_v1_view(msg: whylogs.core.proto.v0.DatasetProfileMessageV0, allow_partial: bool = False) whylogs.core.DatasetProfileView #
- Parameters
msg (whylogs.core.proto.v0.DatasetProfileMessageV0) –
allow_partial (bool) –
- Return type