whylogs.api#

Subpackages#

Submodules#

Package Contents#

Classes#

ResultSet

A holder object for profiling results.

Functions#

profiling(*[, schema])

log(→ result_set.ResultSet)

log_classification_metrics(→ result_set.ResultSet)

Function to track metrics based on validation data.

log_regression_metrics(→ result_set.ResultSet)

Function to track regression metrics based on validation data.

read(→ result_set.ResultSet)

reader

write(→ None)

whylogs.api.profiling(*, schema: Optional[whylogs.core.DatasetSchema] = None)#
Parameters

schema (Optional[whylogs.core.DatasetSchema]) –

class whylogs.api.ResultSet#

Bases: abc.ABC

A holder object for profiling results.

A whylogs.log call can result in more than one profile. This wrapper class simplifies the navigation among these profiles.

Note that currently we only hold one profile but we’re planning to add other kinds of profiles such as segmented profiles here.

property metadata: Optional[Dict[str, str]]#
Return type

Optional[Dict[str, str]]

property count: int#
Return type

int

property performance_metrics: Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]#
Return type

Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]

static read(multi_profile_file: str) ResultSet#
Parameters

multi_profile_file (str) –

Return type

ResultSet

static reader(name: str = 'local') ResultSetReader#
Parameters

name (str) –

Return type

ResultSetReader

writer(name: str = 'local') ResultSetWriter#
Parameters

name (str) –

Return type

ResultSetWriter

abstract view() Optional[whylogs.core.DatasetProfileView]#
Return type

Optional[whylogs.core.DatasetProfileView]

abstract profile() Optional[whylogs.core.DatasetProfile]#
Return type

Optional[whylogs.core.DatasetProfile]

get_writables() Optional[List[whylogs.api.writer.writer.Writable]]#
Return type

Optional[List[whylogs.api.writer.writer.Writable]]

set_dataset_timestamp(dataset_timestamp: datetime.datetime) None#
Parameters

dataset_timestamp (datetime.datetime) –

Return type

None

add_model_performance_metrics(metrics: whylogs.core.model_performance_metrics.ModelPerformanceMetrics) None#
Parameters

metrics (whylogs.core.model_performance_metrics.ModelPerformanceMetrics) –

Return type

None

add_metric(name: str, metric: whylogs.core.metrics.metrics.Metric) None#
Parameters
Return type

None

abstract merge(other: ResultSet) ResultSet#
Parameters

other (ResultSet) –

Return type

ResultSet

whylogs.api.log(obj: Any = None, *, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Dict[str, Any]] = None, schema: Optional[whylogs.core.DatasetSchema] = None, name: Optional[str] = None, multiple: Optional[Dict[str, Loggable]] = None, dataset_timestamp: Optional[datetime.datetime] = None, trace_id: Optional[str] = None, tags: Optional[List[str]] = None, segment_key_values: Optional[Dict[str, str]] = None, debug_event: Optional[Dict[str, Any]] = None) result_set.ResultSet#
Parameters
  • obj (Any) –

  • pandas (Optional[whylogs.core.stubs.pd.DataFrame]) –

  • row (Optional[Dict[str, Any]]) –

  • schema (Optional[whylogs.core.DatasetSchema]) –

  • name (Optional[str]) –

  • multiple (Optional[Dict[str, Loggable]]) –

  • dataset_timestamp (Optional[datetime.datetime]) –

  • trace_id (Optional[str]) –

  • tags (Optional[List[str]]) –

  • segment_key_values (Optional[Dict[str, str]]) –

  • debug_event (Optional[Dict[str, Any]]) –

Return type

result_set.ResultSet

whylogs.api.log_classification_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, score_column: Optional[str] = None, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) result_set.ResultSet#

Function to track metrics based on validation data. user may also pass the associated attribute names associated with target, prediction, and/or score.

Parameters
  • data (pd.DataFrame) – Dataframe with the data to log.

  • target_column (str) – Column name for the actual validated values.

  • prediction_column (str) – Column name for the predicted values.

  • score_column (Optional[str], optional) – Associated scores for each inferred, all values set to 1 if None, by default None

  • schema (Optional[DatasetSchema], optional) – Defines the schema for tracking metrics in whylogs, by default None

  • log_full_data (bool, optional) – Whether to log the complete dataframe or not. If True, the complete DF will be logged in addition to the regression metrics. If False, only the calculated regression metrics will be logged. In a typical production use case, the ground truth might not be available at the time the remaining data is generated. In order to prevent double profiling the input features, consider leaving this as False. by default False.

  • dataset_timestamp (Optional[datetime], optional) – dataset’s timestamp, by default None

Return type

result_set.ResultSet

Examples

data = {
    "product": ["milk", "carrot", "cheese", "broccoli"],
    "category": ["dairies", "vegetables", "dairies", "vegetables"],
    "output_discount": [0, 0, 1, 1],
    "output_prediction": [0, 0, 0, 1],
}
df = pd.DataFrame(data)

results = why.log_classification_metrics(
        df,
        target_column="output_discount",
        prediction_column="output_prediction",
        log_full_data=True,
    )
whylogs.api.log_regression_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) result_set.ResultSet#

Function to track regression metrics based on validation data. User may also pass the associated attribute names associated with target, prediction, and/or score.

Parameters
  • data (pd.DataFrame) – Dataframe with the data to log.

  • target_column (str) – Column name for the target values.

  • prediction_column (str) – Column name for the predicted values.

  • schema (Optional[DatasetSchema], optional) – Defines the schema for tracking metrics in whylogs, by default None

  • log_full_data (bool, optional) – Whether to log the complete dataframe or not. If True, the complete DF will be logged in addition to the regression metrics. If False, only the calculated regression metrics will be logged. In a typical production use case, the ground truth might not be available at the time the remaining data is generated. In order to prevent double profiling the input features, consider leaving this as False. by default False.

  • dataset_timestamp (Optional[datetime], optional) – dataset’s timestamp, by default None

Returns

Return type

ResultSet

Examples

import pandas as pd
import whylogs as why

df = pd.DataFrame({"target_temperature": [[10.5, 24.3, 15.6]], "predicted_temperature": [[9.12,26.42,13.12]]})
results = why.log_regression_metrics(df, target_column = "temperature", prediction_column = "prediction_temperature")
whylogs.api.read(path: str) result_set.ResultSet#
Parameters

path (str) –

Return type

result_set.ResultSet

whylogs.api.reader(name: str) result_set.ResultSetReader#
Parameters

name (str) –

Return type

result_set.ResultSetReader

whylogs.api.write(profile: whylogs.core.DatasetProfile, base_dir: str) None#
Parameters
Return type

None