whylogs.datasets.base#

Module Contents#

Classes#

Batch

Batch object that encapsulate data and information for a given batch.

Dataset

Abstract class representing a dataset.

Attributes#

whylogs.datasets.base.logger#
class whylogs.datasets.base.Batch(timestamp: datetime.date, data: pandas.DataFrame, dataset_config: whylogs.datasets.configs.DatasetConfig, version: str)#

Batch object that encapsulate data and information for a given batch.

  • timestamp: the batch’s timestamp (at the start)

  • data: the complete dataframe

  • features: input features

  • target: output feature(s)

  • prediction: output prediction and, possibly, features such as uncertainty, confidence, probability

  • misc: metadata features that are not of any of the previous categories, but still contain relevant information about the data.

A batch can represent either a baseline or inference batch. The complete data is a sum of the remaining dataframe properties: features, target, prediction, and misc.

Parameters
property data: pandas.DataFrame#

The complete dataframe for all available features.

Return type

pandas.DataFrame

property timestamp: datetime.date#

The batch’s timestamp (at the start)

Return type

datetime.date

property target: pandas.DataFrame#

Ouput feature(s)

Return type

pandas.DataFrame

property prediction: pandas.DataFrame#

Output prediction and, possibly, features such as uncertainty, confidence, probability scores

Return type

pandas.DataFrame

property extra: pandas.DataFrame#

Metadata features that are not of any of the previous categories, but still contain relevant information about the data.

Return type

pandas.DataFrame

property features: pandas.DataFrame#

Input features

Return type

pandas.DataFrame

class whylogs.datasets.base.Dataset#

Bases: abc.ABC

Abstract class representing a dataset.

abstract classmethod describe_versions() List[str]#

Describe available versions for the given dataset.

Return type

List[str]

abstract classmethod describe() str#

Display overall dataset description.

Return type

str

abstract set_parameters(inference_interval: str, baseline_timestamp: Optional[Union[datetime.date, datetime.datetime]] = None, inference_start_timestamp: Optional[Union[datetime.date, datetime.datetime]] = None, original: Optional[bool] = None) None#

Set interval and timestamp parameters for the dataset object.

Parameters
  • inference_interval (str) – Time period for each batch retrieved from the inference dataset. E.g. daily batches would be set as “1d”

  • baseline_timestamp (Optional[Union[date, datetime]], optional) – The timestamp for the baseline dataset. Will be set to the dataset’s original timestamp if original=True. By default None

  • inference_start_timestamp (Optional[Union[date, datetime]], optional) – The timestamp for the start of the inference dataset. Will be set to the dataset’s original timestamp if original=True. By default None

  • original (Optional[bool], optional) – If true, will set both baseline_timestamp and inference_start_timestamp to its original values.

Return type

None

abstract get_baseline() Batch#

Get baseline Batch object.

Returns

A batch object representing the complete baseline data.

Return type

Batch

abstract get_inference_data(target_date: Optional[Union[datetime.date, datetime.datetime]] = None, number_batches: Optional[int] = None) Union[Batch, Iterable[Batch]]#

Get inference batch(es)

Parameters
  • target_date (Optional[Union[date, datetime]], optional) – If target_date is set, a single batch will be returned for the given date(or datetime). If both target_date and number_batches are defined, an error will be raised.

  • number_batches (Optional[int], optional) – If number_batches is set to n, an iterator of n inference batches will be returned, starting from inference_start_timestamp. If both target_date and number_batches are defined, an error will be raised.

Returns

Can return either a single or multiple batches, according to the parameters passed.

Return type

Union[Batch, Iterable[Batch]]