`whylogs.datasets.base`#

Module Contents#

Classes#

`Batch`	Batch object that encapsulate data and information for a given batch.
`Dataset`	Abstract class representing a dataset.

Attributes#

logger

whylogs.datasets.base.logger#

class whylogs.datasets.base.Batch(timestamp: datetime.date, data: pandas.DataFrame, dataset_config: whylogs.datasets.configs.DatasetConfig, version: str)#

Batch object that encapsulate data and information for a given batch.

timestamp: the batch’s timestamp (at the start)
data: the complete dataframe
features: input features
target: output feature(s)
prediction: output prediction and, possibly, features such as uncertainty, confidence, probability
misc: metadata features that are not of any of the previous categories, but still contain relevant information about the data.

A batch can represent either a baseline or inference batch. The complete data is a sum of the remaining dataframe properties: features, target, prediction, and misc.

Parameters

timestamp (datetime.date) –
data (pandas.DataFrame) –
dataset_config (whylogs.datasets.configs.DatasetConfig) –
version (str) –

property data: pandas.DataFrame#

The complete dataframe for all available features.

Return type: pandas.DataFrame

property timestamp: datetime.date#

The batch’s timestamp (at the start)

Return type: datetime.date

property target: pandas.DataFrame#

Ouput feature(s)

Return type: pandas.DataFrame

property prediction: pandas.DataFrame#

Output prediction and, possibly, features such as uncertainty, confidence, probability scores

Return type: pandas.DataFrame

property extra: pandas.DataFrame#

Metadata features that are not of any of the previous categories, but still contain relevant information about the data.

Return type: pandas.DataFrame

property features: pandas.DataFrame#

Input features

Return type: pandas.DataFrame

class whylogs.datasets.base.Dataset#

Bases: abc.ABC

Abstract class representing a dataset.

abstract classmethod describe_versions() → List[str]#

Describe available versions for the given dataset.

Return type: List[str]

abstract classmethod describe() → str#

Display overall dataset description.

Return type: str

abstract set_parameters(inference_interval: str, baseline_timestamp: Optional[Union[datetime.date, datetime.datetime]] = None, inference_start_timestamp: Optional[Union[datetime.date, datetime.datetime]] = None, original: Optional[bool] = None) → None#

Set interval and timestamp parameters for the dataset object.

Parameters

inference_interval (str) – Time period for each batch retrieved from the inference dataset. E.g. daily batches would be set as “1d”
baseline_timestamp (Optional[Union[date, datetime]], optional) – The timestamp for the baseline dataset. Will be set to the dataset’s original timestamp if original=True. By default None
inference_start_timestamp (Optional[Union[date, datetime]], optional) – The timestamp for the start of the inference dataset. Will be set to the dataset’s original timestamp if original=True. By default None
original (Optional[bool], optional) – If true, will set both baseline_timestamp and inference_start_timestamp to its original values.

Return type

None

abstract get_baseline() → Batch#

Get baseline Batch object.

Returns: A batch object representing the complete baseline data.
Return type: Batch

abstract get_inference_data(target_date: Optional[Union[datetime.date, datetime.datetime]] = None, number_batches: Optional[int] = None) → Union[Batch, Iterable[Batch]]#

Get inference batch(es)

Parameters

target_date (Optional[Union[date, datetime]], optional) – If target_date is set, a single batch will be returned for the given date(or datetime). If both target_date and number_batches are defined, an error will be raised.
number_batches (Optional[int], optional) – If number_batches is set to n, an iterator of n inference batches will be returned, starting from inference_start_timestamp. If both target_date and number_batches are defined, an error will be raised.

Returns

Can return either a single or multiple batches, according to the parameters passed.

Return type

Union[Batch, Iterable[Batch]]

whylogs.datasets.base#

Module Contents#

Classes#

Attributes#

`whylogs.datasets.base`#