whylogs.datasets.base
#
Module Contents#
Classes#
Attributes#
- whylogs.datasets.base.logger#
- class whylogs.datasets.base.Batch(timestamp: datetime.date, data: pandas.DataFrame, dataset_config: whylogs.datasets.configs.DatasetConfig, version: str)#
Batch object that encapsulate data and information for a given batch.
timestamp: the batch’s timestamp (at the start)
data: the complete dataframe
features: input features
target: output feature(s)
prediction: output prediction and, possibly, features such as uncertainty, confidence, probability
misc: metadata features that are not of any of the previous categories, but still contain relevant information about the data.
A batch can represent either a baseline or inference batch. The complete data is a sum of the remaining dataframe properties: features, target, prediction, and misc.
- Parameters
timestamp (datetime.date) –
data (pandas.DataFrame) –
dataset_config (whylogs.datasets.configs.DatasetConfig) –
version (str) –
- property data: pandas.DataFrame#
The complete dataframe for all available features.
- Return type
- property timestamp: datetime.date#
The batch’s timestamp (at the start)
- Return type
- property target: pandas.DataFrame#
Ouput feature(s)
- Return type
- property prediction: pandas.DataFrame#
Output prediction and, possibly, features such as uncertainty, confidence, probability scores
- Return type
- property extra: pandas.DataFrame#
Metadata features that are not of any of the previous categories, but still contain relevant information about the data.
- Return type
- property features: pandas.DataFrame#
Input features
- Return type
- class whylogs.datasets.base.Dataset#
Bases:
abc.ABC
Abstract class representing a dataset.
- abstract classmethod describe_versions() List[str] #
Describe available versions for the given dataset.
- Return type
List[str]
- abstract set_parameters(inference_interval: str, baseline_timestamp: Optional[Union[datetime.date, datetime.datetime]] = None, inference_start_timestamp: Optional[Union[datetime.date, datetime.datetime]] = None, original: Optional[bool] = None) None #
Set interval and timestamp parameters for the dataset object.
- Parameters
inference_interval (str) – Time period for each batch retrieved from the inference dataset. E.g. daily batches would be set as “1d”
baseline_timestamp (Optional[Union[date, datetime]], optional) – The timestamp for the baseline dataset. Will be set to the dataset’s original timestamp if original=True. By default None
inference_start_timestamp (Optional[Union[date, datetime]], optional) – The timestamp for the start of the inference dataset. Will be set to the dataset’s original timestamp if original=True. By default None
original (Optional[bool], optional) – If true, will set both baseline_timestamp and inference_start_timestamp to its original values.
- Return type
- abstract get_baseline() Batch #
Get baseline Batch object.
- Returns
A batch object representing the complete baseline data.
- Return type
- abstract get_inference_data(target_date: Optional[Union[datetime.date, datetime.datetime]] = None, number_batches: Optional[int] = None) Union[Batch, Iterable[Batch]] #
Get inference batch(es)
- Parameters
target_date (Optional[Union[date, datetime]], optional) – If target_date is set, a single batch will be returned for the given date(or datetime). If both target_date and number_batches are defined, an error will be raised.
number_batches (Optional[int], optional) – If number_batches is set to n, an iterator of n inference batches will be returned, starting from inference_start_timestamp. If both target_date and number_batches are defined, an error will be raised.
- Returns
Can return either a single or multiple batches, according to the parameters passed.
- Return type