whylogs.app

The whylogs client application API

Package Contents

Classes

Logger

Class for logging whylogs statistics.

Session

param project

The project name. We will default to the project name when logging

SessionConfig

Config for a whylogs session.

WriterConfig

Config for whylogs writers

Functions

load_config(path_to_config: str = None)

Load logging configuration, from disk and from the environment.

Attributes

__ALL__

whylogs.app.load_config(path_to_config: str = None)

Load logging configuration, from disk and from the environment.

Config is loaded by attempting to load files in the following order. The first valid file will be used

  1. Path set in WHYLOGS_CONFIG environment variable

  2. Current directory’s .whylogs.yaml file

  3. ~/.whylogs.yaml (home directory)

  4. /opt/whylogs/.whylogs.yaml path

Returns

config – Config for the logger, if a valid config file is found, else returns None.

Return type

SessionConfig, None

class whylogs.app.Logger(session_id: str, dataset_name: str, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = {}, metadata: Dict[str, str] = None, writers=List[Writer], verbose: bool = False, with_rotation_time: Optional[str] = None, interval: int = 1, cache_size: int = 1, segments: Optional[Union[List[Segment], List[str]]] = None, profile_full_dataset: bool = False, constraints: whylogs.core.statistics.constraints.DatasetConstraints = None)

Class for logging whylogs statistics.

Parameters
  • session_id – The session ID value. Should be set by the Session boject

  • dataset_name – The name of the dataset. Gets included in the DatasetProfile metadata and can be used in generated filenames.

  • dataset_timestamp – Optional. The timestamp that the logger represents

  • session_timestamp – Optional. The time the session was created

  • tags – Optional. Dictionary of key, value for aggregating data upstream

  • metadata – Optional. Dictionary of key, value. Useful for debugging (associated with every single dataset profile)

  • writers – Optional. List of Writer objects used to write out the data

  • with_rotation_time – Optional. Log rotation interval, consisting of digits with unit specification, e.g. 30s, 2h, d. units are seconds (“s”), minutes (“m”), hours, (“h”), or days (“d”) Output filenames will have a suffix reflecting the rotation interval.

  • interval – Deprecated: Interval multiplier for with_rotation_time, defaults to 1.

  • verbose – enable debug logging

  • cache_size – dataprofiles to cache

  • segments – define either a list of segment keys or a list of segments tags: [ {“key”:<featurename>,”value”: <featurevalue>},… ]

  • profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset.

  • constraints – static assertions to be applied to streams and summaries.

__enter__(self)
__exit__(self, exc_type, exc_val, exc_tb)
property profile(self)whylogs.core.DatasetProfile
Returns

the last backing dataset profile

Return type

DatasetProfile

tracking_checks(self)
property segmented_profiles(self)Dict[str, whylogs.core.DatasetProfile]
Returns

the last backing dataset profile

Return type

Dict[str, DatasetProfile]

get_segment(self, segment: Segment)Optional[whylogs.core.DatasetProfile]
set_segments(self, segments: Union[List[Segment], List[str]])None
_intialize_profiles(self, dataset_timestamp: Optional[datetime.datetime] = datetime.datetime.now(datetime.timezone.utc))None
_set_rotation(self, with_rotation_time: str = None)
rotate_when(self, time)
should_rotate(self)
_rotate_time(self)

rotate with time add a suffix

flush(self, rotation_suffix: str = None)

Synchronously perform all remaining write tasks

full_profile_check(self)bool

returns a bool to determine if unsegmented dataset should be profiled.

close(self)Optional[whylogs.core.DatasetProfile]

Flush and close out the logger, outputs the last profile

Returns

the result dataset profile. None if the logger is closed

log(self, features: Optional[Dict[str, any]] = None, feature_name: str = None, value: any = None)

Logs a collection of features or a single feature (must specify one or the other).

Parameters
  • features – a map of key value feature for model input

  • feature_name – a dictionary of key->value for multiple features. Each entry represent a single columnar feature

  • feature_name – name of a single feature. Cannot be specified if ‘features’ is specified

  • value – value of as single feature. Cannot be specified if ‘features’ is specified

log_segment_datum(self, feature_name, value)
log_metrics(self, targets, predictions, scores=None, model_type: whylogs.proto.ModelType = None, target_field=None, prediction_field=None, score_field=None)
log_image(self, image, feature_transforms: Optional[List[Callable]] = None, metadata_attributes: Optional[List[str]] = METADATA_DEFAULT_ATTRIBUTES, feature_name: str = '')

API to track an image, either in PIL format or as an input path

Parameters
  • feature_name – name of the feature

  • metadata_attributes – metadata attributes to extract for the images

  • feature_transforms – a list of callables to transform the input into metrics

log_local_dataset(self, root_dir, folder_feature_name='folder_feature', image_feature_transforms=None, show_progress=False)

Log a local folder dataset It will log data from the files, along with structure file data like metadata, and magic numbers. If the folder has single layer for children folders, this will pick up folder names as a segmented feature

Parameters
  • root_dir (str) – directory where dataset is located.

  • folder_feature_name (str, optional) – Name for the subfolder features, i.e. class, store etc.

  • v (None, optional) – image transform that you would like to use with the image log

Raises

NotImplementedError – Description

log_annotation(self, annotation_data)

Log structured annotation data ie. JSON like structures

Parameters

annotation_data (Dict or List) – Description

Returns

Description

Return type

TYPE

log_csv(self, filepath_or_buffer: Union[str, pathlib.Path, io.IO[AnyStr]], segments: Optional[Union[List[Segment], List[str]]] = None, profile_full_dataset: bool = False, **kwargs)

Log a CSV file. This supports the same parameters as :func`pandas.red_csv<pandas.read_csv>` function.

Parameters
  • filepath_or_buffer (FilePathOrBuffer) – the path to the CSV or a CSV buffer

  • kwargs – from pandas:read_csv

  • segments – define either a list of segment keys or a list of segments tags: [ {“key”:<featurename>,”value”: <featurevalue>},… ]

  • profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the

dataset.

log_dataframe(self, df, segments: Optional[Union[List[Segment], List[str]]] = None, profile_full_dataset: bool = False)

Generate and log a whylogs DatasetProfile from a pandas dataframe :param profile_full_dataset: when segmenting dataset, an option to keep the full unsegmented profile of the

dataset.

Parameters
  • segments – specify the tag key value pairs for segments

  • df – the Pandas dataframe to log

log_segments(self, data)
log_segments_keys(self, data)
log_fixed_segments(self, data)
log_df_segment(self, df, segment: Segment)
is_active(self)

Return the boolean state of the logger

class whylogs.app.Session(project: str, pipeline: str, writers: List[whylogs.app.writers.Writer], verbose: bool = False, with_rotation_time: str = None, cache_size: int = None, report_progress: bool = False)
Parameters
  • project (str) – The project name. We will default to the project name when logging a dataset if the dataset name is not specified

  • pipeline (str) – Name of the pipeline associated with this session

  • writers (list) – configuration for the output writers. This is where the log data will go

  • verbose (bool) – enable verbose logging for not. Default is False

__enter__(self)
__exit__(self, tpe, value, traceback)
__repr__(self)

Return repr(self).

get_config(self)
is_active(self)
logger(self, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None, segments: Optional[List[whylogs.app.logger.Segment]] = None, profile_full_dataset: bool = False, with_rotation_time: str = None, cache_size: int = 1, constraints: whylogs.core.statistics.constraints.DatasetConstraints = None)whylogs.app.logger.Logger

Create a new logger or return an existing one for a given dataset name. If no dataset_name is specified, we default to project name

Parameters

args (_LoggerKey) – The properties of the logger if they’re anything but the defaults.

Returns

ylog – whylogs logger

Return type

whylogs.app.logger.Logger

get_logger(self, dataset_name: str = None)
log_dataframe(self, df: pandas.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None, segments: Optional[Union[List[Dict], List[str]]] = None, profile_full_dataset: bool = False, constraints: whylogs.core.statistics.constraints.DatasetConstraints = None)Optional[whylogs.core.DatasetProfile]

Perform statistics caluclations and log a pandas dataframe

Parameters
  • df – the dataframe to profile

  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

  • segments – can be either

  • a list of tag key value pairs for marking the segment of the data

  • a list of tag keys to group the data by

Parameters

profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset

Returns

a dataset profile if the session is active

profile_dataframe(self, df: pandas.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None)Optional[whylogs.core.DatasetProfile]

Profile a Pandas dataframe without actually writing data to disk. This is useful when you just want to quickly capture and explore a dataset profile.

Parameters
  • df – the dataframe to profile

  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

Returns

a dataset profile if the session is active

new_profile(self, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None)Optional[whylogs.core.DatasetProfile]

Create an empty dataset profile with the metadata from the session.

Parameters
  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

Returns

a dataset profile if the session is active

close(self)

Deactivate this session and flush all associated loggers

remove_logger(self, dataset_name: str)

Remove a logger from the dataset. This is called by the logger when it’s being closed

Parameters
  • the name of the dataset. used to identify the logger (dataset_name) –

  • None (Returns) –

  • -------

class whylogs.app.SessionConfig(project: str, pipeline: str, writers: List[WriterConfig], verbose: bool = False, with_rotation_time: str = None, cache_size: int = 1, report_progress: bool = False)

Config for a whylogs session.

See also SessionConfigSchema

Parameters
  • project (str) – Project associated with this whylogs session

  • pipeline (str) – Name of the associated data pipeline

  • writers (list) – A list of WriterConfig objects defining writer outputs

  • verbose (bool, default=False) – Output verbosity

  • with_rotation_time (str, default = None, to rotate profiles with time, takes values of overall rotation interval,) – “s” for seconds “m” for minutes “h” for hours “d” for days

  • cache_size (int default =1, sets how many dataprofiles to cache in logger during rotation) –

to_yaml(self, stream=None)

Serialize this config to YAML

Parameters

stream – If None (default) return a string, else dump the yaml into this stream.

static from_yaml(stream)

Load config from yaml

Parameters

stream (str, file-obj) – String or file-like object to load yaml from

Returns

config – Generated config

Return type

SessionConfig

class whylogs.app.WriterConfig(type: str, formats: List[str], output_path: str, path_template: Optional[str] = None, filename_template: Optional[str] = None, data_collection_consent: Optional[bool] = False)

Config for whylogs writers

See also:

Parameters
  • type (str) – Destination for the writer output, e.g. ‘local’ or ‘s3’

  • formats (list) – All output formats. See ALL_SUPPORTED_FORMATS

  • output_path (str) – Prefix of where to output files. A directory for type = ‘local’, or key prefix for type = ‘s3’

  • path_template (str, optional) – Templatized path output using standard python string templates. Variables are accessed via $identifier or ${identifier}. See whylogs.app.writers.Writer.template_params() for a list of available identifers. Default = whylogs.app.writers.DEFAULT_PATH_TEMPLATE

  • filename_template (str, optional) – Templatized output filename using standardized python string templates. Variables are accessed via $identifier or ${identifier}. See whylogs.app.writers.Writer.template_params() for a list of available identifers. Default = whylogs.app.writers.DEFAULT_FILENAME_TEMPLATE

to_yaml(self, stream=None)

Serialize this config to YAML

Parameters

stream – If None (default) return a string, else dump the yaml into this stream.

static from_yaml(stream, **kwargs)

Load config from yaml

Parameters
  • stream (str, file-obj) – String or file-like object to load yaml from

  • kwargs – ignored

Returns

config – Generated config

Return type

WriterConfig

whylogs.app.__ALL__