whylogs.app

The whylogs client application API

Submodules

Package Contents

Classes

Logger

Class for logging whylogs statistics.

Session

param project

The project name. We will default to the project name when logging

SessionConfig

Config for a whylogs session.

WriterConfig

Config for whylogs writers

Functions

load_config(path_to_config: str = None)

Load logging configuration, from disk and from the environment.

Attributes

__ALL__

whylogs.app.load_config(path_to_config: str = None)

Load logging configuration, from disk and from the environment.

Config is loaded by attempting to load files in the following order. The first valid file will be used

  1. Path set in WHYLOGS_CONFIG environment variable

  2. Current directory’s .whylogs.yaml file

  3. ~/.whylogs.yaml (home directory)

  4. /opt/whylogs/.whylogs.yaml path

Returns

config – Config for the logger, if a valid config file is found, else returns None.

Return type

SessionConfig, None

class whylogs.app.Logger(session_id: str, dataset_name: str, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Optional[Dict[str, str]] = None, metadata: Optional[Dict[str, str]] = None, writers: Optional[List[whylogs.app.writers.Writer]] = None, metadata_writer: Optional[whylogs.app.metadata_writer.MetadataWriter] = None, verbose: bool = False, with_rotation_time: Optional[str] = None, interval: int = 1, cache_size: int = 1, segments: Optional[Union[List[Segment], List[str], str]] = None, profile_full_dataset: bool = False, constraints: Optional[whylogs.core.statistics.constraints.DatasetConstraints] = None)

Class for logging whylogs statistics.

Parameters
  • session_id – The session ID value. Should be set by the Session boject

  • dataset_name – The name of the dataset. Gets included in the DatasetProfile metadata and can be used in generated filenames.

  • dataset_timestamp – Optional. The timestamp that the logger represents

  • session_timestamp – Optional. The time the session was created

  • tags – Optional. Dictionary of key, value for aggregating data upstream

  • metadata – Optional. Dictionary of key, value. Useful for debugging (associated with every single dataset profile)

  • writers – Optional. List of Writer objects used to write out the data

  • metadata_writer – Optional. MetadataWriter object used to write non-profile information

  • with_rotation_time – Optional. Log rotation interval, consisting of digits with unit specification, e.g. 30s, 2h, d. units are seconds (“s”), minutes (“m”), hours, (“h”), or days (“d”) Output filenames will have a suffix reflecting the rotation interval.

  • interval – Deprecated: Interval multiplier for with_rotation_time, defaults to 1.

  • verbose – enable debug logging

  • cache_size – dataprofiles to cache

  • segments

    Can be either:
    • Autosegmentation source, one of [“auto”, “local”]

    • List of tag key value pairs for tracking data segments

    • List of tag keys for which we will track every value

    • None, no segments will be used

  • profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset.

  • constraints – static assertions to be applied to streams and summaries.

__enter__(self)
__exit__(self, exc_type, exc_val, exc_tb)
property profile(self) whylogs.core.DatasetProfile
Returns

the last backing dataset profile

Return type

DatasetProfile

tracking_checks(self)
property segmented_profiles(self) Dict[str, whylogs.core.DatasetProfile]
Returns

the last backing dataset profile

Return type

Dict[str, DatasetProfile]

get_segment(self, segment: Segment) Optional[whylogs.core.DatasetProfile]
set_segments(self, segments: Union[List[Segment], List[str], str]) None
_retrieve_local_segments(self) Union[List[Segment], List[str], str]

Retrieves local segments

_intialize_profiles(self, dataset_timestamp: Optional[datetime.datetime] = datetime.datetime.now(datetime.timezone.utc)) None
_set_rotation(self, with_rotation_time: str = None)
rotate_when(self, time)
should_rotate(self)
_rotate_time(self)

rotate with time add a suffix

flush(self, rotation_suffix: Optional[str] = None)

Synchronously perform all remaining write tasks

full_profile_check(self) bool

returns a bool to determine if unsegmented dataset should be profiled.

close(self) Optional[whylogs.core.DatasetProfile]

Flush and close out the logger, outputs the last profile

Returns

the result dataset profile. None if the logger is closed

log(self, features: Optional[Dict[str, any]] = None, feature_name: Optional[str] = None, value: any = None, character_list: Optional[str] = None, token_method: Optional[Callable] = None)

Logs a collection of features or a single feature (must specify one or the other).

Parameters
  • features – a map of key value feature for model input

  • feature_name – name of a single feature. Cannot be specified if ‘features’ is specified

  • value – value of as single feature. Cannot be specified if ‘features’ is specified

log_segment_datum(self, feature_name, value, character_list: str = None, token_method: Optional[Callable] = None)
log_metrics(self, targets, predictions, scores=None, model_type: whylogs.proto.ModelType = None, target_field=None, prediction_field=None, score_field=None)
log_image(self, image, feature_transforms: Optional[List[Callable]] = None, metadata_attributes: Optional[List[str]] = METADATA_DEFAULT_ATTRIBUTES, feature_name: str = '')

API to track an image, either in PIL format or as an input path

Parameters
  • feature_name – name of the feature

  • metadata_attributes – metadata attributes to extract for the images

  • feature_transforms – a list of callables to transform the input into metrics

log_local_dataset(self, root_dir, folder_feature_name='folder_feature', image_feature_transforms=None, show_progress=False)

Log a local folder dataset It will log data from the files, along with structure file data like metadata, and magic numbers. If the folder has single layer for children folders, this will pick up folder names as a segmented feature

Parameters
  • show_progress – showing the progress bar

  • image_feature_transforms – image transform that you would like to use with the image log

  • root_dir (str) – directory where dataset is located.

  • folder_feature_name (str, optional) – Name for the subfolder features, i.e. class, store etc.

log_annotation(self, annotation_data)

Log structured annotation data ie. JSON like structures

Parameters

annotation_data (Dict or List) – Description

log_csv(self, filepath_or_buffer: Union[str, pathlib.Path, IO[AnyStr]], segments: Optional[Union[List[Segment], List[str]]] = None, profile_full_dataset: bool = False, **kwargs)

Log a CSV file. This supports the same parameters as :func`pandas.read_csv<pandas.read_csv>` function.

Parameters
  • filepath_or_buffer – the path to the CSV or a CSV buffer

  • segments – define either a list of segment keys or a list of segments tags: [ {“key”:<featurename>,”value”: <featurevalue>},… ]

  • profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset

  • **kwargs – from pandas:read_csv

log_dataframe(self, df, segments: Optional[Union[List[Segment], List[str]]] = None, profile_full_dataset: bool = False)

Generate and log a whylogs DatasetProfile from a pandas dataframe :param profile_full_dataset: when segmenting dataset, an option to keep the full unsegmented profile of the

dataset.

Parameters
  • segments – specify the tag key value pairs for segments

  • df – the Pandas dataframe to log

log_segments(self, data)
log_segments_keys(self, data)
log_fixed_segments(self, data)
log_df_segment(self, df, segment: Segment)
is_active(self)

Return the boolean state of the logger

static _prefix_segment_tags(segment_key_values)
class whylogs.app.Session(project: Optional[str] = None, pipeline: Optional[str] = None, writers: Optional[List[whylogs.app.writers.Writer]] = None, metadata_writer: Optional[whylogs.app.metadata_writer.MetadataWriter] = None, verbose: bool = False, with_rotation_time: str = None, cache_size: int = None, report_progress: bool = False)
Parameters
  • project (str) – The project name. We will default to the project name when logging a dataset if the dataset name is not specified

  • pipeline (str) – Name of the pipeline associated with this session

  • writers (list) – configuration for the output writers. This is where the log data will go

  • verbose (bool) – enable verbose logging for not. Default is False

__enter__(self)
__exit__(self, tpe, value, traceback)
__repr__(self)

Return repr(self).

get_config(self)
is_active(self)
logger(self, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None, segments: Optional[Union[List[Dict], List[str], str]] = None, profile_full_dataset: bool = False, with_rotation_time: str = None, cache_size: int = 1, constraints: whylogs.core.statistics.constraints.DatasetConstraints = None) whylogs.app.logger.Logger

Create a new logger or return an existing one for a given dataset name. If no dataset_name is specified, we default to project name

Parameters
  • dataset_name – name of the dataset

  • dataset_timestamp – timestamp of the dataset. Default to now

  • session_timestamp – timestamp of the session. Inherits from the session

  • tags – metadata associated with the profile

  • metadata – same as tags. Will be deprecated

  • segments – slice of data that the profile belongs to

  • profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset

  • with_rotation_time – rotation time in minutes our hours (“1m”, “1h”)

  • cache_size – size of the segment cache

  • constraints – whylogs contrainst to monitor against

get_logger(self, dataset_name: str = None)
log_dataframe(self, df: pandas.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None, segments: Optional[Union[List[Dict], List[str], str]] = None, profile_full_dataset: bool = False, constraints: whylogs.core.statistics.constraints.DatasetConstraints = None) Optional[whylogs.core.DatasetProfile]

Perform statistics caluclations and log a pandas dataframe

Parameters
  • df – the dataframe to profile

  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

  • segments – Can be either: - Autosegmentation source, one of [“auto”, “local”] - List of tag key value pairs for tracking data segments - List of tag keys for which we will track every value - None, no segments will be used

  • profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset

Returns

a dataset profile if the session is active

profile_dataframe(self, df: pandas.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) Optional[whylogs.core.DatasetProfile]

Profile a Pandas dataframe without actually writing data to disk. This is useful when you just want to quickly capture and explore a dataset profile.

Parameters
  • df – the dataframe to profile

  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

Returns

a dataset profile if the session is active

new_profile(self, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) Optional[whylogs.core.DatasetProfile]

Create an empty dataset profile with the metadata from the session.

Parameters
  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

Returns

a dataset profile if the session is active

estimate_segments(self, df: pandas.DataFrame, name: str, target_field: str = None, max_segments: int = 30, dry_run: bool = False) Optional[Union[List[Dict], List[str]]]

Estimates the most important features and values on which to segment data profiling using entropy-based methods.

Parameters
  • df – the dataframe of data to profile

  • name – name for discovery in the logger, automatically applied

to loggers with same dataset_name :param target_field: target field (optional) :param max_segments: upper threshold for total combinations of segments, default 30 :param dry_run: run calculation but do not write results to metadata :return: a list of segmentation feature names

close(self)

Deactivate this session and flush all associated loggers

remove_logger(self, dataset_name: str)

Remove a logger from the dataset. This is called by the logger when it’s being closed

Parameters
  • logger (dataset_name the name of the dataset. used to identify the) –

  • None (Returns) –

  • -------

class whylogs.app.SessionConfig(project: str, pipeline: str, writers: List[WriterConfig], metadata: Optional[MetadataConfig] = None, verbose: bool = False, with_rotation_time: str = None, cache_size: int = 1, report_progress: bool = False)

Config for a whylogs session.

See also SessionConfigSchema

Parameters
  • project (str) – Project associated with this whylogs session

  • pipeline (str) – Name of the associated data pipeline

  • writers (list) – A list of WriterConfig objects defining writer outputs

  • metadata (MetadataConfig) – A MetadataConfiguration object. If none, will replace with default.

  • verbose (bool, default=False) – Output verbosity

  • with_rotation_time (str, default = None, to rotate profiles with time, takes values of overall rotation interval,) – “s” for seconds “m” for minutes “h” for hours “d” for days

  • cache_size (int default =1, sets how many dataprofiles to cache in logger during rotation) –

to_yaml(self, stream=None)

Serialize this config to YAML

Parameters

stream – If None (default) return a string, else dump the yaml into this stream.

static from_yaml(stream)

Load config from yaml

Parameters

stream (str, file-obj) – String or file-like object to load yaml from

Returns

config – Generated config

Return type

SessionConfig

class whylogs.app.WriterConfig(type: str, formats: Optional[List[str]] = None, output_path: Optional[str] = None, path_template: Optional[str] = None, filename_template: Optional[str] = None, data_collection_consent: Optional[bool] = None, transport_parameters: Optional[TransportParameterConfig] = None)

Config for whylogs writers

See also:

Parameters
  • type (str) – Destination for the writer output, e.g. ‘local’ or ‘s3’

  • formats (list) – All output formats. See ALL_SUPPORTED_FORMATS

  • output_path (str) – Prefix of where to output files. A directory for type = ‘local’, or key prefix for type = ‘s3’

  • path_template (str, optional) – Templatized path output using standard python string templates. Variables are accessed via $identifier or ${identifier}. See whylogs.app.writers.Writer.template_params() for a list of available identifers. Default = whylogs.app.writers.DEFAULT_PATH_TEMPLATE

  • filename_template (str, optional) – Templatized output filename using standardized python string templates. Variables are accessed via $identifier or ${identifier}. See whylogs.app.writers.Writer.template_params() for a list of available identifers. Default = whylogs.app.writers.DEFAULT_FILENAME_TEMPLATE

to_yaml(self, stream=None)

Serialize this config to YAML

Parameters

stream – If None (default) return a string, else dump the yaml into this stream.

static from_yaml(stream, **kwargs)

Load config from yaml

Parameters
  • stream (str, file-obj) – String or file-like object to load yaml from

  • kwargs – ignored

Returns

config – Generated config

Return type

WriterConfig

whylogs.app.__ALL__