whylogs.app

The WhyLogs client application API

Package Contents

Classes

SessionConfig

Config for a WhyLogs session.

WriterConfig

Config for WhyLogs writers

Logger

Class for logging WhyLogs statistics.

Session

param project

The project name. We will default to the project name when logging

Functions

load_config()

Load logging configuration, from disk and from the environment.

whylogs.app.load_config()

Load logging configuration, from disk and from the environment.

Config is loaded by attempting to load files in the following order. The first valid file will be used

  1. Path set in WHYLOGS_CONFIG environment variable

  2. Current directory’s .whylogs.yaml file

  3. ~/.whylogs.yaml (home directory)

  4. /opt/whylogs/.whylogs.yaml path

Returns

config – Config for the logger, if a valid config file is found, else returns None.

Return type

SessionConfig, None

class whylogs.app.SessionConfig(project: str, pipeline: str, writers: List[WriterConfig], verbose: bool = False)

Config for a WhyLogs session.

See also SessionConfigSchema

Parameters
  • project (str) – Project associated with this WhyLogs session

  • pipeline (str) – Name of the associated data pipeline

  • writers (list) – A list of WriterConfig objects defining writer outputs

  • verbose (bool, default=False) – Output verbosity

to_yaml(self, stream=None)

Serialize this config to YAML

Parameters

stream – If None (default) return a string, else dump the yaml into this stream.

static from_yaml(stream)

Load config from yaml

Parameters

stream (str, file-obj) – String or file-like object to load yaml from

Returns

config – Generated config

Return type

SessionConfig

class whylogs.app.WriterConfig(type: str, formats: List[str], output_path: str, path_template: typing.Optional[str] = None, filename_template: typing.Optional[str] = None)

Config for WhyLogs writers

See also:

Parameters
  • type (str) – Destination for the writer output, e.g. ‘local’ or ‘s3’

  • formats (list) – All output formats. See ALL_SUPPORTED_FORMATS

  • output_path (str) – Prefix of where to output files. A directory for type = ‘local’, or key prefix for type = ‘s3’

  • path_template (str, optional) – Templatized path output using standard python string templates. Variables are accessed via $identifier or ${identifier}. See whylogs.app.writers.Writer.template_params() for a list of available identifers. Default = whylogs.app.writers.DEFAULT_PATH_TEMPLATE

  • filename_template (str, optional) – Templatized output filename using standardized python string templates. Variables are accessed via $identifier or ${identifier}. See whylogs.app.writers.Writer.template_params() for a list of available identifers. Default = whylogs.app.writers.DEFAULT_FILENAME_TEMPLATE

to_yaml(self, stream=None)

Serialize this config to YAML

Parameters

stream – If None (default) return a string, else dump the yaml into this stream.

static from_yaml(stream, **kwargs)

Load config from yaml

Parameters
  • stream (str, file-obj) – String or file-like object to load yaml from

  • kwargs – ignored

Returns

config – Generated config

Return type

WriterConfig

class whylogs.app.Logger(session_id: str, dataset_name: str, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: typing.Dict[str, str] = None, metadata: typing.Dict[str, str] = None, writers=List[Writer], verbose: bool = False)

Class for logging WhyLogs statistics.

Parameters
  • session_id – The session ID value. Should be set by the Session boject

  • dataset_name – The name of the dataset. Gets included in the DatasetProfile metadata and can be used in generated filenames.

  • dataset_timestamp – Optional. The timestamp that the logger represents

  • session_timestamp – Optional. The time the session was created

  • tags – Optional. Dictionary of key, value for aggregating data upstream

  • metadata – Optional. Dictionary of key, value. Useful for debugging (associated with every single dataset profile)

  • writers – List of Writer objects used to write out the data

  • verbose – enable debug logging or not

__enter__(self)
__exit__(self, exc_type, exc_val, exc_tb)
property profile(self)
Returns

the backing dataset profile

Return type

DatasetProfile

flush(self)

Synchronously perform all remaining write tasks

close(self) → Optional[DatasetProfile]

Flush and close out the logger.

Returns

the result dataset profile. None if the logger is closed

log(self, features: typing.Dict[str, any] = None, feature_name: str = None, value: any = None)

Logs a collection of features or a single feature (must specify one or the other).

Parameters
  • features – a map of key value feature for model input

  • feature_name – a dictionary of key->value for multiple features. Each entry represent a single columnar feature

  • feature_name – name of a single feature. Cannot be specified if ‘features’ is specified

  • value – value of as single feature. Cannot be specified if ‘features’ is specified

log_csv(self, filepath_or_buffer: FilePathOrBuffer, **kwargs)

Log a CSV file. This supports the same parameters as :func`pandas.red_csv<pandas.read_csv>` function.

Parameters
  • filepath_or_buffer (FilePathOrBuffer) – the path to the CSV or a CSV buffer

  • kwargs – from pandas:read_csv

log_dataframe(self, df)

Generate and log a WhyLogs DatasetProfile from a pandas dataframe

Parameters

df – the Pandas dataframe to log

is_active(self) → bool

Return the boolean state of the logger

class whylogs.app.Session(project: str, pipeline: str, writers: List[Writer], verbose: bool = False)
Parameters
  • project (str) – The project name. We will default to the project name when logging a dataset if the dataset name is not specified

  • pipeline (str) – Name of the pipeline associated with this session

  • writers (list) – configuration for the output writers. This is where the log data will go

  • verbose (bool) – enable verbose logging for not. Default is False

__enter__(self)
__exit__(self, tpe, value, traceback)
is_active(self)
logger(self, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None)Logger

Create a new logger or return an existing one for a given dataset name. If no dataset_name is specified, we default to project name

Parameters
  • metadata (dict) –

  • dataset_name (str) – Name of the dataset. Default is the project name

  • dataset_timestamp (datetime.datetime, optional) – The timestamp associated with the dataset. Could be the timestamp for the batch, or the timestamp for the window that you are tracking

  • tags (dict) – Tag the data with groupable information. For example, you might want to tag your data with the stage information (development, testing, production etc…)

  • metadata – Useful to debug the data source. You can associate non-groupable information in this field such as hostname,

  • session_timestamp (datetime.datetime, optional) – Override the timestamp associated with the session. Normally you shouldn’t need to override this value

Returns

ylog – WhyLogs logger

Return type

whylogs.app.logger.Logger

log_dataframe(self, df: pd.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) → Optional[DatasetProfile]

Perform statistics caluclations and log a pandas dataframe

Parameters
  • df – the dataframe to profile

  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

Returns

a dataset profile if the session is active

profile_dataframe(self, df: pd.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) → Optional[DatasetProfile]

Profile a Pandas dataframe without actually writing data to disk. This is useful when you just want to quickly capture and explore a dataset profile.

Parameters
  • df – the dataframe to profile

  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

Returns

a dataset profile if the session is active

new_profile(self, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) → Optional[DatasetProfile]

Create an empty dataset profile with the metadata from the session.

Parameters
  • dataset_name – name of the dataset

  • dataset_timestamp – the timestamp for the dataset

  • session_timestamp – the timestamp for the session. Override the default one

  • tags – the tags for the profile. Useful when merging

  • metadata – information about this current profile. Can be discarded when merging

Returns

a dataset profile if the session is active

close(self)

Deactivate this session and flush all associated loggers

remove_logger(self, dataset_name)

Remove a logger from the dataset. This is called by the logger when it’s being closed

Parameters
  • the name of the dataset. used to identify the logger (dataset_name) –

  • None (Returns) –

  • -------