whylogs.app
¶
The whylogs client application API
Submodules¶
Package Contents¶
Classes¶
Class for logging whylogs statistics. |
|
|
|
Config for a whylogs session. |
|
Config for whylogs writers |
Functions¶
|
Load logging configuration, from disk and from the environment. |
Attributes¶
- whylogs.app.load_config(path_to_config: str = None)¶
Load logging configuration, from disk and from the environment.
Config is loaded by attempting to load files in the following order. The first valid file will be used
Path set in
WHYLOGS_CONFIG
environment variableCurrent directory’s
.whylogs.yaml
file~/.whylogs.yaml
(home directory)/opt/whylogs/.whylogs.yaml
path
- Returns
config – Config for the logger, if a valid config file is found, else returns None.
- Return type
SessionConfig, None
- class whylogs.app.Logger(session_id: str, dataset_name: str, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Optional[Dict[str, str]] = None, metadata: Optional[Dict[str, str]] = None, writers: Optional[List[whylogs.app.writers.Writer]] = None, metadata_writer: Optional[whylogs.app.metadata_writer.MetadataWriter] = None, verbose: bool = False, with_rotation_time: Optional[str] = None, interval: int = 1, cache_size: int = 1, segments: Optional[Union[List[Segment], List[str], str]] = None, profile_full_dataset: bool = False, constraints: Optional[whylogs.core.statistics.constraints.DatasetConstraints] = None)¶
Class for logging whylogs statistics.
- Parameters
session_id – The session ID value. Should be set by the Session boject
dataset_name – The name of the dataset. Gets included in the DatasetProfile metadata and can be used in generated filenames.
dataset_timestamp – Optional. The timestamp that the logger represents
session_timestamp – Optional. The time the session was created
tags – Optional. Dictionary of key, value for aggregating data upstream
metadata – Optional. Dictionary of key, value. Useful for debugging (associated with every single dataset profile)
writers – Optional. List of Writer objects used to write out the data
metadata_writer – Optional. MetadataWriter object used to write non-profile information
with_rotation_time – Optional. Log rotation interval, consisting of digits with unit specification, e.g. 30s, 2h, d. units are seconds (“s”), minutes (“m”), hours, (“h”), or days (“d”) Output filenames will have a suffix reflecting the rotation interval.
interval – Deprecated: Interval multiplier for with_rotation_time, defaults to 1.
verbose – enable debug logging
cache_size – dataprofiles to cache
segments –
- Can be either:
Autosegmentation source, one of [“auto”, “local”]
List of tag key value pairs for tracking data segments
List of tag keys for which we will track every value
None, no segments will be used
profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset.
constraints – static assertions to be applied to streams and summaries.
- __enter__(self)¶
- __exit__(self, exc_type, exc_val, exc_tb)¶
- property profile(self) whylogs.core.DatasetProfile ¶
- Returns
the last backing dataset profile
- Return type
- tracking_checks(self)¶
- property segmented_profiles(self) Dict[str, whylogs.core.DatasetProfile] ¶
- Returns
the last backing dataset profile
- Return type
Dict[str, DatasetProfile]
- get_segment(self, segment: Segment) Optional[whylogs.core.DatasetProfile] ¶
- set_segments(self, segments: Union[List[Segment], List[str], str]) None ¶
- _retrieve_local_segments(self) Union[List[Segment], List[str], str] ¶
Retrieves local segments
- _intialize_profiles(self, dataset_timestamp: Optional[datetime.datetime] = datetime.datetime.now(datetime.timezone.utc)) None ¶
- _set_rotation(self, with_rotation_time: str = None)¶
- rotate_when(self, time)¶
- should_rotate(self)¶
- _rotate_time(self)¶
rotate with time add a suffix
- flush(self, rotation_suffix: Optional[str] = None)¶
Synchronously perform all remaining write tasks
- full_profile_check(self) bool ¶
returns a bool to determine if unsegmented dataset should be profiled.
- close(self) Optional[whylogs.core.DatasetProfile] ¶
Flush and close out the logger, outputs the last profile
- Returns
the result dataset profile. None if the logger is closed
- log(self, features: Optional[Dict[str, any]] = None, feature_name: Optional[str] = None, value: any = None, character_list: Optional[str] = None, token_method: Optional[Callable] = None)¶
Logs a collection of features or a single feature (must specify one or the other).
- Parameters
features – a map of key value feature for model input
feature_name – name of a single feature. Cannot be specified if ‘features’ is specified
value – value of as single feature. Cannot be specified if ‘features’ is specified
- log_segment_datum(self, feature_name, value, character_list: str = None, token_method: Optional[Callable] = None)¶
- log_metrics(self, targets, predictions, scores=None, model_type: whylogs.proto.ModelType = None, target_field=None, prediction_field=None, score_field=None)¶
- log_image(self, image, feature_transforms: Optional[List[Callable]] = None, metadata_attributes: Optional[List[str]] = METADATA_DEFAULT_ATTRIBUTES, feature_name: str = '')¶
API to track an image, either in PIL format or as an input path
- Parameters
feature_name – name of the feature
metadata_attributes – metadata attributes to extract for the images
feature_transforms – a list of callables to transform the input into metrics
- log_local_dataset(self, root_dir, folder_feature_name='folder_feature', image_feature_transforms=None, show_progress=False)¶
Log a local folder dataset It will log data from the files, along with structure file data like metadata, and magic numbers. If the folder has single layer for children folders, this will pick up folder names as a segmented feature
- Parameters
show_progress – showing the progress bar
image_feature_transforms – image transform that you would like to use with the image log
root_dir (str) – directory where dataset is located.
folder_feature_name (str, optional) – Name for the subfolder features, i.e. class, store etc.
- log_annotation(self, annotation_data)¶
Log structured annotation data ie. JSON like structures
- Parameters
annotation_data (Dict or List) – Description
- log_csv(self, filepath_or_buffer: Union[str, pathlib.Path, IO[AnyStr]], segments: Optional[Union[List[Segment], List[str]]] = None, profile_full_dataset: bool = False, **kwargs)¶
Log a CSV file. This supports the same parameters as :func`pandas.read_csv<pandas.read_csv>` function.
- Parameters
filepath_or_buffer – the path to the CSV or a CSV buffer
segments – define either a list of segment keys or a list of segments tags: [ {“key”:<featurename>,”value”: <featurevalue>},… ]
profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset
**kwargs – from pandas:read_csv
- log_dataframe(self, df, segments: Optional[Union[List[Segment], List[str]]] = None, profile_full_dataset: bool = False)¶
Generate and log a whylogs DatasetProfile from a pandas dataframe :param profile_full_dataset: when segmenting dataset, an option to keep the full unsegmented profile of the
dataset.
- Parameters
segments – specify the tag key value pairs for segments
df – the Pandas dataframe to log
- log_segments(self, data)¶
- log_segments_keys(self, data)¶
- log_fixed_segments(self, data)¶
- log_df_segment(self, df, segment: Segment)¶
- is_active(self)¶
Return the boolean state of the logger
- static _prefix_segment_tags(segment_key_values)¶
- class whylogs.app.Session(project: Optional[str] = None, pipeline: Optional[str] = None, writers: Optional[List[whylogs.app.writers.Writer]] = None, metadata_writer: Optional[whylogs.app.metadata_writer.MetadataWriter] = None, verbose: bool = False, with_rotation_time: str = None, cache_size: int = None, report_progress: bool = False)¶
- Parameters
project (str) – The project name. We will default to the project name when logging a dataset if the dataset name is not specified
pipeline (str) – Name of the pipeline associated with this session
writers (list) – configuration for the output writers. This is where the log data will go
verbose (bool) – enable verbose logging for not. Default is
False
- __enter__(self)¶
- __exit__(self, tpe, value, traceback)¶
- __repr__(self)¶
Return repr(self).
- get_config(self)¶
- is_active(self)¶
- logger(self, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None, segments: Optional[Union[List[Dict], List[str], str]] = None, profile_full_dataset: bool = False, with_rotation_time: str = None, cache_size: int = 1, constraints: whylogs.core.statistics.constraints.DatasetConstraints = None) whylogs.app.logger.Logger ¶
Create a new logger or return an existing one for a given dataset name. If no dataset_name is specified, we default to project name
- Parameters
dataset_name – name of the dataset
dataset_timestamp – timestamp of the dataset. Default to now
session_timestamp – timestamp of the session. Inherits from the session
tags – metadata associated with the profile
metadata – same as tags. Will be deprecated
segments – slice of data that the profile belongs to
profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset
with_rotation_time – rotation time in minutes our hours (“1m”, “1h”)
cache_size – size of the segment cache
constraints – whylogs contrainst to monitor against
- get_logger(self, dataset_name: str = None)¶
- log_dataframe(self, df: pandas.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None, segments: Optional[Union[List[Dict], List[str], str]] = None, profile_full_dataset: bool = False, constraints: whylogs.core.statistics.constraints.DatasetConstraints = None) Optional[whylogs.core.DatasetProfile] ¶
Perform statistics caluclations and log a pandas dataframe
- Parameters
df – the dataframe to profile
dataset_name – name of the dataset
dataset_timestamp – the timestamp for the dataset
session_timestamp – the timestamp for the session. Override the default one
tags – the tags for the profile. Useful when merging
metadata – information about this current profile. Can be discarded when merging
segments – Can be either: - Autosegmentation source, one of [“auto”, “local”] - List of tag key value pairs for tracking data segments - List of tag keys for which we will track every value - None, no segments will be used
profile_full_dataset – when segmenting dataset, an option to keep the full unsegmented profile of the dataset
- Returns
a dataset profile if the session is active
- profile_dataframe(self, df: pandas.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) Optional[whylogs.core.DatasetProfile] ¶
Profile a Pandas dataframe without actually writing data to disk. This is useful when you just want to quickly capture and explore a dataset profile.
- Parameters
df – the dataframe to profile
dataset_name – name of the dataset
dataset_timestamp – the timestamp for the dataset
session_timestamp – the timestamp for the session. Override the default one
tags – the tags for the profile. Useful when merging
metadata – information about this current profile. Can be discarded when merging
- Returns
a dataset profile if the session is active
- new_profile(self, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) Optional[whylogs.core.DatasetProfile] ¶
Create an empty dataset profile with the metadata from the session.
- Parameters
dataset_name – name of the dataset
dataset_timestamp – the timestamp for the dataset
session_timestamp – the timestamp for the session. Override the default one
tags – the tags for the profile. Useful when merging
metadata – information about this current profile. Can be discarded when merging
- Returns
a dataset profile if the session is active
- estimate_segments(self, df: pandas.DataFrame, name: str, target_field: str = None, max_segments: int = 30, dry_run: bool = False) Optional[Union[List[Dict], List[str]]] ¶
Estimates the most important features and values on which to segment data profiling using entropy-based methods.
- Parameters
df – the dataframe of data to profile
name – name for discovery in the logger, automatically applied
to loggers with same dataset_name :param target_field: target field (optional) :param max_segments: upper threshold for total combinations of segments, default 30 :param dry_run: run calculation but do not write results to metadata :return: a list of segmentation feature names
- close(self)¶
Deactivate this session and flush all associated loggers
- remove_logger(self, dataset_name: str)¶
Remove a logger from the dataset. This is called by the logger when it’s being closed
- Parameters
logger (dataset_name the name of the dataset. used to identify the) –
None (Returns) –
------- –
- class whylogs.app.SessionConfig(project: str, pipeline: str, writers: List[WriterConfig], metadata: Optional[MetadataConfig] = None, verbose: bool = False, with_rotation_time: str = None, cache_size: int = 1, report_progress: bool = False)¶
Config for a whylogs session.
See also
SessionConfigSchema
- Parameters
project (str) – Project associated with this whylogs session
pipeline (str) – Name of the associated data pipeline
writers (list) – A list of WriterConfig objects defining writer outputs
metadata (MetadataConfig) – A MetadataConfiguration object. If none, will replace with default.
verbose (bool, default=False) – Output verbosity
with_rotation_time (str, default = None, to rotate profiles with time, takes values of overall rotation interval,) – “s” for seconds “m” for minutes “h” for hours “d” for days
cache_size (int default =1, sets how many dataprofiles to cache in logger during rotation) –
- to_yaml(self, stream=None)¶
Serialize this config to YAML
- Parameters
stream – If None (default) return a string, else dump the yaml into this stream.
- static from_yaml(stream)¶
Load config from yaml
- Parameters
stream (str, file-obj) – String or file-like object to load yaml from
- Returns
config – Generated config
- Return type
- class whylogs.app.WriterConfig(type: str, formats: Optional[List[str]] = None, output_path: Optional[str] = None, path_template: Optional[str] = None, filename_template: Optional[str] = None, data_collection_consent: Optional[bool] = None, transport_parameters: Optional[TransportParameterConfig] = None)¶
Config for whylogs writers
See also:
WriterConfigSchema
- Parameters
type (str) – Destination for the writer output, e.g. ‘local’ or ‘s3’
formats (list) – All output formats. See
ALL_SUPPORTED_FORMATS
output_path (str) – Prefix of where to output files. A directory for type = ‘local’, or key prefix for type = ‘s3’
path_template (str, optional) – Templatized path output using standard python string templates. Variables are accessed via $identifier or ${identifier}. See
whylogs.app.writers.Writer.template_params()
for a list of available identifers. Default =whylogs.app.writers.DEFAULT_PATH_TEMPLATE
filename_template (str, optional) – Templatized output filename using standardized python string templates. Variables are accessed via $identifier or ${identifier}. See
whylogs.app.writers.Writer.template_params()
for a list of available identifers. Default =whylogs.app.writers.DEFAULT_FILENAME_TEMPLATE
- to_yaml(self, stream=None)¶
Serialize this config to YAML
- Parameters
stream – If None (default) return a string, else dump the yaml into this stream.
- static from_yaml(stream, **kwargs)¶
Load config from yaml
- Parameters
stream (str, file-obj) – String or file-like object to load yaml from
kwargs – ignored
- Returns
config – Generated config
- Return type
WriterConfig
- whylogs.app.__ALL__¶