Getting Started¶

whylogs library comes with quickstart CLI to help you initialize the configuration. You can also use the API directly without going through the CLI.

Quick Start¶

Install our library in a Python 3.6+ environment.

pip install whylogs

To get started, you can generate a simple cnofiguration file with whylogs CLI:

whylogs init

A whylogs config file contains the following parameters:

project sets the name of the project.
pipeline specifies the pipeline to be used.
verbose sets output verbosity. Its default value is false.
writers specifies how and where output is stored, using path and filename templates that take the following variables:
- project
- pipeline
- dataset_name
- dataset_timestamp
- session_timestamp

An example config file can be found here.

whylogs.app.config.load_config() loads your config file. It attempts to load files at the following paths, in order:

An example script for creating a logging session can be found here.

Loggers log statistical information about your data. They have the following parameters:

dataset_name sets the name of the dataset, to be used in DatasetProfile metadata and generated filenames.
dataset_timestamp sets a timestamp for the data.
session_timestamp sets a timestamp for the creation of the session.
writers provides a list of writers that will be used to create the DatasetProfile.
verbose sets the verbosity of the output.

For more information, see the documentation for the logger class.

This example code uses logger options to control the output location.

Writers write the statistics gathered by the logger into an output file. They use the following parameters to create output file paths:

output_path sets the location output files will be stored. Use a directory path if your writer type = 'local', or a key prefix for type = 's3'.
formats lists all supported output formats.
path_template optionally sets an output path using Python string templates.
filename_template optionally sets output filenames using Python string templates.
dataset_timestamp sets a timestamp for the data.
session_timestamp sets a timestamp for the creation of the session.

For more information, see the documentation for the writer class.

whylogs supports the following output formats:

Protobuf is a lightweight binary format that maps one-to-one with the memory representation of a whylogs object. Use this format if you plan to apply advanced transformations to whylogs output.
JSON displays the protobuf data in JSON format.
Flat outputs multiple files with both CSV and JSON content to represent different views of the data, including histograms, upperbound, lowerbound, and frequent values.

Check out WhyLabs Platform Sandbox to see how whylogs can be used for large-scale data monitoring and visualization in enterprise settings.