🚩 Create a free WhyLabs account to get more value out of whylogs!

Did you know you can store, visualize, and monitor whylogs profiles with theWhyLabs Observability Platform? Sign up for afree WhyLabs accountto leverage the power of whylogs and WhyLabs together!

Getting Started#

whylogs provides a standard to log any kind of data.

With whylogs, we will show how to log data, generating statistical summaries called profiles. These profiles can be used in a number of ways, like:

Data Visualization
Data Validation
Tracking changes in your datasets

Table of Content#

In this example, we’ll explore the basics of logging data with whylogs:

Installing whylogs
Profiling data
Interacting with the profile
Writing/Reading profiles to/from disk

Installing whylogs#

whylogs is made available as a Python package. You can get the latest version from PyPI with pip install whylogs:

[1]:

# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs

Minimal requirements:

Python 3.7+ up to Python 3.10
Windows, Linux x86_64, and MacOS 10+

Loading a Pandas DataFrame#

Before showing how we can log data, we first need the data itself. Let’s create a simple Pandas DataFrame:

[2]:

import pandas as pd
data = {
    "animal": ["cat", "hawk", "snake", "cat"],
    "legs": [4, 2, 0, 4],
    "weight": [4.3, 1.8, 1.3, 4.1],
}

df = pd.DataFrame(data)

Profiling with whylogs#

To obtain a profile of your data, you can simply use whylogs’ log call, and navigate through the result to a specific profile with profile():

[3]:

import whylogs as why

results = why.log(df)
profile = results.profile()

Analyzing Profiles#

Once you’re done logging the data, you can generate a Profile View and inspect it in a Pandas Dataframe format:

[6]:

prof_view = profile.view()
prof_df = prof_view.to_pandas()

prof_df

[6]:

	counts/n	counts/null	types/integral	types/fractional	types/boolean	types/string	types/object	cardinality/est	cardinality/upper_1	cardinality/lower_1	...	distribution/n	distribution/max	distribution/min	distribution/q_10	distribution/q_25	distribution/median	distribution/q_75	distribution/q_90	ints/max	ints/min
column
animal	8	0	0	0	0	8	0	6.0	6.00030	6.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
weight	8	0	0	8	0	0	0	7.0	7.00035	7.0	...	8.0	30.1	1.3	1.3	4.1	4.3	14.3	30.1	NaN	NaN
legs	8	0	8	0	0	0	0	3.0	3.00015	3.0	...	8.0	4.0	0.0	0.0	2.0	4.0	4.0	4.0	4.0	0.0

3 rows × 24 columns

This will provide you with valuable statistics on a column (feature) basis, such as:

Counters, such as number of samples and null values
Inferred types, such as integral, fractional and boolean
Estimated Cardinality
Frequent Items
Distribution Metrics: min,max, median, quantile values

Writing to Disk#

You can also store your profile in disk for further inspection:

[7]:

why.write(profile,"profile.bin")

This will create a profile binary file in your local filesystem.

Reading from Disk#

You can read the profile back into memory with:

[8]:

n_prof = why.read("profile.bin")

Note: write expects a profile as parameter, while read returns a Profile View. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates.

What’s Next?#

There’s a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!

Basic
- Visualizing Profiles - Compare profiles to detect distribution shifts, visualize histograms and bar charts and explore your data
- Logging Data - See the different ways you can log your data with whylogs
- Inspecting Profiles - A deeper dive on the metrics generated by whylogs
- Schema Configuration for Tracking Metrics - Configure tracking metrics according to data type or column features
- Data Constraints - Set constraints to your data to ensure its quality
- Merging Profiles - Merge your profiles logged across different computing instances, time periods or data segments
Integrations
- WhyLabs - Monitor your profiles continuously with the WhyLabs Observability Platform
- Pyspark - Use whylogs with pyspark
- Writing Profiles - See different ways and locations to output your profiles
- Flask - See how you can create a Flask app with whylogs and WhyLabs integration
- Feature Stores - Learn how to log features from your Feature Store with feast and whylogs
- BigQuery - Profile data queried from a Google BigQuery table
- MLflow - Log your whylogs profiles to an MLflow environment

Or go to the examples page for the complete list of examples!