🚩 Create a free WhyLabs account to get more value out of whylogs!

Did you know you can store, visualize, and monitor whylogs profiles with theWhyLabs Observability Platform? Sign up for afree WhyLabs accountto leverage the power of whylogs and WhyLabs together!

Getting Started#

Open in Colab

whylogs provides a standard to log any kind of data.

With whylogs, we will show how to log data, generating statistical summaries called profiles. These profiles can be used in a number of ways, like:

  • Data Visualization

  • Data Validation

  • Tracking changes in your datasets

Table of Content#

In this example, we’ll explore the basics of logging data with whylogs:

  • Installing whylogs

  • Profiling data

  • Interacting with the profile

  • Writing/Reading profiles to/from disk

Installing whylogs#

whylogs is made available as a Python package. You can get the latest version from PyPI with pip install whylogs:

[1]:
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs

Minimal requirements:

  • Python 3.7+ up to Python 3.10

  • Windows, Linux x86_64, and MacOS 10+

Loading a Pandas DataFrame#

Before showing how we can log data, we first need the data itself. Let’s create a simple Pandas DataFrame:

[2]:
import pandas as pd
data = {
    "animal": ["cat", "hawk", "snake", "cat"],
    "legs": [4, 2, 0, 4],
    "weight": [4.3, 1.8, 1.3, 4.1],
}

df = pd.DataFrame(data)

Profiling with whylogs#

To obtain a profile of your data, you can simply use whylogs’ log call, and navigate through the result to a specific profile with profile():

[3]:
import whylogs as why

results = why.log(df)
profile = results.profile()

Analyzing Profiles#

Once you’re done logging the data, you can generate a Profile View and inspect it in a Pandas Dataframe format:

[6]:
prof_view = profile.view()
prof_df = prof_view.to_pandas()

prof_df
[6]:
counts/n counts/null types/integral types/fractional types/boolean types/string types/object cardinality/est cardinality/upper_1 cardinality/lower_1 ... distribution/n distribution/max distribution/min distribution/q_10 distribution/q_25 distribution/median distribution/q_75 distribution/q_90 ints/max ints/min
column
animal 8 0 0 0 0 8 0 6.0 6.00030 6.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
weight 8 0 0 8 0 0 0 7.0 7.00035 7.0 ... 8.0 30.1 1.3 1.3 4.1 4.3 14.3 30.1 NaN NaN
legs 8 0 8 0 0 0 0 3.0 3.00015 3.0 ... 8.0 4.0 0.0 0.0 2.0 4.0 4.0 4.0 4.0 0.0

3 rows × 24 columns

This will provide you with valuable statistics on a column (feature) basis, such as:

  • Counters, such as number of samples and null values

  • Inferred types, such as integral, fractional and boolean

  • Estimated Cardinality

  • Frequent Items

  • Distribution Metrics: min,max, median, quantile values

Writing to Disk#

You can also store your profile in disk for further inspection:

[7]:
why.write(profile,"profile.bin")

This will create a profile binary file in your local filesystem.

Reading from Disk#

You can read the profile back into memory with:

[8]:
n_prof = why.read("profile.bin")

Note: write expects a profile as parameter, while read returns a Profile View. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates.

What’s Next?#

There’s a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!

  • Basic

  • Integrations

    • WhyLabs - Monitor your profiles continuously with the WhyLabs Observability Platform

    • Pyspark - Use whylogs with pyspark

    • Writing Profiles - See different ways and locations to output your profiles

    • Flask - See how you can create a Flask app with whylogs and WhyLabs integration

    • Feature Stores - Learn how to log features from your Feature Store with feast and whylogs

    • BigQuery - Profile data queried from a Google BigQuery table

    • MLflow - Log your whylogs profiles to an MLflow environment

Or go to the examples page for the complete list of examples!