🚩 Create a free WhyLabs account to get more value out of whylogs!
Did you know you can store, visualize, and monitor whylogs profiles with theWhyLabs Observability Platform? Sign up for afree WhyLabs accountto leverage the power of whylogs and WhyLabs together!
Getting Started#
whylogs provides a standard to log any kind of data.
With whylogs, we will show how to log data, generating statistical summaries called profiles. These profiles can be used in a number of ways, like:
Data Visualization
Data Validation
Tracking changes in your datasets
Table of Content#
In this example, we’ll explore the basics of logging data with whylogs:
Installing whylogs
Profiling data
Interacting with the profile
Writing/Reading profiles to/from disk
Installing whylogs#
whylogs is made available as a Python package. You can get the latest version from PyPI with pip install whylogs
:
[1]:
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs
Minimal requirements:
Python 3.7+ up to Python 3.10
Windows, Linux x86_64, and MacOS 10+
Loading a Pandas DataFrame#
Before showing how we can log data, we first need the data itself. Let’s create a simple Pandas DataFrame:
[2]:
import pandas as pd
data = {
"animal": ["cat", "hawk", "snake", "cat"],
"legs": [4, 2, 0, 4],
"weight": [4.3, 1.8, 1.3, 4.1],
}
df = pd.DataFrame(data)
Profiling with whylogs#
To obtain a profile of your data, you can simply use whylogs’ log
call, and navigate through the result to a specific profile with profile()
:
[3]:
import whylogs as why
results = why.log(df)
profile = results.profile()
Analyzing Profiles#
Once you’re done logging the data, you can generate a Profile View
and inspect it in a Pandas Dataframe format:
[6]:
prof_view = profile.view()
prof_df = prof_view.to_pandas()
prof_df
[6]:
counts/n | counts/null | types/integral | types/fractional | types/boolean | types/string | types/object | cardinality/est | cardinality/upper_1 | cardinality/lower_1 | ... | distribution/n | distribution/max | distribution/min | distribution/q_10 | distribution/q_25 | distribution/median | distribution/q_75 | distribution/q_90 | ints/max | ints/min | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | |||||||||||||||||||||
animal | 8 | 0 | 0 | 0 | 0 | 8 | 0 | 6.0 | 6.00030 | 6.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
weight | 8 | 0 | 0 | 8 | 0 | 0 | 0 | 7.0 | 7.00035 | 7.0 | ... | 8.0 | 30.1 | 1.3 | 1.3 | 4.1 | 4.3 | 14.3 | 30.1 | NaN | NaN |
legs | 8 | 0 | 8 | 0 | 0 | 0 | 0 | 3.0 | 3.00015 | 3.0 | ... | 8.0 | 4.0 | 0.0 | 0.0 | 2.0 | 4.0 | 4.0 | 4.0 | 4.0 | 0.0 |
3 rows × 24 columns
This will provide you with valuable statistics on a column (feature) basis, such as:
Counters, such as number of samples and null values
Inferred types, such as integral, fractional and boolean
Estimated Cardinality
Frequent Items
Distribution Metrics: min,max, median, quantile values
Writing to Disk#
You can also store your profile in disk for further inspection:
[7]:
why.write(profile,"profile.bin")
This will create a profile binary file in your local filesystem.
Reading from Disk#
You can read the profile back into memory with:
[8]:
n_prof = why.read("profile.bin")
Note:
write
expects a profile as parameter, whileread
returns aProfile View
. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates.
What’s Next?#
There’s a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!
Basic
Visualizing Profiles - Compare profiles to detect distribution shifts, visualize histograms and bar charts and explore your data
Logging Data - See the different ways you can log your data with whylogs
Inspecting Profiles - A deeper dive on the metrics generated by whylogs
Schema Configuration for Tracking Metrics - Configure tracking metrics according to data type or column features
Data Constraints - Set constraints to your data to ensure its quality
Merging Profiles - Merge your profiles logged across different computing instances, time periods or data segments
Integrations
WhyLabs - Monitor your profiles continuously with the WhyLabs Observability Platform
Pyspark - Use whylogs with pyspark
Writing Profiles - See different ways and locations to output your profiles
Flask - See how you can create a Flask app with whylogs and WhyLabs integration
Feature Stores - Learn how to log features from your Feature Store with feast and whylogs
BigQuery - Profile data queried from a Google BigQuery table
MLflow - Log your whylogs profiles to an MLflow environment
Or go to the examples page for the complete list of examples!