🚩 Create a free WhyLabs account to get more value out of whylogs!

Did you know you can store, visualize, and monitor whylogs profiles with theWhyLabs Observability Platform? Sign up for afree WhyLabs accountto leverage the power of whylogs and WhyLabs together!

Logging data from Feature Stores with Feast and whylogs#

Open in Colab

This is a whylogs v1 example. For the analog example in v0, please refer to this example

Context#

In this walkthrough, we’ll see how you can use Feast and whylogs together at different parts of your ML pipeline - We’ll use Feast to set up an online feature store, and then use it to enrich our serving data with additional features. After assembling our feature vector, we’ll proceed to log it with whylogs. As the requests for prediction arrives, the logged input features will be statistically profiled. We will explore these profiles to see what kind of insights we can have.

To do so, we’ll use a sample dataset of daily taxi rides in NYC, extracted from here. Our final goal could be a prediction requested at the start of a given ride. This prediction could be whether the customer will give a high tip to the driver, or maybe whether the customer will give him a good review. As an input to the prediction model, in addition to the ride information (like number of passengers, day of the week or trip distance), we might be interested in enriching our feature vector with information about the driver, like the driver’s average speed, average rating or avg trips in the last 24 hours, with the hopes of improving the model’s performances.

alt text

The info about the specific ride will be known at inference time. However, the driver statistics might be available to us in a different data source, updated at specific time intervals. We will join these information to assemble a single feature vector by using Feast to set up an online feature store. Feast will materialize the features into the online store from a data source file. This data source will have driver statistic’s, according the each driver’s ID updated in an hourly basis.

We will simulate a production pipeline, where requests for predictions will be made at different timestamps. We’ll then log the feature vectors for each request into daily profiles for a period of 7 days. We’ll then see how we can compare the obtained profiles for possible data issues or drifts we might have between days.

Changes in Data#

Let’s consider some scenarios in which logging and visualizing features would be helpful.

Data Freshness#

In this example, we have updated information about drivers in an hourly basis. Let’s simulate a scenario in which this frequency gets affected by some reason, and for a particular period we have new information accessible only in 2-hour cycles.

Changes in Customer Behavior#

Let’s consider a scenario where people’s behavior changes: maybe people are riding less. For example, when covid started, the number of rides certainly plummeted. We could also have a change in the criterias people use to rate a driver. For example, now the given rates, or reviews, for each driver could be affected by specific services provided, like the presence of alcohol and/or physical barriers to ensure social distancing.

The Feature Repository#

First of all, let’s install the required packages for this tutorial:

[ ]:
# Note: you may need to restart the kernel to use updated packages.
%pip install --upgrade pip -qq
%pip install feast==0.22.4 -qq
%pip install Pygments -qq
%pip install whylogs[viz] -U

Boilerplate - Registering feature definitions and deploying your feature store#

In order to deploy our feature store, we need to create a feature repository. In Feast’s quickstart example, this is traditionally done with a feast init command. This example is based on the quickstart example, but with some changes in the python and configuration files.

For this reason, let’s quickly create a folder with the required files to create a feature repository adapted to our use case.

[ ]:
%%sh
mkdir feature_repo
mkdir feature_repo/data
mkdir feature_repo/whylogs_output
touch feature_repo/__init__.py

Writing our feature definition in the example.py inside our feature_repo folder:

[4]:
%%writefile feature_repo/example.py
# This is an example feature definition file

from datetime import timedelta

from feast import Entity, FeatureView, Field, FeatureService, FileSource, ValueType
from feast.types import Float32, Int64

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
    path="data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
# Entity has a name used for later reference (in a feature view, eg)
# and join_key to identify physical field name used in storages
driver = Entity(name="driver", value_type=ValueType.INT64, join_keys=["driver_id"], description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver"],  # reference entity by name
    ttl=timedelta(seconds=86400 * 1),
    schema=[
        Field(name="rate_1m", dtype=Int64),
        Field(name="avg_daily_trips", dtype=Int64),
        Field(name="avg_speed", dtype=Float32),

    ],
    online=True,
    source=driver_hourly_stats,
    tags={},
)

driver_stats_fs = FeatureService(
    name="driver_activity",
    features=[driver_hourly_stats_view]
)

Overwriting feature_repo/example.py

Writing the feature_store.yaml configuration file:

[5]:
%%writefile feature_repo/feature_store.yaml
project: feature_repo
registry: data/registry.db
provider: local
online_store:
    path: data/online_store.db
Overwriting feature_repo/feature_store.yaml

Downloading the Data Source#

Let’s first navigate to our feature repository folder:

[6]:
%cd feature_repo
/mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs2/python/examples/integrations/feature_repo

Make sure you’re on the right folder. You should see an empty data folder (we’ll populate it with our data source later), the example.py python script, which contains our feature definitions, and the feature_store.yaml configuration file.

[7]:
%ls -R
.:
__init__.py*  data/  example.py*  feature_store.yaml*  whylogs_output/

./data:
driver_stats.parquet*  registry.db*

./whylogs_output:

Now, let’s download our data source and store it locally in our feature repository:

[8]:
import feast

feast.__version__
[8]:
'0.22.4'
[9]:
import pandas as pd
path = f"https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/feast_integration/driver_stats.parquet"
print(f"Loading data from {path}")
driver_stats = pd.read_parquet(path)
print(f"Saving file source locally")

driver_stats.to_parquet("data/driver_stats.parquet")
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/feast_integration/driver_stats.parquet
Saving file source locally

In the data source, we have driver’s statistics on an hourly basis, such as the average trips done on the last 24 hours, average rating on the last month and average driving speed. You can see more information on how this data was created at the end of this notebook, in the Appendix.

[10]:
driver_stats.head()
[10]:
index event_timestamp driver_id created avg_daily_trips rate_1m avg_speed
0 0.0 2020-02-10 00:00:00 1001 2022-02-16 16:17:56.446774 25 3 16.87
1 0.0 2020-02-10 00:00:00 1002 2022-02-16 16:17:56.446774 35 1 20.21
2 1.0 2020-02-10 01:00:00 1001 2022-02-16 16:17:56.446774 19 4 20.77
3 1.0 2020-02-10 01:00:00 1002 2022-02-16 16:17:56.446774 29 3 19.20
4 2.0 2020-02-10 02:00:00 1001 2022-02-16 16:17:56.446774 31 3 17.41
[11]:
driver_stats.dtypes
[11]:
index                     float64
event_timestamp    datetime64[ns]
driver_id                   int64
created            datetime64[ns]
avg_daily_trips             int64
rate_1m                     int64
avg_speed                 float64
dtype: object

Deploying the Feature Store#

Now, we will scan the python files in our feature repository for feature views/entity definitions, register the objects and deploy the infrastructure with the feast apply command.

[12]:
!feast apply
/mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs2/python/.venv/lib/python3.8/site-packages/feast/entity.py:110: DeprecationWarning: The `value_type` parameter is being deprecated. Instead, the type of an entity should be specified as a Field in the schema of a feature view. Feast 0.24 and onwards will not support the `value_type` parameter. The `entities` parameter of feature views should also be changed to a List[Entity] instead of a List[str]; if this is not done, entity columns will be mistakenly interpreted as feature columns.
  warnings.warn(
/mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs2/python/.venv/lib/python3.8/site-packages/feast/feature_view.py:180: DeprecationWarning: The `entities` parameter should be a list of `Entity` objects. Feast 0.24 and onwards will not support passing in a list of strings to define entities.
  warnings.warn(
Created entity driver
Created feature view driver_hourly_stats
Created feature service driver_activity

Created sqlite table feature_repo_driver_hourly_stats

Let’s also load our rides dataframe. In it we, have features about rides made during 10-Feb to 16-Feb (2020), such as the number of passengers, trip distance and pickup date and time.

[13]:
import pandas as pd

path = f"https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/nyc_taxi_rides_feb_2020_changed.parquet"
print(f"Loading data from {path}")
rides_df = pd.read_parquet(path)

rides_df.head()
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/nyc_taxi_rides_feb_2020_changed.parquet
[13]:
pickup_weekday passenger_count trip_distance PULocationID tpep_pickup_datetime pickup_date
225897 0 1.0 1.20 249 2020-02-10 00:23:21 2020-02-10
108301 0 5.0 19.03 132 2020-02-10 01:19:01 2020-02-10
196729 0 6.0 0.38 68 2020-02-10 01:29:23 2020-02-10
239495 0 1.0 2.90 263 2020-02-10 02:44:20 2020-02-10
72014 0 6.0 16.05 233 2020-02-10 04:12:22 2020-02-10
[14]:
rides_df['passenger_count'] = rides_df['passenger_count'].fillna(0).astype('int64')

Additional Transformations#

The real dataset doesn’t contain information regarding the taxi driver that conducted the ride. Since our goal is to enrich the dataset with driver features from an external data source, we will create a driver_id column. For simplicity, let’s consider that this dataset contains ride information of only 2 drivers (IDs 1001 and 1002)

[15]:
import numpy as np
rides_df['driver_id'] = np.random.randint(1001, 1003, rides_df.shape[0])

Features: Load, Fetch and Log#

We will iterate on rides_df, where each row represents a point in time in which we will request a prediction. For each request, we will:

  • Materialize latest features into our online feature store

  • Get features from the online feature store

  • Join the features from the online store (driver features) with ride features

  • Log features with whylogs into a profile

We’ll consider that the materialization job is run hourly. To simulate that, we will call materialize for the last rounded hour, based on the request’s timestamp tpep_pickup_datetime.

We will iterate through all the requests on the dataset, generate profiles for daily batches of data, and then write the profiles to disk in a binary file for each of the seven days:

[16]:
from datetime import datetime, timedelta
from pprint import pprint
from feast import FeatureStore
import os
import whylogs as why

store = FeatureStore(repo_path=".")

prev_time = datetime(2020, 2, 10, 00, 00)
target_time = datetime(2020, 2, 10, 1, 00)
store.materialize(start_date=prev_time,end_date=target_time)

# Initializing logger for the first day
day_to_log = datetime(2020, 2, 10)

profile = None
for index,row in rides_df.iterrows():

    request_timestamp = row['tpep_pickup_datetime']

    # If new request is from the next day, close logger, save profile in-memory and start logger for the next day
    if request_timestamp.day > day_to_log.day:
        # let's write our profiles to whylogs_output folder
        why.write(profile,os.path.join("whylogs_output","profile_{}_{}_{}.bin".format(day_to_log.day,day_to_log.month,day_to_log.year)))
        day_to_log = request_timestamp.replace(hour=0, minute=0, second=0, microsecond=0)
        print("Starting logger for day {}....".format(day_to_log))
        profile = None
    if request_timestamp>target_time + timedelta(hours=1):
        target_time = datetime(request_timestamp.year,request_timestamp.month,request_timestamp.day,request_timestamp.hour)
        prev_time = target_time - timedelta(hours=1)
        store.materialize(start_date=prev_time,end_date=target_time)

    driver_feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:rate_1m",
        "driver_hourly_stats:avg_daily_trips",
        "driver_hourly_stats:avg_speed"
    ],
    entity_rows=[{"driver_id": row['driver_id']},],
    ).to_dict()

    # Get features from both ride and driver
    assembled_feature_vector = {
        "pickup_weekday": row["pickup_weekday"],
        "passenger_count": row["passenger_count"],
        "trip_distance": row["trip_distance"],
        "PULocationID": row["PULocationID"],
        "driver_avg_daily_trips": driver_feature_vector["avg_daily_trips"][0],
        "driver_rate_1m": driver_feature_vector["rate_1m"][0],
        "driver_avg_speed": driver_feature_vector["avg_speed"][0],

    }

    # Now that we have the complete set of features, model prediction could go here.

    # The first time data is logged to a profile, we call log(). For subsequent data to be logged in the same profile, let's use track(), until the daily batch is finished.
    if not profile:
        profile = why.log(row=assembled_feature_vector).profile()
    else:
        profile.track(assembled_feature_vector)

why.write(profile,os.path.join("whylogs_output","profile_{}_{}_{}.bin".format(day_to_log.day,day_to_log.month,day_to_log.year)))
Materializing 1 feature views from 2020-02-10 00:00:00-03:00 to 2020-02-10 01:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 97.08it/s]
Materializing 1 feature views from 2020-02-10 01:00:00-03:00 to 2020-02-10 02:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 101.23it/s]
Materializing 1 feature views from 2020-02-10 03:00:00-03:00 to 2020-02-10 04:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 117.07it/s]
Materializing 1 feature views from 2020-02-10 04:00:00-03:00 to 2020-02-10 05:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 135.54it/s]
Materializing 1 feature views from 2020-02-10 05:00:00-03:00 to 2020-02-10 06:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 141.93it/s]
Materializing 1 feature views from 2020-02-10 06:00:00-03:00 to 2020-02-10 07:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 154.83it/s]
Materializing 1 feature views from 2020-02-10 07:00:00-03:00 to 2020-02-10 08:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 159.83it/s]
Materializing 1 feature views from 2020-02-10 08:00:00-03:00 to 2020-02-10 09:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 123.47it/s]
Materializing 1 feature views from 2020-02-10 09:00:00-03:00 to 2020-02-10 10:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 95.67it/s]
Materializing 1 feature views from 2020-02-10 10:00:00-03:00 to 2020-02-10 11:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.11it/s]
Materializing 1 feature views from 2020-02-10 11:00:00-03:00 to 2020-02-10 12:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 78.79it/s]
Materializing 1 feature views from 2020-02-10 12:00:00-03:00 to 2020-02-10 13:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 127.50it/s]
Materializing 1 feature views from 2020-02-10 13:00:00-03:00 to 2020-02-10 14:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.23it/s]
Materializing 1 feature views from 2020-02-10 14:00:00-03:00 to 2020-02-10 15:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.21it/s]
Materializing 1 feature views from 2020-02-10 15:00:00-03:00 to 2020-02-10 16:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 114.58it/s]
Materializing 1 feature views from 2020-02-10 16:00:00-03:00 to 2020-02-10 17:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.24it/s]
Materializing 1 feature views from 2020-02-10 17:00:00-03:00 to 2020-02-10 18:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.09it/s]
Materializing 1 feature views from 2020-02-10 18:00:00-03:00 to 2020-02-10 19:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 133.12it/s]
Materializing 1 feature views from 2020-02-10 19:00:00-03:00 to 2020-02-10 20:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 102.93it/s]
Materializing 1 feature views from 2020-02-10 20:00:00-03:00 to 2020-02-10 21:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 96.86it/s]
Materializing 1 feature views from 2020-02-10 21:00:00-03:00 to 2020-02-10 22:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 168.66it/s]
Materializing 1 feature views from 2020-02-10 22:00:00-03:00 to 2020-02-10 23:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 88.05it/s]
Starting logger for day 2020-02-11 00:00:00....
Materializing 1 feature views from 2020-02-10 23:00:00-03:00 to 2020-02-11 00:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 158.66it/s]
Materializing 1 feature views from 2020-02-11 00:00:00-03:00 to 2020-02-11 01:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 157.59it/s]
Materializing 1 feature views from 2020-02-11 04:00:00-03:00 to 2020-02-11 05:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 149.97it/s]
Materializing 1 feature views from 2020-02-11 05:00:00-03:00 to 2020-02-11 06:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 06:00:00-03:00 to 2020-02-11 07:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 148.77it/s]
Materializing 1 feature views from 2020-02-11 07:00:00-03:00 to 2020-02-11 08:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 08:00:00-03:00 to 2020-02-11 09:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 96.71it/s]
Materializing 1 feature views from 2020-02-11 09:00:00-03:00 to 2020-02-11 10:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 10:00:00-03:00 to 2020-02-11 11:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.11it/s]
Materializing 1 feature views from 2020-02-11 11:00:00-03:00 to 2020-02-11 12:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 12:00:00-03:00 to 2020-02-11 13:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.29it/s]
Materializing 1 feature views from 2020-02-11 13:00:00-03:00 to 2020-02-11 14:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 14:00:00-03:00 to 2020-02-11 15:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.01it/s]
Materializing 1 feature views from 2020-02-11 15:00:00-03:00 to 2020-02-11 16:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 16:00:00-03:00 to 2020-02-11 17:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 94.35it/s]
Materializing 1 feature views from 2020-02-11 17:00:00-03:00 to 2020-02-11 18:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 18:00:00-03:00 to 2020-02-11 19:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 110.94it/s]
Materializing 1 feature views from 2020-02-11 19:00:00-03:00 to 2020-02-11 20:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 20:00:00-03:00 to 2020-02-11 21:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 119.28it/s]
Materializing 1 feature views from 2020-02-11 21:00:00-03:00 to 2020-02-11 22:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 22:00:00-03:00 to 2020-02-11 23:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 98.93it/s]
Starting logger for day 2020-02-12 00:00:00....
Materializing 1 feature views from 2020-02-11 23:00:00-03:00 to 2020-02-12 00:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-12 00:00:00-03:00 to 2020-02-12 01:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 111.18it/s]
Materializing 1 feature views from 2020-02-12 03:00:00-03:00 to 2020-02-12 04:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 142.89it/s]
Materializing 1 feature views from 2020-02-12 06:00:00-03:00 to 2020-02-12 07:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 83.19it/s]
Materializing 1 feature views from 2020-02-12 07:00:00-03:00 to 2020-02-12 08:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 103.58it/s]
Materializing 1 feature views from 2020-02-12 08:00:00-03:00 to 2020-02-12 09:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 139.98it/s]
Materializing 1 feature views from 2020-02-12 09:00:00-03:00 to 2020-02-12 10:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 105.49it/s]
Materializing 1 feature views from 2020-02-12 10:00:00-03:00 to 2020-02-12 11:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.73it/s]
Materializing 1 feature views from 2020-02-12 11:00:00-03:00 to 2020-02-12 12:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 128.29it/s]
Materializing 1 feature views from 2020-02-12 12:00:00-03:00 to 2020-02-12 13:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 105.62it/s]
Materializing 1 feature views from 2020-02-12 13:00:00-03:00 to 2020-02-12 14:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 103.11it/s]
Materializing 1 feature views from 2020-02-12 14:00:00-03:00 to 2020-02-12 15:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 123.69it/s]
Materializing 1 feature views from 2020-02-12 15:00:00-03:00 to 2020-02-12 16:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 84.25it/s]
Materializing 1 feature views from 2020-02-12 16:00:00-03:00 to 2020-02-12 17:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 127.68it/s]
Materializing 1 feature views from 2020-02-12 17:00:00-03:00 to 2020-02-12 18:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 111.89it/s]
Materializing 1 feature views from 2020-02-12 18:00:00-03:00 to 2020-02-12 19:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.54it/s]
Materializing 1 feature views from 2020-02-12 19:00:00-03:00 to 2020-02-12 20:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.51it/s]
Materializing 1 feature views from 2020-02-12 20:00:00-03:00 to 2020-02-12 21:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 101.55it/s]
Materializing 1 feature views from 2020-02-12 21:00:00-03:00 to 2020-02-12 22:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.28it/s]
Materializing 1 feature views from 2020-02-12 22:00:00-03:00 to 2020-02-12 23:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 118.25it/s]
Starting logger for day 2020-02-13 00:00:00....
Materializing 1 feature views from 2020-02-13 01:00:00-03:00 to 2020-02-13 02:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 122.30it/s]
Materializing 1 feature views from 2020-02-13 03:00:00-03:00 to 2020-02-13 04:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.17it/s]
Materializing 1 feature views from 2020-02-13 04:00:00-03:00 to 2020-02-13 05:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 141.48it/s]
Materializing 1 feature views from 2020-02-13 05:00:00-03:00 to 2020-02-13 06:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 120.50it/s]
Materializing 1 feature views from 2020-02-13 06:00:00-03:00 to 2020-02-13 07:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 125.49it/s]
Materializing 1 feature views from 2020-02-13 07:00:00-03:00 to 2020-02-13 08:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 90.89it/s]
Materializing 1 feature views from 2020-02-13 08:00:00-03:00 to 2020-02-13 09:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 133.53it/s]
Materializing 1 feature views from 2020-02-13 09:00:00-03:00 to 2020-02-13 10:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 96.20it/s]
Materializing 1 feature views from 2020-02-13 10:00:00-03:00 to 2020-02-13 11:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 81.72it/s]
Materializing 1 feature views from 2020-02-13 11:00:00-03:00 to 2020-02-13 12:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 88.51it/s]
Materializing 1 feature views from 2020-02-13 12:00:00-03:00 to 2020-02-13 13:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 112.87it/s]
Materializing 1 feature views from 2020-02-13 13:00:00-03:00 to 2020-02-13 14:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 126.05it/s]
Materializing 1 feature views from 2020-02-13 14:00:00-03:00 to 2020-02-13 15:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.44it/s]
Materializing 1 feature views from 2020-02-13 15:00:00-03:00 to 2020-02-13 16:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 140.34it/s]
Materializing 1 feature views from 2020-02-13 16:00:00-03:00 to 2020-02-13 17:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 104.42it/s]
Materializing 1 feature views from 2020-02-13 17:00:00-03:00 to 2020-02-13 18:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 148.64it/s]
Materializing 1 feature views from 2020-02-13 18:00:00-03:00 to 2020-02-13 19:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.96it/s]
Materializing 1 feature views from 2020-02-13 19:00:00-03:00 to 2020-02-13 20:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 129.22it/s]
Materializing 1 feature views from 2020-02-13 20:00:00-03:00 to 2020-02-13 21:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.97it/s]
Materializing 1 feature views from 2020-02-13 21:00:00-03:00 to 2020-02-13 22:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 111.21it/s]
Materializing 1 feature views from 2020-02-13 22:00:00-03:00 to 2020-02-13 23:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 117.89it/s]
Starting logger for day 2020-02-14 00:00:00....
Materializing 1 feature views from 2020-02-13 23:00:00-03:00 to 2020-02-14 00:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 127.16it/s]
Materializing 1 feature views from 2020-02-14 00:00:00-03:00 to 2020-02-14 01:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.66it/s]
Materializing 1 feature views from 2020-02-14 01:00:00-03:00 to 2020-02-14 02:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 138.51it/s]
Materializing 1 feature views from 2020-02-14 05:00:00-03:00 to 2020-02-14 06:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.10it/s]
Materializing 1 feature views from 2020-02-14 06:00:00-03:00 to 2020-02-14 07:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 146.52it/s]
Materializing 1 feature views from 2020-02-14 07:00:00-03:00 to 2020-02-14 08:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 133.95it/s]
Materializing 1 feature views from 2020-02-14 08:00:00-03:00 to 2020-02-14 09:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.94it/s]
Materializing 1 feature views from 2020-02-14 09:00:00-03:00 to 2020-02-14 10:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 123.47it/s]
Materializing 1 feature views from 2020-02-14 10:00:00-03:00 to 2020-02-14 11:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 117.00it/s]
Materializing 1 feature views from 2020-02-14 11:00:00-03:00 to 2020-02-14 12:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 104.38it/s]
Materializing 1 feature views from 2020-02-14 12:00:00-03:00 to 2020-02-14 13:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 144.85it/s]
Materializing 1 feature views from 2020-02-14 13:00:00-03:00 to 2020-02-14 14:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 50.76it/s]
Materializing 1 feature views from 2020-02-14 14:00:00-03:00 to 2020-02-14 15:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 80.35it/s]
Materializing 1 feature views from 2020-02-14 15:00:00-03:00 to 2020-02-14 16:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.47it/s]
Materializing 1 feature views from 2020-02-14 16:00:00-03:00 to 2020-02-14 17:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.62it/s]
Materializing 1 feature views from 2020-02-14 17:00:00-03:00 to 2020-02-14 18:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 97.90it/s]
Materializing 1 feature views from 2020-02-14 18:00:00-03:00 to 2020-02-14 19:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 128.72it/s]
Materializing 1 feature views from 2020-02-14 19:00:00-03:00 to 2020-02-14 20:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 68.97it/s]
Materializing 1 feature views from 2020-02-14 20:00:00-03:00 to 2020-02-14 21:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 103.13it/s]
Materializing 1 feature views from 2020-02-14 21:00:00-03:00 to 2020-02-14 22:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 134.36it/s]
Materializing 1 feature views from 2020-02-14 22:00:00-03:00 to 2020-02-14 23:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.09it/s]
Starting logger for day 2020-02-15 00:00:00....
Materializing 1 feature views from 2020-02-14 23:00:00-03:00 to 2020-02-15 00:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.28it/s]
Materializing 1 feature views from 2020-02-15 00:00:00-03:00 to 2020-02-15 01:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 134.45it/s]
Materializing 1 feature views from 2020-02-15 01:00:00-03:00 to 2020-02-15 02:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 121.67it/s]
Materializing 1 feature views from 2020-02-15 02:00:00-03:00 to 2020-02-15 03:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 82.45it/s]
Materializing 1 feature views from 2020-02-15 06:00:00-03:00 to 2020-02-15 07:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.26it/s]
Materializing 1 feature views from 2020-02-15 07:00:00-03:00 to 2020-02-15 08:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.35it/s]
Materializing 1 feature views from 2020-02-15 08:00:00-03:00 to 2020-02-15 09:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 106.04it/s]
Materializing 1 feature views from 2020-02-15 09:00:00-03:00 to 2020-02-15 10:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.26it/s]
Materializing 1 feature views from 2020-02-15 10:00:00-03:00 to 2020-02-15 11:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 121.92it/s]
Materializing 1 feature views from 2020-02-15 11:00:00-03:00 to 2020-02-15 12:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.89it/s]
Materializing 1 feature views from 2020-02-15 12:00:00-03:00 to 2020-02-15 13:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 107.71it/s]
Materializing 1 feature views from 2020-02-15 13:00:00-03:00 to 2020-02-15 14:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.57it/s]
Materializing 1 feature views from 2020-02-15 14:00:00-03:00 to 2020-02-15 15:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.74it/s]
Materializing 1 feature views from 2020-02-15 15:00:00-03:00 to 2020-02-15 16:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 115.71it/s]
Materializing 1 feature views from 2020-02-15 16:00:00-03:00 to 2020-02-15 17:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 111.62it/s]
Materializing 1 feature views from 2020-02-15 17:00:00-03:00 to 2020-02-15 18:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 97.34it/s]
Materializing 1 feature views from 2020-02-15 18:00:00-03:00 to 2020-02-15 19:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.39it/s]
Materializing 1 feature views from 2020-02-15 19:00:00-03:00 to 2020-02-15 20:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 101.70it/s]
Materializing 1 feature views from 2020-02-15 21:00:00-03:00 to 2020-02-15 22:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 102.01it/s]
Materializing 1 feature views from 2020-02-15 22:00:00-03:00 to 2020-02-15 23:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.55it/s]
Starting logger for day 2020-02-16 00:00:00....
Materializing 1 feature views from 2020-02-15 23:00:00-03:00 to 2020-02-16 00:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 115.39it/s]
Materializing 1 feature views from 2020-02-16 00:00:00-03:00 to 2020-02-16 01:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 120.52it/s]
Materializing 1 feature views from 2020-02-16 01:00:00-03:00 to 2020-02-16 02:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.76it/s]
Materializing 1 feature views from 2020-02-16 04:00:00-03:00 to 2020-02-16 05:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 125.81it/s]
Materializing 1 feature views from 2020-02-16 07:00:00-03:00 to 2020-02-16 08:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 86.17it/s]
Materializing 1 feature views from 2020-02-16 08:00:00-03:00 to 2020-02-16 09:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 107.42it/s]
Materializing 1 feature views from 2020-02-16 09:00:00-03:00 to 2020-02-16 10:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 123.10it/s]
Materializing 1 feature views from 2020-02-16 10:00:00-03:00 to 2020-02-16 11:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 121.85it/s]
Materializing 1 feature views from 2020-02-16 11:00:00-03:00 to 2020-02-16 12:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.44it/s]
Materializing 1 feature views from 2020-02-16 12:00:00-03:00 to 2020-02-16 13:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 141.69it/s]
Materializing 1 feature views from 2020-02-16 13:00:00-03:00 to 2020-02-16 14:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 99.13it/s]
Materializing 1 feature views from 2020-02-16 14:00:00-03:00 to 2020-02-16 15:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 98.62it/s]
Materializing 1 feature views from 2020-02-16 15:00:00-03:00 to 2020-02-16 16:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 149.06it/s]
Materializing 1 feature views from 2020-02-16 16:00:00-03:00 to 2020-02-16 17:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 112.62it/s]
Materializing 1 feature views from 2020-02-16 17:00:00-03:00 to 2020-02-16 18:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.72it/s]
Materializing 1 feature views from 2020-02-16 18:00:00-03:00 to 2020-02-16 19:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 102.09it/s]
Materializing 1 feature views from 2020-02-16 19:00:00-03:00 to 2020-02-16 20:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.01it/s]
Materializing 1 feature views from 2020-02-16 20:00:00-03:00 to 2020-02-16 21:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 136.77it/s]
Materializing 1 feature views from 2020-02-16 21:00:00-03:00 to 2020-02-16 22:00:00-03:00 into the sqlite online store.

driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.03it/s]
Materializing 1 feature views from 2020-02-16 22:00:00-03:00 to 2020-02-16 23:00:00-03:00 into the sqlite online store.

driver_hourly_stats:

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 135.85it/s]

Let’s confirm that the profiles for each day was indeed written to the disk:

[17]:
%%sh
ls whylogs_output
profile_10_2_2020.bin
profile_11_2_2020.bin
profile_12_2_2020.bin
profile_13_2_2020.bin
profile_14_2_2020.bin
profile_15_2_2020.bin
profile_16_2_2020.bin

We can rehydrate each of those profiles to check some of the metrics provided in the profile. Let’s take the first day as our reference profile:

[18]:
reference_profile = why.read(os.path.join("whylogs_output","profile_10_2_2020.bin"))
# we generate a profile view, and then call to_pandas() to have a dataframe with the metrics to be inspected
reference_metrics = reference_profile.view().to_pandas()
reference_metrics
[18]:
cardinality/est cardinality/lower_1 cardinality/upper_1 counts/n counts/null distribution/max distribution/mean distribution/median distribution/min distribution/n ... distribution/stddev frequent_items/frequent_strings ints/max ints/min type types/boolean types/fractional types/integral types/object types/string
column
PULocationID 39.000004 39.0 39.001951 98 0 264.000000 161.663265 161.00 41.00 98 ... 64.712997 [FrequentItem(value='161.000000', est=6, upper... 264.0 41.0 SummaryType.COLUMN 0 0 98 0 0
driver_avg_daily_trips 17.000001 17.0 17.000849 98 0 44.000000 29.428571 30.00 9.00 98 ... 7.729979 [FrequentItem(value='36.000000', est=14, upper... 44.0 9.0 SummaryType.COLUMN 0 0 98 0 0
driver_avg_speed 35.000003 35.0 35.001750 98 0 31.139999 21.182551 20.98 13.21 98 ... 3.748177 NaN NaN NaN SummaryType.COLUMN 0 98 0 0 0
driver_rate_1m 4.000000 4.0 4.000200 98 0 4.000000 2.540816 3.00 1.00 98 ... 0.801653 [FrequentItem(value='3.000000', est=45, upper=... 4.0 1.0 SummaryType.COLUMN 0 0 98 0 0
passenger_count 6.000000 6.0 6.000300 98 0 6.000000 1.408163 1.00 0.00 98 ... 1.199972 [FrequentItem(value='1.000000', est=77, upper=... 6.0 0.0 SummaryType.COLUMN 0 0 98 0 0
pickup_weekday 1.000000 1.0 1.000050 98 0 0.000000 0.000000 0.00 0.00 98 ... 0.000000 [FrequentItem(value='0.000000', est=98, upper=... 0.0 0.0 SummaryType.COLUMN 0 0 98 0 0
trip_distance 83.000017 83.0 83.004161 98 0 20.220000 2.791531 1.62 0.24 98 ... 3.606351 NaN NaN NaN SummaryType.COLUMN 0 98 0 0 0

7 rows × 28 columns

If you want to know more about inspecting profiles and metrics contained in them, check the example on Inspecting Profiles !

Injecting data issues and comparing profiles#

Now, let’s add some data error issues into the dataset and see how we could visually inspect this with some of whylog’s functionalites. Some of the changes applied are shown as following:

  • Feb 10: No changes

  • Feb 11: (Data update error) New driver features are available only in 2 hour cycles. Simulating a scenario in which the sampling frequency is affected due to changes upstream.

  • Feb 16: (Feature drift) Based on the considerations made on section Changes in Data, we will: a) Reduce the number of passengers (passenger_count) and b) increase the standard deviation of rate_1m’s distribution. For more information of how that was done, please see Appendix - Changing the Dataset.

We already have our reference profile, so let’s load from disk two other profiles that contain the data update and feature drift issues, respectively.

[19]:
target_profile_1 = why.read(os.path.join("whylogs_output","profile_11_2_2020.bin"))
target_profile_2 = why.read(os.path.join("whylogs_output","profile_16_2_2020.bin"))

Data update#

The data update issue is a subtle one, since we still have data available with the expected shape and values. The only difference is that the values are being updated less often. Ideally, this could be checked elsewhere in our pipeline, but with information available in our assembled feature vector, we could get signals of this issues indirectly by inspecting the cardinality of the features collected from the driver source.

Let’s check the cardinality of the average speed for our reference profile, which is a float variable:

[20]:
card = reference_metrics.loc['driver_avg_speed']['cardinality/est']
print("Cardinality for driver average speed for Reference dataset:",card)
Cardinality for driver average speed for Reference dataset: 35.000002955397264

For the same frequency update, we can expect cardinality estimates around the value seen in our baseline.

Let’s now compare the cardinality estimations in the other two profiles:

[21]:
profile_1_metrics = target_profile_1.view().to_pandas()
profile_2_metrics = target_profile_2.view().to_pandas()

print("Cardinality for driver average speed for profile #1:")
print(profile_1_metrics.loc['driver_avg_speed']['cardinality/est'])
print("Cardinality for driver average speed for profile #2:")
print(profile_2_metrics.loc['driver_avg_speed']['cardinality/est'])
Cardinality for driver average speed for profile #1:
24.00000137090692
Cardinality for driver average speed for profile #2:
35.000002955397264

We can see there’s a significant difference for the profile that is updated less frequently.

You could automate this type of assertion by using Constraints in order to do data validation in your data. If you want to know more, please see the example on Building Metric Constraints!

Feature Drift#

For February 16, we have a change in the average daily trips and also in the driver’s monthly rating.

We can compare both profiles in order to detect data drifts and generate a report for every feature in the profiles by using the NotebookProfileVisualizer

[22]:
from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.set_profiles(target_profile_view=target_profile_2.view(), reference_profile_view=reference_profile.view())
[23]:
visualization.summary_drift_report()
[23]:

The report warns us of 3 possible drifts in the following features:

  • driver_avg_daily_trips

  • driver_rate_1m

  • pickup_weekday

Indeed, we artificially changed the first two features, so we might expect to see a drift alert for those. The third one is also a drift, but probably not a relevant one, since it’s pretty obvious that a feature that reflects the day of the week will be different for daily batches (unless both of them are from the same day of the week!)

We can inspect these features further by using distribution_chart() for the driver’s rating and double_histogram() for the daily trips:

Note: Even though both features are integers, each feature can be better visualized with different types of visualizations. That happens because integers can be viewed as number, properly, or as a sort of encoding categorical variables. Since the driver’s rating has few different possible number, and it wouldn’t make sense to group different numbers in a single bin, this feature is better visualized by treating them as categorical variables, and therefore using the distribution_chart() visualization.

[24]:
visualization.distribution_chart(feature_name="driver_rate_1m")
[24]:

There’s a lot of ratings that don’t even show in the reference profile, so it really likes like there’s a significant drift here.

[25]:
visualization.double_histogram(feature_name="driver_avg_daily_trips")

[25]:

Likewise, these histograms almost don’t overlap, so it’s pretty clear that these distributions are different.

The NotebookProfileVisualizer has a bunch of other types of features and visualization. If you like to know more, be sure to check the example on the Notebook Profile Visualizer

Appendix - Changing the Dataset#

This section is not really a part of the demonstration. It’s just to show the changes made in the dataset that originated the driver_stats_changed.parquet file that will be used in the beginning of the notebook.

Driver Statistics#

The NYC taxi datasets provides only information about rides, but in this example we want to show an example of using an online feature store to enrich ride information with driver statistics. So, we’ll fabricate some driver statistics and link them with the rides dataset (nyc_taxi_rides_feb_2020.parquet) through the Driver_ID key.

[ ]:
import pandas as pd


dstats = pd.DataFrame(
        {'event_timestamp': pd.date_range('2020-02-10', '2020-02-17', freq='1H', closed='left')}
     )
dstats['driver_id'] = '1001'

dstats2 = pd.DataFrame(
        {'event_timestamp': pd.date_range('2020-02-10', '2020-02-17', freq='1H', closed='left')}
     )
dstats2['driver_id'] = '1002'

dstats_tot = pd.concat([dstats, dstats2])
[27]:
dstats_tot = dstats_tot.sort_values(by=["event_timestamp","driver_id"])
[28]:
import datetime
dstats_tot['created'] = datetime.datetime.now()
[29]:
import numpy as np

mu, sigma = 30, 6 # mean and standard deviation
s = np.random.normal(mu, sigma, len(dstats_tot))
daily_trips = np.round(s)
daily_trips = [int(x) for x in daily_trips]
dstats_tot['avg_daily_trips'] = daily_trips
[30]:
from scipy.stats import truncnorm

def get_truncated_normal(mean=3, sd=0.75, low=1, upp=11):
    return truncnorm(
        (low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)

X = get_truncated_normal()

dstats_tot['rate_1m'] = [int(x) for x in X.rvs(len(dstats_tot))]
[31]:
import numpy as np

mu, sigma = 20, 4 # mean and standard deviation
s = np.random.normal(mu, sigma, len(dstats_tot))
avg_speed = np.round(s,2)
avg_speed
dstats_tot['avg_speed'] = avg_speed

Adding changes - Stats Update Frequency#

[32]:
dstats_tot = dstats_tot.reset_index()
cond = (dstats_tot['event_timestamp'].dt.day==11) & (dstats_tot['event_timestamp'].dt.month==2) & ((dstats_tot['event_timestamp'].dt.hour%2)!=0)
df2 = dstats_tot.loc[cond]
dstats_tot = dstats_tot[~dstats_tot.isin(df2)].dropna()

Adding changes - Rate_1m#

We’re assuming that this change in customer’s behaviour would not change the mean of the distribution, but would have an increased standard deviation, making the rates be more spreaded, increasing the frequency of extreme ratings (positive or negative).

[33]:
import numpy as np
cond = (dstats_tot['event_timestamp'].dt.day==14) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])

X = get_truncated_normal(mean=3, sd=2, low=1, upp=11)
rate_1m = [int(x) for x in X.rvs(size)]
dstats_tot.loc[cond, 'rate_1m'] = rate_1m

import numpy as np
cond = (dstats_tot['event_timestamp'].dt.day==15) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])

X = get_truncated_normal(mean=3, sd=3, low=1, upp=11)
rate_1m = [int(x) for x in X.rvs(size)]
dstats_tot.loc[cond, 'rate_1m'] = rate_1m

import numpy as np
cond = (dstats_tot['event_timestamp'].dt.day==16) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])

X = get_truncated_normal(mean=3, sd=4, low=1, upp=11)
rate_1m = [int(x) for x in X.rvs(size)]
dstats_tot.loc[cond, 'rate_1m'] = rate_1m

Adding Changes - Avg Daily Trips#

[34]:
cond = (dstats_tot['event_timestamp'].dt.day==14) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])

mu, sigma = 24, 6 # mean and standard deviation
s = np.random.normal(mu, sigma, size)
daily_trips = np.round(s)
daily_trips = [int(x) if x>0 else 0 for x in daily_trips]
# daily_trips

dstats_tot.loc[cond, 'avg_daily_trips'] = daily_trips

cond = (dstats_tot['event_timestamp'].dt.day==15) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])

mu, sigma = 12, 6 # mean and standard deviation
s = np.random.normal(mu, sigma, size)
daily_trips = np.round(s)
daily_trips = [int(x) if x>0 else 0 for x in daily_trips]
# daily_trips

dstats_tot.loc[cond, 'avg_daily_trips'] = daily_trips

cond = (dstats_tot['event_timestamp'].dt.day==16) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])

mu, sigma = 3, 6 # mean and standard deviation
s = np.random.normal(mu, sigma, size)
daily_trips = np.round(s)
daily_trips = [int(x) if x>0 else 0 for x in daily_trips]
# daily_trips

dstats_tot.loc[cond, 'avg_daily_trips'] = daily_trips

dstats_tot = dstats_tot.astype({'driver_id': 'int64','avg_daily_trips':'int64','rate_1m':'int64'})
# dstats_tot.to_parquet("driver_stats.parquet")

Rides Dataset#

The nyc_taxi_rides_feb_2020.parquet was extracted from the TLC trip record data. We randomly sampled the data and selected a few chosen features, in order to reduce the dataset for this demonstration.

In addition, one features was created: The day of the week, based from tpep_pickup_datetime.

The original features are described in this data dictionary.