Overview

Introduction

WhyLogs (https://github.com/whylabs/whylogs) is an open source data quality library that uses advanced data science statistics to log and monitor data for your AI/ML application. WhyLogs is designed to scale with your MLOps workflow, from local development to production terabyte-size datasets.

Whether you are running an experimentation or production pipeline, understanding the properties of the data that flows through your application is critical to the success of your ML project. WhyLogs enables advanced statistical collection using lightweight techniques, such as building sketches for data, that enable complex monitoring and data quality checks for your pipeline.

Key Features

  • Data Insight: WhyLogs provides complex statistics across different stages of your ML/AI pipelines and applications.

  • Scalability: WhyLogs scales with your system, from local development mode to live production systems in multi-node clusters, and works well with batch and streaming architectures.

  • Lightweight: WhyLogs produces small mergeable lightweight outputs in a variety of formats, using sketching algorithms and summarizing statistics.

  • Unified data instrumentation: To enable data engineering pipelines and ML pipelines to share a common framework for tracking data quality and drifts, the WhyLogs library supports multiple languages and integrations.

  • Observability: In addition to supporting traditional monitoring approaches, WhyLogs data can support advanced ML-focused analytics, error analysis, and data quality and data drift detection.