whylogs.core.schema#

Module Contents#

Classes#

DatasetSchema

Defines the schema for tracking metrics in whylogs.

ColumnSchema

Schema of a column.

DeclarativeSchema

The DeclarativeSchema allows one to customize the set of metrics

Attributes#

whylogs.core.schema.logger#
whylogs.core.schema.LARGE_CACHE_SIZE_LIMIT#
whylogs.core.schema.T#
class whylogs.core.schema.DatasetSchema(types: Optional[Dict[str, Any]] = None, default_configs: Optional[whylogs.core.metrics.metrics.MetricConfig] = None, type_mapper: Optional[whylogs.core.datatypes.TypeMapper] = None, resolvers: Optional[whylogs.core.resolvers.Resolver] = None, cache_size: int = 1024, schema_based_automerge: bool = False, segments: Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]] = None, validators: Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]] = None, metadata: Optional[Dict[str, str]] = None)#

Defines the schema for tracking metrics in whylogs.

In order to customize your tracking, you can extend this class to specify your own column schema or your own type resolution. Otherwise, you can just use the default DatasetSchema object.

Schema objects are also used to group datasets together.

Parameters
types#

required. a dictionay of column name to the Python type.

default_configs#

optional. Options to configure various behavior of whylogs.

type_mapper#

Optional. a mapper that transates the Python type to standardized whylogs DataType object.

resolvers#

Optional. an object that defines how to map from a column name, a whylogs DataType and a schema to metrics.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from whylogs.core import DatasetSchema, DatasetProfile
>>> from whylogs.core.resolvers import Resolver, StandardResolver
>>>
>>> class MyResolver(StandardResolver):
...    pass
>>>
>>> schema = DatasetSchema(
...    types={
...        "col1": str,
...        "col2": np.int32,
...        "col3": pd.CategoricalDtype(categories=('foo', 'bar'), ordered=True)
...    },
...    resolvers=MyResolver()
... )
>>> prof = DatasetProfile(schema)
>>> df = pd.DataFrame({"col1": ['foo'], "col2": np.array([1], dtype=np.int32), "col3": ['bar']})
>>> prof.track(pandas=df)
copy() DatasetSchema#

Returns a new instance of the same underlying schema

Return type

DatasetSchema

resolve(*, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Mapping[str, Any]] = None) bool#
Parameters
  • pandas (Optional[whylogs.core.stubs.pd.DataFrame]) –

  • row (Optional[Mapping[str, Any]]) –

Return type

bool

get_col_names() tuple#
Return type

tuple

get(name: str) Optional[ColumnSchema]#
Parameters

name (str) –

Return type

Optional[ColumnSchema]

class whylogs.core.schema.ColumnSchema#

Schema of a column.

The main goal is to specify the data type. On top of that, users can configure their own tracker resolution logic (mapping a type to a list of tracker factories) and any additional trackers here.

dtype: Any#
cfg: whylogs.core.metrics.metrics.MetricConfig#
type_mapper: whylogs.core.datatypes.TypeMapper#
resolver: whylogs.core.resolvers.Resolver#
validators: Dict[str, List[whylogs.core.validators.validator.Validator]]#
get_metrics(name: str) Dict[str, whylogs.core.metrics.metrics.Metric]#
Parameters

name (str) –

Return type

Dict[str, whylogs.core.metrics.metrics.Metric]

get_validators(name: str) List[Optional[whylogs.core.validators.validator.Validator]]#
Parameters

name (str) –

Return type

List[Optional[whylogs.core.validators.validator.Validator]]

class whylogs.core.schema.DeclarativeSchema(resolvers: Optional[List[whylogs.core.resolvers.ResolverSpec]] = None, types: Optional[Dict[str, Any]] = None, default_config: Optional[whylogs.core.metrics.metrics.MetricConfig] = None, type_mapper: Optional[whylogs.core.datatypes.TypeMapper] = None, cache_size: int = 1024, schema_based_automerge: bool = False, segments: Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]] = None, validators: Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]] = None, metadata: Optional[Dict[str, str]] = None)#

Bases: DatasetSchema

The DeclarativeSchema allows one to customize the set of metrics tracked for each column in a data set. Pass its constructor a list of ResolverSpecs, which specify the column name or data type to match and the list of MetricSpecs to instantiate for matching columns. Each MetricSpec specifies the Metric class and MetricConfig to instantiate. Omit the MetricSpec::config to use the default MetricConfig.

For example, DeclarativeSchema(resolvers=STANDARD_RESOLVER) implements the same schema as DatasetSchema(), i.e., using the default MetricConfig, StandardTypeMapper, StandardResolver, etc. STANDARD_RESOLVER is defined in whylogs/python/whylogs/core/resolvers.py

Parameters
add_resolver(resolver_spec: whylogs.core.resolvers.ResolverSpec)#
Parameters

resolver_spec (whylogs.core.resolvers.ResolverSpec) –

add_resolver_spec(column_name: Optional[str] = None, column_type: Optional[Any] = None, metrics: Optional[List[whylogs.core.resolvers.MetricSpec]] = None)#
Parameters
copy() DeclarativeSchema#

Returns a new instance of the same underlying schema

Return type

DeclarativeSchema

resolve(*, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Mapping[str, Any]] = None) bool#
Parameters
  • pandas (Optional[whylogs.core.stubs.pd.DataFrame]) –

  • row (Optional[Mapping[str, Any]]) –

Return type

bool

get_col_names() tuple#
Return type

tuple

get(name: str) Optional[ColumnSchema]#
Parameters

name (str) –

Return type

Optional[ColumnSchema]