whylogs.core.schema
#
Module Contents#
Classes#
Defines the schema for tracking metrics in whylogs. |
|
Schema of a column. |
|
The DeclarativeSchema allows one to customize the set of metrics |
Attributes#
- whylogs.core.schema.logger#
- whylogs.core.schema.LARGE_CACHE_SIZE_LIMIT#
- whylogs.core.schema.T#
- class whylogs.core.schema.DatasetSchema(types: Optional[Dict[str, Any]] = None, default_configs: Optional[whylogs.core.metrics.metrics.MetricConfig] = None, type_mapper: Optional[whylogs.core.datatypes.TypeMapper] = None, resolvers: Optional[whylogs.core.resolvers.Resolver] = None, cache_size: int = 1024, schema_based_automerge: bool = False, segments: Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]] = None, validators: Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]] = None)#
Defines the schema for tracking metrics in whylogs.
In order to customize your tracking, you can extend this class to specify your own column schema or your own type resolution. Otherwise, you can just use the default DatasetSchema object.
Schema objects are also used to group datasets together.
- Parameters
types (Optional[Dict[str, Any]]) –
default_configs (Optional[whylogs.core.metrics.metrics.MetricConfig]) –
type_mapper (Optional[whylogs.core.datatypes.TypeMapper]) –
resolvers (Optional[whylogs.core.resolvers.Resolver]) –
cache_size (int) –
schema_based_automerge (bool) –
segments (Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]]) –
validators (Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]]) –
- types#
required. a dictionay of column name to the Python type.
- default_configs#
optional. Options to configure various behavior of whylogs.
- type_mapper#
Optional. a mapper that transates the Python type to standardized whylogs
DataType
object.
- resolvers#
Optional. an object that defines how to map from a column name, a whylogs
DataType
and a schema to metrics.
Examples
>>> import pandas as pd >>> import numpy as np >>> from whylogs.core import DatasetSchema, DatasetProfile >>> from whylogs.core.resolvers import Resolver, StandardResolver >>> >>> class MyResolver(StandardResolver): ... pass >>> >>> schema = DatasetSchema( ... types={ ... "col1": str, ... "col2": np.int32, ... "col3": pd.CategoricalDtype(categories=('foo', 'bar'), ordered=True) ... }, ... resolvers=MyResolver() ... ) >>> prof = DatasetProfile(schema) >>> df = pd.DataFrame({"col1": ['foo'], "col2": np.array([1], dtype=np.int32), "col3": ['bar']}) >>> prof.track(pandas=df)
- copy() DatasetSchema #
Returns a new instance of the same underlying schema
- Return type
- resolve(*, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Mapping[str, Any]] = None) bool #
- get(name: str) Optional[ColumnSchema] #
- Parameters
name (str) –
- Return type
Optional[ColumnSchema]
- class whylogs.core.schema.ColumnSchema#
Schema of a column.
The main goal is to specify the data type. On top of that, users can configure their own tracker resolution logic (mapping a type to a list of tracker factories) and any additional trackers here.
- dtype: Any#
- type_mapper: whylogs.core.datatypes.TypeMapper#
- resolver: whylogs.core.resolvers.Resolver#
- validators: Dict[str, List[whylogs.core.validators.validator.Validator]]#
- get_metrics(name: str) Dict[str, whylogs.core.metrics.metrics.Metric] #
- Parameters
name (str) –
- Return type
- get_validators(name: str) List[Optional[whylogs.core.validators.validator.Validator]] #
- Parameters
name (str) –
- Return type
List[Optional[whylogs.core.validators.validator.Validator]]
- class whylogs.core.schema.DeclarativeSchema(resolvers: Optional[List[whylogs.core.resolvers.ResolverSpec]] = None, types: Optional[Dict[str, Any]] = None, default_config: Optional[whylogs.core.metrics.metrics.MetricConfig] = None, type_mapper: Optional[whylogs.core.datatypes.TypeMapper] = None, cache_size: int = 1024, schema_based_automerge: bool = False, segments: Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]] = None, validators: Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]] = None)#
Bases:
DatasetSchema
The DeclarativeSchema allows one to customize the set of metrics tracked for each column in a data set. Pass its constructor a list of ResolverSpecs, which specify the column name or data type to match and the list of MetricSpecs to instantiate for matching columns. Each MetricSpec specifies the Metric class and MetricConfig to instantiate. Omit the MetricSpec::config to use the default MetricConfig.
For example, DeclarativeSchema(resolvers=STANDARD_RESOLVER) implements the same schema as DatasetSchema(), i.e., using the default MetricConfig, StandardTypeMapper, StandardResolver, etc. STANDARD_RESOLVER is defined in whylogs/python/whylogs/core/resolvers.py
- Parameters
resolvers (Optional[List[whylogs.core.resolvers.ResolverSpec]]) –
types (Optional[Dict[str, Any]]) –
default_config (Optional[whylogs.core.metrics.metrics.MetricConfig]) –
type_mapper (Optional[whylogs.core.datatypes.TypeMapper]) –
cache_size (int) –
schema_based_automerge (bool) –
segments (Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]]) –
validators (Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]]) –
- add_resolver(resolver_spec: whylogs.core.resolvers.ResolverSpec)#
- Parameters
resolver_spec (whylogs.core.resolvers.ResolverSpec) –
- add_resolver_spec(column_name: Optional[str] = None, column_type: Optional[Any] = None, metrics: Optional[List[whylogs.core.resolvers.MetricSpec]] = None)#
- Parameters
column_name (Optional[str]) –
column_type (Optional[Any]) –
metrics (Optional[List[whylogs.core.resolvers.MetricSpec]]) –
- copy() DeclarativeSchema #
Returns a new instance of the same underlying schema
- Return type
- resolve(*, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Mapping[str, Any]] = None) bool #
- get(name: str) Optional[ColumnSchema] #
- Parameters
name (str) –
- Return type
Optional[ColumnSchema]