whylogs.core
#
Subpackages#
whylogs.core.constraints
whylogs.core.constraints.factories
whylogs.core.constraints.factories.cardinality_metrics
whylogs.core.constraints.factories.condition_counts
whylogs.core.constraints.factories.count_metrics
whylogs.core.constraints.factories.distribution_metrics
whylogs.core.constraints.factories.frequent_items
whylogs.core.constraints.factories.multi_metrics
whylogs.core.constraints.factories.types_metrics
whylogs.core.constraints.metric_constraints
whylogs.core.metrics
whylogs.core.metrics.aggregators
whylogs.core.metrics.column_metrics
whylogs.core.metrics.compound_metric
whylogs.core.metrics.condition_count_metric
whylogs.core.metrics.decorators
whylogs.core.metrics.deserializers
whylogs.core.metrics.maths
whylogs.core.metrics.metric_components
whylogs.core.metrics.metrics
whylogs.core.metrics.multimetric
whylogs.core.metrics.serializers
whylogs.core.metrics.unicode_range
whylogs.core.model_performance_metrics
whylogs.core.proto
whylogs.core.validators
whylogs.core.view
Submodules#
whylogs.core.column_profile
whylogs.core.common
whylogs.core.configs
whylogs.core.dataset_profile
whylogs.core.datatypes
whylogs.core.errors
whylogs.core.feature_weights
whylogs.core.input_resolver
whylogs.core.metadata
whylogs.core.metric_getters
whylogs.core.predicate_parser
whylogs.core.preprocessing
whylogs.core.projectors
whylogs.core.relations
whylogs.core.resolvers
whylogs.core.schema
whylogs.core.segment
whylogs.core.segmentation_partition
whylogs.core.specialized_resolvers
Package Contents#
Classes#
Dataset profile represents a collection of in-memory profiling stats for a dataset. |
|
Helper class that provides a standard way to create an ABC using |
|
Container class for various model-related performance metrics |
|
A resolver maps from a column name and a data type to trackers. |
|
Schema of a column. |
|
Defines the schema for tracking metrics in whylogs. |
|
A Writable is an object that contains data to write to a file or files. |
Functions#
Attributes#
- class whylogs.core.ColumnProfile(name: str, schema: whylogs.core.schema.ColumnSchema, cache_size: int)#
Bases:
object
- Parameters
name (str) –
schema (whylogs.core.schema.ColumnSchema) –
cache_size (int) –
- add_metric(metric: whylogs.core.metrics.Metric) None #
- Parameters
metric (whylogs.core.metrics.Metric) –
- Return type
- track_column(series: Any, identity_values: Optional[Any] = None) None #
- Parameters
series (Any) –
identity_values (Optional[Any]) –
- Return type
- to_protobuf() whylogs.core.proto.ColumnMessage #
- Return type
whylogs.core.proto.ColumnMessage
- view() whylogs.core.view.ColumnProfileView #
- Return type
- class whylogs.core.DatasetProfile(schema: Optional[whylogs.core.schema.DatasetSchema] = None, dataset_timestamp: Optional[datetime.datetime] = None, creation_timestamp: Optional[datetime.datetime] = None, metrics: Optional[Dict[str, Union[whylogs.core.metrics.Metric, Any]]] = None, metadata: Optional[Dict[str, str]] = None)#
Bases:
whylogs.api.writer.writer._Writable
Dataset profile represents a collection of in-memory profiling stats for a dataset.
- Parameters
schema (Optional[whylogs.core.schema.DatasetSchema]) –
DatasetSchema
, optional An object that represents the data column names and typesdataset_timestamp (Optional[datetime.datetime]) – int, optional A timestamp integer that best represents the date tied to the dataset generation. i.e.: A January 1st 2019 Sales Dataset will have 1546300800000 as the timestamp in miliseconds (UTC). If None is provided, it will take the current timestamp as default
creation_timestamp (Optional[datetime.datetime]) – int, optional The timestamp tied to the exact moment when the
DatasetProfile
is created. If None is provided, it will take the current timestamp as defaultmetrics (Optional[Dict[str, Union[whylogs.core.metrics.Metric, Any]]]) –
- property creation_timestamp: datetime.datetime#
- Return type
- property dataset_timestamp: datetime.datetime#
- Return type
- property is_active: bool#
Returns True if the profile tracking code is currently running.
- Return type
- property is_empty: bool#
Returns True if the profile tracking code is currently running.
- Return type
- property model_performance_metrics: whylogs.core.model_performance_metrics.model_performance_metrics.ModelPerformanceMetrics#
- set_dataset_timestamp(dataset_timestamp: datetime.datetime) None #
- Parameters
dataset_timestamp (datetime.datetime) –
- Return type
- add_metric(col_name: str, metric: whylogs.core.metrics.Metric) None #
- Parameters
col_name (str) –
metric (whylogs.core.metrics.Metric) –
- Return type
- add_dataset_metric(name: str, metric: whylogs.core.metrics.Metric) None #
- Parameters
name (str) –
metric (whylogs.core.metrics.Metric) –
- Return type
- add_model_performance_metrics(metric: whylogs.core.model_performance_metrics.model_performance_metrics.ModelPerformanceMetrics) None #
- Parameters
metric (whylogs.core.model_performance_metrics.model_performance_metrics.ModelPerformanceMetrics) –
- Return type
- track(obj: Any = None, *, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Mapping[str, Any]] = None, execute_udfs: bool = True) None #
- view() whylogs.core.view.DatasetProfileView #
- Return type
- classmethod read(input_path: str) whylogs.core.view.DatasetProfileView #
- Parameters
input_path (str) –
- Return type
- class whylogs.core.TypeMapper#
Bases:
abc.ABC
Helper class that provides a standard way to create an ABC using inheritance.
- class whylogs.core.MetricGetter(metric: whylogs.core.metrics.metrics.Metric, path: str)#
Bases:
whylogs.core.relations.ValueGetter
- Parameters
metric (whylogs.core.metrics.metrics.Metric) –
path (str) –
- class whylogs.core.ProfileGetter(profile: Union[whylogs.core.dataset_profile.DatasetProfile, whylogs.core.view.dataset_profile_view.DatasetProfileView], column_name: str, path: str)#
Bases:
whylogs.core.relations.ValueGetter
- Parameters
profile (Union[whylogs.core.dataset_profile.DatasetProfile, whylogs.core.view.dataset_profile_view.DatasetProfileView]) –
column_name (str) –
path (str) –
- class whylogs.core.MetricConfig#
- class whylogs.core.ModelPerformanceMetrics(confusion_matrix: Optional[whylogs.core.model_performance_metrics.confusion_matrix.ConfusionMatrix] = None, regression_metrics: Optional[whylogs.core.model_performance_metrics.regression_metrics.RegressionMetrics] = None, metrics: Optional[Dict[str, whylogs.core.metrics.metrics.Metric]] = None, field_metadata: Optional[Dict[str, Set[str]]] = None)#
Container class for various model-related performance metrics
- Parameters
confusion_matrix (Optional[whylogs.core.model_performance_metrics.confusion_matrix.ConfusionMatrix]) –
regression_metrics (Optional[whylogs.core.model_performance_metrics.regression_metrics.RegressionMetrics]) –
metrics (Optional[Dict[str, whylogs.core.metrics.metrics.Metric]]) –
- confusion_matrix#
ConfusionMatrix which keeps it track of counts with NumberTracker
- Type
- regression_metrics#
Regression Metrics keeps track of a common regression metrics in case the targets are continous.
- Type
- to_protobuf() whylogs.core.proto.v0.ModelProfileMessage #
- Return type
whylogs.core.proto.v0.ModelProfileMessage
- classmethod from_protobuf(message: whylogs.core.proto.v0.ModelProfileMessage) ModelPerformanceMetrics #
- Parameters
message (whylogs.core.proto.v0.ModelProfileMessage) –
- Return type
- compute_confusion_matrix(predictions: List[Union[str, int, bool, float]], targets: List[Union[str, int, bool, float]], scores: Optional[List[float]] = None)#
computes the confusion_matrix, if one is already present merges to old one.
- merge(other) ModelPerformanceMetrics #
- Return type
- class whylogs.core.Predicate(op: Relation = Relation.no_op, value: Union[str, int, float, ValueGetter] = 0, udf: Optional[Callable[[Any], bool]] = None, left: Optional[Predicate] = None, right: Optional[Predicate] = None, component: Optional[str] = None)#
- Parameters
- matches(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- fullmatch(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- search(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- equals(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- less_than(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- less_or_equals(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- greater_than(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- greater_or_equals(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- not_equal(value: Union[str, int, float, ValueGetter]) Predicate #
- Parameters
value (Union[str, int, float, ValueGetter]) –
- Return type
- class whylogs.core.Resolver#
Bases:
abc.ABC
A resolver maps from a column name and a data type to trackers.
Note that the key of the result dictionaries defines the namespaces of the metrics in the serialized form.
- abstract resolve(name: str, why_type: whylogs.core.datatypes.DataType, column_schema: ColumnSchema) Dict[str, whylogs.core.metrics.metrics.Metric] #
- Parameters
name (str) –
why_type (whylogs.core.datatypes.DataType) –
column_schema (ColumnSchema) –
- Return type
- class whylogs.core.ColumnSchema#
Schema of a column.
The main goal is to specify the data type. On top of that, users can configure their own tracker resolution logic (mapping a type to a list of tracker factories) and any additional trackers here.
- dtype: Any#
- type_mapper: whylogs.core.datatypes.TypeMapper#
- resolver: whylogs.core.resolvers.Resolver#
- validators: Dict[str, List[whylogs.core.validators.validator.Validator]]#
- get_metrics(name: str) Dict[str, whylogs.core.metrics.metrics.Metric] #
- Parameters
name (str) –
- Return type
- get_validators(name: str) List[Optional[whylogs.core.validators.validator.Validator]] #
- Parameters
name (str) –
- Return type
List[Optional[whylogs.core.validators.validator.Validator]]
- class whylogs.core.DatasetSchema(types: Optional[Dict[str, Any]] = None, default_configs: Optional[whylogs.core.metrics.metrics.MetricConfig] = None, type_mapper: Optional[whylogs.core.datatypes.TypeMapper] = None, resolvers: Optional[whylogs.core.resolvers.Resolver] = None, cache_size: int = 1024, schema_based_automerge: bool = False, segments: Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]] = None, validators: Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]] = None, metadata: Optional[Dict[str, str]] = None)#
Defines the schema for tracking metrics in whylogs.
In order to customize your tracking, you can extend this class to specify your own column schema or your own type resolution. Otherwise, you can just use the default DatasetSchema object.
Schema objects are also used to group datasets together.
- Parameters
types (Optional[Dict[str, Any]]) –
default_configs (Optional[whylogs.core.metrics.metrics.MetricConfig]) –
type_mapper (Optional[whylogs.core.datatypes.TypeMapper]) –
resolvers (Optional[whylogs.core.resolvers.Resolver]) –
cache_size (int) –
schema_based_automerge (bool) –
segments (Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]]) –
validators (Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]]) –
- types#
required. a dictionay of column name to the Python type.
- default_configs#
optional. Options to configure various behavior of whylogs.
- type_mapper#
Optional. a mapper that transates the Python type to standardized whylogs
DataType
object.
- resolvers#
Optional. an object that defines how to map from a column name, a whylogs
DataType
and a schema to metrics.
Examples
>>> import pandas as pd >>> import numpy as np >>> from whylogs.core import DatasetSchema, DatasetProfile >>> from whylogs.core.resolvers import Resolver, StandardResolver >>> >>> class MyResolver(StandardResolver): ... pass >>> >>> schema = DatasetSchema( ... types={ ... "col1": str, ... "col2": np.int32, ... "col3": pd.CategoricalDtype(categories=('foo', 'bar'), ordered=True) ... }, ... resolvers=MyResolver() ... ) >>> prof = DatasetProfile(schema) >>> df = pd.DataFrame({"col1": ['foo'], "col2": np.array([1], dtype=np.int32), "col3": ['bar']}) >>> prof.track(pandas=df)
- copy() DatasetSchema #
Returns a new instance of the same underlying schema
- Return type
- resolve(*, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Mapping[str, Any]] = None) bool #
- get(name: str) Optional[ColumnSchema] #
- Parameters
name (str) –
- Return type
Optional[ColumnSchema]
- class whylogs.core.SegmentationPartition#
-
- mapper: Optional[ColumnMapperFunction]#
- filter: Optional[SegmentFilter]#
- whylogs.core.WHYLOGS_MAGIC_HEADER = 'WHY1'#
- class whylogs.core.ColumnProfileView(metrics: Dict[str, METRIC], success_count: int = 0, failure_count: int = 0)#
Bases:
object
- merge(other: ColumnProfileView) ColumnProfileView #
- Parameters
other (ColumnProfileView) –
- Return type
- classmethod deserialize(serialized_profile: bytes) ColumnProfileView #
- Parameters
serialized_profile (bytes) –
- Return type
- to_protobuf() whylogs.core.proto.ColumnMessage #
- Return type
whylogs.core.proto.ColumnMessage
- get_metrics() List[whylogs.core.metrics.metrics.Metric] #
- Return type
- to_summary_dict(*, column_metric: Optional[str] = None, cfg: Optional[whylogs.core.configs.SummaryConfig] = None) Dict[str, Any] #
- Parameters
column_metric (Optional[str]) –
cfg (Optional[whylogs.core.configs.SummaryConfig]) –
- Return type
Dict[str, Any]
- classmethod zero(msg: whylogs.core.proto.ColumnMessage) ColumnProfileView #
- Parameters
msg (whylogs.core.proto.ColumnMessage) –
- Return type
- classmethod from_protobuf(msg: whylogs.core.proto.ColumnMessage) ColumnProfileView #
- Parameters
msg (whylogs.core.proto.ColumnMessage) –
- Return type
- classmethod from_bytes(data: bytes) ColumnProfileView #
- Parameters
data (bytes) –
- Return type
- class whylogs.core.DatasetProfileView(*, columns: Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView], dataset_timestamp: Optional[datetime.datetime], creation_timestamp: Optional[datetime.datetime], metrics: Optional[Dict[str, Any]] = None, metadata: Optional[Dict[str, str]] = None)#
Bases:
whylogs.api.writer.writer._Writable
A Writable is an object that contains data to write to a file or files. These might be temporary files intended to be passed on to another consumer (e.g., WhyLabs servers) via a Writer.
- Parameters
columns (Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]) –
dataset_timestamp (Optional[datetime.datetime]) –
creation_timestamp (Optional[datetime.datetime]) –
metrics (Optional[Dict[str, Any]]) –
- property dataset_timestamp: Optional[datetime.datetime]#
- Return type
Optional[datetime.datetime]
- property creation_timestamp: Optional[datetime.datetime]#
- Return type
Optional[datetime.datetime]
- property model_performance_metrics: Any#
- Return type
Any
- set_dataset_timestamp(dataset_timestamp: datetime.datetime) None #
- Parameters
dataset_timestamp (datetime.datetime) –
- Return type
- merge(other: DatasetProfileView) DatasetProfileView #
- Parameters
other (DatasetProfileView) –
- Return type
- get_column(col_name: str) Optional[whylogs.core.view.column_profile_view.ColumnProfileView] #
- Parameters
col_name (str) –
- Return type
Optional[whylogs.core.view.column_profile_view.ColumnProfileView]
- get_columns(col_names: Optional[List[str]] = None) Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView] #
- Parameters
col_names (Optional[List[str]]) –
- Return type
Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]
- classmethod zero() DatasetProfileView #
- Return type
- classmethod deserialize(data: bytes) DatasetProfileView #
- Parameters
data (bytes) –
- Return type
- classmethod read(path: str) DatasetProfileView #
- Parameters
path (str) –
- Return type
- to_pandas(column_metric: Optional[str] = None, cfg: Optional[whylogs.core.configs.SummaryConfig] = None) whylogs.core.stubs.pd.DataFrame #
- Parameters
column_metric (Optional[str]) –
cfg (Optional[whylogs.core.configs.SummaryConfig]) –
- Return type
whylogs.core.stubs.pd.DataFrame