whylogs.core.columnprofile
¶
Defines the ColumnProfile class for tracking per-column statistics
Module Contents¶
Classes¶
Statistics tracking for a column (i.e. a feature) |
|
Statistics tracking for a multiple columns (i.e. a features) |
Attributes¶
- whylogs.core.columnprofile._TYPES¶
- whylogs.core.columnprofile._NUMERIC_TYPES¶
- whylogs.core.columnprofile._UNIQUE_COUNT_BOUNDS_STD = 1¶
- class whylogs.core.columnprofile.ColumnProfile(name: str, number_tracker: whylogs.core.statistics.NumberTracker = None, string_tracker: whylogs.core.statistics.StringTracker = None, schema_tracker: whylogs.core.statistics.SchemaTracker = None, counters: whylogs.core.statistics.CountersTracker = None, frequent_items: whylogs.util.dsketch.FrequentItemsSketch = None, cardinality_tracker: whylogs.core.statistics.hllsketch.HllSketch = None, constraints: whylogs.core.statistics.constraints.ValueConstraints = None)¶
Statistics tracking for a column (i.e. a feature)
The primary method for
- Parameters
name (str (required)) – Name of the column profile
number_tracker (NumberTracker) – Implements numeric data statistics tracking
string_tracker (StringTracker) – Implements string data-type statistics tracking
schema_tracker (SchemaTracker) – Implements tracking of schema-related information
counters (CountersTracker) – Keep count of various things
frequent_items (FrequentItemsSketch) – Keep track of all frequent items, even for mixed datatype features
cardinality_tracker (HllSketch) – Track feature cardinality (even for mixed data types)
constraints (ValueConstraints) – Static assertions to be applied to numeric data tracked in this column
TODO –
Proper TypedDataConverter type checking
Multi-threading/parallelism
- track(self, value, character_list=None, token_method=None)¶
Add value to tracking statistics.
- _unique_count_summary(self) whylogs.proto.UniqueCountSummary ¶
- to_summary(self)¶
Generate a summary of the statistics
- Returns
summary – Protobuf summary message.
- Return type
ColumnSummary
- generate_constraints(self) whylogs.core.statistics.constraints.SummaryConstraints ¶
- merge(self, other)¶
Merge this columnprofile with another.
- Parameters
other (ColumnProfile) –
- Returns
merged – A new, merged column profile.
- Return type
- to_protobuf(self)¶
Return the object serialized as a protobuf message
- Returns
message
- Return type
ColumnMessage
- static from_protobuf(message)¶
Load from a protobuf message
- Returns
column_profile
- Return type
- class whylogs.core.columnprofile.MultiColumnProfile(constraints: whylogs.core.statistics.constraints.MultiColumnValueConstraints = None)¶
Statistics tracking for a multiple columns (i.e. a features)
The primary method for
- Parameters
constraints (MultiColumnValueConstraints) – Static assertions to be applied to data tracked between all columns
- track(self, column_dict, character_list=None, token_method=None)¶
TODO: Add column_dict to tracking statistics.
- abstract to_summary(self)¶
Generate a summary of the statistics
- Returns
summary – Protobuf summary message.
- Return type
(Multi)ColumnSummary
- merge(self, other) MultiColumnProfile ¶
Merge this columnprofile with another.
- Parameters
other (MultiColumnProfile) –
- Returns
merged – A new, merged multi column profile.
- Return type
- abstract to_protobuf(self)¶
Return the object serialized as a protobuf message
- Returns
message
- Return type
ColumnMessage
- abstract static from_protobuf(message)¶
Load from a protobuf message
- Returns
column_profile
- Return type