whylogs.core.statistics
¶
Define classes for tracking statistics
Subpackages¶
Submodules¶
Package Contents¶
Classes¶
Class to keep track of the counts of various data types |
|
Class to track statistics for numeric data. |
|
Track information about a column's schema and present datatypes |
|
Track statistics for strings |
|
A sketch for approximate cardinality tracking. |
Attributes¶
- class whylogs.core.statistics.CountersTracker(count=0, true_count=0)¶
Class to keep track of the counts of various data types
- Parameters
count (int, optional) – Current number of objects
true_count (int, optional) – Number of boolean values
null_count (int, optional) – Number of nulls encountered
- increment_count(self)¶
Add one to the count of total objects
- increment_bool(self)¶
Add one to the boolean count
- increment_null(self)¶
Add one to the null count
- merge(self, other)¶
Merge another counter tracker with this one
- Returns
new_tracker – The merged tracker
- Return type
- to_protobuf(self, null_count=0)¶
Return the object serialized as a protobuf message
- static from_protobuf(message: whylogs.proto.Counters)¶
Load from a protobuf message
- Returns
counters
- Return type
- class whylogs.core.statistics.NumberTracker(variance: whylogs.core.statistics.datatypes.VarianceTracker = None, floats: whylogs.core.statistics.datatypes.FloatTracker = None, ints: whylogs.core.statistics.datatypes.IntTracker = None, theta_sketch: whylogs.core.statistics.thetasketch.ThetaSketch = None, histogram: datasketches.kll_floats_sketch = None)¶
Class to track statistics for numeric data.
- Parameters
variance – Tracker to follow the variance
floats – Float tracker for tracking all floats
ints – Integer tracker
- variance¶
See above
- floats¶
See above
- ints¶
See above
- theta_sketch¶
Sketch which tracks approximate cardinality
- Type
whylabs.logs.core.statistics.thetasketch.ThetaSketch
- property count(self)¶
- track(self, number)¶
Add a number to statistics tracking
- Parameters
number (int, float) – A numeric value
- merge(self, other)¶
- to_protobuf(self)¶
Return the object serialized as a protobuf message
- static from_protobuf(message: whylogs.proto.NumbersMessage)¶
Load from a protobuf message
- Returns
number_tracker
- Return type
- to_summary(self)¶
Construct a NumberSummary message
- Returns
summary – Summary of the tracker statistics
- Return type
NumberSummary
- class whylogs.core.statistics.SchemaTracker(type_counts: dict = None, legacy_null_count=0)¶
Track information about a column’s schema and present datatypes
- type_countsdict
If specified, a dictionary containing information about the counts of all data types.
- UNKNOWN_TYPE¶
- NULL_TYPE¶
- CANDIDATE_MIN_FRAC = 0.7¶
- _non_null_type_counts(self)¶
- track(self, item_type)¶
Track an item type
- get_count(self, item_type)¶
Return the count of a given item type
- infer_type(self)¶
Generate a guess at what type the tracked values are.
- Returns
type_guess – The guess tome. See InferredType.Type for candidates
- Return type
object
- _get_most_popular_type(self, total_count)¶
- merge(self, other)¶
Merge another schema tracker with this and return a new one. Does not alter this object.
- Parameters
other (SchemaTracker) –
- Returns
merged – Merged tracker
- Return type
- copy(self)¶
Return a copy of this tracker
- to_protobuf(self)¶
Return the object serialized as a protobuf message
- Returns
message
- Return type
SchemaMessage
- static from_protobuf(message, legacy_null_count=0)¶
Load from a protobuf message
- Returns
schema_tracker
- Return type
- to_summary(self)¶
Generate a summary of the statistics
- Returns
summary – Protobuf summary message.
- Return type
SchemaSummary
- class whylogs.core.statistics.StringTracker(count: int = None, items: datasketches.frequent_strings_sketch = None, theta_sketch: whylogs.core.statistics.thetasketch.ThetaSketch = None, length: whylogs.core.statistics.numbertracker.NumberTracker = None, token_length: whylogs.core.statistics.numbertracker.NumberTracker = None, char_pos_tracker: CharPosTracker = None, token_method: Callable[[], List[str]] = None)¶
Track statistics for strings
- Parameters
count (int) – Total number of processed values
items (frequent_strings_sketch) – Sketch for tracking string counts
theta_sketch (ThetaSketch) – Sketch for approximate cardinality tracking
length (NumberTracker) – tracks the distribution of length of strings
token_length (NumberTracker) – counts token per sentence
token_method (funtion) – method used to turn string into tokens
char_pos_tracker (CharPosTracker) –
- update(self, value: str, character_list=None, token_method=None)¶
Add a string to the tracking statistics.
If value is None, nothing will be done
- merge(self, other)¶
Merge the values of this string tracker with another
- Parameters
other (StringTracker) – The other StringTracker
- Returns
new – Merged values
- Return type
- to_protobuf(self)¶
Return the object serialized as a protobuf message
- Returns
message
- Return type
StringsMessage
- static from_protobuf(message: whylogs.proto.StringsMessage)¶
Load from a protobuf message
- Returns
string_tracker
- Return type
- to_summary(self)¶
Generate a summary of the statistics
- Returns
summary – Protobuf summary message.
- Return type
StringsSummary
- class whylogs.core.statistics.ThetaSketch(theta_sketch=None, union=None, compact_theta=None)¶
A sketch for approximate cardinality tracking.
A wrapper class for datasketches.update_theta_sketch which implements merging for updatable theta sketches.
Currently, datasketches only implements merging for compact (read-only) theta sketches.
- update(self, value)¶
Update the statistics tracking
- Parameters
value (object) – Value to follow
- merge(self, other)¶
Merge another ThetaSketch with this one, returning a new object
- Parameters
other (ThetaSketch) – Other theta sketch
- Returns
new – New theta sketch with merged statistics
- Return type
- get_result(self)¶
Generate a theta sketch
- Returns
compact_sketch – Read-only compact theta sketch with full statistics.
- Return type
datasketches.compact_theta_sketch
- serialize(self)¶
Serialize this object.
Note that serialization only preserves the object approximately.
- Returns
msg – Serialized to bytes
- Return type
bytes
- static deserialize(msg: bytes)¶
Deserialize from a serialized message.
msg
- Parameters
msg (bytes) –
- Serialized object. can be a serialized version of:
ThetaSketch
datasketches.update_theta_sketch,
datasketches.compact_theta_sketch
- Returns
sketch – ThetaSketch object
- Return type
- to_summary(self, num_std_devs=1)¶
Generate a summary protobuf message
- Parameters
num_std_devs (float) – For estimating bounds
- Returns
summary – Summary protobuf message
- Return type
UniqueCountSummary
- whylogs.core.statistics.__ALL__¶