whylogs.core.statistics.stringtracker
¶
Module Contents¶
Classes¶
Track statistics for character positions within a string |
|
Track statistics for strings |
Attributes¶
- whylogs.core.statistics.stringtracker.MAX_ITEMS_SIZE = 128¶
- whylogs.core.statistics.stringtracker.MAX_SUMMARY_ITEMS = 100¶
- whylogs.core.statistics.stringtracker.logger¶
- class whylogs.core.statistics.stringtracker.CharPosTracker(character_list: str = None)¶
Track statistics for character positions within a string
- Parameters
character_list (str) – string containing all characters to be tracked this list can include specific unicode characters to track.
- update(self, value: str, character_list: str = None) None ¶
update
- Parameters
value (str) – utf-16 string
character_list (str, optional) – use a specific character_list for the tracked string. Note that modifing it from a previous saved choice, will reset the character position map, since NITL no longer has the same context.
- merge(self, other: CharPosTracker) CharPosTracker ¶
Merges two Char Pos Frequency Maps
- Parameters
other (CharPosTracker) – to be merged
- to_protobuf(self)¶
Return the object serialized as a protobuf message
- static from_protobuf(message: whylogs.proto.CharPosMessage)¶
Load from a CharPosMessage protobuf message
- Return type
- to_summary(self)¶
- class whylogs.core.statistics.stringtracker.StringTracker(count: int = None, items: datasketches.frequent_strings_sketch = None, theta_sketch: whylogs.core.statistics.thetasketch.ThetaSketch = None, length: whylogs.core.statistics.numbertracker.NumberTracker = None, token_length: whylogs.core.statistics.numbertracker.NumberTracker = None, char_pos_tracker: CharPosTracker = None, token_method: Callable[[], List[str]] = None)¶
Track statistics for strings
- Parameters
count (int) – Total number of processed values
items (frequent_strings_sketch) – Sketch for tracking string counts
theta_sketch (ThetaSketch) – Sketch for approximate cardinality tracking
length (NumberTracker) – tracks the distribution of length of strings
token_length (NumberTracker) – counts token per sentence
token_method (funtion) – method used to turn string into tokens
char_pos_tracker (CharPosTracker) –
- update(self, value: str, character_list=None, token_method=None)¶
Add a string to the tracking statistics.
If value is None, nothing will be done
- merge(self, other)¶
Merge the values of this string tracker with another
- Parameters
other (StringTracker) – The other StringTracker
- Returns
new – Merged values
- Return type
- to_protobuf(self)¶
Return the object serialized as a protobuf message
- Returns
message
- Return type
StringsMessage
- static from_protobuf(message: whylogs.proto.StringsMessage)¶
Load from a protobuf message
- Returns
string_tracker
- Return type
- to_summary(self)¶
Generate a summary of the statistics
- Returns
summary – Protobuf summary message.
- Return type
StringsSummary