whylogs.core.statistics.stringtracker

Module Contents

Classes

CharPosTracker

Track statistics for character positions within a string

StringTracker

Track statistics for strings

Attributes

MAX_ITEMS_SIZE

MAX_SUMMARY_ITEMS

logger

whylogs.core.statistics.stringtracker.MAX_ITEMS_SIZE = 128
whylogs.core.statistics.stringtracker.MAX_SUMMARY_ITEMS = 100
whylogs.core.statistics.stringtracker.logger
class whylogs.core.statistics.stringtracker.CharPosTracker(character_list: str = None)

Track statistics for character positions within a string

Parameters

character_list (str) – string containing all characters to be tracked this list can include specific unicode characters to track.

update(self, value: str, character_list: str = None) None

update

Parameters
  • value (str) – utf-16 string

  • character_list (str, optional) – use a specific character_list for the tracked string. Note that modifing it from a previous saved choice, will reset the character position map, since NITL no longer has the same context.

merge(self, other: CharPosTracker) CharPosTracker

Merges two Char Pos Frequency Maps

Parameters

other (CharPosTracker) – to be merged

to_protobuf(self)

Return the object serialized as a protobuf message

static from_protobuf(message: whylogs.proto.CharPosMessage)

Load from a CharPosMessage protobuf message

Return type

CharPosTracker

to_summary(self)
class whylogs.core.statistics.stringtracker.StringTracker(count: int = None, items: datasketches.frequent_strings_sketch = None, theta_sketch: whylogs.core.statistics.thetasketch.ThetaSketch = None, length: whylogs.core.statistics.numbertracker.NumberTracker = None, token_length: whylogs.core.statistics.numbertracker.NumberTracker = None, char_pos_tracker: CharPosTracker = None, token_method: Callable[[], List[str]] = None)

Track statistics for strings

Parameters
  • count (int) – Total number of processed values

  • items (frequent_strings_sketch) – Sketch for tracking string counts

  • theta_sketch (ThetaSketch) – Sketch for approximate cardinality tracking

  • length (NumberTracker) – tracks the distribution of length of strings

  • token_length (NumberTracker) – counts token per sentence

  • token_method (funtion) – method used to turn string into tokens

  • char_pos_tracker (CharPosTracker) –

update(self, value: str, character_list=None, token_method=None)

Add a string to the tracking statistics.

If value is None, nothing will be done

merge(self, other)

Merge the values of this string tracker with another

Parameters

other (StringTracker) – The other StringTracker

Returns

new – Merged values

Return type

StringTracker

to_protobuf(self)

Return the object serialized as a protobuf message

Returns

message

Return type

StringsMessage

static from_protobuf(message: whylogs.proto.StringsMessage)

Load from a protobuf message

Returns

string_tracker

Return type

StringTracker

to_summary(self)

Generate a summary of the statistics

Returns

summary – Protobuf summary message.

Return type

StringsSummary