whylogs.util.dsketch
¶
Define functions and classes for interfacing with datasketches
Module Contents¶
Classes¶
A class to implement frequent item counting for mixed data types. |
Functions¶
|
Deserialize a KLL floats sketch. Compatible with whylogs-java |
|
Deserialize a frequent strings sketch. Compatible with whylogs-java |
- whylogs.util.dsketch.deserialize_kll_floats_sketch(x: bytes, kind: str = 'float')¶
Deserialize a KLL floats sketch. Compatible with whylogs-java
whylogs histograms are serialized as kll floats sketches
- Parameters
x (bytes) – Serialized sketch
kind (str, optional) – Specify type of sketch: ‘float’ or ‘int’
- Returns
sketch – If x is an empty sketch, return None, else return the deserialized sketch.
- Return type
kll_floats_sketch, kll_ints_sketch, or None
- whylogs.util.dsketch.deserialize_frequent_strings_sketch(x: bytes)¶
Deserialize a frequent strings sketch. Compatible with whylogs-java
Wrapper for datasketches.frequent_strings_sketch.deserialize
- Parameters
x (bytes) – Serialized sketch
- Returns
sketch – If x is an empty string sketch, returns None, else returns the deserialized string sketch
- Return type
datasketches.frequent_strings_sketch, None
- class whylogs.util.dsketch.FrequentItemsSketch(lg_max_k: int = None, sketch: datasketches.frequent_strings_sketch = None)¶
A class to implement frequent item counting for mixed data types.
Wraps datasketches.frequent_strings_sketch by encoding numbers as strings since the datasketches python implementation does not implement frequent number tracking.
- Parameters
lg_max_k (int, optional) – Parameter controlling the size and accuracy of the sketch. A larger number increases accuracy and the memory requirements for the sketch
sketch (datasketches.frequent_strings_sketch, optional) – Initialize with an existing frequent strings sketch
- DEFAULT_MAX_ITEMS_SIZE = 128¶
- DEFAULT_ERROR_TYPE¶
- get_apriori_error(self, lg_max_map_size: int, estimated_total_weight: int)¶
Return an apriori estimate of the uncertainty for various parameters
- Parameters
lg_max_map_size (int) – The lg_max_k value
estimated_total_weight – Total weight (see
FrequentItems.get_total_weight()
)
- Returns
error – Approximate uncertainty
- Return type
float
- get_epsilon_for_lg_size(self, lg_max_map_size: int)¶
- get_estimate(self, item)¶
- get_lower_bound(self, item)¶
- get_upper_bound(self, item)¶
- get_frequent_items(self, err_type: datasketches.frequent_items_error_type = None, threshold: int = 0, decode: bool = True)¶
Retrieve the frequent items.
- Parameters
err_type (datasketches.frequent_items_error_type) – Override default error type
threshold (int) – Minimum count for returned items
decode (bool (default=True)) – Decode the returned values. Internally, all items are encoded as strings.
- Returns
items – A list of tuples of items:
[(item, count)]
- Return type
list
- get_num_active_items(self)¶
- get_serialized_size_bytes(self)¶
- get_sketch_epsilon(self)¶
- get_total_weight(self)¶
- is_empty(self)¶
- merge(self, other)¶
Merge the item counts of this sketch with another.
This object will not be modified. This operation is commutative.
- Parameters
other (FrequentItemsSketch) – The other sketch
- copy(self)¶
- Returns
sketch – A copy of this sketch
- Return type
- serialize(self)¶
Serialize this sketch as a bytes string.
See also
FrequentItemsSketch.deserialize()
- Returns
data – Serialized object.
- Return type
bytes
- to_string(self, print_items=False)¶
- update(self, x, weight=1)¶
Track an item.
- Parameters
x (object) – Item to track
weight (int) – Number of times the item appears
- to_summary(self, max_items=30, min_count=1)¶
Generate a protobuf summary. Returns None if there are no frequent items.
- Parameters
max_items (int) – Maximum number of items to return. The most frequent items will be returned
min_count (int) – Minimum number counts for all returned items
- Returns
summary – Protobuf summary message
- Return type
FrequentItemsSummary
- to_protobuf(self)¶
Generate a protobuf representation of this object
- static from_protobuf(message: whylogs.proto.FrequentItemsSketchMessage)¶
Initialize a FrequentItemsSketch from a protobuf FrequentItemsSketchMessage
- static _encode_item(x)¶
- static deserialize(x: bytes)¶
Deserialize a frequent numbers sketch.
If x is an empty sketch, None is returned