whylogs.core.statistics

Define classes for tracking statistics

Subpackages

Submodules

Package Contents

Classes

CountersTracker

Class to keep track of the counts of various data types

NumberTracker

Class to track statistics for numeric data.

SchemaTracker

Track information about a column's schema and present datatypes

StringTracker

Track statistics for strings

ThetaSketch

A sketch for approximate cardinality tracking.

Attributes

__ALL__

class whylogs.core.statistics.CountersTracker(count=0, true_count=0)

Class to keep track of the counts of various data types

Parameters
  • count (int, optional) – Current number of objects

  • true_count (int, optional) – Number of boolean values

  • null_count (int, optional) – Number of nulls encountered

increment_count(self)

Add one to the count of total objects

increment_bool(self)

Add one to the boolean count

increment_null(self)

Add one to the null count

merge(self, other)

Merge another counter tracker with this one

Returns

new_tracker – The merged tracker

Return type

CountersTracker

to_protobuf(self, null_count=0)

Return the object serialized as a protobuf message

static from_protobuf(message: whylogs.proto.Counters)

Load from a protobuf message

Returns

counters

Return type

CountersTracker

class whylogs.core.statistics.NumberTracker(variance: whylogs.core.statistics.datatypes.VarianceTracker = None, floats: whylogs.core.statistics.datatypes.FloatTracker = None, ints: whylogs.core.statistics.datatypes.IntTracker = None, theta_sketch: whylogs.core.statistics.thetasketch.ThetaSketch = None, histogram: datasketches.kll_floats_sketch = None)

Class to track statistics for numeric data.

Parameters
  • variance – Tracker to follow the variance

  • floats – Float tracker for tracking all floats

  • ints – Integer tracker

variance

See above

floats

See above

ints

See above

theta_sketch

Sketch which tracks approximate cardinality

Type

whylabs.logs.core.statistics.thetasketch.ThetaSketch

property count(self)
track(self, number)

Add a number to statistics tracking

Parameters

number (int, float) – A numeric value

merge(self, other)
to_protobuf(self)

Return the object serialized as a protobuf message

static from_protobuf(message: whylogs.proto.NumbersMessage)

Load from a protobuf message

Returns

number_tracker

Return type

NumberTracker

to_summary(self)

Construct a NumberSummary message

Returns

summary – Summary of the tracker statistics

Return type

NumberSummary

class whylogs.core.statistics.SchemaTracker(type_counts: dict = None, legacy_null_count=0)

Track information about a column’s schema and present datatypes

type_countsdict

If specified, a dictionary containing information about the counts of all data types.

UNKNOWN_TYPE
NULL_TYPE
CANDIDATE_MIN_FRAC = 0.7
_non_null_type_counts(self)
track(self, item_type)

Track an item type

get_count(self, item_type)

Return the count of a given item type

infer_type(self)

Generate a guess at what type the tracked values are.

Returns

type_guess – The guess tome. See InferredType.Type for candidates

Return type

object

merge(self, other)

Merge another schema tracker with this and return a new one. Does not alter this object.

Parameters

other (SchemaTracker) –

Returns

merged – Merged tracker

Return type

SchemaTracker

copy(self)

Return a copy of this tracker

to_protobuf(self)

Return the object serialized as a protobuf message

Returns

message

Return type

SchemaMessage

static from_protobuf(message, legacy_null_count=0)

Load from a protobuf message

Returns

schema_tracker

Return type

SchemaTracker

to_summary(self)

Generate a summary of the statistics

Returns

summary – Protobuf summary message.

Return type

SchemaSummary

class whylogs.core.statistics.StringTracker(count: int = None, items: datasketches.frequent_strings_sketch = None, theta_sketch: whylogs.core.statistics.thetasketch.ThetaSketch = None, length: whylogs.core.statistics.numbertracker.NumberTracker = None, token_length: whylogs.core.statistics.numbertracker.NumberTracker = None, char_pos_tracker: CharPosTracker = None, token_method: Callable[[], List[str]] = None)

Track statistics for strings

Parameters
  • count (int) – Total number of processed values

  • items (frequent_strings_sketch) – Sketch for tracking string counts

  • theta_sketch (ThetaSketch) – Sketch for approximate cardinality tracking

  • length (NumberTracker) – tracks the distribution of length of strings

  • token_length (NumberTracker) – counts token per sentence

  • token_method (funtion) – method used to turn string into tokens

  • char_pos_tracker (CharPosTracker) –

update(self, value: str, character_list=None, token_method=None)

Add a string to the tracking statistics.

If value is None, nothing will be done

merge(self, other)

Merge the values of this string tracker with another

Parameters

other (StringTracker) – The other StringTracker

Returns

new – Merged values

Return type

StringTracker

to_protobuf(self)

Return the object serialized as a protobuf message

Returns

message

Return type

StringsMessage

static from_protobuf(message: whylogs.proto.StringsMessage)

Load from a protobuf message

Returns

string_tracker

Return type

StringTracker

to_summary(self)

Generate a summary of the statistics

Returns

summary – Protobuf summary message.

Return type

StringsSummary

class whylogs.core.statistics.ThetaSketch(theta_sketch=None, union=None, compact_theta=None)

A sketch for approximate cardinality tracking.

A wrapper class for datasketches.update_theta_sketch which implements merging for updatable theta sketches.

Currently, datasketches only implements merging for compact (read-only) theta sketches.

update(self, value)

Update the statistics tracking

Parameters

value (object) – Value to follow

merge(self, other)

Merge another ThetaSketch with this one, returning a new object

Parameters

other (ThetaSketch) – Other theta sketch

Returns

new – New theta sketch with merged statistics

Return type

ThetaSketch

get_result(self)

Generate a theta sketch

Returns

compact_sketch – Read-only compact theta sketch with full statistics.

Return type

datasketches.compact_theta_sketch

serialize(self)

Serialize this object.

Note that serialization only preserves the object approximately.

Returns

msg – Serialized to bytes

Return type

bytes

static deserialize(msg: bytes)

Deserialize from a serialized message.

msg

Parameters

msg (bytes) –

Serialized object. can be a serialized version of:
  • ThetaSketch

  • datasketches.update_theta_sketch,

  • datasketches.compact_theta_sketch

Returns

sketch – ThetaSketch object

Return type

ThetaSketch

to_summary(self, num_std_devs=1)

Generate a summary protobuf message

Parameters

num_std_devs (float) – For estimating bounds

Returns

summary – Summary protobuf message

Return type

UniqueCountSummary

whylogs.core.statistics.__ALL__