whylogs.core.columnprofile

Defines the ColumnProfile class for tracking per-column statistics

Module Contents

Classes

ColumnProfile

Statistics tracking for a column (i.e. a feature)

MultiColumnProfile

Statistics tracking for a multiple columns (i.e. a features)

Attributes

_TYPES

_NUMERIC_TYPES

_UNIQUE_COUNT_BOUNDS_STD

whylogs.core.columnprofile._TYPES
whylogs.core.columnprofile._NUMERIC_TYPES
whylogs.core.columnprofile._UNIQUE_COUNT_BOUNDS_STD = 1
class whylogs.core.columnprofile.ColumnProfile(name: str, number_tracker: whylogs.core.statistics.NumberTracker = None, string_tracker: whylogs.core.statistics.StringTracker = None, schema_tracker: whylogs.core.statistics.SchemaTracker = None, counters: whylogs.core.statistics.CountersTracker = None, frequent_items: whylogs.util.dsketch.FrequentItemsSketch = None, cardinality_tracker: whylogs.core.statistics.hllsketch.HllSketch = None, constraints: whylogs.core.statistics.constraints.ValueConstraints = None)

Statistics tracking for a column (i.e. a feature)

The primary method for

Parameters
  • name (str (required)) – Name of the column profile

  • number_tracker (NumberTracker) – Implements numeric data statistics tracking

  • string_tracker (StringTracker) – Implements string data-type statistics tracking

  • schema_tracker (SchemaTracker) – Implements tracking of schema-related information

  • counters (CountersTracker) – Keep count of various things

  • frequent_items (FrequentItemsSketch) – Keep track of all frequent items, even for mixed datatype features

  • cardinality_tracker (HllSketch) – Track feature cardinality (even for mixed data types)

  • constraints (ValueConstraints) – Static assertions to be applied to numeric data tracked in this column

  • TODO

    • Proper TypedDataConverter type checking

    • Multi-threading/parallelism

track(self, value, character_list=None, token_method=None)

Add value to tracking statistics.

_unique_count_summary(self) whylogs.proto.UniqueCountSummary
to_summary(self)

Generate a summary of the statistics

Returns

summary – Protobuf summary message.

Return type

ColumnSummary

generate_constraints(self) whylogs.core.statistics.constraints.SummaryConstraints
merge(self, other)

Merge this columnprofile with another.

Parameters

other (ColumnProfile) –

Returns

merged – A new, merged column profile.

Return type

ColumnProfile

to_protobuf(self)

Return the object serialized as a protobuf message

Returns

message

Return type

ColumnMessage

static from_protobuf(message)

Load from a protobuf message

Returns

column_profile

Return type

ColumnProfile

class whylogs.core.columnprofile.MultiColumnProfile(constraints: whylogs.core.statistics.constraints.MultiColumnValueConstraints = None)

Statistics tracking for a multiple columns (i.e. a features)

The primary method for

Parameters

constraints (MultiColumnValueConstraints) – Static assertions to be applied to data tracked between all columns

track(self, column_dict, character_list=None, token_method=None)

TODO: Add column_dict to tracking statistics.

abstract to_summary(self)

Generate a summary of the statistics

Returns

summary – Protobuf summary message.

Return type

(Multi)ColumnSummary

merge(self, other) MultiColumnProfile

Merge this columnprofile with another.

Parameters

other (MultiColumnProfile) –

Returns

merged – A new, merged multi column profile.

Return type

MultiColumnProfile

abstract to_protobuf(self)

Return the object serialized as a protobuf message

Returns

message

Return type

ColumnMessage

abstract static from_protobuf(message)

Load from a protobuf message

Returns

column_profile

Return type

MultiColumnProfile