whylogs.core.statistics.constraints

Module Contents

Classes

ValueConstraint

ValueConstraints express a binary boolean relationship between an implied numeric value and a literal.

SummaryConstraint

Summary constraints specify a relationship between a summary field and a static value,

ValueConstraints

SummaryConstraints

MultiColumnValueConstraint

ValueConstraints express a binary boolean relationship between an implied numeric value and a literal.

MultiColumnValueConstraints

DatasetConstraints

Functions

_try_parse_strftime_format(strftime_val: str, format: str) → Optional[datetime.datetime]

Return whether the string is in a strftime format.

_try_parse_dateutil(dateutil_val: str, ref_val=None) → Optional[datetime.datetime]

Return whether the string can be interpreted as a date.

_try_parse_json(json_string: str, ref_val=None) → Optional[dict]

Return whether the string can be interpreted as json.

_matches_json_schema(json_data: Union[str, dict], json_schema: Union[str, dict]) → bool

Return whether the provided json matches the provided schema.

_check_between_constraint_valid_initialization(lower_value, upper_value, lower_field, upper_field)

_set_between_constraint_default_name(field, lower_value, upper_value, lower_field, upper_field)

_format_set_values_for_display(reference_set)

stddevBetweenConstraint(lower_value=None, upper_value=None, lower_field=None, upper_field=None, name=None, verbose=False)

Defines a summary constraint on the standard deviation of a feature. The standard deviation can be defined to be

meanBetweenConstraint(lower_value=None, upper_value=None, lower_field=None, upper_field=None, name=None, verbose=False)

Defines a summary constraint on the mean (average) of a feature. The mean can be defined to be

minBetweenConstraint(lower_value=None, upper_value=None, lower_field=None, upper_field=None, name=None, verbose=False)

Defines a summary constraint on the minimum value of a feature. The minimum can be defined to be

minGreaterThanEqualConstraint(value=None, field=None, name=None, verbose=False)

Defines a summary constraint on the minimum value of a feature. The minimum can be defined to be

maxBetweenConstraint(lower_value=None, upper_value=None, lower_field=None, upper_field=None, name=None, verbose=False)

Defines a summary constraint on the maximum value of a feature. The maximum can be defined to be

maxLessThanEqualConstraint(value=None, field=None, name=None, verbose=False)

Defines a summary constraint on the maximum value of a feature. The maximum can be defined to be

distinctValuesInSetConstraint(reference_set: Set[Any], name=None, verbose=False)

Defines a summary constraint on the distinct values of a feature. All of the distinct values should

distinctValuesEqualSetConstraint(reference_set: Set[Any], name=None, verbose=False)

Defines a summary constraint on the distinct values of a feature. The set of the distinct values should

distinctValuesContainSetConstraint(reference_set: Set[Any], name=None, verbose=False)

Defines a summary constraint on the distinct values of a feature. The set of user-supplied reference values,

columnValuesInSetConstraint(value_set: Set[Any], name=None, verbose=False)

Defines a value constraint with set operations on the values of a single feature.

containsEmailConstraint(regex_pattern: str = None, name=None, verbose=False)

Defines a value constraint with email regex matching operations on the values of a single feature.

containsCreditCardConstraint(regex_pattern: str = None, name=None, verbose=False)

Defines a value constraint with credit card number regex matching operations on the values of a single feature.

dateUtilParseableConstraint(name=None, verbose=False)

Defines a value constraint which checks if the values of a single feature

jsonParseableConstraint(name=None, verbose=False)

Defines a value constraint which checks if the values of a single feature

matchesJsonSchemaConstraint(json_schema, name=None, verbose=False)

Defines a value constraint which checks if the values of a single feature

strftimeFormatConstraint(format, name=None, verbose=False)

Defines a value constraint which checks if the values of a single feature

containsSSNConstraint(regex_pattern: str = None, name=None, verbose=False)

Defines a value constraint with social security number (SSN) matching operations

containsURLConstraint(regex_pattern: str = None, name=None, verbose=False)

Defines a value constraint with URL regex matching operations on the values of a single feature.

stringLengthEqualConstraint(length: int, name=None, verbose=False)

Defines a value constraint which checks if the string values of a single feature

stringLengthBetweenConstraint(lower_value: int, upper_value: int, name=None, verbose=False)

Defines a value constraint which checks if the string values' length of a single feature

quantileBetweenConstraint(quantile_value: Union[int, float], lower_value: Union[int, float], upper_value: Union[int, float], name=None, verbose: bool = False)

Defines a summary constraint on the n-th quantile value of a numeric feature.

columnUniqueValueCountBetweenConstraint(lower_value: int, upper_value: int, name=None, verbose: bool = False)

Defines a summary constraint on the cardinality of a specific feature.

columnUniqueValueProportionBetweenConstraint(lower_fraction: float, upper_fraction: float, name=None, verbose: bool = False)

Defines a summary constraint on the proportion of unique values of a specific feature.

columnExistsConstraint(column: str, name=None, verbose=False)

Defines a constraint on the data set schema.

numberOfRowsConstraint(n_rows: int, name=None, verbose=False)

Defines a constraint on the data set schema.

columnsMatchSetConstraint(reference_set: Set[str], name=None, verbose=False)

Defines a constraint on the data set schema.

columnMostCommonValueInSetConstraint(value_set: Set[Any], name=None, verbose=False)

Defines a summary constraint on the most common value of a feature.

columnValuesNotNullConstraint(name=None, verbose=False)

Defines a non-null summary constraint on the value of a feature.

missingValuesProportionBetweenConstraint(lower_fraction: float, upper_fraction: float, name: str = None, verbose: bool = False)

Defines a summary constraint on the proportion of missing values of a specific feature.

columnValuesTypeEqualsConstraint(expected_type: Union[whylogs.proto.InferredType, int], name=None, verbose: bool = False)

Defines a summary constraint on the type of the feature values.

columnValuesTypeInSetConstraint(type_set: Set[int], name=None, verbose: bool = False)

Defines a summary constraint on the type of the feature values.

approximateEntropyBetweenConstraint(lower_value: Union[int, float], upper_value: float, name=None, verbose=False)

Defines a summary constraint specifying the expected interval of the features estimated entropy.

parametrizedKSTestPValueGreaterThanConstraint(reference_distribution: Union[List[float], numpy.ndarray], p_value=0.05, name=None, verbose=False)

Defines a summary constraint specifying the expected

columnKLDivergenceLessThanConstraint(reference_distribution: Union[List[Any], numpy.ndarray], threshold: float = 0.5, name=None, verbose: bool = False)

Defines a summary constraint specifying the expected

columnChiSquaredTestPValueGreaterThanConstraint(reference_distribution: Union[List[Any], numpy.ndarray, Mapping[str, int]], p_value: float = 0.05, name=None, verbose: bool = False)

Defines a summary constraint specifying the expected

columnValuesAGreaterThanBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A,

columnValuesAGreaterThanEqualBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A,

columnValuesALessThanBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A,

columnValuesALessThanEqualBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A,

columnValuesAEqualBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A,

columnValuesANotEqualBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A,

sumOfRowValuesOfMultipleColumnsEqualsConstraint(columns: Union[List[str], Set[str], numpy.array], value: Union[float, int, str], name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that the sum of the values in each row

columnPairValuesInSetConstraint(column_A: str, column_B: str, value_set: Set[Tuple[Any, Any]], name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that the pair of values of columns A and B,

columnValuesUniqueWithinRow(column_A: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that the values of column A

Attributes

TYPES

logger

MAX_SET_DISPLAY_MESSAGE_LENGTH

Dict indexed by constraint operator.

_value_funcs

_summary_funcs1

_summary_funcs2

_multi_column_value_funcs

whylogs.core.statistics.constraints.TYPES
whylogs.core.statistics.constraints.logger
whylogs.core.statistics.constraints._try_parse_strftime_format(strftime_val: str, format: str) Optional[datetime.datetime]

Return whether the string is in a strftime format. :param strftime_val: str, string to check for date :param format: format to check if strftime_val can be parsed :return None if not parseable, otherwise the parsed datetime.datetime object

whylogs.core.statistics.constraints._try_parse_dateutil(dateutil_val: str, ref_val=None) Optional[datetime.datetime]

Return whether the string can be interpreted as a date. :param dateutil_val: str, string to check for date :param ref_val: any, not used, interface design requirement :return None if not parseable, otherwise the parsed datetime.datetime object

whylogs.core.statistics.constraints._try_parse_json(json_string: str, ref_val=None) Optional[dict]

Return whether the string can be interpreted as json. :param json_string: str, string to check for json :param ref_val: any, not used, interface design requirement :return None if not parseable, otherwise the parsed json object

whylogs.core.statistics.constraints._matches_json_schema(json_data: Union[str, dict], json_schema: Union[str, dict]) bool

Return whether the provided json matches the provided schema. :param json_data: json object to check :param json_schema: schema to check if the json object matches it :return True if the json data matches the schema, False otherwise

whylogs.core.statistics.constraints.MAX_SET_DISPLAY_MESSAGE_LENGTH = 20

Dict indexed by constraint operator.

These help translate from constraint schema to language-specific functions that are faster to evaluate. This is just a form of currying, and I chose to bind the boolean comparison operator first.

whylogs.core.statistics.constraints._value_funcs
whylogs.core.statistics.constraints._summary_funcs1
whylogs.core.statistics.constraints._summary_funcs2
whylogs.core.statistics.constraints._multi_column_value_funcs
class whylogs.core.statistics.constraints.ValueConstraint(op: whylogs.proto.Op, value=None, regex_pattern: str = None, apply_function=None, name: str = None, verbose=False)

ValueConstraints express a binary boolean relationship between an implied numeric value and a literal. When associated with a ColumnProfile, the relation is evaluated for every incoming value that is processed by whylogs.

Parameters
  • op (whylogs.proto.Op (required)) – Enumeration of binary comparison operator applied between static value and incoming stream. Enum values are mapped to operators like ‘==’, ‘<’, and ‘<=’, etc.

  • value ((one-of)) – When value is provided, regex_pattern must be None. Static value to compare against incoming stream using operator specified in op.

  • regex_pattern ((one-of)) – When regex_pattern is provided, value must be None. Regex pattern to use when MATCH or NOMATCH operations are used.

  • apply_function – To be supplied only when using APPLY_FUNC operation. In case when the apply_function requires argument, to be supplied in the value param.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

property name(self)
update(self, v) bool
apply_func_validate(self, value) str
merge(self, other) ValueConstraint
static from_protobuf(msg: whylogs.proto.ValueConstraintMsg) ValueConstraint
to_protobuf(self) whylogs.proto.ValueConstraintMsg
report(self)
class whylogs.core.statistics.constraints.SummaryConstraint(first_field: str, op: whylogs.proto.Op, value=None, upper_value=None, quantile_value: Union[int, float] = None, second_field: str = None, third_field: str = None, reference_set: Union[List[Any], Set[Any], datasketches.kll_floats_sketch, whylogs.proto.ReferenceDistributionDiscreteMessage] = None, name: str = None, verbose=False)

Summary constraints specify a relationship between a summary field and a static value, or between two summary fields. e.g. ‘min’ < 6

‘std_dev’ < 2.17 ‘min’ > ‘avg’

Parameters
  • first_field (str) – Name of field in NumberSummary that will be compared against either a second field or a static value.

  • op (whylogs.proto.Op (required)) – Enumeration of binary comparison operator applied between summary values. Enum values are mapped to operators like ‘==’, ‘<’, and ‘<=’, etc.

  • value ((one-of)) – Static value to be compared against summary field specified in first_field. Only one of value or second_field should be supplied.

  • upper_value ((one-of)) – Only to be supplied when using Op.BTWN. Static upper boundary value to be compared against summary field specified in first_field. Only one of upper_value or third_field should be supplied.

  • second_field ((one-of)) – Name of second field in NumberSummary to be compared against summary field specified in first_field. Only one of value or second_field should be supplied.

  • third_field ((one-of)) –

    Only to be supplied when op == Op.BTWN. Name of third field in NumberSummary, used as an upper boundary,

    to be compared against summary field specified in first_field.

    Only one of upper_value or third_field should be supplied.

  • reference_set ((one-of)) – Only to be supplied when using set operations or distributional measures. Used as a reference set to be compared with the column distinct values. Or is instance of datasketches.kll_floats_sketch or ReferenceDistributionDiscreteMessage. Only to be supplied for constraints on distributional measures, such as KS test, KL divergence and Chi-Squared test.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

property name(self)
_get_field_name(self)
_get_value_or_field(self)
_get_constraint_type(self)
_check_and_init_table_shape_constraint(self, reference_set)
_check_and_init_valid_set_constraint(self, reference_set)
_check_and_init_distributional_measure_constraint(self, reference_set)
_check_and_init_between_constraint(self)
_get_str_from_ref_set(self) str
_try_cast_set(self) Set[Any]
_get_string_and_numbers_sets(self)
_create_theta_sketch(self, ref_set: set = None)
update(self, update_summary: object) bool
merge(self, other) SummaryConstraint
_check_if_summary_constraint_message_is_valid(msg: whylogs.proto.SummaryConstraintMsg)
static from_protobuf(msg: whylogs.proto.SummaryConstraintMsg) SummaryConstraint
to_protobuf(self) whylogs.proto.SummaryConstraintMsg
report(self)
class whylogs.core.statistics.constraints.ValueConstraints(constraints: Mapping[str, ValueConstraint] = None)
static from_protobuf(msg: whylogs.proto.ValueConstraintMsgs) ValueConstraints
__getitem__(self, name: str) Optional[ValueConstraint]
to_protobuf(self) whylogs.proto.ValueConstraintMsgs
update(self, v)
update_typed(self, v)
merge(self, other) ValueConstraints
report(self) List[tuple]
class whylogs.core.statistics.constraints.SummaryConstraints(constraints: Mapping[str, SummaryConstraint] = None)
static from_protobuf(msg: whylogs.proto.SummaryConstraintMsgs) SummaryConstraints
__getitem__(self, name: str) Optional[SummaryConstraint]
to_protobuf(self) whylogs.proto.SummaryConstraintMsgs
update(self, v)
merge(self, other) SummaryConstraints
report(self) List[tuple]
class whylogs.core.statistics.constraints.MultiColumnValueConstraint(dependent_columns: Union[str, List[str], Tuple[str], numpy.ndarray], op: whylogs.proto.Op, reference_columns: Union[str, List[str], Tuple[str], numpy.ndarray] = None, internal_dependent_cols_op: whylogs.proto.Op = None, value=None, name: str = None, verbose: bool = False)

Bases: ValueConstraint

ValueConstraints express a binary boolean relationship between an implied numeric value and a literal. When associated with a ColumnProfile, the relation is evaluated for every incoming value that is processed by whylogs.

Parameters
  • op (whylogs.proto.Op (required)) – Enumeration of binary comparison operator applied between static value and incoming stream. Enum values are mapped to operators like ‘==’, ‘<’, and ‘<=’, etc.

  • value ((one-of)) – When value is provided, regex_pattern must be None. Static value to compare against incoming stream using operator specified in op.

  • regex_pattern ((one-of)) – When regex_pattern is provided, value must be None. Regex pattern to use when MATCH or NOMATCH operations are used.

  • apply_function – To be supplied only when using APPLY_FUNC operation. In case when the apply_function requires argument, to be supplied in the value param.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

property name(self)
update(self, column_values_dictionary)
merge(self, other) MultiColumnValueConstraint
static from_protobuf(msg: whylogs.proto.MultiColumnValueConstraintMsg) MultiColumnValueConstraint
to_protobuf(self) whylogs.proto.MultiColumnValueConstraintMsg
class whylogs.core.statistics.constraints.MultiColumnValueConstraints(constraints: Mapping[str, MultiColumnValueConstraint] = None)

Bases: ValueConstraints

static from_protobuf(msg: whylogs.proto.ValueConstraintMsgs) MultiColumnValueConstraints
to_protobuf(self) whylogs.proto.ValueConstraintMsgs
class whylogs.core.statistics.constraints.DatasetConstraints(props: whylogs.proto.DatasetProperties, value_constraints: Mapping[str, ValueConstraints] = None, summary_constraints: Mapping[str, SummaryConstraints] = None, table_shape_constraints: Mapping[str, SummaryConstraints] = None, multi_column_value_constraints: Optional[MultiColumnValueConstraints] = None)
__getitem__(self, key)
static from_protobuf(msg: whylogs.proto.DatasetConstraintMsg) DatasetConstraints
static from_json(data: str) DatasetConstraints
to_protobuf(self) whylogs.proto.DatasetConstraintMsg
to_json(self) str
report(self)
whylogs.core.statistics.constraints._check_between_constraint_valid_initialization(lower_value, upper_value, lower_field, upper_field)
whylogs.core.statistics.constraints._set_between_constraint_default_name(field, lower_value, upper_value, lower_field, upper_field)
whylogs.core.statistics.constraints._format_set_values_for_display(reference_set)
whylogs.core.statistics.constraints.stddevBetweenConstraint(lower_value=None, upper_value=None, lower_field=None, upper_field=None, name=None, verbose=False)

Defines a summary constraint on the standard deviation of a feature. The standard deviation can be defined to be between two values, or between the values of two other summary fields of the same feature, such as the minimum and the maximum. The defined interval is a closed interval, which includes both of its limit points.

Parameters
  • lower_value (numeric (one-of)) – Represents the lower value limit of the interval for the standard deviation. If lower_value is supplied, then upper_value must also be supplied, and none of lower_field and upper_field should be provided.

  • upper_value (numeric (one-of)) – Represents the upper value limit of the interval for the standard deviation. If upper_value is supplied, then lower_value must also be supplied, and none of lower_field and upper_field should be provided.

  • lower_field (str (one-of)) – Represents the lower field limit of the interval for the standard deviation. The lower field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used as a lower bound. If lower_field is supplied, then upper_field must also be supplied, and none of lower_value and upper_value should be provided.

  • upper_field (str (one-of)) – Represents the upper field limit of the interval for the standard deviation. The upper field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used as an upper bound. If upper_field is supplied, then lower_field must also be supplied, and none of lower_value and upper_value should be provided.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

SummaryConstraint - a summary constraint defining an interval of values for the standard deviation of a feature

whylogs.core.statistics.constraints.meanBetweenConstraint(lower_value=None, upper_value=None, lower_field=None, upper_field=None, name=None, verbose=False)

Defines a summary constraint on the mean (average) of a feature. The mean can be defined to be between two values, or between the values of two other summary fields of the same feature, such as the minimum and the maximum. The defined interval is a closed interval, which includes both of its limit points.

Parameters
  • lower_value (numeric (one-of)) – Represents the lower value limit of the interval for the mean. If lower_value is supplied, then upper_value must also be supplied, and none of lower_field and upper_field should be provided.

  • upper_value (numeric (one-of)) – Represents the upper value limit of the interval for the mean. If upper_value is supplied, then lower_value must also be supplied, and none of lower_field and upper_field should be provided.

  • lower_field (str (one-of)) – Represents the lower field limit of the interval for the mean. The lower field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used as a lower bound. If lower_field is supplied, then upper_field must also be supplied, and none of lower_value and upper_value should be provided.

  • upper_field (str (one-of)) – Represents the upper field limit of the interval for the mean. The upper field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used as an upper bound. If upper_field is supplied, then lower_field must also be supplied, and none of lower_value and upper_value should be provided.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

SummaryConstraint - a summary constraint defining an interval of values for the mean of a feature

whylogs.core.statistics.constraints.minBetweenConstraint(lower_value=None, upper_value=None, lower_field=None, upper_field=None, name=None, verbose=False)

Defines a summary constraint on the minimum value of a feature. The minimum can be defined to be between two values, or between the values of two other summary fields of the same feature, such as the minimum and the maximum. The defined interval is a closed interval, which includes both of its limit points.

Parameters
  • lower_value (numeric (one-of)) – Represents the lower value limit of the interval for the minimum. If lower_value is supplied, then upper_value must also be supplied, and none of lower_field and upper_field should be provided.

  • upper_value (numeric (one-of)) – Represents the upper value limit of the interval for the minimum. If upper_value is supplied, then lower_value must also be supplied, and none of lower_field and upper_field should be provided.

  • lower_field (str (one-of)) – Represents the lower field limit of the interval for the minimum. The lower field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used as a lower bound. If lower_field is supplied, then upper_field must also be supplied, and none of lower_value and upper_value should be provided.

  • upper_field (str (one-of)) – Represents the upper field limit of the interval for the minimum. The upper field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used as an upper bound. If upper_field is supplied, then lower_field must also be supplied, and none of lower_value and upper_value should be provided.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

SummaryConstraint - a summary constraint defining an interval of values for the minimum value of a feature

whylogs.core.statistics.constraints.minGreaterThanEqualConstraint(value=None, field=None, name=None, verbose=False)

Defines a summary constraint on the minimum value of a feature. The minimum can be defined to be greater than or equal to some value, or greater than or equal to the values of another summary field of the same feature, such as the mean (average).

Parameters
  • value (numeric (one-of)) – Represents the value which should be compared to the minimum value of the specified feature, for checking the greater than or equal to constraint. Only one of value and field should be supplied.

  • field (str (one-of)) – The field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used for checking the greater than or equal to constraint. Only one of field and value should be supplied.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a constraint on the minimum value to be greater than

  • or equal to some value / summary field

whylogs.core.statistics.constraints.maxBetweenConstraint(lower_value=None, upper_value=None, lower_field=None, upper_field=None, name=None, verbose=False)

Defines a summary constraint on the maximum value of a feature. The maximum can be defined to be between two values, or between the values of two other summary fields of the same feature, such as the minimum and the maximum. The defined interval is a closed interval, which includes both of its limit points.

Parameters
  • lower_value (numeric (one-of)) – Represents the lower value limit of the interval for the maximum. If lower_value is supplied, then upper_value must also be supplied, and none of lower_field and upper_field should be provided.

  • upper_value (numeric (one-of)) – Represents the upper value limit of the interval for the maximum. If upper_value is supplied, then lower_value must also be supplied, and none of lower_field and upper_field should be provided.

  • lower_field (str (one-of)) – Represents the lower field limit of the interval for the maximum. The lower field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used as a lower bound. If lower_field is supplied, then upper_field must also be supplied, and none of lower_value and upper_value should be provided.

  • upper_field (str (one-of)) – Represents the upper field limit of the interval for the maximum. The upper field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used as an upper bound. If upper_field is supplied, then lower_field must also be supplied, and none of lower_value and upper_value should be provided.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

SummaryConstraint - a summary constraint defining an interval of values for the maximum value of a feature

whylogs.core.statistics.constraints.maxLessThanEqualConstraint(value=None, field=None, name=None, verbose=False)

Defines a summary constraint on the maximum value of a feature. The maximum can be defined to be less than or equal to some value, or less than or equal to the values of another summary field of the same feature, such as the mean (average).

Parameters
  • value (numeric (one-of)) – Represents the value which should be compared to the maximum value of the specified feature, for checking the less than or equal to constraint. Only one of value and field should be supplied.

  • field (str (one-of)) – The field is a string representing a summary field e.g. min, mean, max, stddev, etc., for which the value will be used for checking the less than or equal to constraint. Only one of field and value should be supplied.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a constraint on the maximum value to be less than

  • or equal to some value / summary field

whylogs.core.statistics.constraints.distinctValuesInSetConstraint(reference_set: Set[Any], name=None, verbose=False)

Defines a summary constraint on the distinct values of a feature. All of the distinct values should belong in the user-provided set or reference values reference_set. Useful for categorical features, for checking if the set of values present in a feature is contained in the set of expected categories.

Parameters
  • reference_set (Set[Any] (required)) – Represents the set of reference (expected) values for a feature. The provided values can be of any type. If at least one of the distinct values of the feature is not in the user specified set reference_set, then the constraint will fail.

  • name (str) – The name of the constraint.

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a constraint on the distinct values of a feature

  • to belong in a user supplied set of values

whylogs.core.statistics.constraints.distinctValuesEqualSetConstraint(reference_set: Set[Any], name=None, verbose=False)

Defines a summary constraint on the distinct values of a feature. The set of the distinct values should be equal to the user-provided set or reference values, reference_set. Useful for categorical features, for checking if the set of values present in a feature is the same as the set of expected categories.

Parameters
  • reference_set (Set[Any] (required)) – Represents the set of reference (expected) values for a feature. The provided values can be of any type. If the distinct values of the feature are not equal to the user specified set reference_set, then the constraint will fail.

  • name (str) – The name of the constraint.

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a constraint on the distinct values of a feature

  • to be equal to a user supplied set of values

whylogs.core.statistics.constraints.distinctValuesContainSetConstraint(reference_set: Set[Any], name=None, verbose=False)

Defines a summary constraint on the distinct values of a feature. The set of user-supplied reference values, reference_set should be a subset of the set of distinct values for the current feature. Useful for categorical features, for checking if the set of values present in a feature is a superset of the set of expected categories.

Parameters
  • reference_set (Set[Any] (required)) – Represents the set of reference (expected) values for a feature. The provided values can be of any type. If at least one of the values of the reference set, specified in reference_set, is not contained in the set of distinct values of the feature, then the constraint will fail.

  • name (str) – The name of the constraint.

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a constraint on the distinct values of a feature

  • to be a super set of the user supplied set of values

whylogs.core.statistics.constraints.columnValuesInSetConstraint(value_set: Set[Any], name=None, verbose=False)

Defines a value constraint with set operations on the values of a single feature. The values of the feature should all be in the set of user-supplied values, specified in value_set. Useful for categorical features, for checking if the values in a feature belong in a predefined set.

Parameters
  • value_set (Set[Any] (required)) – Represents the set of expected values for a feature. The provided values can be of any type. Each value in the feature is checked against the constraint. The total number of failures equals the number of values not in the provided set value_set.

  • name (str) – The name of the constraint.

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • ValueConstraint - a value constraint specifying a constraint on the values of a feature

  • to be drawn from a predefined set of values.

whylogs.core.statistics.constraints.containsEmailConstraint(regex_pattern: str = None, name=None, verbose=False)

Defines a value constraint with email regex matching operations on the values of a single feature. The constraint defines a default email regex pattern, but a user-defined pattern can be supplied to override it. Useful for checking the validity of features with values representing email addresses.

Parameters
  • regex_pattern (str (optional)) – User-defined email regex pattern. If supplied, will override the default email regex pattern provided by whylogs.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for email regex matching of the values of a single feature

whylogs.core.statistics.constraints.containsCreditCardConstraint(regex_pattern: str = None, name=None, verbose=False)

Defines a value constraint with credit card number regex matching operations on the values of a single feature. The constraint defines a default credit card number regex pattern, but a user-defined pattern can be supplied to override it. Useful for checking the validity of features with values representing credit card numbers.

Parameters
  • regex_pattern (str (optional)) – User-defined credit card number regex pattern. If supplied, will override the default credit card number regex pattern provided by whylogs.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for credit card number regex matching of the values of a single feature

whylogs.core.statistics.constraints.dateUtilParseableConstraint(name=None, verbose=False)

Defines a value constraint which checks if the values of a single feature can be parsed by the dateutil parser. Useful for checking if the date time values of a feature are compatible with dateutil.

Parameters
  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for checking if a feature’s values are dateutil parseable

whylogs.core.statistics.constraints.jsonParseableConstraint(name=None, verbose=False)

Defines a value constraint which checks if the values of a single feature are JSON parseable. Useful for checking if the values of a feature can be serialized to JSON.

Parameters
  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for checking if a feature’s values are JSON parseable

whylogs.core.statistics.constraints.matchesJsonSchemaConstraint(json_schema, name=None, verbose=False)

Defines a value constraint which checks if the values of a single feature match a user-provided JSON schema. Useful for checking if the values of a feature can be serialized to match a predefined JSON schema.

Parameters
  • json_schema (Union[str, dict] (required)) – A string or dictionary of key-value pairs representing the expected JSON schema.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for checking if a feature’s values match a user-provided JSON schema

whylogs.core.statistics.constraints.strftimeFormatConstraint(format, name=None, verbose=False)

Defines a value constraint which checks if the values of a single feature are strftime parsable.

Parameters
  • format (str (required)) – A string representing the expected strftime format for parsing the values.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for checking if a feature’s values are strftime parseable

whylogs.core.statistics.constraints.containsSSNConstraint(regex_pattern: str = None, name=None, verbose=False)

Defines a value constraint with social security number (SSN) matching operations on the values of a single feature. The constraint defines a default SSN regex pattern, but a user-defined pattern can be supplied to override it. Useful for checking the validity of features with values representing SNN numbers.

Parameters
  • regex_pattern (str (optional)) – User-defined SSN regex pattern. If supplied, will override the default SSN regex pattern provided by whylogs.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for SSN regex matching of the values of a single feature

whylogs.core.statistics.constraints.containsURLConstraint(regex_pattern: str = None, name=None, verbose=False)

Defines a value constraint with URL regex matching operations on the values of a single feature. The constraint defines a default URL regex pattern, but a user-defined pattern can be supplied to override it. Useful for checking the validity of features with values representing URL addresses.

Parameters
  • regex_pattern (str (optional)) – User-defined URL regex pattern. If supplied, will override the default URL regex pattern provided by whylogs.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for URL regex matching of the values of a single feature

whylogs.core.statistics.constraints.stringLengthEqualConstraint(length: int, name=None, verbose=False)

Defines a value constraint which checks if the string values of a single feature have a predefined length.

Parameters
  • length (int (required)) – A numeric value which represents the expected length of the string values in the specified feature.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

ValueConstraint - a value constraint for checking if a feature’s string values have a predefined length

whylogs.core.statistics.constraints.stringLengthBetweenConstraint(lower_value: int, upper_value: int, name=None, verbose=False)

Defines a value constraint which checks if the string values’ length of a single feature is in some predefined interval.

Parameters
  • lower_value (int (required)) – A numeric value which represents the expected lower bound of the length of the string values in the specified feature.

  • upper_value (int (required)) – A numeric value which represents the expected upper bound of the length of the string values in the specified feature.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool (optional)) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • ValueConstraint - a value constraint for checking if a feature’s string values’

  • length is in a predefined interval

whylogs.core.statistics.constraints.quantileBetweenConstraint(quantile_value: Union[int, float], lower_value: Union[int, float], upper_value: Union[int, float], name=None, verbose: bool = False)

Defines a summary constraint on the n-th quantile value of a numeric feature. The n-th quantile can be defined to be between two values. The defined interval is a closed interval, which includes both of its limit points.

Parameters
  • quantile_value (numeric (required)) – The n-the quantile for which the constraint will be executed

  • lower_value (numeric (required)) – Represents the lower value limit of the interval for the n-th quantile.

  • upper_value (numeric (required)) – Represents the upper value limit of the interval for the n-th quantile.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a closed interval of valid values

  • for the n-th quantile value of a specific feature

whylogs.core.statistics.constraints.columnUniqueValueCountBetweenConstraint(lower_value: int, upper_value: int, name=None, verbose: bool = False)

Defines a summary constraint on the cardinality of a specific feature. The cardinality can be defined to be between two values. The defined interval is a closed interval, which includes both of its limit points. Useful for checking the unique count of values for discrete features.

Parameters
  • lower_value (numeric (required)) – Represents the lower value limit of the interval for the feature cardinality.

  • upper_value (numeric (required)) – Represents the upper value limit of the interval for the feature cardinality.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a closed interval

  • for the valid cardinality of a specific feature

whylogs.core.statistics.constraints.columnUniqueValueProportionBetweenConstraint(lower_fraction: float, upper_fraction: float, name=None, verbose: bool = False)

Defines a summary constraint on the proportion of unique values of a specific feature. The proportion of unique values can be defined to be between two values. The defined interval is a closed interval, which includes both of its limit points. Useful for checking the frequency of unique values for discrete features.

Parameters
  • lower_fraction (fraction between 0 and 1 (required)) – Represents the lower fraction limit of the interval for the feature unique value proportion.

  • upper_fraction (fraction between 0 and 1 (required)) – Represents the upper fraction limit of the interval for the feature cardinality.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a closed interval

  • for the valid proportion of unique values of a specific feature

whylogs.core.statistics.constraints.columnExistsConstraint(column: str, name=None, verbose=False)

Defines a constraint on the data set schema. Checks if the user-supplied column, identified by column, is present in the data set schema.

Parameters
  • column (str (required)) – Represents the name of the column to be checked for existence in the data set.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint which checks the existence of a column

  • in the current data set.

whylogs.core.statistics.constraints.numberOfRowsConstraint(n_rows: int, name=None, verbose=False)

Defines a constraint on the data set schema. Checks if the number of rows in the data set equals the user-supplied number of rows.

Parameters
  • n_rows (int (required)) – Represents the user-supplied expected number of rows.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

SummaryConstraint - a summary constraint which checks the number of rows in the data set

whylogs.core.statistics.constraints.columnsMatchSetConstraint(reference_set: Set[str], name=None, verbose=False)

Defines a constraint on the data set schema. Checks if the set of columns in the data set is equal to the user-supplied set of expected columns.

Parameters
  • reference_set (Set[str] (required)) – Represents the expected columns in the current data set.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint which checks if the column set

  • of the current data set matches the expected column set

whylogs.core.statistics.constraints.columnMostCommonValueInSetConstraint(value_set: Set[Any], name=None, verbose=False)

Defines a summary constraint on the most common value of a feature. The most common value of the feature should be in the set of user-supplied values, value_set. Useful for categorical features, for checking if the most common value of a feature belongs in an expected set of common categories.

Parameters
  • value_set (Set[Any] (required)) – Represents the set of expected values for a feature. The provided values can be of any type. If the most common value of the feature is not in the values of the user-specified value_set, the constraint will fail.

  • name (str) – The name of the constraint.

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a constraint on the most common value of a feature

  • to belong to a set of user-specified expected values

whylogs.core.statistics.constraints.columnValuesNotNullConstraint(name=None, verbose=False)

Defines a non-null summary constraint on the value of a feature. Useful for features for which there is no tolerance for missing values. The constraint will fail if there is at least one missing value in the specified feature.

Parameters
  • name (str) – The name of the constraint.

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining that no missing values

  • are allowed for the specified feature

whylogs.core.statistics.constraints.missingValuesProportionBetweenConstraint(lower_fraction: float, upper_fraction: float, name: str = None, verbose: bool = False)

Defines a summary constraint on the proportion of missing values of a specific feature. The proportion of missing values can be defined to be between two frequency values. The defined interval is a closed interval, which includes both of its limit points. Useful for checking features with expected amounts of missing values.

Parameters
  • lower_fraction (fraction between 0 and 1 (required)) – Represents the lower fraction limit of the interval for the feature missing value proportion.

  • upper_fraction (fraction between 0 and 1 (required)) – Represents the upper fraction limit of the interval for the feature missing value proportion.

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining a closed interval

  • for the valid proportion of missing values of a specific feature

whylogs.core.statistics.constraints.columnValuesTypeEqualsConstraint(expected_type: Union[whylogs.proto.InferredType, int], name=None, verbose: bool = False)

Defines a summary constraint on the type of the feature values. The type of values should be equal to the user-provided expected type.

Parameters
  • expected_type (Union[InferredType, int]) –

    whylogs.proto.InferredType.Type - Enumeration of allowed inferred data types If supplied as integer value, should be one of:

    UNKNOWN = 0 NULL = 1 FRACTIONAL = 2 INTEGRAL = 3 BOOLEAN = 4 STRING = 5

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

equal to a user-provided expected type

Return type

SummaryConstraint - a summary constraint defining that the feature values type should be

whylogs.core.statistics.constraints.columnValuesTypeInSetConstraint(type_set: Set[int], name=None, verbose: bool = False)

Defines a summary constraint on the type of the feature values. The type of values should be in the set of to the user-provided expected types.

Parameters
  • type_set (Set[int]) –

    whylogs.proto.InferredType.Type - Enumeration of allowed inferred data types If supplied as integer value, should be one of:

    UNKNOWN = 0 NULL = 1 FRACTIONAL = 2 INTEGRAL = 3 BOOLEAN = 4 STRING = 5

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

in the set of user-provided expected types

Return type

SummaryConstraint - a summary constraint defining that the feature values type should be

whylogs.core.statistics.constraints.approximateEntropyBetweenConstraint(lower_value: Union[int, float], upper_value: float, name=None, verbose=False)

Defines a summary constraint specifying the expected interval of the features estimated entropy. The defined interval is a closed interval, which includes both of its limit points.

Parameters
  • lower_value (numeric (required)) – Represents the lower value limit of the interval for the feature’s estimated entropy.

  • upper_value (numeric (required)) – Represents the upper value limit of the interval for the feature’s estimated entropy.

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint defining the interval of valid values

  • of the feature’s estimated entropy

whylogs.core.statistics.constraints.parametrizedKSTestPValueGreaterThanConstraint(reference_distribution: Union[List[float], numpy.ndarray], p_value=0.05, name=None, verbose=False)

Defines a summary constraint specifying the expected upper limit of the p-value for rejecting the null hypothesis of the KS test. Can be used only for continuous data.

Parameters
  • reference_distribution (Array-like) – Represents the reference distribution for calculating the KS Test p_value of the column, should be an array-like object with floating point numbers, Only numeric distributions are accepted

  • p_value (float) – Represents the reference p_value value to compare with the p_value of the test Should be between 0 and 1, inclusive

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint specifying the upper limit of the

  • KS test p-value for rejecting the null hypothesis

whylogs.core.statistics.constraints.columnKLDivergenceLessThanConstraint(reference_distribution: Union[List[Any], numpy.ndarray], threshold: float = 0.5, name=None, verbose: bool = False)

Defines a summary constraint specifying the expected upper limit of the threshold for the KL divergence of the specified feature.

Parameters
  • reference_distribution (Array-like) – Represents the reference distribution for calculating the KL Divergence of the column, should be an array-like object with floating point numbers, or integers, strings and booleans, but not both Both numeric and categorical distributions are accepted

  • threshold (float) – Represents the threshold value which if exceeded from the KL Divergence, the constraint would fail

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint specifying the upper threshold of the

  • feature’s KL divergence

whylogs.core.statistics.constraints.columnChiSquaredTestPValueGreaterThanConstraint(reference_distribution: Union[List[Any], numpy.ndarray, Mapping[str, int]], p_value: float = 0.05, name=None, verbose: bool = False)

Defines a summary constraint specifying the expected upper limit of the p-value for rejecting the null hypothesis of the Chi-Squared test. Can be used only for discrete data.

Parameters
  • reference_distribution (Array-like) – Represents the reference distribution for calculating the Chi-Squared test, should be an array-like object with integer, string or boolean values or a mapping of type key: value where the keys are the items and the values are the per-item counts Only categorical distributions are accepted

  • p_value (float) – Represents the reference p_value value to compare with the p_value of the test Should be between 0 and 1, inclusive

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • SummaryConstraint - a summary constraint specifying the upper limit of the

  • Chi-Squared test p-value for rejecting the null hypothesis

whylogs.core.statistics.constraints.columnValuesAGreaterThanBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A, specified in column_A, is greater than the corresponding value of column B, specified in column_B in the same row.

Parameters
  • column_A (str) – The name of column A

  • column_B (str) – The name of column B

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • MultiColumnValueConstraint - multi-column value constraint specifying that values from column A

  • should always be greater than the corresponding values of column B

whylogs.core.statistics.constraints.columnValuesAGreaterThanEqualBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A, specified in column_A, is greater than or equal to the corresponding value of column B, specified in column_B in the same row.

Parameters
  • column_A (str) – The name of column A

  • column_B (str) – The name of column B

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • MultiColumnValueConstraint - multi-column value constraint specifying that values from column A

  • should always be greater than or equal to the corresponding values of column B

whylogs.core.statistics.constraints.columnValuesALessThanBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A, specified in column_A, is less than the corresponding value of column B, specified in column_B in the same row.

Parameters
  • column_A (str) – The name of column A

  • column_B (str) – The name of column B

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • MultiColumnValueConstraint - multi-column value constraint specifying that values from column A

  • should always be less the corresponding values of column B

whylogs.core.statistics.constraints.columnValuesALessThanEqualBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A, specified in column_A, is less than or equal to the corresponding value of column B, specified in column_B in the same row.

Parameters
  • column_A (str) – The name of column A

  • column_B (str) – The name of column B

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • MultiColumnValueConstraint - multi-column value constraint specifying that values from column A

  • should always be less than or equal to the corresponding values of column B

whylogs.core.statistics.constraints.columnValuesAEqualBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A, specified in column_A, is equal to the corresponding value of column B, specified in column_B in the same row.

Parameters
  • column_A (str) – The name of column A

  • column_B (str) – The name of column B

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • MultiColumnValueConstraint - multi-column value constraint specifying that values from column A

  • should always be equal to the corresponding values of column B

whylogs.core.statistics.constraints.columnValuesANotEqualBConstraint(column_A: str, column_B: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that each value in column A, specified in column_A, is different from the corresponding value of column B, specified in column_B in the same row.

Parameters
  • column_A (str) – The name of column A

  • column_B (str) – The name of column B

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Returns

  • MultiColumnValueConstraint - multi-column value constraint specifying that values from column A

  • should always be different from the corresponding values of column B

whylogs.core.statistics.constraints.sumOfRowValuesOfMultipleColumnsEqualsConstraint(columns: Union[List[str], Set[str], numpy.array], value: Union[float, int, str], name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that the sum of the values in each row of the provided columns, specified in columns, should be equal to the user-predefined value, specified in value, or to the corresponding value of another column, which will be specified with a name in the value parameter.

Parameters
  • columns (List[str]) – List of columns for which the sum of row values should equal the provided-value

  • value (Union[float, int, str]) – Numeric value to compare with the sum of the column row values, or a string indicating a column name for which the row value will be compared with the sum

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

MultiColumnValueConstraint - specifying the expected value of the sum of the values in multiple columns

whylogs.core.statistics.constraints.columnPairValuesInSetConstraint(column_A: str, column_B: str, value_set: Set[Tuple[Any, Any]], name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that the pair of values of columns A and B, should be in a user-predefined set of expected pairs of values.

Parameters
  • column_A (str) – The name of the first column

  • column_B (str) – The name of the second column

  • value_set (Set[Tuple[Any, Any]]) – A set of expected pairs of values for the columns A and B, in that order

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

MultiColumnValueConstraint - specifying the expected set of value pairs of two columns in the data set

whylogs.core.statistics.constraints.columnValuesUniqueWithinRow(column_A: str, name=None, verbose: bool = False)

Defines a multi-column value constraint which specifies that the values of column A should be unique within each row of the data set.

Parameters
  • column_A (str) – The name of the column for which it is expected that the values are unique within each row

  • name (str) – Name of the constraint used for reporting

  • verbose (bool) – If true, log every application of this constraint that fails. Useful to identify specific streaming values that fail the constraint.

Return type

MultiColumnValueConstraint - specifying that the provided column’s values are unique within each row