whylogs.core.constraints.factories#

Submodules#

Package Contents#

Functions#

emit_usage(→ None)

distinct_number_in_range(...)

Number of distinct categories must be between lower and upper values (inclusive).

condition_count_below(...)

condition_meets(...)

Checks that all values in column match predicate

condition_never_meets(...)

Checks that no values in column match predicate

count_below_number(...)

Number of elements in a column must be below given number.

no_missing_values(...)

Checks that there are no missing values in the column.

null_percentage_below_number(...)

Percentage of null values must be below given number.

null_values_below_number(...)

Number of null values must be below given number.

greater_than_number(...)

Minimum value of given column must be above defined number.

is_in_range(→ whylogs.core.constraints.MetricConstraint)

Checks that all of column's values are in defined range (inclusive).

is_non_negative(...)

Checks if a column is non negative

mean_between_range(...)

Estimated mean must be between range defined by lower and upper bounds.

quantile_between_range(...)

Q-th quantile value must be withing the range defined by lower and upper boundaries.

smaller_than_number(...)

Maximum value of given column must be below defined number.

stddev_between_range(column_name, lower, upper[, ...])

Estimated standard deviation must be between range defined by lower and upper bounds.

frequent_strings_in_reference_set(...)

Determine whether a set of variables appear in the frequent strings for a string column.

n_most_common_items_in_set(...)

Validate if the top n most common items appear in the dataset.

column_is_probably_unique(...)

column_has_non_zero_types(...)

column_has_zero_count_types(...)

column_is_nullable_boolean(...)

column_is_nullable_datatype(...)

Check if column contains only records of specific datatype.

column_is_nullable_fractional(...)

column_is_nullable_integral(...)

column_is_nullable_object(...)

column_is_nullable_string(...)

Attributes#

ALL

whylogs.core.constraints.factories.emit_usage(event: str) None#
Parameters

event (str) –

Return type

None

whylogs.core.constraints.factories.distinct_number_in_range(column_name: str, lower: Union[int, float], upper: Union[int, float]) whylogs.core.constraints.metric_constraints.MetricConstraint#

Number of distinct categories must be between lower and upper values (inclusive).

Parameters
  • column_name (str) – Column the constraint is applied to

  • lower (int) – Lower bound of defined range

  • upper (int) – Upper bound of the value range

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.condition_count_below(column_name: str, condition_name: str, max_count: int) whylogs.core.constraints.metric_constraints.MetricConstraint#
Parameters
  • column_name (str) –

  • condition_name (str) –

  • max_count (int) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.condition_meets(column_name: str, condition_name: str) whylogs.core.constraints.metric_constraints.MetricConstraint#

Checks that all values in column match predicate

Parameters
  • column_name (str) – Name of the column to apply the constraint to

  • condition_name (str) – Name of the condition that will be applied to each value of the column

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.condition_never_meets(column_name: str, condition_name: str) whylogs.core.constraints.metric_constraints.MetricConstraint#

Checks that no values in column match predicate

Parameters
  • column_name (str) – Name of the column to apply the constraint to

  • condition_name (str) – Name of the condition that will be applied to each value of the column

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.count_below_number(column_name: str, number: int) whylogs.core.constraints.metric_constraints.MetricConstraint#

Number of elements in a column must be below given number.

Parameters
  • column_name (str) – Column the constraint is applied to

  • number (float) – reference value for applying the constraint

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.no_missing_values(column_name: str) whylogs.core.constraints.metric_constraints.MetricConstraint#

Checks that there are no missing values in the column.

Parameters

column_name (str) – Column the constraint is applied to

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.null_percentage_below_number(column_name: str, number: float) whylogs.core.constraints.metric_constraints.MetricConstraint#

Percentage of null values must be below given number.

Parameters
  • column_name (str) – Column the constraint is applied to

  • number (float) – reference value for applying the constraint

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.null_values_below_number(column_name: str, number: int) whylogs.core.constraints.metric_constraints.MetricConstraint#

Number of null values must be below given number.

Parameters
  • column_name (str) – Column the constraint is applied to

  • number (float) – reference value for applying the constraint

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.greater_than_number(column_name: str, number: Union[float, int], skip_missing: bool = True) whylogs.core.constraints.MetricConstraint#

Minimum value of given column must be above defined number.

Parameters
  • column_name (str) – Column the constraint is applied to

  • number (float) – reference value for applying the constraint

  • skip_missing (bool) – If skip_missing is True, missing distribution metrics will make the check pass. If False, the check will fail on missing metrics, such as on an empty dataset

Return type

whylogs.core.constraints.MetricConstraint

whylogs.core.constraints.factories.is_in_range(column_name: str, lower: Union[float, int], upper: Union[float, int], skip_missing: bool = True) whylogs.core.constraints.MetricConstraint#

Checks that all of column’s values are in defined range (inclusive).

For the constraint to pass, the column’s minimum value should be higher or equal than lower and maximum value should be less than or equal to upper.

Parameters
  • column_name (str) – Column the constraint is applied to

  • lower (float) – lower bound of defined range

  • upper (float) – upper bound of defined range

  • skip_missing (bool) – If skip_missing is True, missing distribution metrics will make the check pass. If False, the check will fail on missing metrics, such as on an empty dataset

Return type

whylogs.core.constraints.MetricConstraint

whylogs.core.constraints.factories.is_non_negative(column_name: str, skip_missing: bool = True) whylogs.core.constraints.MetricConstraint#

Checks if a column is non negative

Parameters
  • column_name (str) – Column the constraint is applied to

  • skip_missing (bool) – If skip_missing is True, missing distribution metrics will make the check pass. If False, the check will fail on missing metrics, such as on an empty dataset

Return type

whylogs.core.constraints.MetricConstraint

whylogs.core.constraints.factories.mean_between_range(column_name: str, lower: float, upper: float, skip_missing: bool = True) whylogs.core.constraints.MetricConstraint#

Estimated mean must be between range defined by lower and upper bounds.

Parameters
  • column_name (str) – Column the constraint is applied to

  • lower (int) – Lower bound of defined range

  • upper (int) – Upper bound of the value range

  • skip_missing (bool) – If skip_missing is True, missing distribution metrics will make the check pass. If False, the check will fail on missing metrics, such as on an empty dataset

Return type

whylogs.core.constraints.MetricConstraint

whylogs.core.constraints.factories.quantile_between_range(column_name: str, quantile: float, lower: float, upper: float, skip_missing: bool = True) whylogs.core.constraints.MetricConstraint#

Q-th quantile value must be withing the range defined by lower and upper boundaries.

Parameters
  • column_name (str) – Column the constraint is applied to

  • quantile (float) – Quantile value. E.g. median is equal to quantile_value=0.5

  • lower (float) – Lower bound of defined range

  • upper (float) – Upper bound of the value range

  • skip_missing (bool) – If skip_missing is True, missing distribution metrics will make the check pass. If False, the check will fail on missing metrics, such as on an empty dataset

Return type

whylogs.core.constraints.MetricConstraint

whylogs.core.constraints.factories.smaller_than_number(column_name: str, number: float, skip_missing: bool = True) whylogs.core.constraints.MetricConstraint#

Maximum value of given column must be below defined number.

Parameters
  • column_name (str) – Column the constraint is applied to

  • number (float) – reference value for applying the constraint

  • skip_missing (bool) – If skip_missing is True, missing distribution metrics will make the check pass. If False, the check will fail on missing metrics, such as on an empty dataset

Return type

whylogs.core.constraints.MetricConstraint

whylogs.core.constraints.factories.stddev_between_range(column_name: str, lower: float, upper: float, skip_missing: bool = True)#

Estimated standard deviation must be between range defined by lower and upper bounds.

Parameters
  • column_name (str) – Column the constraint is applied to

  • lower (float) – Lower bound of defined range

  • upper (float) – Upper bound of the value range

  • skip_missing (bool) – If skip_missing is True, missing distribution metrics will make the check pass. If False, the check will fail on missing metrics, such as on an empty dataset

whylogs.core.constraints.factories.frequent_strings_in_reference_set(column_name: str, reference_set: dict) whylogs.core.constraints.metric_constraints.MetricConstraint#

Determine whether a set of variables appear in the frequent strings for a string column. Every item in frequent strings must be in defined reference set

Parameters
  • column_name (str) – Columns the constraint is applied to.

  • reference_set (dict) – Reference set for applying the constraint

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.n_most_common_items_in_set(column_name: str, n: int, reference_set: dict) whylogs.core.constraints.metric_constraints.MetricConstraint#

Validate if the top n most common items appear in the dataset.

Parameters
  • column_name (str) – Columns the constraint is applied to.

  • n (int) – n most common items or strings.

  • reference_set (dict) – Reference set for applying the constraint

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.column_is_probably_unique(column_name: str, hll_stddev: int = 3) whylogs.core.constraints.MetricConstraint#
Parameters
  • column_name (str) –

  • hll_stddev (int) –

Return type

whylogs.core.constraints.MetricConstraint

whylogs.core.constraints.factories.column_has_non_zero_types(column_name: str, types_list: List[str]) whylogs.core.constraints.metric_constraints.MetricConstraint#
Parameters
  • column_name (str) –

  • types_list (List[str]) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.column_has_zero_count_types(column_name: str, types_list: List[str]) whylogs.core.constraints.metric_constraints.MetricConstraint#
Parameters
  • column_name (str) –

  • types_list (List[str]) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.column_is_nullable_boolean(column_name: str) whylogs.core.constraints.metric_constraints.MetricConstraint#
Parameters

column_name (str) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.column_is_nullable_datatype(column_name: str, datatype: str) whylogs.core.constraints.metric_constraints.MetricConstraint#

Check if column contains only records of specific datatype. Datatypes can be: integral, fractional, boolean, string, object.

Returns True if there is at least one record of type datatype and there is no records of remaining types.

Parameters
  • column_name (str) – Column the constraint is applied to

  • datatype (str) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.column_is_nullable_fractional(column_name: str) whylogs.core.constraints.metric_constraints.MetricConstraint#
Parameters

column_name (str) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.column_is_nullable_integral(column_name: str) whylogs.core.constraints.metric_constraints.MetricConstraint#
Parameters

column_name (str) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.column_is_nullable_object(column_name: str) whylogs.core.constraints.metric_constraints.MetricConstraint#
Parameters

column_name (str) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.column_is_nullable_string(column_name: str) whylogs.core.constraints.metric_constraints.MetricConstraint#
Parameters

column_name (str) –

Return type

whylogs.core.constraints.metric_constraints.MetricConstraint

whylogs.core.constraints.factories.ALL#