`whylogs.features.autosegmentation`¶

Module Contents¶

Functions¶

`_entropy`(series: pandas.Series, normalized: bool = True) → numpy.float64	Entropy calculation. If normalized, use log cardinality.
`_weighted_entropy`(df: pandas.DataFrame, split_columns: List[Optional[str]], target_column_name: str, normalized: bool = True)	Entropy calculation. If normalized, use log cardinality.
`_information_gain_ratio`(df: pandas.DataFrame, prev_split_columns: List[Optional[str]], column_name: str, target_column_name: str, normalized: bool = True)	Entropy calculation. If normalized, use log cardinality.
`_find_best_split`(df: pandas.DataFrame, prev_split_columns: List[str], valid_column_names: List[str], target_column_name: str)
`_estimate_segments`(df: pandas.DataFrame, target_field: str = None, max_segments: int = 30) → Optional[Union[List[Dict], List[str]]]	Estimates the most important features and values on which to segment

whylogs.features.autosegmentation._entropy(series: pandas.Series, normalized: bool = True) → numpy.float64¶: Entropy calculation. If normalized, use log cardinality.

whylogs.features.autosegmentation._weighted_entropy(df: pandas.DataFrame, split_columns: List[Optional[str]], target_column_name: str, normalized: bool = True)¶: Entropy calculation. If normalized, use log cardinality.

whylogs.features.autosegmentation._information_gain_ratio(df: pandas.DataFrame, prev_split_columns: List[Optional[str]], column_name: str, target_column_name: str, normalized: bool = True)¶: Entropy calculation. If normalized, use log cardinality.

whylogs.features.autosegmentation._find_best_split(df: pandas.DataFrame, prev_split_columns: List[str], valid_column_names: List[str], target_column_name: str)¶

whylogs.features.autosegmentation._estimate_segments(df: pandas.DataFrame, target_field: str = None, max_segments: int = 30) → Optional[Union[List[Dict], List[str]]]¶

Estimates the most important features and values on which to segment data profiling using entropy-based methods.

If no target column provided, maximum entropy column is substituted.

Parameters

df – the dataframe of data to profile
target_field – target field (optional)
max_segments – upper threshold for total combinations of segments,

default 30 :return: a list of segmentation feature names

whylogs.features.autosegmentation¶

Module Contents¶

Functions¶

`whylogs.features.autosegmentation`¶