whylogs.features.autosegmentation
¶
Module Contents¶
Functions¶
|
Entropy calculation. If normalized, use log cardinality. |
|
Entropy calculation. If normalized, use log cardinality. |
|
Entropy calculation. If normalized, use log cardinality. |
|
|
|
Estimates the most important features and values on which to segment |
- whylogs.features.autosegmentation._entropy(series: pandas.Series, normalized: bool = True) numpy.float64 ¶
Entropy calculation. If normalized, use log cardinality.
- whylogs.features.autosegmentation._weighted_entropy(df: pandas.DataFrame, split_columns: List[Optional[str]], target_column_name: str, normalized: bool = True)¶
Entropy calculation. If normalized, use log cardinality.
- whylogs.features.autosegmentation._information_gain_ratio(df: pandas.DataFrame, prev_split_columns: List[Optional[str]], column_name: str, target_column_name: str, normalized: bool = True)¶
Entropy calculation. If normalized, use log cardinality.
- whylogs.features.autosegmentation._find_best_split(df: pandas.DataFrame, prev_split_columns: List[str], valid_column_names: List[str], target_column_name: str)¶
- whylogs.features.autosegmentation._estimate_segments(df: pandas.DataFrame, target_field: str = None, max_segments: int = 30) Optional[Union[List[Dict], List[str]]] ¶
Estimates the most important features and values on which to segment data profiling using entropy-based methods.
If no target column provided, maximum entropy column is substituted.
- Parameters
df – the dataframe of data to profile
target_field – target field (optional)
max_segments – upper threshold for total combinations of segments,
default 30 :return: a list of segmentation feature names