whylogs.core.flatten_datasetprofile

Module Contents

Functions

flatten_summary(dataset_summary: whylogs.proto.DatasetSummary) → dict

Flatten a DatasetSummary

_quantile_strings(quantiles: list)

flatten_dataset_quantiles(dataset_summary: whylogs.proto.DatasetSummary)

Flatten quantiles from a dataset summary

flatten_dataset_string_quantiles(dataset_summary: whylogs.proto.DatasetSummary)

Flatten quantiles from a dataset summary

flatten_dataset_histograms(dataset_summary: whylogs.proto.DatasetSummary)

Flatten histograms from a dataset summary

flatten_dataset_frequent_strings(dataset_summary: whylogs.proto.DatasetSummary)

Flatten frequent strings summaries from a dataset summary

get_dataset_frame(dataset_summary: whylogs.proto.DatasetSummary, mapping: dict = None)

Get a dataframe from scalar values flattened from a dataset summary

Attributes

TYPENUM_COLUMN_NAMES

SCALAR_NAME_MAPPING

whylogs.core.flatten_datasetprofile.TYPENUM_COLUMN_NAMES
whylogs.core.flatten_datasetprofile.SCALAR_NAME_MAPPING
whylogs.core.flatten_datasetprofile.flatten_summary(dataset_summary: whylogs.proto.DatasetSummary) dict

Flatten a DatasetSummary

Parameters

dataset_summary (DatasetSummary) – Summary to flatten

Returns

data

A dictionary with the following keys:

summarypandas.DataFrame

Per-column summary statistics

histpandas.Series

Series of histogram Series with (column name, histogram) key, value pairs. Histograms are formatted as a pandas.Series

frequent_stringspandas.Series

Series of frequent string counts with (column name, counts) key, val pairs. counts are a pandas Series.

Return type

dict

Notes

Some relevant info on the summary mapping:

>>> from whylogs.core.datasetprofile import SCALAR_NAME_MAPPING
>>> import json
>>> print(json.dumps(SCALAR_NAME_MAPPING, indent=2))
whylogs.core.flatten_datasetprofile._quantile_strings(quantiles: list)
whylogs.core.flatten_datasetprofile.flatten_dataset_quantiles(dataset_summary: whylogs.proto.DatasetSummary)

Flatten quantiles from a dataset summary

whylogs.core.flatten_datasetprofile.flatten_dataset_string_quantiles(dataset_summary: whylogs.proto.DatasetSummary)

Flatten quantiles from a dataset summary

whylogs.core.flatten_datasetprofile.flatten_dataset_histograms(dataset_summary: whylogs.proto.DatasetSummary)

Flatten histograms from a dataset summary

whylogs.core.flatten_datasetprofile.flatten_dataset_frequent_strings(dataset_summary: whylogs.proto.DatasetSummary)

Flatten frequent strings summaries from a dataset summary

whylogs.core.flatten_datasetprofile.get_dataset_frame(dataset_summary: whylogs.proto.DatasetSummary, mapping: dict = None)

Get a dataframe from scalar values flattened from a dataset summary

Parameters
  • dataset_summary (DatasetSummary) – The dataset summary.

  • mapping (dict, optional) – Override the default variable mapping.

Returns

summary – Scalar values, flattened and re-named according to mapping

Return type

pd.DataFrame