whylogs.experimental.core.udf_schema#

Module Contents#

Classes#

UdfSpec

Defines UDFs to apply to matching input columns.

UdfSchema

Subclass of DeclarativeSchema that runs the UDFs specified in udf_specs to

Functions#

register_multioutput_udf(→ Callable[[Any], Any])

Decorator to easily configure UDFs for your data set. Decorate your UDF

register_dataset_udf(→ Callable[[Any], Any])

Decorator to easily configure UDFs for your data set. Decorate your UDF

unregister_udf(→ None)

register_type_udf(→ Callable[[Any], Any])

Decorator to easily configure UDFs for your data set. Decorate your UDF

generate_udf_specs(→ List[UdfSpec])

Generates a list of UdfSpecs that implement the UDFs specified by the

udf_schema(→ UdfSchema)

Returns a UdfSchema that implements any registered UDFs, along with any

Attributes#

whylogs.experimental.core.udf_schema.logger#
class whylogs.experimental.core.udf_schema.UdfSpec#

Defines UDFs to apply to matching input columns.

For UDFs matched by column_name(s), the function is passed a dictionary or dataframe with the named columns available (the UDF will not be called unless all the named columns are available). The output column name is the key in the udfs dictionary.

For UDFs matched by column_type, the function is passed the value or Pandas series. The output column name is the key in the udfs dictionary prefixed by the input column name.

You must specify exactly one of column_names or column_type.

column_names: Optional[List[str]]#
udfs: Dict[str, Callable[[Any], Any]]#
column_type: Optional[whylogs.core.datatypes.DataType]#
prefix: Optional[str]#
udf: Optional[Callable[[Any], Any]]#
name: Optional[str]#
class whylogs.experimental.core.udf_schema.UdfSchema(resolvers: Optional[List[whylogs.core.resolvers.ResolverSpec]] = None, types: Optional[Dict[str, Any]] = None, default_config: Optional[whylogs.core.metrics.metrics.MetricConfig] = None, type_mapper: Optional[whylogs.core.datatypes.TypeMapper] = None, cache_size: int = 1024, schema_based_automerge: bool = False, segments: Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]] = None, validators: Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]] = None, udf_specs: Optional[List[UdfSpec]] = None)#

Bases: whylogs.core.schema.DeclarativeSchema

Subclass of DeclarativeSchema that runs the UDFs specified in udf_specs to create new columns before resolving metrics.

Parameters
copy() UdfSchema#
Return type

UdfSchema

apply_udfs(pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Dict[str, Any]] = None) Tuple[Optional[whylogs.core.stubs.pd.DataFrame], Optional[Mapping[str, Any]]]#
Parameters
  • pandas (Optional[whylogs.core.stubs.pd.DataFrame]) –

  • row (Optional[Dict[str, Any]]) –

Return type

Tuple[Optional[whylogs.core.stubs.pd.DataFrame], Optional[Mapping[str, Any]]]

whylogs.experimental.core.udf_schema.register_multioutput_udf(col_names: List[str], udf_name: Optional[str] = None, prefix: Optional[str] = None, namespace: Optional[str] = None, schema_name: str = '', no_prefix: bool = False) Callable[[Any], Any]#

Decorator to easily configure UDFs for your data set. Decorate your UDF functions, then call generate_udf_dataset_schema() to create a UdfSchema that includes the UDFs configured by your decorator parameters. The decorated function will automatically be a UDF in the UdfSchema.

Specify udf_name to give the output of the UDF a name. udf_name defautls to the name of the decorated function. Note that all lambdas are named “lambda”, so omitting udf_name on more than one lambda will result in name collisions. If you pass a namespace, it will be prepended to the UDF name. Specifying schema_name will register the UDF in a particular schema. If omitted, it will be registered to the defualt schema.

For multiple output column UDFs, the udf_name is prepended to the column name supplied by the UDF. The signature for multiple output column UDFs is f(Union[Dict[str, List], pd.DataFrame]) -> Union[Dict[str, List], pd.DataFrame]

Parameters
  • col_names (List[str]) –

  • udf_name (Optional[str]) –

  • prefix (Optional[str]) –

  • namespace (Optional[str]) –

  • schema_name (str) –

  • no_prefix (bool) –

Return type

Callable[[Any], Any]

whylogs.experimental.core.udf_schema.register_dataset_udf(col_names: List[str], udf_name: Optional[str] = None, metrics: Optional[List[whylogs.core.resolvers.MetricSpec]] = None, namespace: Optional[str] = None, schema_name: str = '', anti_metrics: Optional[List[whylogs.core.metrics.metrics.Metric]] = None) Callable[[Any], Any]#

Decorator to easily configure UDFs for your data set. Decorate your UDF functions, then call generate_udf_dataset_schema() to create a UdfSchema that includes the UDFs configured by your decorator parameters. The decorated function will automatically be a UDF in the UdfSchema.

Specify udf_name to give the output of the UDF a name. udf_name defautls to the name of the decorated function. Note that all lambdas are named “lambda”, so omitting udf_name on more than one lambda will result in name collisions. If you pass a namespace, it will be prepended to the UDF name. Specifying schema_name will register the UDF in a particular schema. If omitted, it will be registered to the defualt schema.

If any metrics are passed via the metrics argument, they will be attached to the column produced by the UDF via the schema returned by generate_udf_dataset_schema(). If metrics is None, the UDF output column will get the metrics determined by the other resolvers passed to generate_udf_dataset_schema(), or the STANDARD_UDF_RESOLVER by default. Any anti_metrics will be excluded from the metrics attached to the UDF output.

Parameters
Return type

Callable[[Any], Any]

whylogs.experimental.core.udf_schema.unregister_udf(udf_name: str, namespace: Optional[str] = None, schema_name: str = '') None#
Parameters
  • udf_name (str) –

  • namespace (Optional[str]) –

  • schema_name (str) –

Return type

None

whylogs.experimental.core.udf_schema.register_type_udf(col_type: Type, udf_name: Optional[str] = None, namespace: Optional[str] = None, schema_name: str = '', type_mapper: Optional[whylogs.core.datatypes.TypeMapper] = None) Callable[[Any], Any]#

Decorator to easily configure UDFs for your data set. Decorate your UDF functions, then call generate_udf_dataset_schema() to create a UdfSchema that includes the UDFs configured by your decorator parameters. The decorated function will automatically be a UDF in the UdfSchema.

The registered function will be applied to any columns of the specified type. Specify udf_name to give the output of the UDF a name. udf_name defautls to the name of the decorated function. The output column name is the UDF name prefixed with the input column name. Note that all lambdas are named “lambda”, so omitting udf_name on more than one lambda will result in name collisions. If you pass a namespace, it will be prepended to the UDF name. Specifying schema_name will register the UDF in a particular schema. If omitted, it will be registered to the defualt schema.

Parameters
Return type

Callable[[Any], Any]

whylogs.experimental.core.udf_schema.generate_udf_specs(other_udf_specs: Optional[List[UdfSpec]] = None, schema_name: Union[str, List[str]] = '', include_default_schema: bool = True) List[UdfSpec]#

Generates a list of UdfSpecs that implement the UDFs specified by the @register_dataset_udf, @register_type_udf, and @register_metric_udf decorators. You can provide a list of other_udf_specs to include in addition to those UDFs registered via the decorator.

For example:

@register_dataset_udf(col_names=[“col1”]) def add5(x):

return x + 5

schema = UdfSchema(STANDARD_RESOLVER, udf_specs=generate_udf_specs()) why.log(data, schema=schema)

This will attach a UDF to column “col1” that will generate a new column named “add5” containing the values in “col1” incremented by 5. Since these are appended to the STANDARD_UDF_RESOLVER, the default metrics are also tracked for every column.

Parameters
  • other_udf_specs (Optional[List[UdfSpec]]) –

  • schema_name (Union[str, List[str]]) –

  • include_default_schema (bool) –

Return type

List[UdfSpec]

whylogs.experimental.core.udf_schema.DEFAULT_UDF_SCHEMA_RESOLVER#
whylogs.experimental.core.udf_schema.udf_schema(other_udf_specs: Optional[List[UdfSpec]] = None, resolvers: Optional[List[whylogs.core.resolvers.ResolverSpec]] = None, types: Optional[Dict[str, Any]] = None, default_config: Optional[whylogs.core.metrics.metrics.MetricConfig] = None, type_mapper: Optional[whylogs.core.datatypes.TypeMapper] = None, cache_size: int = 1024, schema_based_automerge: bool = False, segments: Optional[Dict[str, whylogs.core.segmentation_partition.SegmentationPartition]] = None, validators: Optional[Dict[str, List[whylogs.core.validators.validator.Validator]]] = None, schema_name: Union[str, List[str]] = '', include_default_schema: bool = True) UdfSchema#

Returns a UdfSchema that implements any registered UDFs, along with any other_udf_specs or resolvers passed in.

Parameters
Return type

UdfSchema