Outlier handling
Outlier detection and handling
HandleUnivariateOutliers
HandleUnivariateOutliers(
self,
inputs,
*,
method='z-score',
treatment='capping',
deviation_factor=3,
)A step for detecting and treating univariate outliers in numeric columns.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| inputs | SelectionType | A selection of columns to analyze for outliers. All columns must be numeric. | required |
| method | str | The method to use for detecting outliers. “z-score” detects outliers based on the standard deviation from the mean for normally distributed data. “IQR” detects outliers using the interquartile range for skewed data. | 'z-score' |
| treatment | str | The treatment to apply to the outliers. capping replaces outlier values with the upper or lower bound, while trimming removes outlier rows from the dataset. |
'capping' |
| deviation_factor | int | float | The magnitude of deviation from the center is used to calculate the upper and lower bound for outlier detection. For “z-score”, Upper Bound = mean + deviation_factor * standard deviation. Lower Bound = mean - deviation_factor * standard deviation. 68% of the data lies within 1 standard deviation. 95% of the data lies within 2 standard deviations. 99.7% of the data lies within 3 standard deviations. For “IQR”, IQR = Q3 - Q1. Upper Bound = Q3 + deviation_factor * IQR. Lower Bound = Q1 - deviation_factor * IQR. |
3 |
Examples
>>> import ibis_ml as mlCapping outliers in all numeric columns using z-score method.
>>> step = ml.HandleUnivariateOutliers(ml.numeric())Trimming outliers in a specific set of columns using IQR method.
>>> step = ml.HandleUnivariateOutliers(
["x", "y"],
method="IQR",
deviation_factor=2.0,
treatment="trimming",
)