Outlier handling
Outlier detection and handling
HandleUnivariateOutliers
HandleUnivariateOutliers(self,
inputs,*,
='z-score',
method='capping',
treatment=3,
deviation_factor )
A step for detecting and treating univariate outliers in numeric columns.
Parameters
Name | Type | Description | Default |
---|---|---|---|
inputs | SelectionType | A selection of columns to analyze for outliers. All columns must be numeric. | required |
method | str | The method to use for detecting outliers. “z-score” detects outliers based on the standard deviation from the mean for normally distributed data. “IQR” detects outliers using the interquartile range for skewed data. | 'z-score' |
treatment | str | The treatment to apply to the outliers. capping replaces outlier values with the upper or lower bound, while trimming removes outlier rows from the dataset. |
'capping' |
deviation_factor | int | float | The magnitude of deviation from the center is used to calculate the upper and lower bound for outlier detection. For “z-score”, Upper Bound = mean + deviation_factor * standard deviation . Lower Bound = mean - deviation_factor * standard deviation . 68% of the data lies within 1 standard deviation. 95% of the data lies within 2 standard deviations. 99.7% of the data lies within 3 standard deviations. For “IQR”, IQR = Q3 - Q1 . Upper Bound = Q3 + deviation_factor * IQR . Lower Bound = Q1 - deviation_factor * IQR . |
3 |
Examples
>>> import ibis_ml as ml
Capping outliers in all numeric columns using z-score method.
>>> step = ml.HandleUnivariateOutliers(ml.numeric())
Trimming outliers in a specific set of columns using IQR method.
>>> step = ml.HandleUnivariateOutliers(
"x", "y"],
[="IQR",
method=2.0,
deviation_factor="trimming",
treatment )