Outlier handling

Outlier detection and handling

HandleUnivariateOutliers

HandleUnivariateOutliers(
    self,
    inputs,
    *,
    method='z-score',
    treatment='capping',
    deviation_factor=3,
)

A step for detecting and treating univariate outliers in numeric columns.

Parameters

Name Type Description Default
inputs SelectionType A selection of columns to analyze for outliers. All columns must be numeric. required
method str The method to use for detecting outliers. “z-score” detects outliers based on the standard deviation from the mean for normally distributed data. “IQR” detects outliers using the interquartile range for skewed data. 'z-score'
treatment str The treatment to apply to the outliers. capping replaces outlier values with the upper or lower bound, while trimming removes outlier rows from the dataset. 'capping'
deviation_factor int | float The magnitude of deviation from the center is used to calculate the upper and lower bound for outlier detection. For “z-score”, Upper Bound = mean + deviation_factor * standard deviation. Lower Bound = mean - deviation_factor * standard deviation. 68% of the data lies within 1 standard deviation. 95% of the data lies within 2 standard deviations. 99.7% of the data lies within 3 standard deviations. For “IQR”, IQR = Q3 - Q1. Upper Bound = Q3 + deviation_factor * IQR. Lower Bound = Q1 - deviation_factor * IQR. 3

Examples

>>> import ibis_ml as ml

Capping outliers in all numeric columns using z-score method.

>>> step = ml.HandleUnivariateOutliers(ml.numeric())

Trimming outliers in a specific set of columns using IQR method.

>>> step = ml.HandleUnivariateOutliers(
    ["x", "y"],
    method="IQR",
    deviation_factor=2.0,
    treatment="trimming",
   )
Back to top