We can use custom values for removing or capping outliers. For example, we can use custom values as the upper limit and lower limit of data. If a value is less than the lower limit, we can remove the data or replace the data with the lower limit. Similarly, if a value is more than the upper limit, we can either remove the data or replace the value with the upper limit.
Let’s look at an example. Let’s read the titanic dataset. The dataset contains a column “age” that indicates the age of the passengers. In this example, we will use custom values to set the upper limit and lower limit of age. And then, cap the outliers using the custom values.
We can use the following Python code for that purpose:
import seaborn import numpy df = seaborn.load_dataset("titanic") lower_limit = 5 upper_limit = 60 print("Outliers: \n", df[(df["age"] > upper_limit) | (df["age"] < lower_limit)]) df["age"] = numpy.where(df["age"] > upper_limit, upper_limit, numpy.where(df["age"] < lower_limit, lower_limit, df["age"])) print(df.head())
Here, we are using two custom values 5 and 60 as the lower limit and the upper limit, respectively. After that, we are using the numpy.where() function to replace a value that is less than the lower limit with the lower limit. Otherwise, if a value is more than the upper limit, the value is replaced with the upper limit. And if these two conditions are not met, then the value remains as it is...






0 Comments