Lower cutoff of age: -6.6875
Upper cutoff of age: 64.8125
survived pclass sex age ... deck embark_town alive alone
33 0 2 male 66.0 ... NaN Southampton no True
54 0 1 male 65.0 ... B Cherbourg no False
96 0 1 male 71.0 ... A Cherbourg no True
116 0 3 male 70.5 ... NaN Queenstown no True
280 0 3 male 65.0 ... NaN Queenstown no True
456 0 1 male 65.0 ... E Southampton no True
493 0 1 male 71.0 ... NaN Cherbourg no True
630 1 1 male 80.0 ... A Southampton yes True
672 0 2 male 70.0 ... NaN Southampton no True
745 0 1 male 70.0 ... B Southampton no False
851 0 3 male 74.0 ... NaN Southampton no True
[11 rows x 15 columns]
How to perform outlier trimming using Inter Quartile Range (IQR)?
So, till now, we have detected outliers in a dataset. Now, we will perform outlier trimming based on Inter Quartile Range (IQR). We can use the following Python code for that purpose:
import seaborn
from matplotlib import pyplot
df = seaborn.load_dataset("titanic")
seaborn.boxplot(data=df, x="age")
pyplot.savefig("titanic-age-outliers.png")
pyplot.close()
q3 = df["age"].quantile(q=0.75)
q1 = df["age"].quantile(q=0.25)
iqr = q3 - q1
lower_cutoff = q1 - 1.5 * iqr
upper_cutoff = q3 + 1.5 * iqr
print("Lower cutoff of age: ", lower_cutoff)
print("Upper cutoff of age: ", upper_cutoff)
print(df[(df["age"] > upper_cutoff) | (df["age"] < lower_cutoff)]) df = df[(df["age"] >= lower_cutoff) & (df["age"] <= upper_cutoff)]
print(df.head())
seaborn.boxplot(data=df, x="age")
pyplot.savefig("titanic-age-without-outliers.png")
pyplot.close()
Here, we are using the following Python statement to select only those rows from the dataset where the value in the age column …








































0 Comments