df["age"] = numpy.where(df["age"] > upper_limit, upper_limit, numpy.where(df["age"] < lower_limit, lower_limit, df["age"]))
Please note that we can also use the following Python statement to remove the outliers after detecting them.
df = df[(df["age"] >= lower_limit) & (df["age"] <= upper_limit)]
The output of the given program for the outlier capping will be:
Outliers:
survived pclass sex age ... deck embark_town alive alone
7 0 3 male 2.00 ... NaN Southampton no False
10 1 3 female 4.00 ... G Southampton yes False
16 0 3 male 2.00 ... NaN Queenstown no False
33 0 2 male 66.00 ... NaN Southampton no True
43 1 2 female 3.00 ... NaN Cherbourg yes False
.. ... ... ... ... ... ... ... ... ...
829 1 1 female 62.00 ... B NaN yes True
831 1 2 male 0.83 ... NaN Southampton yes False
850 0 3 male 4.00 ... NaN Southampton no False
851 0 3 male 74.00 ... NaN Southampton no True
869 1 3 male 4.00 ... NaN Southampton yes False
[62 rows x 15 columns]
survived pclass sex age ... deck embark_town alive alone
0 0 3 male 22.0 ... NaN Southampton no False
1 1 1 female 38.0 ... C Cherbourg yes False
2 1 3 female 26.0 ... NaN Southampton yes True
3 1 1 female 35.0 ... C Southampton yes False
4 0 3 male 35.0 ... NaN Southampton no True
[5 rows x 15 columns]








































0 Comments