parts. Q0 is the minimum value, Q4 is the maximum value, Q2 or the second quartile is the median value of the data, Q1 is the middle of Q0 and Q2 and Q3 is the middle of Q2 and Q4. And the difference between Q1 and Q3 is called the Inter-Qauartile Range or IQR.

And a value is called an outlier if it is less than (Q1 – 1.5 x IQR) or more than (Q3 + 1.5 x IQR).
We can use the following Python code to detect outliers in a dataset.
import seaborn
from matplotlib import pyplot
df = seaborn.load_dataset("titanic")
seaborn.boxplot(data=df, x="age")
pyplot.savefig("titanic-age-outliers.png")
pyplot.close()
q3 = df["age"].quantile(q=0.75)
q1 = df["age"].quantile(q=0.25)
iqr = q3 - q1
lower_cutoff = q1 - 1.5 * iqr
upper_cutoff = q3 + 1.5 * iqr
print("Lower cutoff of age: ", lower_cutoff)
print("Upper cutoff of age: ", upper_cutoff)
print(df[(df["age"] > upper_cutoff) | (df["age"] < lower_cutoff)])
Here, we are first calculating Q1, Q3, and IQR. After that we are calculating the upper cutoff and lower cutoff of age. Now, if a value in the age column is less than lower_cutoff or more than upper_cutoff, we know that the value is an outlier.
The output of the above program will be:








































0 Comments