Let’s say we are reading the titanic dataset. As we learned from this article, we can use the following code to know the percentage of missing values in each column of the dataset.
import seaborn df = seaborn.load_dataset("titanic") print(df.isnull().mean()*100)
The output will show the following:
survived 0.000000 pclass 0.000000 sex 0.000000 age 19.865320 sibsp 0.000000 parch 0.000000 fare 0.000000 embarked 0.224467 class 0.000000 who 0.000000 adult_male 0.000000 deck 77.216611 embark_town 0.224467 alive 0.000000 alone 0.000000 dtype: float64
So, there are many columns in the dataset that have missing values. Now, these missing values can introduce biases and affect …






0 Comments