In our previous articles, we discussed mean or median imputation, end-of-distribution imputation, and frequent category imputation. But, sometimes we may want to fill in missing numerical values with numbers that indicate missing values. For example, if we are imputing missing values in the age column, we can impute the number 9999 or -1 in the place of missing numbers. As a result, the distribution of data will not be impacted, yet missing numbers will be handled.
To summarize, in arbitrary value imputation, we replace missing values in a numerical column with numbers that are chosen arbitrarily. And we select the arbitrary numbers in such a way that the numbers do not belong to the dataset. The numbers rather signify that there is a missing number in the column.
Let’s read the titanic dataset and see the percentage of missing values in each column of the dataset.
import seaborn df = seaborn.load_dataset("titanic") print(df.isnull().mean()*100)
As we discussed previously, print(df.isnull().mean()*100) will print the percentage of missing values in each column of the dataset. The output will be like the following:
survived 0.000000 pclass 0.000000 sex 0.000000 age 19.865320 sibsp 0.000000 parch 0.000000 fare 0.000000 embarked 0.224467 class 0.000000 who 0.000000 adult_male 0.000000 deck 77.216611 embark_town 0.224467 alive 0.000000 alone 0.000000 dtype: float64
So, the age column of the dataset has 19.865320% missing data. Let’s fill in the missing ages with the number -1. We can use the following Python code for that purpose:






0 Comments