In the missing category imputation, we fill in the missing values in a categorical column with the word “Missing.” For example, let’s read the titanic dataset. As we can see from the following program, the dataset has some missing values in the embark town column.
import seaborn df = seaborn.load_dataset("titanic") print(df.embark_town.isnull().mean()*100)
The Python statement df.embark_town.isnull().mean()*100 gives us the percentage of missing values in the embark town column of the titanic dataset. The output will show the following:
0.22446689113355783
So, the embark town column has 0.22% missing values. Now, let’s fill in the missing values with the value “Missing.”
import seaborn from matplotlib import pyplot df = seaborn.load_dataset("titanic") print(df.embark_town.isnull().mean()*100) df.embark_town.fillna(value="Missing", inplace=True) print(df.embark_town.isnull().mean()*100) series1 = df.embark_town.value_counts() pyplot.bar(series1.index, series1.values) pyplot.xlabel("Embark Town") pyplot.ylabel("Number of Passengers") pyplot.savefig("sklearn-missing-value-imputation.png") pyplot.close()
Here, we are using the fillna() function to fill missing values in the embark_town column with the value “Missing.” Now, if we …






0 Comments