Let’s say a dataset contains categorical values in a column and there are some missing categorical values. So, we need to handle missing categorical values. There are two ways of handling missing categorical values using the pandas Python library. We can do mode imputation or “missing” value imputation. Let’s try to understand them in detail.
In the case of mode imputation, the missing values in a column are replaced with the most frequent non-null value appearing in the column. Let’s look at an example.
Let’s read the titanic dataset and see the missing values appearing in each column.
import pandas df = pandas.read_csv("titanic.csv") print(df.head())
The output is like the following:
survived pclass sex age ... deck embark_town alive alone 0 0 3 male 22.0 ... NaN Southampton no False 1 1 1 female 38.0 ... C Cherbourg yes False 2 1 3 female 26.0 ... NaN Southampton yes True 3 1 1 female 35.0 ... C Southampton yes False 4 0 3 male 35.0 ... NaN Southampton no True [5 rows x 15 columns]
So, the dataset has 15 columns. Out of them, we can see the embark_town column contains categorical values. So, let’s create …






0 Comments