Machine learning algorithms understand only numbers. So, if a column in a dataset contains categorical values, we need to encode the categorical values into numbers. We can use label encoding for that purpose. In label encoding, labels are replaced by integers. So, it is also called integer encoding.
Let’s read the titanic dataset. The dataset contains various information, such as age, gender, embark town of passengers, whether the passenger survived, etc. Let’s see what different values the embark town column has.
import seaborn from sklearn.preprocessing import LabelEncoder df = seaborn.load_dataset("titanic") print(df.embark_town.value_counts()) print(df.isnull().sum())
The df.embark_town.value_counts() function shows the number of passengers from each embark town. And df.isnull().sum() shows the number of null values each column of the dataset has.
The output will look like the following:
Southampton 644 Cherbourg 168 Queenstown 77 Name: embark_town, dtype: int64 survived 0 pclass 0 sex 0 age 177 sibsp 0 parch 0 fare 0 embarked 2 class 0 who 0 adult_male 0 deck 688 embark_town 2 alive 0 alone 0 dtype: int64
So, there are three embark towns – Southampton, Cherbourg, and Queenstown. And there are two missing values in the embark town column.
Before we do label encoding, we need to fill in the missing values. Let’s fill in the missing values with “Southampton”, which …






0 Comments