What is One-Hot Encoding?
Let’s say a column in a dataset contains categorical values. There are three different values in the categorical column. Let’s say, these values are “A”, “B”, and “C”. If we perform One Hot Encoding on the data of the column, then three different columns will be added to the dataset, as there are three different values for the column. Let’s call these columns “A”, “B”, and “C.” Now, rows that have “A” in the original column, will have 1 in the “A” column, and 0s in the “B” and “C” columns.
Similarly, rows that have “B” in the original column, will have 1 in the “B” column of the One Hot encoded table. And columns “A” and “C” will contain 0s for those rows.
If we look closely, we do not need to add three columns in the One-Hot Encoded table. Adding two columns will be sufficient. Rows that have 0s in both the “A” and “B” columns, will definitely have 1 in the “C” column. So, adding only columns “A” and “B” will be sufficient in this case.
How to perform One Hot Encoding using sklearn?
Let’s read the titanic dataset. The dataset contains various information, such as age, gender, embark town of the passengers, …






0 Comments