What is ordinal encoding?
A categorical variable contains categorical data, such as name, gender, address, etc. There are different types of categorical variables. A nominal categorical variable contains categorical data that cannot be ranked over each other. For example, name, address, gender, etc. We cannot rank these data over each other.
But let’s say there is a categorical variable that contains categorical data that can be ranked over each other. For example, we can say the weather is cold, medium, or hot and rank the data over each other. We can also rank categorical variable that indicates the height of something, for example, low, medium, high, etc. This type of categorical variable is called an ordinal variable.
Now, machine learning algorithms work with numbers. They cannot understand categorical values. So, if a column in a dataset contains categorical values, we need to encode the categorical values and convert them into numbers.
When a column in a dataset contains ordinal values, we use ordinal encoding to encode the ordinal data.
How to perform ordinal encoding using sklearn?
We can use the OrdinalEncoder class from the sklearn.preprocessing module to perform ordinal encoding. For example, let’s read the “exercise” dataset. The dataset contains various information, such as diet, pulse, time of exercise, etc.
If we look at the diet column, we will see the diet column contains two different categorial values – no fat and low fat. Now, we can rank the categorical data in ascending or descending order. So, the data is ordinal and we can use an ordinal encoder here.
import seaborn from sklearn.preprocessing import OrdinalEncoder df = seaborn.load_dataset("exercise") print(df.head()) print(df.diet.value_counts())
As we can see 45 rows contain “no fat” and 45 rows contain “low fat” in the diet column…






0 Comments