In frequency encoding, each value in a categorical column is replaced with the total count or the frequency of the value. For example, let’s say a categorical column has 10 rows. The value “A” occurs 4 times, “B” occurs 3 times, “C” occurs 2 times and “D” occurs 1 time. So, we can replace A with 4, B with 3, C with 2, and D with 1. We can also divide these value counts with the total number of rows and replace A, B, C, and D with 0.4, 0.3, 0.2, and 0.1, respectively.
Let’s read the titanic dataset. Let’s encode the embark town of the dataset using frequency encoding. We can use the following Python code to replace each categorical value with the value counts of the value.
import seaborn df = seaborn.load_dataset("titanic") df.dropna(inplace=True) value_counts = df["embark_town"].value_counts().to_dict() print(value_counts) df["embark_town"] = df["embark_town"].map(value_counts) print(df.head())
Here, we are first getting the value counts of each categorical value of the embark town column and then, replacing the values with the value counts. We are using the pandas map function for this purpose.
The output of the above program will be:
{'Southampton': 115, 'Cherbourg': 65, 'Queenstown': 2} survived pclass sex age ... deck embark_town alive alone 1 1 1 female 38.0 ... C 65 yes False 3 1 1 female 35.0 ... C 115 yes False 6 0 1 male 54.0 ... E 115 no True 10 1 3 female 4.0 ... G 115 yes False 11 1 1 female 58.0 ... C 115 yes True [5 rows x 15 columns]
We can also divide these value counts by the total number of rows of the dataset and use the frequency to replace each value of …






0 Comments