What is equal-width discretization?
Let’s say a column in a dataset contains continuous numerical values, such as age, weight, price, etc. Now, we want to convert the continuous numerical values into discrete intervals. This process of converting continuous numerical values into discrete intervals is known as discretization.
There are different methods for discretization. In this article, we will discuss equal-width discretization.
In equal-width discretization, we convert continuous numerical values into intervals that have equal width. The intervals are also known as bins.
How to perform equal-width discretization using Python pandas?
We can use the pandas.cut() function to discretize a numerical variable into equal-sized buckets. For example, let’s read the diamonds dataset and discretize the numerical values in the price column of the dataset. We can use the following Python code for that purpose:
import seaborn import pandas df = seaborn.load_dataset("diamonds") df["price"] = pandas.cut(x=df["price"], bins=3, labels=["low", "medium", "high"]) print(df.price.value_counts())
Here, the x parameter in the cut() function indicates that the price column is being discretized. The bins=3 parameter indicates that the data is discretized into 3 equal-sized bins. The labels paremeter indicates the labels for the intervals.
And the value_counts() function counts the number of values in each interval. The output of the above program will be:
low 43591 medium 7347 high 3002 Name: price, dtype: int64






0 Comments