What is standardization in machine learning?
Standardization is a feature scaling technique in machine learning. Let’s say a dataset includes two columns – one for salary and one for age. Now, the salary column will store numbers in thousands and the age column will be a two-digit number. So, if we run a linear model, the bigger ranges of the salary column may dominate over the smaller range of the age column. To address the problem feature scaling is used.
When we standardize the data of a column in a dataset, the mean of the data becomes 0 and the standard deviation becomes 1. In other words, we center the data at zero and the standard deviation becomes 1.
How to standardize data in a column of a dataset?
Let’s say x is a number in the column of the dataset. The mean of the data in the column is μ and the standard deviation is σ. So, after standardization, x will become x1, where
In other words, we take each number from a column and subtract the mean of the column from the number. And then we divide the result by the standard deviation of the numbers of the column. After standardization, the new data will have a mean 0 and a standard deviation of 1.
How to standardize data using sklearn?
Let’s read the titanic dataset. Let’s say we want to standardize the age column of the dataset. We can use the following Python code for that purpose:






0 Comments