In the vector unit length scaling, a feature vector is divided by the Manhattan distance (l1 norm) or by the Euclidean distance (l2 norm). We can use the Normalizer class from the sklearn.preprocessing module to perform vector unit length scaling.
Let’s look at an example. Let’s read the titanic dataset. The age column of the dataset contains the age of the passengers. We can use the following Python code to perform vector unit length scaling.
import seaborn from sklearn.preprocessing import Normalizer df = seaborn.load_dataset("titanic") df.dropna(inplace=True) normalizer = Normalizer(norm="l1") df[["age", "fare", "pclass"]] = normalizer.fit_transform(df[["age", "fare", "pclass"]]) print(df.head())
The norm=”l1” parameter in the Normalizer() constructor indicates that l1 norm is being used to normalize each non-zero sample. Please note that we need to remove or fill in the missing values before performing vector unit length scaling. We are here using the dropna() method to drop all the rows that contain missing values in any column for simplicity.
The output of the above program will be:
survived pclass sex age ... deck embark_town alive alone 1 1 0.009068 female 0.344567 ... C Cherbourg yes False 3 1 0.011223 female 0.392817 ... C Southampton yes False 6 0 0.009358 male 0.505322 ... E Southampton no True 10 1 0.126582 female 0.168776 ... G Southampton yes False 11 1 0.011689 female 0.677966 ... C Southampton yes True [5 rows x 15 columns]
We can use the following Python code to use the l2 norm to normalize the data…






0 Comments