['Cherbourg' 'Queenstown' 'Southampton']
So, we are here adding three columns named “Cherbourg”, “Queenstown”, and “Southampton” in the dataset. So, the output of the above program will be:
<class 'scipy.sparse._csr.csr_matrix'> [[0. 0. 1.] [1. 0. 0.] [0. 0. 1.] ... [0. 0. 1.] [1. 0. 0.] [0. 1. 0.]] ['Cherbourg' 'Queenstown' 'Southampton'] survived pclass sex age ... alone Cherbourg Queenstown Southampton 0 0 3 male 22.0 ... False 0.0 0.0 1.0 1 1 1 female 38.0 ... False 1.0 0.0 0.0 2 1 3 female 26.0 ... True 0.0 0.0 1.0 3 1 1 female 35.0 ... False 0.0 0.0 1.0 4 0 3 male 35.0 ... True 0.0 0.0 1.0 [5 rows x 18 columns]
As we can clearly see from the output of the print(df.head()) Python statement, the three previously mentioned columns have been added to the original dataset.
Please note that if we want to drop the first column from the transformed dataset, we need to initialize the One Hot Encoder in the following way:
one_hot_encoder = OneHotEncoder(drop=”first”)
The complete Python code, in that case, is given below:
import seaborn
from sklearn.preprocessing import OneHotEncoder
import numpy
df = seaborn.load_dataset("titanic")
df["embark_town"].fillna(value="Southampton", inplace=True)
one_hot_encoder = OneHotEncoder(drop="first")
transformed = one_hot_encoder.fit_transform(df[["embark_town"]])
print(one_hot_encoder.categories_[0])
df[numpy.array([one_hot_encoder.categories_[0][1], one_hot_encoder.categories_[0][2]])] = transformed.toarray()
print(df.head())
Please note that we are adding only two columns in the dataset. The names of the columns are given by …








































0 Comments