fit_transform() function learns from the data, does the encoding, and then, transforms the data.
Please note that the fit_transform() method returns <class ‘scipy.sparse._csr.csr_matrix’>. We can transform this matrix into an array and print it.
So, the output of the above program will be:
Value counts before filling missing values: Southampton 644 Cherbourg 168 Queenstown 77 Name: embark_town, dtype: int64 Value counts after filling missing values: Southampton 646 Cherbourg 168 Queenstown 77 Name: embark_town, dtype: int64 <class 'scipy.sparse._csr.csr_matrix'> [[0. 0. 1.] [1. 0. 0.] [0. 0. 1.] ... [0. 0. 1.] [1. 0. 0.] [0. 1. 0.]]
Now, letโs say we want to add these values as three different columns in the dataset. We can do so in the following way:
import seaborn
from sklearn.preprocessing import OneHotEncoder
df = seaborn.load_dataset("titanic")
df["embark_town"].fillna(value="Southampton", inplace=True)
one_hot_encoder = OneHotEncoder()
transformed_df = one_hot_encoder.fit_transform(df[["embark_town"]])
print(type(transformed_df))
print(transformed_df.toarray())
print(one_hot_encoder.categories_[0])
df[one_hot_encoder.categories_[0]] = transformed_df.toarray()
print(df.head())
Now, print(one_hot_encoder.categories_[0]) will print the following: …








































0 Comments