whether they survived, etc. Let’s try to perform One Hot Encoding on the embark town column.
Let’s first read the dataset, fill in the missing values in the embark town column with the most frequent value of the column, and then, print the number of passengers from each embark town.
import seaborn
from sklearn.preprocessing import OneHotEncoder
df = seaborn.load_dataset("titanic")
print("Value counts before filling missing values: \n", df.embark_town.value_counts())
df["embark_town"].fillna(value="Southampton", inplace=True)
print("Value counts after filling missing values: \n", df.embark_town.value_counts())
Here, we are filling in the missing values in the embark town column with the value “Southampton”, which is the most frequent value of the column. The value_counts() function before and after filling in the missing values will look like the following:
Value counts before filling in missing values: Southampton 644 Cherbourg 168 Queenstown 77 Name: embark_town, dtype: int64 Value counts after filling missing values: Southampton 646 Cherbourg 168 Queenstown 77 Name: embark_town, dtype: int64
Now, let’s do the One-Hot Encoding.
import seaborn
from sklearn.preprocessing import OneHotEncoder
df = seaborn.load_dataset("titanic")
print("Value counts before filling missing values: \n", df.embark_town.value_counts())
df["embark_town"].fillna(value="Southampton", inplace=True)
print("Value counts after filling missing values: \n", df.embark_town.value_counts())
one_hot_encoder = OneHotEncoder()
transformed_df = one_hot_encoder.fit_transform(df[["embark_town"]])
print(type(transformed_df))
print(transformed_df.toarray())
Here, we are using the OneHotEncode class from the sklearn.preprocessing module to perform the One Hot Encoding. The …








































0 Comments