species island bill_length_mm ... flipper_length_mm body_mass_g sex 0 Adelie Torgersen 39.1 ... 181.0 3750.0 Male 1 Adelie Torgersen 39.5 ... 186.0 3800.0 Female 2 Adelie Torgersen 40.3 ... 195.0 3250.0 Female 3 Adelie Torgersen NaN ... NaN NaN NaN 4 Adelie Torgersen 36.7 ... 193.0 3450.0 Female [5 rows x 7 columns] species 0 island 0 bill_length_mm 2 bill_depth_mm 2 flipper_length_mm 2 body_mass_g 2 sex 11 dtype: int64 bill_length_mm 0 bill_depth_mm 0 flipper_length_mm 0 body_mass_g 0 species 0 dtype: int64
As we can see, after the median imputation, there are no missing values in any column. Now, a machine learning model works with numbers. So, we need to label encode the species column.
import seaborn
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.preprocessing import LabelEncoder
df = seaborn.load_dataset("penguins")
print(df.head())
print(df.isnull().sum())
df.drop(labels=["island", "sex"], axis=1, inplace=True)
df = df[["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "species"]]
df.bill_length_mm.fillna(value=df["bill_length_mm"].median(), inplace=True)
df.bill_depth_mm.fillna(value=df["bill_depth_mm"].median(), inplace=True)
df.flipper_length_mm.fillna(value=df["flipper_length_mm"].median(), inplace=True)
df.body_mass_g.fillna(value=df["body_mass_g"].median(), inplace=True)
print(df.isnull().sum())
label_encoder = LabelEncoder()
df["species"] = label_encoder.fit_transform(df["species"])
print(df.head())
Now, we will select relevant features based on Recursive Feature Elimination (RFE). We can use the following Python code for that purpose: …








































0 Comments