import seaborn
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.preprocessing import LabelEncoder
df = seaborn.load_dataset("penguins")
print(df.head())
print(df.isnull().sum())
df.drop(labels=["island", "sex"], axis=1, inplace=True)
df = df[["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "species"]]
df.bill_length_mm.fillna(value=df["bill_length_mm"].median(), inplace=True)
df.bill_depth_mm.fillna(value=df["bill_depth_mm"].median(), inplace=True)
df.flipper_length_mm.fillna(value=df["flipper_length_mm"].median(), inplace=True)
df.body_mass_g.fillna(value=df["body_mass_g"].median(), inplace=True)
print(df.isnull().sum())
label_encoder = LabelEncoder()
df["species"] = label_encoder.fit_transform(df["species"])
print(df.head())
linear_regressor = LinearRegression()
rfe = RFE(estimator=linear_regressor, n_features_to_select=3, step=1)
rfe.fit(df[["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]], df["species"])
selected_features = rfe.get_support(indices=True)
print(“Selected Features: “, selected_features)
df2 = df[df.columns[selected_features]]
df2["species"] = df["species"]
print(df2.head())
Please note that after the label encoding, we are using the RFE class from the sklearn.feature_selection module. We are passing a linear regressor in the RFE() constructor. This linear regressor will be used to determine the predictive power of each feature. The n_features_to_select parameter indicates the number of features to select. And the step=1 parameter indicates that we will eliminate one feature at each step.
The following Python statement gives us the indices of the selected features.
selected_features = rfe.get_support(indices=True)
We are then creating another DataFrame with the selected features and the column with the output labels.
df2 = df[df.columns[selected_features]] df2["species"] = df["species"]
The output of the above program will be: …








































0 Comments