threshold value is 0.8.
After that, we are creating a DataFrame that contains only the selected features from the original DataFrame.
The output of the above program will be:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 344 entries, 0 to 343 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 species 344 non-null object 1 island 344 non-null object 2 bill_length_mm 342 non-null float64 3 bill_depth_mm 342 non-null float64 4 flipper_length_mm 342 non-null float64 5 body_mass_g 342 non-null float64 6 sex 333 non-null object dtypes: float64(4), object(3) memory usage: 18.9+ KB None <class 'pandas.core.frame.DataFrame'> Selected Features: {'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm'} bill_length_mm bill_depth_mm flipper_length_mm 0 39.1 18.7 181.0 1 39.5 17.4 186.0 2 40.3 18.0 195.0 3 NaN NaN NaN 4 36.7 19.3 193.0
As we can see bill length and body mass are strongly correlated. So, we selected only one of the features.






0 Comments