threshold value is 0.8.
After that, we are creating a DataFrame that contains only the selected features from the original DataFrame.
The output of the above program will be:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 species 344 non-null object
1 island 344 non-null object
2 bill_length_mm 342 non-null float64
3 bill_depth_mm 342 non-null float64
4 flipper_length_mm 342 non-null float64
5 body_mass_g 342 non-null float64
6 sex 333 non-null object
dtypes: float64(4), object(3)
memory usage: 18.9+ KB
None
<class 'pandas.core.frame.DataFrame'>
Selected Features:
{'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm'}
bill_length_mm bill_depth_mm flipper_length_mm
0 39.1 18.7 181.0
1 39.5 17.4 186.0
2 40.3 18.0 195.0
3 NaN NaN NaN
4 36.7 19.3 193.0
As we can see bill length and body mass are strongly correlated. So, we selected only one of the features.








































0 Comments