The get_support() function here returns the indices of the selected features. We are using the following Python statements to create another DataFrame that contains only the selected features and the output label.
df2 = df[df.columns[selected_features]] df2["MedHouseVal"] = df["MedHouseVal"] print(df2.head())
The output of the above program will be:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 20640 entries, 0 to 20639 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MedInc 20640 non-null float64 1 HouseAge 20640 non-null float64 2 AveRooms 20640 non-null float64 3 AveBedrms 20640 non-null float64 4 Population 20640 non-null float64 5 AveOccup 20640 non-null float64 6 Latitude 20640 non-null float64 7 Longitude 20640 non-null float64 8 MedHouseVal 20640 non-null float64 dtypes: float64(9) memory usage: 1.4 MB None MedInc HouseAge AveRooms ... Latitude Longitude MedHouseVal 0 8.3252 41.0 6.984127 ... 37.88 -122.23 4.526 1 8.3014 21.0 6.238137 ... 37.86 -122.22 3.585 2 7.2574 52.0 8.288136 ... 37.85 -122.24 3.521 3 5.6431 52.0 5.817352 ... 37.85 -122.25 3.413 4 3.8462 52.0 6.281853 ... 37.85 -122.25 3.422 [5 rows x 9 columns] MedInc HouseAge AveRooms ... AveOccup Latitude Longitude 0 8.3252 41.0 6.984127 ... 2.555556 37.88 -122.23 1 8.3014 21.0 6.238137 ... 2.109842 37.86 -122.22 2 7.2574 52.0 8.288136 ... 2.802260 37.85 -122.24 3 5.6431 52.0 5.817352 ... 2.547945 37.85 -122.25 4 3.8462 52.0 6.281853 ... 2.181467 37.85 -122.25 [5 rows x 8 columns] MedHouseVal 0 4.526 1 3.585 2 3.521 3 3.413 4 3.422 Selected Features: [0 5] MedInc AveOccup MedHouseVal 0 8.3252 2.555556 4.526 1 8.3014 2.109842 3.585 2 7.2574 2.802260 3.521 3 5.6431 2.547945 3.413 4 3.8462 2.181467 3.422








































0 Comments