learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
features = data.drop(axis=1, labels="Outcome") labels = data.filter(items=["Outcome"], axis=1)
The last column of the dataset contains the outcome, which is the target variable. So, we are dropping the last column to get the features. And, then we are filtering the rest of the columns to get the output labels.
selected_features = SelectKBest(score_func=f_classif, k=5).fit_transform(features, labels["Outcome"]) print(selected_features)
Now, we are using the SelectKBest() function to select k=5 best features from the dataset. Here, we are using the ANOVA F-test to select the k best features from all the available features. We can also use the chi-square test, the F-test for regression tasks, etc. to select k best features that are most related to the target variable.
The output of the above program will be:
[[ 6. 148. 33.6 0.627 50. ] [ 1. 85. 26.6 0.351 31. ] [ 8. 183. 23.3 0.672 32. ] ... [ 5. 121. 26.2 0.245 30. ] [ 1. 126. 30.1 0.349 47. ] [ 1. 93. 30.4 0.315 23. ]]








































0 Comments