Let’s say a dataset has n number of features. Now, we may not need all the features to run a machine learning model. If we have more than the necessary features, then that can lead to inefficient inferences or increase the training time of the machine learning model. It may lead to overfitting also. So, we need to select the best features from all the available features for our machine learning model.
Now, we can use various statistical tests to select k best features for a machine learning model. And we can use the SelectKBest() function in sklearn for that purpose.
We can use the following Python code to select K best features from all the available features for a machine learning model.
from sklearn.feature_selection import SelectKBest, f_classif import pandas data = pandas.read_csv("diabetes.csv") features = data.drop(axis=1, labels="Outcome") labels = data.filter(items=["Outcome"], axis=1) selected_features = SelectKBest(score_func=f_classif, k=5).fit_transform(features, labels["Outcome"]) print(selected_features)
Here, we are reading the Pima Indians Diabetes dataset using the pandas Python library. The dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can …






0 Comments