We can use the Extra Trees to get the feature importances of various features of a dataset. These feature importances are impurity-based feature importances. Please note that impurity-based feature importances are strongly biased. They favor numerical features over binary or categorical features with a small number of categories.
We can use the following Python code to get the feature importance of various features in the Pima Indians Diabetes dataset.
from sklearn.ensemble import ExtraTreesClassifier import pandas data = pandas.read_csv("diabetes.csv") D = data.values X = D[:, :-1] y = D[:, -1] classifier = ExtraTreesClassifier(n_estimators=100) classifier.fit(X, y) print(classifier.feature_importances_)
Here, firstly, we are reading the Pima Indians Diabetes dataset. The last column of the dataset specifies the target variable. So, after reading the dataset, we are splitting the columns of the dataset into features and the target…






0 Comments