Let’s say we want to solve a classification problem. Now, we can use logistic regression, KNN classifier, decision tree classifier, Gaussian Naive Bayes classifier, Support Vector Machine classifier, etc., to solve the classification problem. We can also create a voting ensemble in which predictions from different models can be combined, and we will select the class that gets the maximum voting by the models.
For example, let’s say we are reading the Pima Indians Diabetes dataset. The dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
We can use the following Python code to create a voting ensemble. We will use five different models. Each model will solve the classification problem and predict the outcome. After that, we will use the VotingClassifier class to predict the class that gets the maximum voting by the five different models.
from sklearn.model_selection import KFold from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.ensemble import VotingClassifier import pandas data = pandas.read_csv("diabetes.csv") D = data.values X = D[:, :-1] y = D[:, -1] kfold = KFold(n_splits=10, shuffle=True, random_state=1) models = list() model1 = LogisticRegression(solver="liblinear") models.append(("Logistic Regression", model1)) model2 = KNeighborsClassifier(n_neighbors=5) models.append(("KNN Classifier", model2)) model3 = DecisionTreeClassifier() models.append(("Decision Tree Classifier", model3)) model4 = GaussianNB() models.append(("Gaussian Naive Bayes Classifier", model4)) model5 = SVC() models.append(("Support Vector Machine Classifier", model5)) classifier = VotingClassifier(estimators=models) result = cross_val_score(classifier, X, y, scoring="accuracy", cv=kfold) print(result.mean())
Here, we are first using the pandas library to read the Pima Indians Diabetes dataset. After that, we are splitting the columns of the dataset into features and the target variable. The last column of the dataset contains the target variable. So, X here …
0 Comments