contains all the features, and y contains the target variable.
data = pandas.read_csv("diabetes.csv")
D = data.values
X = D[:, :-1]
y = D[:, -1]
Now, we are initializing the k-fold cross-validation.
kfold = KFold(n_splits=10, shuffle=True, random_state=1)
Please note that n_splits refers to the number of splits, and shuffle indicates whether we want to shuffle the data before splitting. And the random_state argument initializes the pseudo-random number generator that is used to shuffle the data.
models = list()
model1 = LogisticRegression(solver="liblinear")
models.append(("Logistic Regression", model1))
model2 = KNeighborsClassifier(n_neighbors=5)
models.append(("KNN Classifier", model2))
model3 = DecisionTreeClassifier()
models.append(("Decision Tree Classifier", model3))
model4 = GaussianNB()
models.append(("Gaussian Naive Bayes Classifier", model4))
model5 = SVC()
models.append(("Support Vector Machine Classifier", model5))
Here, we are creating a list of tuples. Each tuple consists of the name of the model and the model. Then, we are appending each tuple to the list.
classifier = VotingClassifier(estimators=models)
Now, we are creating a voting ensemble using the VotingClassifier class. The argument estimators refers to the models we just created.
result = cross_val_score(classifier, X, y, scoring="accuracy", cv=kfold) print(result.mean())
Now, we are using the cross_val_score() function to evaluate our ensemble model. Please note that we are using the accuracy score to evaluate the ensemble model (What is the accuracy score in machine learning?). We will get one accuracy score for each iteration of the k-fold cross-validation. So, we are taking the average of all the accuracy scores and printing the average accuracy score.
The above program will give the following output:
0.7603554340396446








































0 Comments