Let’s say we are reading the Pima Indians Diabetes dataset. The dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
So, it is a classification problem. And we can use logistic regression, KNN classifier, decision tree classifier, Gaussian Naive Bayes classifier, Support Vector Machine classifier, etc., to solve the classification problem. But, before we finally select a classification algorithm, we need to compare the performance of these classification algorithms on the Pima Indians Diabetes dataset. In this article, we will discuss how to compare the performance of different machine learning algorithms before selecting an algorithm finally.
We can use the following Python code to compare the performance of different machine learning algorithms for a classification problem.
import pandas from sklearn.model_selection import KFold from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.model_selection import cross_val_score data = pandas.read_csv("diabetes.csv") D = data.values X = D[:, :-1] y = D[:, -1] k_fold = KFold(n_splits=10, shuffle=True, random_state=1) models = list() model1 = LogisticRegression(solver="liblinear") models.append(("Logistic Regression", model1)) model2 = KNeighborsClassifier(n_neighbors=5) models.append(("KNN Classifier", model2)) model3 = DecisionTreeClassifier() models.append(("Decision Tree Classifier", model3)) model4 = GaussianNB() models.append(("Gaussian Naive Bayes Classifier", model4)) model5 = SVC() models.append(("Support Vector Machine Classifier", model5)) results = list() for name, model in models: result = cross_val_score(model, X, y, cv=k_fold, scoring="accuracy") results.append((name, result)) for name, result in results: print(name, result.mean())
Here, we are first reading the Pima Indians Diabetes dataset and splitting the columns of the dataset into features and the …
0 Comments