number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
D = data.values X = D[:, :-1] y = D[:, -1]
After reading the dataset, we are first splitting the columns into features and the target variable. Please note that the last column of the dataset contains the target variable. So, X contains all the columns of the dataset except the last column. And y contains the target variable.
k_fold = KFold(n_splits=10, shuffle=True, random_state=1)
Now, we are using the Kfold() function to split the dataset into 10 folds. Please note that the shuffle=True parameter indicates that the data is shuffled before splitting. And the random_state argument is used to initialize the pseudo random number generator that is used for shuffling.
classifier = LogisticRegression(solver="liblinear")
Now, we initialize our classifier using the LogisticRegression class. Please note that LogisticRegression() by default, uses libfgs or Limited-memory Broyden–Fletcher–Goldfarb–Shanno. This solver may be good for smaller datasets. On larger datasets, libfgs may fail to converge. So, we are here using liblinear solver.
results = cross_val_score(classifier, X, y, cv=k_fold, scoring="accuracy")
Now, we are using the cross_val_score() function to evaluate the performance of the machine learning model. Please note that we are using the accuracy score here (What is the accuracy score in machine learning?)
mean_score = results.mean() print("Accuracy: ", mean_score)
Now, we can take the average accuracy score of all the iterations of the k-fold cross-validation and print the average accuracy score. The output of the above program will be:
Accuracy: 0.7681818181818182






0 Comments