In one of our previous articles, we discussed k-fold cross-validation. Stratified k-fold cross-validation is a variation of k-fold cross-validation, in which stratified folds are returned. In other words, each set contains approximately the same ratio of the target variable as the complete dataset.
We can use the following Python code for implementing the stratified k-fold cross-validation.
from sklearn.model_selection import StratifiedKFold from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score import pandas data = pandas.read_csv("diabetes.csv") D = data.values X = D[:, :-1] y = D[:, -1] sk_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=1) classifier = LogisticRegression(solver="liblinear") results = cross_val_score(classifier, X, y, cv=sk_fold, scoring="accuracy") mean_score = results.mean() print("Accuracy: ", mean_score)
Hree, we are first using pandas to read the Pima Indians Diabetes dataset. The dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
D = data.values X = D[:, :-1] y = D[:, -1]
Now, we are splitting the columns of the dataset into features and the target variable. Please note that the last column …
0 Comments