In one of our previous articles, we discussed k-fold cross-validation. Stratified k-fold cross-validation is a variation of k-fold cross-validation, in which stratified folds are returned. In other words, each set contains approximately the same ratio of the target variable as the complete dataset.
We can use the following Python code for implementing the stratified k-fold cross-validation.
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
import pandas
data = pandas.read_csv("diabetes.csv")
D = data.values
X = D[:, :-1]
y = D[:, -1]
sk_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=1)
classifier = LogisticRegression(solver="liblinear")
results = cross_val_score(classifier, X, y, cv=sk_fold, scoring="accuracy")
mean_score = results.mean()
print("Accuracy: ", mean_score)
Hree, we are first using pandas to read the Pima Indians Diabetes dataset. The dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
D = data.values X = D[:, :-1] y = D[:, -1]
Now, we are splitting the columns of the dataset into features and the target variable. Please note that the last column …








































0 Comments