dataset contains the target variable. So, X here contains the features and y contains the target variable.
shuffle_cv = ShuffleSplit(n_splits=10, test_size=0.3, random_state=1)
Now, we are initializing the shuffle split using the ShuffleSplit class. Here, the total number of splits is 10. So, there will be 10 folds. Each fold will contain 0.3 parts as the test set, and the rest will be the train set. And the random_state here controls the randomness of the training and testing indices thus produced.
classifier = LogisticRegression(solver="liblinear")
Now, we are initializing the classifier using the LogisticRegression class. Please note that LogisticRegression() by default, uses libfgs or Limited-memory Broyden–Fletcher–Goldfarb–Shanno. This solver may be good for smaller datasets. On larger datasets, libfgs may fail to converge. So, we are here using liblinear solver.
results = cross_val_score(classifier, X, y, cv=shuffle_cv, scoring="accuracy")
mean_score = results.mean()
print("Accuracy: ", mean_score)
Now, we are using the cross_val_score() function to calculate the accuracy score (What is the accuracy score in machine learning?) As we know, there will be one accuracy score for each iteration. So, we are taking the average of all the accuracy scores and printing the mean accuracy score. The output of the above program will be like the following:
Accuracy: 0.7571428571428571








































0 Comments