Repeated Stratified K-Fold Cross-Validation using sklearn in Python

by | Apr 11, 2023 | AI, Machine Learning and Deep Learning, Featured, Machine Learning Using Python, Python Scikit-learn

What is repeated stratified k-fold cross-validation?

In our previous articles, we discussed k-fold cross-validation, repeated k-fold cross-validation, and stratified k-fold cross-validation. We discussed that in stratified k-fold cross-validation, stratified folds are returned. In other words, the dataset is split into k folds in such a way that each set contains approximately the same ratio of the target variable as the complete dataset.

In repeated stratified k-fold cross-validation, the stratified k-fold cross-validation is repeated a specific number of times. Each repetition uses different randomization. As a result, we get different results for each repetition. We can then take the average of all the results. As each repetition uses different randomization, the repeated stratified k-fold cross-validation can estimate the performance of a model in a better way.

Repeated Stratified K-Fold Cross-Validation using sklearn in Python

We can use the following Python code to implement repeated stratified k-fold cross-validation.

import pandas
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression


dataset = pandas.read_csv("diabetes.csv")
D = dataset.values
X = D[: :-1]
y = D[:, -1]

model = LogisticRegression(solver="liblinear")
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=5, random_state=1)
scores = cross_val_score(model, X, y, cv=cv, scoring="accuracy")

print("Accuracy: ", scores.mean())

Here, we are first reading the Pima Indians Diabetes dataset using the pandas Python library. The Pima Indians Diabetes dataset contains information such as plasma glucose concentration, blood pressure, serum insulin, etc. Based on these features a machine learning model can predict whether a patient has diabetes.

dataset = pandas.read_csv("diabetes.csv")
D = dataset.values
X = D[: :-1]
y = D[:, -1]

The last column of the dataset contains the target variable. So, X contains all the features and y contains the target variable.

model = LogisticRegression(solver="liblinear")
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=5, random_state=1)
scores = cross_val_score(model, X, y, cv=cv, scoring="accuracy")

Now, we initialize the model. We are using logistic regression to solve this problem. Then, we initialize repeated stratified k-fold cross-validation. Here, n_splits refers the number of splits. n_repeats specifies the number of repetitions of the repeated stratified k-fold cross-validation. And, the random_state argument is used to initialize the pseudo-random number generator that is used for randomization.

Now, we use the cross_val_score() function to estimate the performance of the model. We are using an accuracy score here (What is the accuracy score in machine learning?) We get an accuracy score for each repetition of the repeated stratified k-fold cross-validation. We are printing the average accuracy score.

The output of the given program will be:

Accuracy:  0.6466302118933698
Facebooktwitterredditpinterestlinkedinmail

Calculate the pseudoinverse of a matrix using Python

What is the pseudoinverse of a matrix? We know that if A is a square matrix with full rank, then A-1 is said to be the inverse of A if the following condition holds: $latex AA^{-1}=A^{-1}A=I $ The pseudoinverse or the Moore-Penrose inverse of a matrix is a...

Cholesky decomposition using Python

What is Cholesky decomposition? A square matrix A is said to have Cholesky decomposition if it can be written as a product of a lower triangular matrix and its conjugate transpose. $latex A=LL^{*} $ If all the entries of A are real numbers, then the conjugate...

Tensor Hadamard Product using Python

In one of our previous articles, we already discussed what the Hadamard product in linear algebra is. We discussed that if A and B are two matrices of size mxn, then the Hadamard product of A and B is another mxn matrix C such that: $latex H_{i,j}=A_{i,j} \times...

Perform tensor addition and subtraction using Python

We can use numpy nd-array to create a tensor in Python. We can use the following Python code to perform tensor addition and subtraction. import numpy A = numpy.random.randint(low=1, high=10, size=(3, 3, 3)) B = numpy.random.randint(low=1, high=10, size=(3, 3, 3)) C =...

How to create a tensor using Python?

What is a tensor? A tensor is a generalization of vectors and matrices. It is easily understood as a multidimensional array. For example, in machine learning, we can organize data in an m-way array and refer it as a data tensor. Data related to images, sounds, movies,...

How to combine NumPy arrays using horizontal stack?

We can use the hstack() function from the numpy module to combine two or more NumPy arrays horizontally. For example, we can use the following Python code to combine three NumPy arrays horizontally. import numpy A = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) B =...

How to combine NumPy arrays using vertical stack?

Let’s say we have two or more NumPy arrays. We can combine these NumPy arrays vertically using the vstack() function from the numpy module. For example, we can use the following Python code to combine three NumPy arrays vertically. import numpy A = numpy.array([[1, 2,...

Singular Value Decomposition (SVD) using Python

What is Singular Value Decomposition (SVD)? Let A be an mxn rectangular matrix. Using Singular Value Decomposition (SVD), we can decompose the matrix A in the following way: $latex A_{m \times n}=U_{m \times m}S_{m \times n}V_{n \times n}^T $ Here, U is an mxm matrix....

Eigen decomposition of a square matrix using Python

Let A be a square matrix. Let’s say A has k eigenvalues λ1, λ2, ... λk. And the corresponding eigenvectors are X1, X2, ... Xk. $latex X_1=\begin{bmatrix} x_{11} \\ x_{21} \\ x_{31} \\ ... \\ x_{k1} \end{bmatrix} \\ X_2=\begin{bmatrix} x_{12} \\ x_{22} \\ x_{32} \\ ......

How to calculate eigenvalues and eigenvectors using Python?

In our previous article, we discussed what eigen values and eigenvectors of a square matrix are and how we can calculate the eigenvalues and eigenvectors of a square matrix mathematically. We discussed that if A is a square matrix, then $latex (A- \lambda I) \vec{u}=0...

Amrita Mitra

Author

Ms. Amrita Mitra is an author, who has authored the books “Cryptography And Public Key Infrastructure“, “Web Application Vulnerabilities And Prevention“, “A Guide To Cyber Security” and “Phishing: Detection, Analysis And Prevention“. She is also the founder of Asigosec Technologies, the company that owns The Security Buddy.

0 Comments

Submit a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Not a premium member yet?

Please follow the link below to buy The Security Buddy Premium Membership.

Featured Posts

Translate »