In the repeated random train-test split or shuffle split, the dataset is split into a certain number of folds. Each fold is divided into train and test sets. The machine learning model then uses the train test to learn from the dataset and uses the test set to evaluate the model.
We can use the following Python code to implement the repeated random train-test split or shuffle split.
from sklearn.model_selection import ShuffleSplit from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score import pandas data = pandas.read_csv("diabetes.csv") D = data.values X = D[:, :-1] y = D[:, -1] shuffle_cv = ShuffleSplit(n_splits=10, test_size=0.3, random_state=1) classifier = LogisticRegression(solver="liblinear") results = cross_val_score(classifier, X, y, cv=shuffle_cv, scoring="accuracy") mean_score = results.mean() print("Accuracy: ", mean_score)
Here, we are first using the pandas Python library to read the Pima Indians Diabetes dataset. The dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
D = data.values X = D[:, :-1] y = D[:, -1]
Now, we are splitting the columns of the dataset into features and the target variable. Please note that the last column of the …






0 Comments