number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
data = pandas.read_csv("diabetes.csv") D = data.values X = D[:, :-1] y = D[:, -1]
Then, we are splitting the columns of the dataset into features and the target variable. The last column of the dataset contains the target variable. So, X here contains all the features, while y contains the target variable.
kfold = KFold(n_splits=10, shuffle=True, random_state=1)
Now, we are initializing the k-fold cross-validation. The argument n_splits refers to the total number of splits. The argument shuffle=True indicates that data are shuffled before splitting. And random_state is used to initialize the pseudo-random number generator that is used to shuffle the data.
model = GradientBoostingClassifier(n_estimators=100, random_state=1)
Here, we are initializing the gradient boosting classifier using the GradientBoostingClassifier class. The argument n_estimators indicates the number of estimators used by the model. By default, decision trees are used with a maximum depth of 3. And the argument random_state is used to initialize the pseudo-random number generator that is used by the algorithm.
result = cross_val_score(model, X, y, scoring="accuracy", cv=kfold) print(result.mean())
Now, we are using the function cross_val_score() to evaluate the model. Please note that we are here using the accuracy score to evaluate the model (What is the accuracy score in machine learning?)
We will get an accuracy score for each iteration of the k-fold cross-validation. So, we are taking the average of all the accuracy scores and printing the result. The output of the above program will be like the following:
0.7603041695146958






0 Comments