dataset and predict whether the patient has diabetes based on these predictor variables.
data = pandas.read_csv("diabetes.csv")
D = data.values
X = D[:, :-1]
y = D[:, -1]
Here, we are splitting the columns of the dataset into features and the target variable. The last column of the dataset contains the target variable. So, X here contains all the features, and y contains the target variable.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, random_state=1) classifier = LogisticRegression(solver="liblinear") classifier.fit(X_train, y_train)
Now, we are splitting the dataset into training and test set. The size of the test set is 0.3. The argument shuffle specifies that we are shuffling the data before splitting. And the argument random_state is used to initialize the pseudo-random number generator that is used to shuffle the data.
Now, we are initializing the classifier. We are using logistic regression here. After that, we are training the module using the training set.
dump(classifier, "diabetes_classification_joblib.sav")
At this point, we can use the dump() function from the joblib Python module to save the model in a file.
Later, we can use the following Python code to load the model from the saved file.
from joblib import load
import pandas
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = pandas.read_csv("diabetes.csv")
D = data.values
X = D[:, :-1]
y = D[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, random_state=1)
model = load("diabetes_classification.sav")
y_pred = model.predict(X_test)
print("Accuracy: ", accuracy_score(y_test, y_pred))
score = model.score(X_test, y_test)
print("Score: ", score)
Here, we are using the load() function to load the model from the saved file. After that, we can use the predict() function to predict values. We can also use the score() function to evaluate the model.
Please note that by default, the score() function uses the accuracy score to evaluate a machine learning model for a classification problem (What is the accuracy score in machine learning?).
The given program will give the following output:
Accuracy: 0.7748917748917749 Score: 0.7748917748917749








































0 Comments