from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import classification_report import pandas data = pandas.read_csv("diabetes.csv") D = data.values X = D[:, :-1] y = D[:, -1] X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, random_state=1) classifier = LogisticRegression(solver="liblinear") classifier.fit(X_train, y_train) y_pred = classifier.predict(X_test) report = classification_report(y_test, y_pred) print(report)
Here, we are first using the pandas Python library to read the Pima Indians Diabetes dataset. The dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
D = data.values X = D[:, :-1] y = D[:, -1]
Now, we are splitting the columns of the dataset into features and the target variable. X here contains all the features, and y contains the target variable.
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, random_state=1)
Now, we are splitting the dataset into training and test set. The shuffle=True argument specifies that we are shuffling the data before splitting. And the random_state argument is used to initialize the pseudo-random number generator that is used to shuffle the data.
classifier = LogisticRegression(solver="liblinear") classifier.fit(X_train, y_train) y_pred = classifier.predict(X_test)
We are here using logistic regression for this classification problem. Please note that LogisticRegression(), by default, uses libfgs or Limited-memory Broyden–Fletcher–Goldfarb–Shanno. This solver may be good for smaller datasets. On larger datasets, …






0 Comments