from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
data = load_breast_cancer(as_frame=True)
df = data.frame
df_features = df.drop(labels=["target"], axis=1)
df_target = df.filter(items=["target"])
X_train, X_test, y_train, y_test = train_test_split(df_features, df_target["target"], shuffle=True, random_state=1)
classifier = GaussianNB()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_test_pred)
print("Accuracy Score: ", accuracy)
Here, we are first reading the dataset using the sklearn library. df is a DataFrame here. After reading the DataFrame, we are splitting the dataset into features and target. df_features contains all the features of the dataset and df_target contains the target variable of the “target” column.
df_features = df.drop(labels=["target"], axis=1) df_target = df.filter(items=["target"])
Now, we are splitting the dataset into training and test set. Please note that the shuffle=True parameter indicates that the dataset is shuffled before the split. And the random_state=1 parameter controls the random number generator that is used for shuffling.
X_train, X_test, y_train, y_test = train_test_split(df_features, df_target["target"], shuffle=True, random_state=1)
Now, we are initializing the classifier using the GaussianNB class. The fit() method learns from the dataset. And the predict() method predicts the target variable.
classifier = GaussianNB() classifier.fit(X_train, y_train) y_test_pred = classifier.predict(X_test)
Now, we can compare y_test_pred and y_test to measure the performance of the model. Please note that here we are using the accuracy score (What is an accuracy score in machine learning?).
accuracy = accuracy_score(y_test, y_test_pred)
The output of the above program will be like the following:
Accuracy Score: 0.9440559440559441








































0 Comments