import seaborn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
# print(seaborn.get_dataset_names())
df = seaborn.load_dataset("iris")
print(df.head())
print(df.info())
print(df['species'].unique())
df.drop(df.loc[df['species'] == 'virginica'].index, inplace=True)
print(df['species'].unique())
X_train, X_test, y_train, y_test = train_test_split(df[["sepal_length", "sepal_width", "petal_length", "petal_width"]], df["species"], train_size=0.8, shuffle=True, random_state=1)
logistic_regressor = LogisticRegression()
logistic_regressor.fit(X_train, y_train)
y_test_predicted = logistic_regressor.predict(X_test)
acc_score = accuracy_score(y_test, y_test_predicted)
conf_matrix = confusion_matrix(y_test, y_test_predicted)
print(acc_score)
print(conf_matrix)
Here, we are using the modified “iris” dataset. As per the original iris dataset, there are three classifications. We are removing one classification (the classification named “virginica”) completely and using the remaining dataset.
So, firstly, we are loading the “iris” dataset using the seaborn Python library. After that, we print the first few lines and information about the dataset. The output will be like the following:
sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa <class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 sepal_length 150 non-null float64 1 sepal_width 150 non-null float64 2 petal_length 150 non-null float64 3 petal_width 150 non-null float64 4 species 150 non-null object dtypes: float64(4), object(1) memory usage: 6.0+ KB
Now, we are removing the rows that contain the column value “virginica” as species…








































0 Comments