print(df['species'].unique()) df.drop(df.loc[df['species'] == 'virginica'].index, inplace=True) print(df['species'].unique())
The output will be like the following:
['setosa' 'versicolor' 'virginica'] ['setosa' 'versicolor']
Now, we split the dataset into training and test set.
X_train, X_test, y_train, y_test = train_test_split(df[["sepal_length", "sepal_width", "petal_length", "petal_width"]], df["species"], train_size=0.8, shuffle=True, random_state=1)
Please note that the shuffle=True parameter indicates that we are shuffling the dataset while splitting. And the random_state=1 parameter controls the random number generator that is used for shuffling the data. We are keeping 80% of the data as training set, and the rest are test set.
Now, we execute the following Python statements:
logistic_regressor = LogisticRegression() logistic_regressor.fit(X_train, y_train)
We first initialize the logistic regression model and then use the training dataset to train the model.
Now, we can use the trained model on the test set and record the results. The predicted output can then be compared with the actual data and the performance of the model can be measured.
acc_score = accuracy_score(y_test, y_test_predicted) conf_matrix = confusion_matrix(y_test, y_test_predicted) print(acc_score) print(conf_matrix)
Here, we are calculating the accuracy score and confusion matrix of the output of the model. Please note that if TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative, then the accuracy score can be written as:
And the confusion matrix can be represented as: …








































0 Comments