label_encoder = LabelEncoder() df["species"] = label_encoder.fit_transform(df["species"])
Now, we are splitting the dataset into features and target. df_features contain all four features from the df DataFrame. And df_target contains the species column of the dataset.
df_features = df.drop(labels=["species"], axis=1) df_target = df.filter(items=["species"])
Now, we are splitting the dataset into train and test set. Please note that the shuffle=True parameter indicates that the dataset is shuffled before the split. And the random_state=1 parameter controls the random number generator that is used for shuffling.
X_train, X_test, y_train, y_test = train_test_split(df_features, df_target["species"], shuffle=True, random_state=1)
Now, we are initializing the classifier using the KNeighborsClassifier class. The n_neighbors parameter indicates k or the number of nearest neighbors that is being considered by the algorithm. Please note that a low value of k may introduce noise and a high value of k may make the boundaries between the classes less distinct.
After that, we are using the fit() and predict() methods to learn from the data and predict the target variable.
knn_classifier = KNeighborsClassifier(n_neighbors=5) knn_classifier.fit(X_train, y_train) y_test_pred = knn_classifier.predict(X_test)
Now, we can compare y_test_pred and y_test to measure the performance of the model. Please note that here we are measuring the accuracy score (What is accuracy score in machine learning?).
accuracy = accuracy_score(y_test, y_test_pred)
The output of the above program will be:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 sepal_length 150 non-null float64 1 sepal_width 150 non-null float64 2 petal_length 150 non-null float64 3 petal_width 150 non-null float64 4 species 150 non-null object dtypes: float64(4), object(1) memory usage: 6.0+ KB None sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 0 1 4.9 3.0 1.4 0.2 0 2 4.7 3.2 1.3 0.2 0 3 4.6 3.1 1.5 0.2 0 4 5.0 3.6 1.4 0.2 0 Accuracy Score: 1.0
As we can see we have obtained a very good accuracy score in this example.








































0 Comments