What is the K-Nearest Neighbors (KNN) algorithm?
The k-nearest neighbors algorithm is a supervised learning method. The method can be used for solving regression or classification problems. The k-nearest neighbors regressor is used for solving regression problems and the k-nearest neighbors classifier is used to solve classification problems.
The algorithm works by determining the distance between different data points. Most often the Euclidean distance is used to measure the distance between data points. In the case of the k-nearest neighbors regressor, the algorithm first determines the distance between k nearest neighbors, and then, the prediction is made based on the mean or the median distance.
How to solve regression problems using the k-nearest neighbors regressor in sklearn?
Let’s read the “tips” dataset and try to find out the tip amount from the total bill amount using the KNN regressor. We can use the following Python code for that purpose:
import seaborn
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
df = seaborn.load_dataset("tips")
X_train, X_test, y_train, y_test = train_test_split(df[["total_bill"]], df["tip"], shuffle=True, random_state=1)
knn_regressor = KNeighborsRegressor(n_neighbors=5)
knn_regressor.fit(X_train, y_train)
y_test_pred = knn_regressor.predict(X_test)
mae = mean_absolute_error(y_test, y_test_pred)
rmse = mean_squared_error(y_test, y_test_pred, squared=False)
print("Mean Absolute Error: ", mae)
print("Root Mean Square Error: ", rmse)
Here, we are first reading the dataset using the seaborn Python library. After that, the dataset is split into training and …








































0 Comments