Now, if we see the scatterplot between tip and total bill, the graph looks like the following:
From the above graph, it looks like the relationship is linear. And here, we are considering only one predictor variable – the total bill amount. So, the problem can be solved using simple linear regression.
We can use the following Python code to solve the simple linear regression problem.
import seaborn from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from matplotlib import pyplot from sklearn.metrics import r2_score, mean_squared_error df = seaborn.load_dataset("tips") print(df.head()) print(df.info()) pyplot.scatter(df["total_bill"], df["tip"]) pyplot.xlabel("Total Bill") pyplot.ylabel("Tip") pyplot.savefig("tips-scatter.png") pyplot.close() X_train, X_test, y_train, y_test = train_test_split(df[["total_bill"]], df["tip"], train_size=0.8, shuffle=True, random_state=1) linear_regressor = LinearRegression() linear_regressor.fit(X_train, y_train) y_test_predicted = linear_regressor.predict(X_test) pyplot.scatter(df["total_bill"], df["tip"]) pyplot.plot(X_test, y_test_predicted, color="green") pyplot.xlabel("Total Bill") pyplot.ylabel("Tip") pyplot.savefig("tips-regression.png") pyplot.close() r2 = r2_score(y_test, y_test_predicted) rmse = mean_squared_error(y_test, y_test_predicted, squared=False) print("R2: ", r2) print("RMSE: ", rmse)
Firstly, we load the “tips” dataset using the seaborn library. The df.head() function shows the first few lines of the dataset. The df.info() function shows information about the dataset. The output of the two functions will be like the following: …






0 Comments