Now, if we see the scatterplot between tip and total bill, the graph looks like the following:

From the above graph, it looks like the relationship is linear. And here, we are considering only one predictor variable – the total bill amount. So, the problem can be solved using simple linear regression.
We can use the following Python code to solve the simple linear regression problem.
import seaborn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot
from sklearn.metrics import r2_score, mean_squared_error
df = seaborn.load_dataset("tips")
print(df.head())
print(df.info())
pyplot.scatter(df["total_bill"], df["tip"])
pyplot.xlabel("Total Bill")
pyplot.ylabel("Tip")
pyplot.savefig("tips-scatter.png")
pyplot.close()
X_train, X_test, y_train, y_test = train_test_split(df[["total_bill"]], df["tip"], train_size=0.8, shuffle=True,
random_state=1)
linear_regressor = LinearRegression()
linear_regressor.fit(X_train, y_train)
y_test_predicted = linear_regressor.predict(X_test)
pyplot.scatter(df["total_bill"], df["tip"])
pyplot.plot(X_test, y_test_predicted, color="green")
pyplot.xlabel("Total Bill")
pyplot.ylabel("Tip")
pyplot.savefig("tips-regression.png")
pyplot.close()
r2 = r2_score(y_test, y_test_predicted)
rmse = mean_squared_error(y_test, y_test_predicted, squared=False)
print("R2: ", r2)
print("RMSE: ", rmse)
Firstly, we load the “tips” dataset using the seaborn library. The df.head() function shows the first few lines of the dataset. The df.info() function shows information about the dataset. The output of the two functions will be like the following: …








































0 Comments