df = df.dropna()
After that, we split the dataset into training set and test set.
X_train, X_test, y_train, y_test = train_test_split(df[["horsepower", "weight", "acceleration"]], df["mpg"], train_size=0.8, shuffle=True, random_state=1)
Please note that the shuffle=True parameter indicates that we are shuffling the dataset while splitting. And the random_state=1 parameter controls the random number generator that is used for shuffling the data. We are keeping 80% of the data as training set and the rest are test set.
Now, we execute the following Python statements:
lasso_regressor = Lasso() lasso_regressor.fit(X_train, y_train)
We first initialize the Lasso regressor. After that, we are training the model using the training set.
Now, the model can be run on the test set. And the output can be compared with the actual target values of the test set. In that way, we can measure the performance of the Lasso regression model.
y_test_predicted = lasso_regressor.predict(X_test)
r2 = r2_score(y_test, y_test_predicted)
rmse = mean_squared_error(y_test, y_test_predicted, squared=False)
print("R2: ", r2)
print("RMSE: ", rmse)
The R-squared score and the Root Mean Square Error of the model will be like the following:
R2: 0.7216621600100153 RMSE: 4.392608833440981
As we can see, the model has performed better than the Ridge regression model.








































0 Comments