regression model using which we can determine the mpg of a car from its features like weight, horsepower, acceleration, etc.
data = seaborn.load_dataset("mpg")
data.dropna(inplace=True)
D = data.values
X = data.drop(labels=["origin", "name", "mpg"], axis=1).values
y = data.filter(items=["mpg"], axis=1).values
In this example, we are dropping the rows that have missing values. Then, we are splitting the columns of the dataset into features and the target variable. Please note that the first column of the dataset, “mpg” contains the target variable. And, we will not consider certain columns like origin and name in this example. So, we are dropping the origin, name, and mpg columns from the features. And we are filtering only the mpg column as the target variable. So, X here contains all the features, and y contains the target variable.
alphas = numpy.linspace(0.01, 1, num=100, endpoint=True) params = dict(alpha=alphas)
Now, we are creating an array with 100 values from 0.01 to 1. We will consider these values as alpha values while performing the grid search. We are also creating a dictionary named “params” that contains these alpha values.
regressor = Lasso() grid_search_cv = GridSearchCV(estimator=regressor, param_grid=params, cv=10, scoring="r2") grid_search_cv.fit(X, y)
Now, we are initializing the regressor using the Lasso class. Then, we are using the function GridSearchCV() to try out all these alpha values and find out the alpha value that gives the optimal performance.
Please note that cv=10 specifies that we are using k-fold cross-validation with 10 folds. And the param_grid argument specifies the alpha values on which we want to perform an exhaustive search.
After that, we fit the model with the data. As we are using k-fold cross-validation here, the dataset will be split into training and test sets in each iteration. After that, the model will be trained with the training set and tested with the test set for all the alpha values.
print(grid_search_cv.best_estimator_) print(grid_search_cv.best_score_)
Now, the program prints the estimator with the best alpha value and the corresponding r2 score (What is the r-squared score in machine learning?)
The output of the program will be like the following:
Lasso(alpha=0.47000000000000003) 0.6218982117944125
So, in our case, the model gives the optimal performance using Lasso regression with the alpha value of 0.47. And the corresponding r2 score is 0.6218982117944125.








































0 Comments