models along with their horsepower, weight, acceleration, mpg or miles driven per 1 gallon of gasoline, etc. We want to create a Lasso regression model using which we can determine the mpg of a car from its features like weight, horsepower, acceleration, etc.
data = seaborn.load_dataset("mpg")
data.dropna(inplace=True)
D = data.values
X = data.drop(labels=["origin", "name", "mpg"], axis=1).values
y = data.filter(items=["mpg"], axis=1).values
We do not want to consider columns, origin and name in this example. The first column, “mpg” is the target variable here. So, we are dropping the columns origin, name, and mpg to get the features. And we are filtering the mpg column to get the target variable. X here contains all the features, and y contains the target variable.
alphas = [numpy.random.uniform() for _ in range(100)] params = dict(alpha=alphas)
Here, we are randomly generating the values of alpha from the uniform distribution. Then, we are creating a dictionary containing the generated values of alpha.
regressor = Lasso() randomized_search_cv = RandomizedSearchCV(estimator=regressor, param_distributions=params, n_iter=50, cv=10, scoring="r2", random_state=1) randomized_search_cv.fit(X, y)
Now, we are using the RandomizedSearchCV() function to perform a randomized search for tuning the hyperparameter alpha. Please note that we are using the Lasso regression here.
The param_distributions argument specifies a dictionary that contains the parameter name and a list of values or a distribution to try out. The argument n_iter specifies the number of parameter settings that are sampled.
The argument cv=10 specifies that we are using k-fold cross-validation with 10 splits. We are using r2 scoring here. And the random_state argument is used to initialize the pseudo-random number generator that is used for randomization.
After that, we are fitting the model with the dataset. Please note that as we are using k-fold cross-validation here, 10 number of folds will be created in every iteration. And we will perform a randomized search in the iterations.
print(randomized_search_cv.best_estimator_) print(randomized_search_cv.best_score_)
Finally, we are printing the best estimator with the value of alpha that gives the best r2 score (What is the r-squared score in machine learning?). We are also printing the best r2 score for the model.
The output of the given program will be the following:
Lasso(alpha=0.45683643739445123) 0.621879411184066
From the given output, we can figure out that the best value of alpha for this model is 0.45683643739445123. And we will get the best r2 score to be 0.621879411184066.








































0 Comments