import seaborn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from sklearn.metrics import r2_score, mean_squared_error
df = seaborn.load_dataset("mpg")
print(df.head())
print(df.info())
df = df.dropna()
X_train, X_test, y_train, y_test = train_test_split(df[["horsepower", "weight", "acceleration"]], df["mpg"], train_size=0.8, shuffle=True, random_state=1)
ElasticNet_regressor = ElasticNet()
ElasticNet_regressor.fit(X_train, y_train)
y_test_predicted = ElasticNet_regressor.predict(X_test)
r2 = r2_score(y_test, y_test_predicted)
rmse = mean_squared_error(y_test, y_test_predicted, squared=False)
print("R2: ", r2)
print("RMSE: ", rmse)
Firstly, we are loading the “mpg” dataset using the seaborn Python library. After that, we print the first few lines and information about the dataset. The output will be like the following:
mpg cylinders displacement ... model_year origin name 0 18.0 8 307.0 ... 70 usa chevrolet chevelle malibu 1 15.0 8 350.0 ... 70 usa buick skylark 320 2 18.0 8 318.0 ... 70 usa plymouth satellite 3 16.0 8 304.0 ... 70 usa amc rebel sst 4 17.0 8 302.0 ... 70 usa ford torino [5 rows x 9 columns] <class 'pandas.core.frame.DataFrame'> RangeIndex: 398 entries, 0 to 397 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 mpg 398 non-null float64 1 cylinders 398 non-null int64 2 displacement 398 non-null float64 3 horsepower 392 non-null float64 4 weight 398 non-null int64 5 acceleration 398 non-null float64 6 model_year 398 non-null int64 7 origin 398 non-null object 8 name 398 non-null object dtypes: float64(4), int64(3), object(2) memory usage: 28.1+ KB
As we can see the horsepower variable contains null values (it contains 392 non-null values out of 398 rows), we use the dropna() function to drop the rows containing null values…








































0 Comments