import seaborn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score, mean_squared_error
df = seaborn.load_dataset("mpg")
print(df.head())
print(df.info())
df = df.dropna()
X_train, X_test, y_train, y_test = train_test_split(df[["horsepower", "weight", "acceleration"]], df["mpg"],
train_size=0.8, shuffle=True, random_state=1)
ridge_regressor = Ridge()
ridge_regressor.fit(X_train, y_train)
y_test_predicted = ridge_regressor.predict(X_test)
r2 = r2_score(y_test, y_test_predicted)
rmse = mean_squared_error(y_test, y_test_predicted, squared=False)
print("R2: ", r2)
print("RMSE: ", rmse)
Here, we are first loading the “mpg” dataset using the seaborn library. After that, we print the first few lines of the dataset and information about the dataset. We get the following output:
mpg cylinders displacement ... model_year origin name 0 18.0 8 307.0 ... 70 usa chevrolet chevelle malibu 1 15.0 8 350.0 ... 70 usa buick skylark 320 2 18.0 8 318.0 ... 70 usa plymouth satellite 3 16.0 8 304.0 ... 70 usa amc rebel sst 4 17.0 8 302.0 ... 70 usa ford torino [5 rows x 9 columns] <class 'pandas.core.frame.DataFrame'> RangeIndex: 398 entries, 0 to 397 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 mpg 398 non-null float64 1 cylinders 398 non-null int64 2 displacement 398 non-null float64 3 horsepower 392 non-null float64 4 weight 398 non-null int64 5 acceleration 398 non-null float64 6 model_year 398 non-null int64 7 origin 398 non-null object 8 name 398 non-null object dtypes: float64(4), int64(3), object(2) memory usage: 28.1+ KB
Now, we can see the horsepower feature contains 392 non-null values out of 398 rows. So, we are using the dropna() function to drop the rows that contain null values…








































0 Comments