What are Extra Trees?
The Extra Trees algorithm creates a large number of unpruned decision trees and makes predictions based on that. For a regression problem, the Extra Trees algorithm takes the average of all the predictions made by the decision trees. And for a classification problem, the Extra Trees algorithm selects the class that gets maximum voting by the decision trees.
Apparently, the Extra Trees seem to be similar to Random Forests. But, there are some differences. For example, Random Forests use bagging. But, Extra Trees use the whole training dataset to fit the decision trees.
Moreover, a Random Forest uses a greedy algorithm to select the optimal split point at each node of the decision trees. But, Extra Trees select the split point at each node of a decision tree randomly.
As a result, Extra Trees are sometimes faster than Random Forests and perform equally well or better than Random Forests.
Extra Trees Regressor using sklearn
We can use the ExtraTreesRegressor class to solve regression problems in sklearn. We can use the following Python code for that purpose:
from sklearn.ensemble import ExtraTreesRegressor from sklearn.model_selection import train_test_split from sklearn.metrics import mean_absolute_error, mean_squared_error from sklearn.datasets import load_diabetes data = load_diabetes(as_frame=True) df = data.frame df_features = df.drop(labels=["target"], axis=1) df_target = df.filter(items=["target"]) X_train, X_test, y_train, y_test = train_test_split(df_features, df_target["target"], shuffle=True, random_state=1) regressor = ExtraTreesRegressor(n_estimators=100) regressor.fit(X_train, y_train) y_test_pred = regressor.predict(X_test) mae = mean_absolute_error(y_test, y_test_pred) rmse = mean_squared_error(y_test, y_test_pred, squared=False) print("Mean Absolute Error: ", mae) print("Root Mean Square Error: ", rmse)
Here, we are first reading the diabetes dataset using sklearn. Please note that df is a DataFrame. …
0 Comments