df_features = df.drop(labels=["Outcome"], axis=1) df_target = df.filter(items=["Outcome"])
Now, we are splitting the dataset into training and test set. The size of the test set is 20% of the dataset. We are shuffling the dataset before splitting. And the random_state parameter is used to control the random number generator that is used for shuffling.
X_train, X_test, y_train, y_test = train_test_split(df_features, df_target["Outcome"], test_size=0.2, shuffle=True, random_state=1)
Now, we initialize the random forest classifier. The random_state parameter in the RandomForestClassifier() constructor is used to control the randomness of the bootstrapping of the samples and the sampling of the features.
The fit() method learns from the dataset and the predict() method is used to predict the target of the test set.
classifier = RandomForestClassifier(random_state=1) classifier.fit(X_train, y_train) y_test_pred = classifier.predict(X_test)
Now, we can compare y_test_pred and y_test to measure the performance of the random forest classifier. We will use accuracy score here (What is accuracy score?).
accuracy = accuracy_score(y_test, y_test_pred)
The output of the given program will be like the following:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 768 entries, 0 to 767 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Pregnancies 768 non-null int64 1 Glucose 768 non-null int64 2 BloodPressure 768 non-null int64 3 SkinThickness 768 non-null int64 4 Insulin 768 non-null int64 5 BMI 768 non-null float64 6 DiabetesPedigreeFunction 768 non-null float64 7 Age 768 non-null int64 8 Outcome 768 non-null int64 dtypes: float64(2), int64(7) memory usage: 54.1 KB None Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome 0 6 148 72 ... 0.627 50 1 1 1 85 66 ... 0.351 31 0 2 8 183 64 ... 0.672 32 1 3 1 89 66 ... 0.167 21 0 4 0 137 40 ... 2.288 33 1 [5 rows x 9 columns] Pregnancies Glucose BloodPressure ... BMI DiabetesPedigreeFunction Age 0 6 148 72 ... 33.6 0.627 50 1 1 85 66 ... 26.6 0.351 31 2 8 183 64 ... 23.3 0.672 32 3 1 89 66 ... 28.1 0.167 21 4 0 137 40 ... 43.1 2.288 33 [5 rows x 8 columns] Outcome 0 1 1 0 2 1 3 0 4 1 Accuracy Score: 0.8051948051948052
As we can see, the accuracy score has improved. Previously, we solved the same problem using a classification tree. When we solved the same classification problem using a random forest classifier instead, the accuracy score improved.








































0 Comments