In our previous articles, we discussed how to solve regression problems using regression trees and classification problems using classification trees. We can also plot decision trees using the plot_tree() function from the sklearn.tree module. In this article, we will discuss how to plot decision trees using the sklearn Python library.
Let’s read the pima diabetes dataset. The dataset has total 9 columns. Out of these 9 columns, 8 columns represent features, such as glucose, blood pressure, insulin, BMI, etc. And the Outcome column represents whether the patient has diabetes. In our previous article, we discussed how to solve this classification problem using a classification tree. Let’s look into the Python code once more:
import pandas from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from matplotlib import pyplot df = pandas.read_csv("diabetes.csv") print(df.info()) print(df.head()) df_features = df.drop(labels=["Outcome"], axis=1) df_target = df.filter(items=["Outcome"]) print(df_features.head()) print(df_target.head()) X_train, X_test, y_train, y_test = train_test_split(df_features, df_target["Outcome"], test_size=0.2, shuffle=True, random_state=1) classifier = DecisionTreeClassifier(random_state=1, max_depth=3) classifier.fit(X_train, y_train) y_test_pred = classifier.predict(X_test) accuracy = accuracy_score(y_test, y_test_pred) print("Accuracy Score: ", accuracy)
Let’s summarize what we did here…
0 Comments