In our previous articles, we discussed how to solve regression problems using regression trees and classification problems using classification trees. We can also plot decision trees using the plot_tree() function from the sklearn.tree module. In this article, we will discuss how to plot decision trees using the sklearn Python library.
Let’s read the pima diabetes dataset. The dataset has total 9 columns. Out of these 9 columns, 8 columns represent features, such as glucose, blood pressure, insulin, BMI, etc. And the Outcome column represents whether the patient has diabetes. In our previous article, we discussed how to solve this classification problem using a classification tree. Let’s look into the Python code once more:
import pandas
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from matplotlib import pyplot
df = pandas.read_csv("diabetes.csv")
print(df.info())
print(df.head())
df_features = df.drop(labels=["Outcome"], axis=1)
df_target = df.filter(items=["Outcome"])
print(df_features.head())
print(df_target.head())
X_train, X_test, y_train, y_test = train_test_split(df_features, df_target["Outcome"], test_size=0.2, shuffle=True, random_state=1)
classifier = DecisionTreeClassifier(random_state=1, max_depth=3)
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_test_pred)
print("Accuracy Score: ", accuracy)
Let’s summarize what we did here…








































0 Comments