Decision trees use a supervised learning approach. They can be used for regression or classification problems. A regression tree is used to solve regression problems. And a classification tree is used to solve classification problems.
In a classification tree, the leaves of the tree represent class labels, and the branches represent a combination of features that lead to the class labels. Interested readers who want to know more about how classification trees work, please refer to this youtube video: https://www.youtube.com/watch?v=_L39rN6gz7Y
How to solve classification problems using classification trees in sklearn?
Here, we will read the pima diabetes dataset. The dataset has total 9 columns. Out of these 9 columns, 8 columns represent features, such as glucose, blood pressure, insulin, BMI, etc. And the Outcome column represents whether the patient has diabetes.
We will use the following Python code to read the dataset and use a classification tree to solve the classification problem.
import pandas from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score df = pandas.read_csv("diabetes.csv") print(df.info()) print(df.head()) df_features = df.drop(labels=["Outcome"], axis=1) df_target = df.filter(items=["Outcome"]) print(df_features.head()) print(df_target.head()) X_train, X_test, y_train, y_test = train_test_split(df_features, df_target["Outcome"], test_size=0.2, shuffle=True, random_state=1) classifier = DecisionTreeClassifier(random_state=1) classifier.fit(X_train, y_train) y_test_pred = classifier.predict(X_test) accuracy = accuracy_score(y_test, y_test_pred) print("Accuracy Score: ", accuracy)
Firstly, we are reading the dataset and splitting the dataset into features and target. The df_features DataFrame contains the features. And the df_target DataFrame contains the Outcome column…
0 Comments