What is bagging in machine learning?
Let’s say we want to solve a classification problem using machine learning. For example, let’s say we are reading the Pima Indians Diabetes dataset. Now, the dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables. We want to create a machine learning model to predict whether a patient has diabetes based on the mentioned features.
Now, the dataset has numerous records. So, we can take random samples from the dataset with replacement and create several other datasets. The datasets thus created are called bootstrapped samples. Now, we can fit each such bootstrapped sample to each decision tree and get the result. If there are n bootstrapped samples, then we will get n results. Finally, we can select the result that has the maximum voting. In other words, we will finally select the class that is selected by most of the decision trees.
This method of combining several decision trees is called ensembling. Ensembling can improve the accuracy of a machine learning model. And the mentioned technique of ensembling decision trees by creating bootstrapped samples is called bagging or …
0 Comments