Sometimes we need to binarize data in a dataset. We need to transform data in a dataset in such a way that data above a specific threshold should be marked 1, and below the threshold should be marked zero.
We can use the following Python code to perform binarization using sklearn.
from sklearn.preprocessing import Binarizer import pandas data = pandas.read_csv("diabetes.csv") D = data.values X = D[:,:-1] binarizer = Binarizer(threshold=1) X_transformed = binarizer.fit_transform(X) print(X_transformed)
Here, we are using pandas to read the Pima Indians Diabetes dataset. The dataset contains various predictor variables such as the number of pregnancies the patient has had, the BMI, insulin level, age, etc. A machine learning model can learn from the dataset and predict whether the patient has diabetes based on these predictor variables.
X = D[:,:-1]
Here, X contains all the features of the dataset. We will binarize the data X.
binarizer = Binarizer(threshold=1) X_transformed = binarizer.fit_transform(X)
Firstly, we are initializing the binarizer. After that, we are binarizing the data X with the threshold of 1. So, all values more than 1 are transformed to 1. And all values less than 1 are transformed into 0.
The output of the above program will be:
[[1. 1. 1. ... 1. 0. 1.] [0. 1. 1. ... 1. 0. 1.] [1. 1. 1. ... 1. 0. 1.] ... [1. 1. 1. ... 1. 0. 1.] [0. 1. 1. ... 1. 0. 1.] [0. 1. 1. ... 1. 0. 1.]]






0 Comments