We can use the make_classification() function to create a dataset that can be used for a classification problem. The function returns two ndarrays. One contains all the features, and the other contains the target variable.
We can use the following Python code to create two ndarrays using the make_classification() function.
from sklearn.datasets import make_classification X, y = make_classification(n_samples=200, n_features=5, n_informative=4, n_redundant=1, n_repeated=0, n_classes=3, shuffle=True, random_state=1) print(X.shape) print(y.shape)
Here, n_samples is the total number of samples or records in the dataset. n_features is the total number of features. This includes the total number of redundant features or n_redundant and the total number of repeated features or n_repeated.
The argument n_classes indicates the total number of classes in the target variable. This number will be more than two for a multiclass classification problem.
The argument shuffle=True indicates that we are shuffling the features and the samples while creating the data. And random_state is used to initialize the pseudo-random number generator that is used for randomization.
The function returns two ndarrays X and y. X contains all the features. And y contains the target variable.
The output of the given program will be:
(200, 5) (200,)
Here, we are creating a dataset with 200 samples with 5 features and one target variable. So, the shape of the ndarray X is (200, 5), and the shape of the ndarray y is (200,).








































0 Comments