If we are given a dataset, then we need to split the dataset into training and test sets before we run a machine learning model on the dataset. The machine learning model is then trained using the training set. And the test set is used to measure the performance of the trained model.
We can use the train_test_split() function from the sklearn.model_selection module to perform a train test split on the dataset. Letโs read the โtipsโ dataset. The dataset contains various features based on which the tip amount can be calculated.
So, we will first divide the dataset into features and output labels. After that, we will pass the features and output labels as two parameters of the train_test_split() function and perform the trin test split. We can use the following Python code for that purpose:
import seaborn from sklearn.model_selection import train_test_split df = seaborn.load_dataset("tips") print(df.head()) print(df.info()) df_features = df.drop(labels="tip", axis=1) df_labels = df.filter(items=["tip"], axis=1) print(df_features.head()) print(df_labels.head()) X_train, X_test, y_train, y_test = train_test_split(df_features, df_labels["tip"], train_size=0.8, shuffle=True, random_state=1)
Here, df_features is a DataFrame that contains all the features from the dataset. And df_labels contains the output labels. The first parameter of the train_test_split() function is the set of features and the second parameter is the output label…






0 Comments