df = seaborn.load_dataset("iris") df_features = df.drop(labels=["species"], axis=1) df_target = df.filter(items=["species"])
Please note that the output labels are strings. But, a machine learning model understands only numbers. So, we need to label encode the target labels column. Here, we are label encoding the species column of df_target.
encoder = LabelEncoder() df_target["species"] = encoder.fit_transform(df_target["species"])
Now, we are performing k-means clustering on the dataset. The number of clusters is 3 and the random_state=1 parameter controls the random number generator that is used to randomly initialize the k centroids of the clusters (How does k-means clustering work?)
The fit_predict() method learns from the dataset, performs clustering, and assigns labels to each of the datapoints depending on to which cluster the datapoint belongs.
kmeans = KMeans(n_clusters=3, random_state=1) y_pred = kmeans.fit_predict(df_features)
Now, we can compare y_pred with the actual labels in df_target[“species”] and measure the clustering performance using the V-measure score.
score = v_measure_score(df_target["species"], y_pred)
The output of the above program will be:
sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa V-measure Score: 0.7581756800057786
0 Comments