What is the completeness score in clustering?
The completeness score of clustering indicates whether all the data points that are members of a given class belong to the same cluster. It is a floating point number between 0 and 1. A completeness score of 1 indicates that all samples with the same true label are assigned to the same cluster. And a completeness score of 0 indicates poor clustering.
Please note that the completeness score is independent of the absolute values of the labels. If we permute the classes or the cluster labels, that will not change the completeness score.
How to measure clustering performance using completeness score in sklearn?
We can use the completeness_score() function from the sklearn.metrics module to calculate the completeness score of clustering. In this article, we will read the iris dataset and cluster the data points. After that, we will compare the true labels with the computed labels and compute the completeness score.
We can use the following Python code for that purpose:
from sklearn.cluster import KMeans from sklearn.metrics import completeness_score import seaborn from sklearn.preprocessing import LabelEncoder df = seaborn.load_dataset("iris") df_features = df.drop(labels=["species"], axis=1) df_target = df.filter(items=["species"]) print(df.head()) encoder = LabelEncoder() df_target["species"] = encoder.fit_transform(df_target["species"]) kmeans = KMeans(n_clusters=3, random_state=1) y_pred = kmeans.fit_predict(df_features) score = completeness_score(df_target["species"], y_pred) print("Completeness Score: ", score)
Here, we are first reading the iris dataset from the seaborn library. After that, we are splitting the dataset into features and target. df_features contain all the features of the dataset. And df_target contains all the target labels…
0 Comments