What is the silhouette score?
The silhouette coefficient of a sample is calculated as follows:
Here, a is the distance between the sample and the centroid of the cluster to which the sample belongs. And b is the distance between the sample and the nearest centroid of a cluster to which the sample does not belong.
We can calculate the mean Silhouette Coefficient of all samples. The value will be between -1 to +1. A negative value indicates that a sample is assigned to the wrong cluster. A value near 0 indicates overlapping clusters. And a positive value near 1 is considered a good score.
How to measure clustering performance using silhouette score in sklearn?
We can use the silhouette_score() function from the sklearn.metrics module to calculate the mean Silhouette Coefficient of all samples. In this example, we will read the iris dataset. And then, we will divide the samples into three clusters. After that, we will use the silhouette_score() function to measure the clustering performance. Please note that we will be using k-means clustering here.
We can use the following Python code for that purpose:
from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score import seaborn df = seaborn.load_dataset("iris") df_features = df.drop(labels=["species"], axis=1) kmeans = KMeans(n_clusters=3, random_state=1) kmeans.fit_predict(df_features) score = silhouette_score(df_features, kmeans.labels_) print("Silhouette Score: ", score)
Firstly, we are reading the iris dataset from the seaborn library. Then, we are extracting features from the dataset. df_features is a DataFrame that contains all the features of the samples…
0 Comments