We have already discussed what k-means clustering is and how it works. We also discussed how to perform k-means clustering using the sklearn Python library. We saw that k-means clustering is a simple algorithm that can be applied to large datasets. But, the disadvantage is we need to choose the value of k manually. And the convergence of the k-means clustering algorithm depends on this value of k. In this article, we will discuss how we can choose the optimal value of k in k-means clustering using the sklearn Python library.
We will first generate a dataset with a specific number of centers. Later, we will use the sklearn Python library to find the optimal value of centers or k in the k-means clustering algorithm. We can use the following Python code for that purpose.
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from matplotlib import pyplot X, y = make_blobs(n_samples=1000, n_features=2, centers=5, random_state=1) pyplot.scatter(x=X[:, 0], y=X[:, 1]) pyplot.savefig("clusters-2.png") pyplot.close() inertia = list() for i in range(1, 10): kmeans = KMeans(n_clusters=i, random_state=1) kmeans.fit(X) inertia.append(kmeans.inertia_) pyplot.plot(range(1, 10), inertia) pyplot.xlabel("Number of Clusters") pyplot.ylabel("Inertia") pyplot.savefig("inertia.png")
Here, we are generating a dataset with 1000 samples and two features in each sample. There are 5 cluster centers. And the …
0 Comments