We have already discussed what k-means clustering is and how it works. We also discussed how to perform k-means clustering using the sklearn Python library. We saw that k-means clustering is a simple algorithm that can be applied to large datasets. But, the disadvantage is we need to choose the value of k manually. And the convergence of the k-means clustering algorithm depends on this value of k. In this article, we will discuss how we can choose the optimal value of k in k-means clustering using the sklearn Python library.
We will first generate a dataset with a specific number of centers. Later, we will use the sklearn Python library to find the optimal value of centers or k in the k-means clustering algorithm. We can use the following Python code for that purpose.
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from matplotlib import pyplot
X, y = make_blobs(n_samples=1000, n_features=2, centers=5, random_state=1)
pyplot.scatter(x=X[:, 0], y=X[:, 1])
pyplot.savefig("clusters-2.png")
pyplot.close()
inertia = list()
for i in range(1, 10):
kmeans = KMeans(n_clusters=i, random_state=1)
kmeans.fit(X)
inertia.append(kmeans.inertia_)
pyplot.plot(range(1, 10), inertia)
pyplot.xlabel("Number of Clusters")
pyplot.ylabel("Inertia")
pyplot.savefig("inertia.png")
Here, we are generating a dataset with 1000 samples and two features in each sample. There are 5 cluster centers. And the …








































0 Comments