K-means clustering is an unsupervised learning algorithm that can be used for solving clustering problems in machine learning. K-means clustering takes a bunch of unlabeled data and groups them into k clusters. The clustering is done so that each point belongs to its nearest cluster center. And we usually use the Manhattan distance or Euclidean distance to measure the distance between each point and cluster centers.
In our previous article, we discussed how k-means clustering works. In this article, we will discuss how to do k-means clustering using sklearn in Python.
We can use the following code to solve the k-means clustering problem using the sklearn Python library.
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from matplotlib import pyplot X, y = make_blobs(n_samples=1000, n_features=2, centers=4, random_state=1) pyplot.scatter(x=X[:, 0], y=X[:, 1]) pyplot.savefig("clusters.png") pyplot.close() kmeans = KMeans(n_clusters=4, random_state=1) kmeans.fit(X) print("Cluster Centers: \n", kmeans.cluster_centers_) pyplot.scatter(x=X[:, 0], y=X[:, 1]) pyplot.scatter(x=kmeans.cluster_centers_[:, 0], y=kmeans.cluster_centers_[:, 1], color="black") pyplot.savefig("cluster-centers.png") pyplot.close()
Firstly, we are generating data. We are using the make_blobs() function to generate 1000 samples. The number of features in …
0 Comments