df = seaborn.load_dataset("iris") df_features = df.drop(labels=["species"], axis=1) df_target = df.filter(items=["species"])
Please note that a machine learning model only understands numbers. But the target labels are strings. So, we will need to label encode the target column of the dataset.
encoder = LabelEncoder() df_target["species"] = encoder.fit_transform(df_target["species"])
Now, we are using the k-means clustering algorithm to cluster the data points. Please note that we are dividing the data points into three clusters. And the random_state=1 parameter controls the random number generator that is used to initialize the k centroids of the cluster (How does k-means clustering work?)
The fit_predict() method here learns from the dataset, clusters the data points, and then labels all the clusters to which the data points belong.
kmeans = KMeans(n_clusters=3, random_state=1) y_pred = kmeans.fit_predict(df_features)
Now, we can compare y_pred with the actual labels of the target and measure the completeness score.
score = completeness_score(df_target["species"], y_pred)
The output of the above program will be:
sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Completeness Score: 0.7649861514489816
0 Comments