df = seaborn.load_dataset("iris") df_features = df.drop(labels=["species"], axis=1) df_target = df.filter(items=["species"])
The species column contains strings. But, a machine learning model understands only numbers. So, we are label encoding the species column of df_target.
encoder = LabelEncoder() df_target["species"] = encoder.fit_transform(df_target["species"])
Now, we are performing k-means clustering. We want three clusters. And the random_state=1 parameter controls the random number generator that is used to randomly initialize the centroids of the clusters.
The fit_predict() method learns from the dataset, performs clustering, and assigns predicted labels to each datapoint.
kmeans = KMeans(n_clusters=3, random_state=1) y_pred = kmeans.fit_predict(df_features)
Now, we are using the contingency_matrix() function from the sklearn.metrics.cluster module to compute the contingency matrix.
matrix = contingency_matrix(df_target["species"], y_pred)
The output of the above program will be:
sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Contingency Matrix: [[ 0 50 0] [48 0 2] [14 0 36]]
0 Comments