From the heat map, it is clear that body mass and flipper length are strongly correlated. The correlation coefficient between these two features is 0.87. So, we can select one of them.
We can use the following Python code to select features based on a correlation coefficient threshold.
import seaborn from matplotlib import pyplot import numpy df = seaborn.load_dataset("penguins") print(df.info()) features = df.drop(["species", "island", "sex"], axis=1) corr_matrix = features.corr() print(type(corr_matrix)) threshold = 0.8 selected_features = set(corr_matrix.columns) for i in range(len(corr_matrix.columns)): for j in range(i): if numpy.abs(corr_matrix.iloc[i, j]) > threshold: selected_features.remove(corr_matrix.columns[i]) print(“Selected Features: \n”, selected_features) df2 = df[numpy.array(list(selected_features))] print(df2.head())
Here, we are first selecting numerical features and then, determining the correlation matrix between those features. Then, we are selecting features based on a threshold value of the absolute value of the correlation coefficient. In this example, the …






0 Comments