The Kernel Density Estimation or KDE plot shows the probability density function of continuous or non-parametric data variables. For example, let’s look into the “titanic” dataset. The dataset contains the age of each passenger. But, there are some missing values.
So, before we fill in the missing values, we may want to see the distribution of data in the age column. If the distribution of data is normal, then we can fill in the missing values with the mean age. On the other hand, if the distribution is skewed, we need to fill in the missing values with the median age. Using a KDE plot, we can see this distribution of data in the age column.
We can use the following Python code to plot the KDE plot of the data contained in the age column of the titanic dataset.
import pandas from matplotlib import pyplot df1 = pandas.read_csv("titanic.csv") df1["age"].plot.kde(color="blue") pyplot.savefig("pandas-kde.png") pyplot.close()
Here, we are using the df1[“age”].plot.kde(color=”blue”) function to plot the KDE plot. The KDE plot looks like the following:






0 Comments