Let's put a nice pile of sand on it: Our model for this pile of sand is called the Epanechnikov kernel function: The Epanechnikov kernel is a probability density function, which means that it is positive or zero and the area under its graph is equal to one. The function \(f\) is the Kernel Density Estimator (KDE). A density estimate or density estimator is just a fancy word for a guess: We are trying to guess the density function that describes well the randomness of the data. The python source code used to generate all the plots in this blog post is available here: In this blog post, we learned about histograms and kernel density estimators. In this article, we explore practical techniques that are extremely useful in your initial data analysis and plotting. KDE plot is a probability density function that generates the data by binning and counting observations. Now let's try a non-normal sample data set. We could also partition the data range into intervals with length 1, or even use intervals with varying length (this is not so common). However, it would be great if one could control how distplot normalizes the KDE in order to sum to a value other than 1. Building upon the histogram example, I will explain how to construct a KDE In this blog post, we are going to explore the basic properties of histograms and kernel density estimators (KDEs) and show how they can be used to draw insights from the data. What if, instead of using rectangles, we could pour a "pile of sand" on each data point and see how the sand stacks? Six Sigma utilizes a variety of chart aids to evaluate the presence of data variation. To plot a 2D histogram, one only needs two vectors of the same length, corresponding to each axis of the histogram. Another popular choice is the Gaussian bell curve (the density of the Standard Normal distribution). For example, the first observation in the data set is 50.389. To illustrate the concepts, I will use a small data set I collected over the last few months. Sometimes, we are interested in calculating a smoother estimate, which may be closer to reality. The following code loads the meditation data and saves both plots as PNG files. I would like to know more about this data and my meditation tendencies. For example, the first observation in the data set is 50.389. Densities are handy because they can be used to calculate probabilities. The function K is centered at zero, but we can easily move it along the x-axis by subtracting a constant from its argument x. The KDE is a function Density pb n(x) = 1 nh Xn i=1 K X i x h ; (6.5) where K(x) is called the kernel function that is generally a smooth, symmetric function such as a Gaussian and h>0 is called the smoothing bandwidth that controls the amount of smoothing. In practice, it often makes sense to try out a few kernels and compare the resulting KDEs. 