Interactive scatter plot for understanding of correlation
Usage
- Click on the scatter plot area, or type in the coordinates to add a new point, for example type
-3 4
to add a point at \((-3,4)\). - [To be implemented] Right-click on a point to remove it
- Drag the point around to change it coordinate
- Observe the correlation measures to see how they change with your data points
Correlation measures
Pearson correlation
The Pearson correlation coefficient between a pair of vairable \((X,Y)\) is defined as [1]
\[ \rho(X,Y) = \frac{\text{cov}(X,Y)}{\sigma_X\sigma_Y} \]
If \(X\) and \(Y\) normal distribution and are uncorrelated, the distribution of \(\rho(X,Y)\) follows Student’s \(t\)-distribution with degree of freedom \(n -2\), \(n\) is the number of pairs. The t-statistics can be calculated by the formula
\[ t = \rho \sqrt{\frac{n - 2}{1 - \rho^2}} \]
Spearman’s correlation coefficient
Spearman correlation coefficient between two variable \(X, Y\) is defined as the Pearson correlation between the rank vairables \(r_X\) and \(r_Y\) [2]
\[ \begin{align} r_X &= rank(X) \\ r_Y &= rank(Y) \\ s(X,Y) &= \frac{\text{cov}(r_X,r_Y)}{\sigma_{r_X}\sigma_{r_Y}} \end{align} \]
Kendall correlation
Not yet implemented.
About the app
- The app was developed using d3js and Vue.js
- The cdf of t-distribution is implemented using the script from https://www.math.ucla.edu/~tom/distributions/tDist.html
- For feedback and questions, please shoot me an email.
References
[1] “Pearson correlation coefficient,” Wikipedia, Oct. 2019.
[2] “Spearman’s rank correlation coefficient,” Wikipedia, Nov. 2019.