Correlation

Correlation is a measure of the strength of the relationship between a pair of variables. It expresses the extent and direction to which two variables change together. Correlation does not imply causation.

Summary

Correlation is a statistical measure often used in observational research that quantifies the relationship between two variables, indicating the degree to which changes in one variable are related to changes in another variable. This measure provides information on both the direction and strength of the relationship between two variables.

Is there a difference between “association” and “correlation”?

Although “correlation” and “association” are often used interchangeably in casual communication, there is an important technical difference between the two terms.

Association is a broad term that refers to a relationship between two variables in which changes in one variable are related to changes in the other variable. In contrast, correlation is frequently used as a technical term that mathematically quantifies the strength and direction of the relationship between two variables that fit the conditions for use of this type of analysis

How is correlation measured?

Depending on the type of data and the nature of the relationship being examined, correlation is usually measured with one of two main types of correlation coefficients:

  • Pearson's product moment correlation coefficient
  • Spearman's rank correlation coefficient

A correlation coefficient value can range from −1 to +1, with a value of −1 or +1 indicating a perfect relationship and a value of 0 indicating no relationship between two variables. If the coefficient is a positive number (e.g., +0.4), the variables are positively related (i.e., as the value of one variable goes up, the value of the other variable also tends to go up), and if the coefficient is a negative number (e.g., −0.4), the variables are negatively related (i.e., as the value of one variable goes up, the value of the other variable tends to go down).

General guidelines for evaluating the strength of the relationship are as follows:

  • A correlation coefficient of 0.1 to 0.29 (or −0.1 to −0.29) indicates a weak correlation.
  • A correlation coefficient of 0.3 to 0.49 (or −0.3 to −0.49) indicates a medium correlation.
  • A correlation coefficient of 0.5 to 1.0 (or −0.5 to −1.0) indicates a strong correlation.

Correlation coefficients — like any other statistic — are more error-prone at low sample sizes, so a “strong” correlation for a small sample size should be taken with a grain of salt. Also keep in mind that these are rough guidelines; these levels are not set in stone.

Does correlation imply causation?

In a word, no. Just because two variables are correlated does not mean that one variable causes changes in the other variable. Correlation only tells us that there is a relationship between the variables, but it doesn't explain why or how this relationship occurs. For instance, one study[1] found a strong correlation between news reports that have hype and press releases that have hype. Assuming the observed correlation is actually true, then there are generally three explanations for why it could occur:

A causes B: It could be that press releases containing exaggerations are picked up by the media and repeated.

B causes A: It could be that exaggerated news stories about a piece of research lead to exaggerated press releases. However, because press releases are generally written before news stories, this possibility is unlikely.

Some third factor causes both A and B: It could be that both the journalists and writers of press releases are getting their information directly from the journal articles and/or interviews with the researchers. Therefore, the source of the hype are the journal articles and/or researchers.

Although a correlation coefficient doesn’t provide the information needed to identify which of the above three explanations is true, one can narrow down the possibilities through independent reasoning. However, if possible, the best way to establish causation is not through observational studies, but through carefully controlled experiments in which researchers actively intervene by changing only one variable and then observing what happens with the intervention group that is compared to a control group. This is partly why randomized, double-blind, placebo-controlled trials are the gold standard in biomedical sciences.

References

1.^Sumner P, Vivian-Griffiths S, Boivin J, Williams A, Venetis CA, Davies A, Ogden J, Whelan L, Hughes B, Dalton B, Boy F, Chambers CDThe association between exaggeration in health related science news and academic press releases: retrospective observational studyBMJ.(2014 Dec 9)