In a meta-analysis, heterogeneity generally describes the degree to which the included studies vary. This term’s often used synonymously with “between-study heterogeneity”, which refers to the variation in studies’ outcomes above and beyond sampling error. The most common way to quantify between-study heterogeneity is using I2, which ranges from 0% (no heterogeneity) to 100% (considerable heterogeneity).


In a meta-analysis, heterogeneity is the term used to describe variation between studies. “Heterogeneity” mean several different things: it can can refer to how studies differ in terms of participants, interventions, or outcomes (called “clinical heterogeneity”, in terms of study design and risk of bias (called “methodological heterogeneity”), or in terms of an intervention’s effects (called “statistical heterogeneity”). Ideally, a high-quality meta-analysis will minimize clinical and methodological heterogeneity by choosing appropriate inclusion and exclusion criteria (so that the studies don’t differ too greatly in their participants, interventions, study design, etc.). Even in this ideal case where studies are exactly repeated, they’ll get different values for the outcome simply because the whole population isn’t being observed. This is known as “sampling error”. As a simple example, if you get the average height of a room of 10 people by measuring 4 people’s heights, you’ll get a different average height if you sample another 4 randomly.

However, things are almost never ideal in nutrition research. Studies are not exact replicas of each other sampling from the exact same populations. Thus, the results that studies get will vary more than random error predicts. This increased variation is called between-study heterogeneity, and is often just called “heterogeneity” for short. The most common measure of heterogeneity is I2, which returns values ranging from 0%–100%. This can be interpreted as the percent of variation across studies over and above chance. The I2 of the studies in a meta-analysis can be roughly interpreted as follows:

  • 0%–40%: low heterogeneity
  • 30%–60%: moderate heterogeneity
  • 50%–90%: substantial heterogeneity
  • 75%–100%: considerable heterogeneity Note: the ranges above overlap as these are rough guidelines suggested by the Cochrane Collaboration, who suggest that factors other than the I2 value need to be taken into account to fully assess heterogeneity

A large I2 value can be interpreted two ways:

  • An unobserved factor is causing variability in the magnitude of an intervention’s effect.
  • There may be a clear difference between studies that should be explored (e.g., perhaps the heterogeneity could be reduced if methodological heterogeneity is accounted for. For example, if the authors performed two meta-analyses for low doses and high doses)

High I2 is a warning sign that the authors may have inappropriately selected studies to include in a meta-analysis because the studies are different from each other and they’re averaging apples and oranges together and that the conclusions from the meta-analysis may not be reliable or generalizable.

Adapted from Siebert, M., 2018[1]