Can the Food Compass accurately gauge how “healthy” a food is? Original paper

The Food Compass is a food classification system that ranks food from the most healthy to least healthy, on a scale of 0–100. The system is well-intentioned and can be helpful, but it also has several flaws.

This Study Summary was published on December 2, 2021.

Dashboard

Background

Most people have a fairly easy time identifying a “healthy” food when they see one, like fruits, vegetables, and whole grains. However, consumers often seek an objective assessment of a food’s “healthfulness” to better inform their own food choices. Nutrient profiling systems (NPSs) are used to evaluate the nutritional quality of foods and beverages based on their nutritional composition and their effects on health. They’re often used to guide consumer food choices and food policy decisions. While several NPSs exist, they are all based on different criteria, and none are perfect.

The study

The authors of this study developed and validated a new NPS, the Food Compass. Development involved four steps:

1. Assessing existing NPS dietary guidelines, health claims, and diet-health relationships
To determine what constituted “healthy” and “unhealthy” food and beverage attributes, the authors reviewed seven widely-used NPSs from various countries, a recent systematic review of dietary guidelines,^[1] and the U.S. FDA nutrient content requirements for health claims.

2. Selecting attributes.
Based on the assessments above, the authors selected nutrients, ingredients, and other food attributes related to health for inclusion. They ultimately included 54 attributes across nine health-relevant domains:

Fiber and protein (both nutrients were scored positively).
Phytochemicals: total flavonoids and carotenoids (both were scored positively).
Vitamins: vitamins A, B₁, B₂, B₃, B₆, B₉ (folate), B₁₂, C, D, E, K, and choline (foods with higher amounts of each vitamin were scored positively).
Minerals: calcium, phosphorus, magnesium, iron, zinc, copper, selenium, sodium, potassium, and iodine (higher amounts of each mineral were scored positively, except for sodium, which was scored negatively).
Nutrient ratios: unsaturated:saturated fat ratio, fiber:carbohydrate ratio, and potassium:sodium ratio (higher ratios were scored positively).
Specific lipids: cholesterol, medium-chain fatty acids, alpha-linolenic acid, EPA + DHA, and trans fats (all were scored positively except for cholesterol and trans fat, which were scored negatively).
Food ingredients: fruits, non-starchy vegetables, beans and legumes, whole grains, nuts and seeds, seafood, yogurt, plant oils, refined grains, and red/processed meat (each category was scored positively, except for refined grains and red/processed meat, which were scored negatively).
Additives: added sugar, nitrates, artificial sweeteners/flavors/colors, partially hydrogenated oils, interesterified or hydrogenated oils, high fructose corn syrup, and monosodium glutamate (all items were scored negatively and in a binary manner, meaning foods lost 10 points per contained additive).
Processing: NOVA level (a 4-level classification of how processed a food is: −10 points for ultra-processed foods, −2 for processed foods, −1 for processed culinary ingredients, zero points for unprocessed/minimally processed foods), fermentation (10 points for fermented foods, zero points for non-fermented foods) and frying (−10 points for fried foods, zero points for non-fried foods).

3. Scoring principles and algorithm
As noted above, the authors scored attributes with positive health effects on a 0–10 scale and attributes with adverse health effects on a −10–0 scale. For attributes with dietary reference intakes (DRIs), scoring was based on the percentage of the DRI. Attributes without DRIs were given a de facto DRI based on the 95th percentile of their consumption in the U.S. population. For attributes with binary characteristics (e.g., the presence or absence of preservatives), scoring was binary (−10 or 0). The researchers scored all foods and beverages per 100 kcal.

The domain scores were summed into a Food Compass Score (FCS), ranging from 1 (least healthy) to 100 (most healthy) for all foods and beverages.

4. Testing and validation
The authors assessed content validity by assessing nutrients, food ingredients, and other characteristics deemed to be of public health concern; face validity by assigning a FCS to 8,032 foods and beverages reported in the 2015–2016 National Health and Nutrition Examination Survey and Food and Nutrient Database for Dietary Studies; and convergent and discriminant validity by comparing the Food Compass to other NPSs, including the NOVA food processing classification, the Health Star Rating (HSR), and the Nutri-Score NPS.

🔍 Digging Deeper: Types of validity

The researchers in this study sought to create a system for ranking the nutritional value of foods. This new ranking system needs to be valid, which might sound obvious, but what does “valid” actually mean?

Put simply, “validity” is how well a test measures the thing it’s supposed to be measuring.^[2] In the case of an NPS, this means that the score a food earns should correspond to how healthy the food actually is.

The term “construct validity” is often used to describe validity overall — it is concerned with how well an observer can make inferences about something using the test in question. For example, if an intelligent person takes an IQ test, then their score should reflect their true intelligence, and an observer should be able to correctly infer that they are intelligent. While validity can be thought of as a single concept, it can be helpful to consider some “subtypes” of validity to better understand what this concept actually is:

Convergent validity describes how closely similar tests correlate to the test in question. For example, if someone takes a test that measures anxiety, their results should be correlated with results derived from tests of worrying, stress, and negative emotion.
Divergent validity describes how closely dissimilar tests correlate to the test in question. Using the same example as above, someone who scores high on a new anxiety test should probably score low on tests of carefreeness or spontaneity. If they don’t, the new anxiety test may be questionable.
Content validity describes if a test includes all the elements required to accurately measure something. For example, if a test is designed to measure the healthiness of a person’s lifestyle, it should include a multitude of factors including, but not limited to, activity, diet, sleep, and stress.
Face validity is a concept similar to saying “you know it when you see it.” Face validity describes how obvious it is to observers that the test measures the subject in question. For example, if a test measures how well someone sleeps, the degree to which observers view the test as measuring sleep, as opposed to something else that isn’t necessarily related, like whether they have chronic fatigue, is a description of its face validity.

The results

Across the 8,032 foods tested, the mean FCS was 43.2±28.5. Among 12 major food categories, the lowest FCS was 16±17.7 for savory snacks and sweet desserts. The highest FCS was 78.6±17.4 for legumes, nuts, and seeds. Based on the observed ranges, the authors determined the following cut-offs:

FCS ≥ 70. These foods or beverages should be encouraged.
FCS = 31–69. These foods should be consumed in moderation.
FCS ≤ 30. These foods should be minimized.

The mean FCS for foods with a NOVA classification of 1 (unprocessed foods) was 81.7; for NOVA 2 foods (culinary ingredients), the mean FCS was 38.5; for NOVA 3 foods (processed foods), the mean FCS was 61.9; for NOVA 4 foods (ultra processed), the mean FCS was 36.8.

The Food Compass correlated “moderately” with the HSR and “modestly” with Nutri-Score. While the FCS tended to rise with the HSR or Nutri-Score overall, a wide range of FCS variation was evident within any HSR or Nutri-Score category.

Food Compass Scores for Select Foods and Beverages

Note

While well-intentioned, the Food Compass has several flaws. For example, Honey Nut Cheerios (FCS = 76), Frosted Cheerios (FCS = 73) and Reese’s Puffs (FCS = 72) ranked higher than raw corn (FCS = 70), olives (FCS = 70), plantains (FCS = 67) sweet potatoes (FCS = 55), and chicken breast (FCS = 52). If foods that seem obviously healthy rank lower than ones that are debatably healthy, it’s possible that this ranking system isn’t valid.

As described previously in Study Summaries, the classification of nutrients and food groups as “good” or “bad” is overly simplistic. The authors categorized additives such as high-fructose corn syrup, artificial sweeteners and flavors, and MSG as unequivocally “bad” (−10 FCS points), even though the research behind them is nuanced. Similarly, the classification negatively scored foods high in sodium, but sodium isn’t inherently unhealthy, as it’s an essential electrolyte.

The big picture

A 2016 systematic review of 83 studies^[3] assessed the methods researchers used to determine the validity of various NPSs and the ability of these NPSs to predict future health outcomes. Most of the studies assessing NPS validity were identified to be at high risk of bias, largely due to a lack of a gold standard with which to classify the “healthfulness” of a food and thereby serve as a reference standard. Construct validity (the correlation between how an NPS ranks the healthiness of foods in comparison to other measures) was often assessed by comparing one NPS model to another, even though comparing two models of unknown accuracy is likely problematic. The authors concluded that the evidence for the validity of NPSs was “very low to moderate.”

Every month we summarize over 150 of the most noteworthy health and nutrition studies. Other health categories related to this summary include:

Try Examine+ for free to view the latest research in 25 health categories and the entire Study Summaries archive, access our Supplement Guides, and unlock the Examine Database. Plus, earn continuing education credits!

Get free weekly updates on what’s new at Examine.

This Study Summary was published on December 2, 2021.

References

^Anna Herforth, Mary Arimond, Cristina Álvarez-Sánchez, Jennifer Coates, Karin Christianson, Ellen MuehlhoffA Global Review of Food-Based Dietary GuidelinesAdv Nutr.(2019 Jul 1)
^Jerry A Colliver, Melinda J Conlee, Steven J VerhulstFrom test validity to construct validity … and back?Med Educ.(2012 Apr)
^Sheri L Cooper, Fiona E Pelly, John B LoweConstruct and criterion-related validation of nutrient profiling models: A systematic review of the literatureAppetite.(2016 May 1)