OUP user menu

The use of crowdsourcing for dietary self-monitoring: crowdsourced ratings of food pictures are comparable to ratings by trained observers

Gabrielle M Turner-McGrievy , Elina E Helander , Kirsikka Kaipainen , Jose Maria Perez-Macias , Ilkka Korhonen
DOI: http://dx.doi.org/10.1136/amiajnl-2014-002636 First published online: 4 August 2014

Abstract

Objective Crowdsourcing dietary ratings for food photographs, which uses the input of several users to provide feedback, has potential to assist with dietary self-monitoring.

Materials and methods This study assessed how closely crowdsourced ratings of foods and beverages contained in 450 pictures from the Eatery mobile app as rated by peer users (fellow Eatery app users) (n=5006 peers, mean 18.4 peer ratings/photo) using a simple ‘healthiness’ scale were related to the ratings of the same pictures by trained observers (raters). In addition, the foods and beverages present in each picture were categorized and the impact on the peer rating scale by food/beverage category was examined. Raters were trained to provide a ‘healthiness’ score using criteria from the 2010 US Dietary Guidelines.

Results The average of all three raters’ scores was highly correlated with the peer healthiness score for all photos (r=0.88, p<0.001). Using a multivariate linear model (R2=0.73) to examine the association of peer healthiness scores with foods and beverages present in photos, peer ratings were in the hypothesized direction for both foods/beverages to increase and ones to limit. Photos with fruit, vegetables, whole grains, and legumes, nuts, and seeds (borderline at p=0.06) were all associated with higher peer healthiness scores, and processed foods (borderline at p=0.06), food from fast food restaurants, refined grains, red meat, cheese, savory snacks, sweets/desserts, and sugar-sweetened beverages were associated with lower peer healthiness scores.

Conclusions The findings suggest that crowdsourcing holds potential to provide basic feedback on overall diet quality to users utilizing a low burden approach.

  • diet
  • self-monitoring
  • mobile health
  • technology
  • crowdsourcing

Introduction

Behavioral weight loss interventions are an effective way to help people lose weight and decrease chronic disease risk.1 Diet self-monitoring assists with weight management2 and is considered the cornerstone of behavioral treatment for weight loss.3 Adherence to self-monitoring4 and receiving personalized feedback on diet5 ,6 are associated with improved weight loss, but diet self-monitoring tends to decline over time.7 ,8 Mobile health (mHealth) technologies hold promise as a way to provide individuals with the ability to self-monitor diet and receive feedback wherever they are. Generally, studies requiring participants to self-monitor diets have utilized paper journal methods,2 which can be time consuming and tedious for participants.9

Recently, smartphone cameras have made photographing foods a possibility, making just-in-time food recording possible.10 Recording food through photographs may be one way to reduce the participant burden for recording foods. In one study, which had users record dietary intake via phone cameras, users gave using the camera phone high ratings of satisfaction and almost all preferred the camera method to traditional pen and paper recording methods.11 Finding ways to provide quick and low-cost feedback to users based on food photographs has been a challenge. One approach to providing feedback on photos of diet is to utilize crowdsourcing, which uses the input of several users to provide feedback and information,12 such as in the Eatery application (http://www.massivehealth.com/). Users take pictures of their foods with the Eatery app, rate their meals using a sliding scale from fit (healthy) to fat (unhealthy), and are then prompted to rate the photographs of foods and beverages from other users. In addition, users receive peer feedback as an average healthiness score for their own foods and beverages. Figure 1 provides a screenshot of the Eatery app interface for rating and feedback.

Figure 1

Screenshot of the Eatery application with the rating interface on the left and the feedback interface on the right.

The Eatery application represents a potential use of mHealth technology to reduce the burden of self-monitoring. However, little is known about the validity of peer feedback using a crowdsourcing model. The goal of the study was to assess how closely crowdsourced ratings of foods and beverages were related to the ratings of the same pictures by trained raters. A secondary goal of the study was to examine if foods and beverages that should be increased in the diet (according to the US Dietary Guidelines) were associated with higher crowdsourced ratings, and if foods and beverages that should be limited in the diet were associated with lower crowdsourced ratings. We hypothesized that crowdsourced ratings of foods would be similar to those of trained raters comparing a basic rating for crowdsource users (scale of fit to fat) to a more complex rating system based on the US Dietary Guidelines. In addition, we hypothesized that foods and beverages that should be increased in the diet (based on the US Dietary Guidelines) would be associated with higher peer user ratings than those foods and beverages that should be decreased.

Methods

This paper details the results of an observational study which used data collected from the Eatery app from 2012–2013. The Eatery app is designed to allow users to take pictures of the foods they eat and then post the picture for others to view on the app. Users of the app can then rate the healthiness of the foods pictured. These crowdsourced ratings of each photo are then provided back to the original user who receives feedback on their diet and can then modify his or her diet based on the feedback. The goal of the app is to help users improve the quality of their diet.

Pictures (n=429 288 photos) from the Eatery app were provided to the researchers from the Eatery app creators. From these photos, pictures were selected which had at least 10 peer ratings and had some textual description (which users could enter describing the food, such as ‘sandwich’) but did not contain text about ‘not eating’ something (n=167 787). Figure 1 shows the rating scale used by Eatery app users, which has a series of stars between ‘fat’ (eg, unhealthy) and ‘fit (‘healthy’). Eatery peer rating scores were provided to the researchers with a corresponding scale from 0 to 1. From these photos, 500 pictures equally distributed among the range of Eatery scores and from both the USA and Europe representing 333 unique Eatery users were selected. These photos were then manually inspected to ensure the photos met inclusion criteria (ie, they represented actual food and/or beverage items). This resulted in 450 pictures in total (300 from the USA and 150 from Europe) with a mean of 18.4 peer user ratings per photo. The total number of ratings was 8265 from 5006 unique Eatery users.

Next, a rating system was developed in order to compare the peer user ratings to a set of nutrition standards. The 2010 US Dietary Guidelines foods and food components to reduce (foods high in sodium,13 saturated fat,13–15 cholesterol,13 ,14 trans fat,16 and added sugars15; refined grains; and alcohol in moderation) and foods to increase (fruit, vegetables, whole grains, fat-free/low-fat (unsweetened) dairy, and low-cholesterol protein sources) were chosen as the comparison rating framework.17 Three expert raters (graduate students in public health) were trained in the rating system and instructed to categorize the pictured food and beverage items as well as provide a rating of healthiness. Expert raters were trained in and knowledgeable about foods which were common sources of the examined nutrients. In addition, expert raters all had completed coursework in nutrition and were knowledgeable about a variety of foods and beverages. Expert raters were only able to score the photos based on what was pictured along with any text descriptions. Therefore, like the users of the Eatery app, they may not have known preparation methods or if a food was a reduced fat version. Pictures were scored on a scale of 1 to 5 with all photos starting off at a 3 (neutral). Points were subtracted (down through 1) for each food category that the Dietary Guidelines specify should be consumed in moderation and points were added (up through 5) for each food that the Dietary Guidelines specify should be increased. Photos could have points deducted (eg, −2 points for a high cholesterol food and a food made from refined grains) and added (eg, 1 point added for a fruit present) to give a total value (eg, 3–2+1=score of 2). A single food or beverage item could have more than 1 point deducted or added if it represented more than one scoring category. For example, a pastry item would have points deducted for trans fat, added sugar, and refined grains. Table 1 shows the foods and beverages used to assess the rating of each photo. Institutional Review Board approval was not necessary for the study as all data used were de-identified and there was no interaction with human subjects.

View this table:
Table 1

Scoring system used by expert raters of Eatery pictures including foods and beverages which represent food groups or predominant sources of nutrients which the Dietary Guidelines recommend should be consumed in moderation and foods and beverages which should be encouraged

Food or food componentExamples
Minus 1 pointFoods to reduce
 Sodium Cheese, processed foods, canned foods, etc
 Saturated fat Beef, pork, processed meat, high-fat dairy (cheese, ice cream, whole milk), etc
 Cholesterol Beef, pork, poultry, eggs, liver, etc
 Trans fat Baked goods, French fries, yeast breads, etc
 Added sugars Sodas, energy drinks, sweet teas, sweetened coffees, desserts (cookies, cakes, pies, ice cream, etc), pastries, etc
 Refined grains White bread, white pasta, white rice, cold cereals made with refined grains (eg, corn flakes), etc
 Alcohol Alcohol in moderation (if >2 alcoholic drinks in picture, then 1 point was deducted)
Plus 1 pointFoods to increase
 Fruits All fruits (except fruit desserts)
 Vegetables All vegetables (except fried potatoes)
 Whole grains Whole wheat pasta, whole wheat bread, brown rice, whole grain cereal, oatmeal, etc
 Fat-free, low-fat (unsweetened dairy) Low-fat or skim milk, yogurt, etc
 Low-fat/low-cholesterol protein Seafood, beans, peas, soy products, and nuts/seeds
 Vegetable oils Olive oil, canola oil, etc

In addition to calculating a score, expert raters also categorized what foods and beverages were present from a list of common food and beverage groups (as outlined in table 2). An ‘other’ category where a response could be inserted was also included. This brief list of food/beverage groups was selected based on MyPlate food groups18 and foods and beverages which are frequent sources of the nutrients examined in the Dietary Guidelines scoring (such as saturated fat). This was used to assess how well the expert raters correlated with one another based on the food categories and to examine how foods/beverages present in photos predicted the peer rater healthiness scale. A password protected website was created with individual log-ins for each rater, which allowed raters to view each Eatery photo, rate the photo using the Dietary Guidelines rating system, and categorize the foods and beverages present. Raters were instructed to complete ratings and food/beverage categorization for every photo viewed on the system. Expert raters received a 1.5 h training session on the scoring system and 10 sample pictures were rated together by the group to establish consensus in the scoring methods and consensus in identifying foods and beverages present.

View this table:
Table 2

Results of a multivariate linear model with regression coefficients (B), SEs, and p values for different food categories identified by expert raters as predicting the healthiness score crowdsourced by peer raters

Food categories as predicting healthiness score provided by peer ratersB coefficientSEp Value
Foods and beverages to increase
 Fruits0.1530.019<0.001
 Vegetables0.0650.016<0.001
 Whole grains0.0210.0250.39
 Fat-free and low-fat dairy products0.0300.0310.34
 Seafood−0.0010.0330.96
 Beans, peas, lentils, nuts, or seeds0.0390.0210.06
 Water or unsweetened beverage0.0130.0340.70
Foods and beverages to consume in moderation
 Processed food−0.0470.0250.06
 Fast food−0.2180.026<0.001
 Refined grains−0.0710.015<0.001
 Red meat (beef, pork, lamb)−0.1200.020<0.001
 Cheese−0.0830.017<0.001
 Savory snacks−0.2070.036<0.001
 Sweets/dessert−0.3280.021<0.001
 Chicken or chicken mixed dishes−0.0300.0230.19
 Eggs and egg mixed dishes0.0010.0290.97
 Sugar-sweetened beverages−0.0700.0360.05
 Alcohol−0.1310.0570.02

Data from the expert raters were then compared to the Eatery peer user ratings (peer raters). The ratings from the expert raters (on a scale of 1–5) were compared the healthiness scores from the Eatery peer raters (on a scale of 0–1). In addition, food and beverage groups identified in each picture by the expert raters were used to examine the relationship of the presence of these foods and beverages with the healthiness score generated by the peer Eatery app raters. Data from the Eatery app were provided to the researchers free of charge by Massive Health (owners and creators of the Eatery app). Massive Health was not involved in the design or conduct of the research and they did not affect in any way reporting of the results. The researchers did not receive funding from Massive Health.

Statistical analysis

Food categories and location

If an expert rater had not marked any food components for the picture or provided any text (‘other’), the picture was declared to have missing content information for the expert rater. To assess how expert raters agreed on the presence of food categories, the number of times all expert raters, two expert raters, or only one expert rater marked the presence of a certain food category was calculated. Pictures that did not have information from all three expert raters were excluded from this step. The number of times all expert raters agreed on the presence of a food category was calculated and divided by the total number of cases where one, two, or all three raters marked a food category (agreement percentage). Because all expert raters were from the USA (and therefore may have been more knowledge about US vs European foods and brands), the effect of image location on rater agreement and the content categorized was calculated. A χ2 test was used to assess whether agreement percentages were different between pictures from the USA and Europe.

Healthiness scores

The average of all peer ratings obtained by Eatery users for a single photograph represents the peer healthiness score for the photograph in question. The higher the healthiness score, the healthier (or more ‘fit’) the food or beverage items in the image were perceived by peer raters. The interclass correlation coefficient (ICC) was calculated among the three individual raters to assess their consistency. Due to difference in the scales used by expert raters and peer raters (continuous and categorical), a Pearson correlation coefficient in healthiness score was calculated (a) between individual expert raters, (b) between each expert rater and peer raters, and (c) between an average healthiness score given by expert raters and peer raters. If an expert rater had not provided healthiness score information, the picture was not used when calculating the correlation coefficient between the expert rater and other expert raters or peer raters. When an average healthiness score of an image given by all expert raters was calculated, only expert raters who had given a healthiness score for the image were considered for obtaining the average of the image. Correlation coefficients were also calculated and compared separately for pictures from the USA and Europe. It was assessed whether there was higher agreement on the healthiness scores for pictures from the USA (a one-tailed test). In this case, correlation coefficients can be assumed independent due to rating different images and Fisher z-transformation was applied to the correlation coefficients in order to obtain z values for difference assessment.

Modeling peer rater healthiness score from food categories

A multivariate linear model was used to examine the association between peer rater healthiness scores and the food categories present in each photo. Food categories were included as independent variables and the peer rater healthiness score was included as the dependent variable. The food categories were entered as binary variables (1=present, 0=absent). For example, in the photo of chicken salad in figure 1, expert raters might have the following food categories: vegetable (cherry tomatoes, lettuce) and chicken or chicken mixed dishes. A food category was declared to be present if at least two expert raters agreed on its presence. For images that had one expert rater's information missing, only one expert rater's vote was enough to declare the presence of a food category. All statistical analyses were conducted using MATLAB (V.8.0.0.783; The MathWorks Inc).

Results

Completion of the ratings and categorization by each expert rater was high. Expert rater 1 provided the healthiness score and food/beverage categorization for 100% of the photos, expert rater 2 rated and categorized 98% of the photos, and expert rater 3 rated 99% of the photos and categorized 98% of the photos. None of the pictures had more than one rater's information missing.

Raters’ agreement on the foods and beverages present and healthiness score

The agreement among the three expert raters for the presence of food was examined. This was analyzed in order to explore how frequently all three reviewers agreed that a food or beverage was present (eg, all three raters listed the same food or beverage as present in the photo). The agreement percentage was 45.8% for photos from the USA and 40.9% for photos from Europe. The agreement percentage for photos from the USA was significantly higher than for photos from Europe (χ2(n=1592)=4.60, p=0.032).

The correlation coefficients of healthiness score ratings between all possible pairs of expert raters and each expert rater and peer raters were all highly significant (p<0.001) and correlated with one another. Correlations between expert rater pairs were r=0.75 (rater 1 and rater 2; ICC=0.75), r=0.73 (rater 1 and rater 3; ICC=0.73), and r=0.78 (rater 2 and rater 3; ICC=0.77), and among all expert raters ICC=0.75 (single measures). The average healthiness score of all three expert raters combined was highly correlated with the peer healthiness score (r=0.88, p<0.001). The correlation of expert raters’ average score with peer user ratings was high for both photos from the USA (r=0.90, p<0.001) and Europe (r=0.87, p<0.001) and did not differ (p=0.12) from one another.

Peer healthiness score prediction from food categories

A multivariate linear model was used to examine the relationship between peer raters’ healthiness scores and the food and beverages (to increase and to reduce) present in the images. The overall model was significant and the food categories were associated with the peers’ healthiness score (R2=0.73, F(19 431)=63.91, p<0.001). Table 2 shows the individual regression coefficients for different food components of the model (p values signifying the significance of each food/beverage component-related peer user score). The peer user ratings were in the hypothesized direction for both foods/beverages to increase and ones to limit such that photos with fruit, vegetables, whole grains, and legumes, nuts, and seeds (borderline at p=0.06) were all associated with higher peer user healthiness scores and processed foods (borderline at p=0.06), food from fast food restaurants, refined grains, red meat, cheese, savory snacks, sweets/desserts, and sugar-sweetened beverages were associated with lower peer healthiness scores.

Discussion

The study examined the relationship between using a crowdsourced method of assessing and providing minimal feedback on food and beverage intake by untrained users (peer raters) and trained observers (expert raters). The findings suggest that a large group of untrained peers can provide feedback comparable to trained raters who are familiar with the US Dietary Guidelines using a basic rating scale. In addition, the ratings of peers were in the expected direction for foods and beverages which should be included and increased in the diet, and foods and beverages which should be limited or consumed in moderation (based on the Dietary Guidelines recommendations). It proved difficult to achieve complete match-up among expert raters for identifying every food and beverage in each photo, namely because missing just one item (such as a slice of orange identified as a fruit) resulted in non-agreement among expert raters. And although identifying foods from Europe proved to be slightly more challenging for the US expert raters than identifying foods from US photos, the correlation of healthiness scores derived from peer user ratings with the expert raters was high for both European and US photos and the scores did not significantly differ from one another. The study found that the average of the three expert raters was more highly correlated with the peer healthiness score than any expert rater alone or individual expert raters with each other, demonstrating the benefit of using data from several users versus a single user. The findings of this paper suggest that crowdsourcing has potential to provide basic feedback on overall diet quality to users utilizing a low burden approach.

Self-monitoring energy intake is one of the key components of behavioral weight loss programs.19 Dietary self-monitoring requires daily recording of foods and the energy content (and sometimes other macronutrients, such as fat grams) for each food item, which can be burdensome,19 time-consuming, and tedious for participants.9 Using mobile devices for self-monitoring holds promise for making self-monitoring easier (through automatic calculation of energy intake) and can create the opportunity to self-monitor ‘in-the-moment’ (versus recording after a bout of eating). In our previous weight loss trial, however, participants who used traditional mobile apps for diet self-monitoring (which require entering each food and beverage consumed) did not self-monitor their diets significantly more than participants who recorded their dietary intake using a paper journal.8 Recording food and beverage intake through photographs, versus searching for each food consumed and adding that food to a daily intake list, may reduce participant burden and, in turn, help to increase self-monitoring behavior.

Research shows that accuracy is not as important as frequency of and adherence to self-monitoring for weight loss20; therefore, finding ways to increase the frequency of self-monitoring may be more important than focusing on highly accurate and detailed methods. A potential method of reducing participant burden when tracking diet is the use of photographing of foods and beverages consumed. Several research projects are underway to create systems which utilize user food and beverage photographs to estimate the nutrient content of meals.10 ,21 These projects have focused more on dietary assessment, which requires a high degree of accuracy, as compared to dietary self-monitoring. In addition, nutrient analysis of meal photographs relies on image processing by computers to determine what foods and beverages are present, as well as the portion sizes.21 Other technologies have also been explored as a way to capture dietary data, such as interactive web sites,22 digital audio recorders,22 scanning or sensor-based technologies,22 or using social media.23

Crowdsourcing has already been used in other areas outside dietary assessment. One example is Amazon's Mechanical Turk, which relies on human users to perform web-based tasks in return for money.24 In addition, crowdsourcing has been used in the health arena to share information on health conditions,25 participate in genome studies,25 and test health promotion messages.25 Few studies, however, have been conducted examining crowdsourcing as a diet self-monitoring method. Those studies which have examined crowdsourcing have primarily examined apps which attempt to estimate the energy and macronutrient content of meals.10 The results of these studies demonstrate potential for using crowdsourcing; however, the accuracy of estimating the energy content of foods from photographs, whether by expert raters (registered dietitians) or non-expert raters, is still fairly low.10 ,26 A potential benefit in using a general rating scale to provide basic feedback on dietary intake (from ‘fit’ or healthy to ‘fat’ or unhealthy—such as the one the Eatery employs) is the low burden approach for the user while still providing accurate feedback on overall diet quality. Weight loss interventions generally prescribe a caloric goal, so it is not known if using a rating scale would be useful in a weight loss setting. While crowdsourcing energy intake may be challenging, using other simple categorization methods, such as the traffic light approach, which has been successfully used in other weight loss interventions,27 ,28 may be a better method.

The study has several strengths. The study represents real world data from over 300 Eatery app users with 450 food photographs used. An objective scale was used, based on the Dietary Guidelines, as a way to validate the ratings provided by peers. In addition, three trained expert raters used the US Dietary Guidelines rating scale to rate each photograph. There are also several limitations. Because the Eatery app only provided a simple rating scale, we had to devise an objective rating method while being constrained to having a similar, limited scale. A more comprehensive rating scale comparing the peer user ratings with a gold standard in dietary assessment, such as diet quality (such as using the Healthy Eating Index29) or nutrient content of the foods and beverages in the photos, with the peer healthiness ratings may provide a more accurate interpretation of the utility and accuracy of the Eatery app. While expert raters could use the ‘other’ category if foods or beverages were not on the possible list of categories, a comprehensive list of foods and beverages for categorization was not used. Even if Eatery peer ratings are accurate, it is not known what impact receiving this feedback from the app has on users’ eating behavior and whether it impacts dietary change. Future studies should examine if this simple rating system impacts dietary intake or if more sophisticated feedback (energy, macronutrients, etc) is needed and can be crowdsourced.

Conclusions

Self-monitoring energy intake is one of the key components of behavioral weight loss programs.19 Diet tracking mobile apps have held promise as a way to increase the frequency of diet self-monitoring, but these apps still require that participants enter foods and beverages consumed (through searching for the food or scanning a barcode on a packaged item). Crowdsourcing has potential as a way to improve adherence to dietary self-monitoring over a longer period of time. This study represents the first step in assessing the utility and accuracy of using crowdsourcing to provide very general diet feedback. The results of this study found that when basic feedback on diet quality by peer raters is crowdsourced, it is comparable to feedback from expert raters and that peers rate both healthy and unhealthy foods in the expected direction. Future studies should examine the impact of this type of rating on dietary intake and examine long-term adherence to self-monitoring using this type of approach.

Acknowledgments

The authors thank Sylvia Cheng and the Massive Health team, as well as Max Utter at Jawbone for providing the Eatery dataset. Massive Health was acquired by Jawbone in February 2013 and Jawbone continues to support the Massive Health application and associated research.

Footnotes

  • Contributors IK, EEH, and KK conceptualized the study and acquired the data. KK conducted statistical analysis. JMP-M created the database for analysis and designed the data collection tools. GMTM designed the nutrition protocol for rating the pictures and trained the raters. GMTM drafted the manuscript. All authors provided critical review and revisions of the manuscript.

  • Funding This work was partially supported by the SalWe Research Program for Mind and Body (Tekes—the Finnish Funding Agency for Technology and Innovation grant 1104/10).

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

References

View Abstract