This repository stores code and supplementary material for study on the quality of food related data in consumer nutrition applications.
You can
- OpenFoodFacts: https://world.openfoodfacts.org/data
- Nutritional values for common foods and products: https://www.kaggle.com/datasets/trolukovich/nutritional-values-for-common-foods-and-products
- Food101: https://www.kaggle.com/datasets/kmader/food41
- MAFood121: https://www.kaggle.com/datasets/theviz/mafood121
- Epicurious scraped: https://www.kaggle.com/datasets/pes12017000148/food-ingredients-and-recipe-dataset-with-images
There is a need for developers of consumer nutrition applications to accumulate food-related data. However, studies about the methods to assess the quality of food related data are scarce. This study lays a foundation for further research into quality of such data. The central part of the research is a way to solve merge problem that occurs when trying to merge multiple datasets into one with minimal number of duplicates. This study solves it using lexical similarity function. This study also proposes a range of metrics that can be used to gauge quality of a dataset and demonstrates their results