ferrisoxide / brocade.io Goto Github PK

View Code? Open in Web Editor NEW

104.0 7.0 4.0 3.08 MB

Open GTIN / barcode & product database

Home Page: https://www.brocade.io

License: GNU Affero General Public License v3.0

Ruby 67.68% JavaScript 3.15% CSS 0.54% HTML 28.53% Shell 0.09%

barcode api-server products

brocade.io's Introduction

brocade.io

Open GTIN / barcode & product database

Project Background

Over the years there have been several attempts at creating a freely accessible database of GTIN/barcodes and associated product data.

Many of these projects have either stalled or disappeared: the Outpan API vanished without warning, Open Product Data doesn't appear to have been updated since 2014 and recently Datakick announced they will be shutting down as of March 31, 2020.

There are still numerous commercial providers, but the number of freely accessible product data sources seems to be severely limited, with the Internet UPC Database being one of the few open services able to sustain itself. For a universal dataset, UPC and related data isn't universally accessible.

Project Goals

Open Access

The project aims to present as few barriers as possible to accessing GTIN and product data. The code for providing the service will be released as open source (see LICENSE for details) and wherever possible the data will be made available under open licenses.

There may be a need to apply some constraints - e.g. rate limiting or requiring authorisation for certain tasks - but these will only be introduced for the sake of performance, security or similar concerns.

Federated/Distributed Data

The project is not intended to be a single source of truth for product data. Instead the goal is to provide a framework for sharing product data between otherwise autonomous sources.

It's anticipated that the project will require novel protocols for federating data between disparate systems, and will be looking for inspiration from other distributed systems (e.g. DNS, Open Social, etc).

Migration Path

The initial goal for this project is to establish a migration path for users of existing APIs - notably Datakick. One aim is to provide a Datakick-compatible endpoint for developers of Datakick-related libraries to target.

The database will be seeded using data sourced from Datakick, at least initially.

Getting Started

This section will have to be added to as the project proceeds, but for now assume a technology stack based around Ruby on Rails and Postgres.

Assuming you're able to get the app installed, there is a basic seed file available using a recent download from datakick.org. Install the data via the rake task:

[bundle exec] rails db:seed:datakick

NB There are about 6000 entries in the seed data, but it doesn't include any images.

Accessing Brocade.io

You are welcome to use the Brocode instance found at https://brocade.io. Register a user account by clicking on the "Sign In" link and swap to the "Sign Up" tab before entering your email address and password. Once your account is confirmed you will be able to access other features.

API Access

Read access to the API is unrestricted, but if you want to add or edit product items you will need an access token.

Use the following curl command to fetch an access token, replacing <your email address> and <your password> as approproiate. NB We will be providing an easier mechanism to generate tokens in the UI. Please bear with us while we get the basics sorted out.

curl -X POST -d "grant_type=password&email=<your email address>&password=<your password>" https://www.brocade.io/oauth/token

The API will respond with a JSON payload containing your token, along the lines of:

{"access_token":"29bd3f1b-76ad-45c4-867f-179803f5246d","token_type":"Bearer","expires_in":7200,"created_at":1600087628}

Take note of the value of the access_token key (the 29bd3f1b... GUID in the example). You'll use this to authenticate against the API.

CAUTION: Tokens are currently set to expire after 2 hours. We will be addressing this limitation in the future, once we've established this is a secure enough model. It should be sufficient for testing, but we can extend this if it's causing issues.

Future plans include rolling out PKCE to provide security for mobile apps or other implementations where a long-running token could be decompiled out of the app or otherwise compromised.

Retrieving Items

Read access will work without authentication, so a simple curl to the API endpoint, passing the GTIN of the product you are after will suffice. Retrieving the 'test' GTIN '00000000000000':

curl -v https://www.brocade.io/api/items/00000000000000

returns a JSON payload containing the sample data:

{"gtin14":"00000000000000","brand_name":"ayam","name":"testname","size":"081216382297","ingredients":"Chocolate","serving_size":"34g","servings_per_container":"10","calories":5,"fat_calories":5,"fat":0.5,"saturated_fat":0.5,"trans_fat":0.5,"polyunsaturated_fat":0.5,"monounsaturated_fat":0.5,"cholesterol":0,"sodium":0,"potassium":0,"carbohydrate":0,"fiber":0,"sugars":0,"protein":0,"author":"MyAuthor","publisher":"MyPublisher","pages":0,"alcohol_by_volume":40.0}

Creating and Updating Items

To create a new item, send a POST to the API:

curl -i -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <your token>" -d '{"gtin":"00000000000002", "name":"test", "brand_name":"my brand", "properties": {"size":"11 inches"}}' https://www.brocade.io/api/items

Updating requires a PUT, adding the GTIN to the end of the URL:

curl -i -X PUT -H "Content-Type: application/json" -H "Authorization: Bearer <your token>" -d '{"name":"new test", "brand_name":"new brand", "properties": {"size":"12 inches"}}' https://www.brocade.io/api/items/00000000000002

NB The JSON payload for creating/updating is slightly different to the data returned via read access. Where the data retrieved is a simple flat list of key/value pairs, when pushing data to the API you will need to nest property values (anything other than gtin, name or brand_name) as a set of key/value pairs assigned to the properties key:

{ 
  "gtin": "GTIN / barcode id",
  "name": "product name,
  "brand_name": "product brand",
  "properties": {
    "serving_size": "..",
    "ingredients": "..",
    ...
  }
}

Right now you can pretty well put any keys in here, but we're thinking of adding the idea of property "sets" - common attributes for similar products (e.g. for books allow the keys author, number_of_pages, etc ). The UI has something like this now, but it's not currently supported in the API.

TODO

Import image data from Datakick
Source more open product data
Improve front-end UI, add capacity to manage tokens
New /products endpoint with more features (leaving /items to avoid breaking the API early adopters)
Implement PKCE
Introduce 'property sets'
Clean up database, remove / fix non-GTIN14 records

brocade.io's People

Contributors

Stargazers

Watchers

Forkers

mrredbeard rp70 ducky007 mohammadkhoeini

brocade.io's Issues

Set up CI

Need to have a CI solution in place. Investigate use of Github Actions as a first call.

CI solution will need to:

Run all tests
Run bundle audit
Run zeitwerk checks

and report on failure if any of these fail.

Stretch target is to merge to master and deploy on success, but that's not super critical at this stage.

Ensure UI follows accessibility guidelines

Check UI against WCAG, etc standards. Remedy as necessary.

Investigate Google Taxonomy

https://www.google.com/basepages/producttype/taxonomy-with-ids.en-US.txt

Assess impact of "2D" barcodes coming in 2027

Is this something we need to be aware of?

https://www.prnewswire.com/news-releases/retail-industry-to-transition-from-upc-to-two-dimensional-barcodes-on-product-packaging-by-2027-301458969.html

Investigate Open Beauty Facts as another potential data source

NB This is affiliate with Open Food Facts, using similar technology throughout: https://world.openbeautyfacts.org/data

API Consumption Helpful Information

Is this accurate? I kept looping through records to try understand the API behavior and data structure/types.

This information might be helpful for those consuming the API.

Header Pagination Info

Header - `links` - First Header

https://www.brocade.io/api/items?page=20&query=peanut+butter; rel="last", https://www.brocade.io/api/items?page=2&query=peanut+butter; rel="next"

Header - `links` - Following Headers include the previous and

https://www.brocade.io/api/items?page=1&query=peanut+butter; rel="first", https://www.brocade.io/api/items?page=1&query=peanut+butter; rel="prev",

Header - `per-page` & `total`

per-page - Results per page = 100
total - Total number records = 1912
total / per-page = Total Pages 19.12 rounded up = 20. Total is also found in Header - links last

Data Types

**Key					DataType**
ProductID				int
gtin14					string
brand_name				string
name					string
fat					string
size					string
fiber					string
sodium					string
sugars					string
protein					string
calories				string
potassium				string
cholesterol				string
ingredients				string
carbohydrate				string
fat_calories				string
serving_size				string
saturated_fat				string
trans_fat				string
monounsaturated_fat			string
polyunsaturated_fat			string
servings_per_container			string
pages					string
author					string
format					string
publisher				string
alcohol_by_volume			double
alcohol_by_weight			double
volume_fluid_ounce			double
volume_ml				double
weight_g				double
weight_ounce				double
unit_count				double

Example JS Class of Data Structure

class clsProducts
{
	constructor()
	{
		this.gtin14 = '';
		this.brand_name = '';
		this.name = '';
		this.fat = '';
		this.size = '';
		this.fiber = '';
		this.sodium = '';
		this.sugars = '';
		this.protein = '';
		this.calories = '';
		this.potassium = '';
		this.cholesterol = '';
		this.ingredients = '';
		this.carbohydrate = '';
		this.fat_calories = '';
		this.serving_size = '';
		this.saturated_fat = '';
		this.trans_fat = '';
		this.monounsaturated_fat = '';
		this.polyunsaturated_fat = '';
		this.servings_per_container = '';
		this.pages = '';
		this.author = '';
		this.format = '';
		this.publisher = '';
		this.alcohol_by_volume = null; //Double
		this.alcohol_by_weight = null; //Double
		this.volume_fluid_ounce = null; //Double
		this.volume_ml = null; //Double
		this.weight_g = null; //Double
		this.weight_ounce = null; //Double
		this.unit_count = null; //Double
	}
}

Example c# Class of Data Structure

public class Products
    {
        public string gtin14 { get; set; }
        public string brand_name { get; set; }
        public string name { get; set; }
        public string fat { get; set; }
        public string size { get; set; }
        public string fiber { get; set; }
        public string sodium { get; set; }
        public string sugars { get; set; }
        public string protein { get; set; }
        public string calories { get; set; }
        public string potassium { get; set; }
        public string cholesterol { get; set; }
        public string ingredients { get; set; }
        public string carbohydrate { get; set; }
        public string fat_calories { get; set; }
        public string serving_size { get; set; }
        public string saturated_fat { get; set; }
        public string trans_fat { get; set; }
        public string monounsaturated_fat { get; set; }
        public string polyunsaturated_fat { get; set; }
        public string servings_per_container { get; set; }
        public string pages { get; set; }
        public string author { get; set; }
        public string format { get; set; }
        public string publisher { get; set; }
        public double? alcohol_by_volume { get; set; }
        public double? alcohol_by_weight { get; set; }
        public double? volume_fluid_ounce { get; set; }
        public double? volume_ml { get; set; }
        public double? weight_g { get; set; }
        public double? weight_ounce { get; set; }
        public double? unit_count { get; set; }
    }

Investigate OpenFoodFacts as another potential data source

https://world.openfoodfacts.org/

Audit log for changes

Add support for logging who/what against changes in the database (ala Papertrail or similar)

Investigate UNSPSC as a possible alternative taxonomy

Investigate https://store.unspsc.org/ as a possible alternative taxonomy - though this does appear to require a subscription or purchase of the dataset

I wrote a simple mobile app that consumes brocade.io API

I threw this together this weekend. Can't wait for the ability to edit and add new records from the API. This is the initial working version.
https://github.com/MrRedBeard/brocade.io-Mobile-App

No emails can be send

Hi,

emails can not be send, I get repeatably the following error:

Due to this email addresses can not be verificated and new users, e.g. me, can not sign up.

Regards,
Syndesi

Improve search

Currently search works on the whole word(s). Searching using "?query=gree" under Datakick will return everything with "gree" in the data (e.g. _Gree_n Tea, _Gree_k Yogurt). The same search on Brocode.io doesn't return anything, as it's looking for the literal word 'gree'.

Task: Add data from UhttBarcodeReference

Import data from https://github.com/papyrussolution/UhttBarcodeReference

There is a lot of data here, so we'll probably have to process files individually
Will need to translate some of the Russian text, often used for general categories (e.g. Продукты питания = Food)
Still not clear what license this data is provided under

Upgrade to Rails 7

Rails 7 alpha 2 is out now. Edgy, but let's upgrade anyway. By the time Rails 7 hits release-candidate (or release) we'll be looking at building out our own formal release.

Feature: Nutrition Facts

Present nutrition-related properties in a sensible collection, as per the original Datakick style:

NB "Percent of daily values" text at the bottom is derived /assumed and not part of the stored properties values.

NB Default units are not present in the data (and have to be assumed) - e.g. 'Trans Fat' (or trans_fat) is measured in g, whereas Cholesterol is measured in mg. In neither case is the unit stored in the data, with the properties stored as { "trans_fat":0.0, "cholesterol":0 }.

NB Some values are grouped and tallied, e.g. Saturated Fat, Trans Fat, etc are grouped and tallied as 'Total Fats'. The total value is present in the stored properties (as fat) - see * Example JSON Data* below.

NB Total calories and Calories from Fat are reported separately. This data is present in the stored properties (as fat) - see * Example JSON Data* below.

Example JSON Data

{"gtin14":"00000000079983","brand_name":"Trader Joe's","name":"Pistachios - Dry Roasted & Salted","size":"16 oz (1 lb) 454 g","ingredients":"Pistachios, Salt\r\n\r\nFACILITY PROCESSES OTHER TREE NUTS","serving_size":"1/4 cup nuts without shells (30 g / about 1/2 cup with shells)","servings_per_container":"about 8","calories":180,"fat_calories":120,"fat":14.0,"saturated_fat":1.5,"trans_fat":0.0,"polyunsaturated_fat":2.5,"monounsaturated_fat":10.0,"cholesterol":0,"sodium":160,"carbohydrate":9,"fiber":3,"sugars":3,"protein":6,"images":[]}

Task: Add data from Product Open Data

Import data from http://product-open-data.com/download

NB This could interesting, as the dump file is for MySQL

Introduce GPC Bricks

Background

Already baked into the app is the notion of a property set, basic collections of facts about a product.

Each property set defines the name, type and potentially the title of product information elements:

  NUTRITION_FACTS = {
    serving_size: { type: :text },
    servings_per_container: { type: :text },
    calories: { type: :number },
    fat_calories: { type: :number, title: 'Calories from Fat' },
   ...

This scheme is a bit clumsy and likely to become more complicated as more complex types are added. There is also no way currently in the system to validate data against property sets.

Proposal

1. Use JSON Schema

Rather than build out a complete type system, use the existing standards and tooling around JSON Schema. Under this model, a property set definition would look something like this:

{
  "$id": "https://brocade.io/nutrition-facts.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Nutrition Facts",
  "type": "object",
  "properties": {
    "servingSize": {
      "type": "string",
      "description": "Details of typical serving portion"
    },
    ....
    "calories": {
      "type": "integer",
      "description": "Number of calories per serving"
    },
    ....

There are numerous resources for validating JSON data, including multiple ruby gems:

We can probably push a lot of this down to the database level, validating / typing data within PostgreSQL itself:

2. Use Property Definitions From GS1

GS1 publishes standard property definitions for different product types and industries. Rather than invent our our, we should defer to the GS1 standards for property name and type information.

For example, the published Australasian_Liquor_Industry attributes includes a liquorAge integer type attribute.

Attribute Name	Definition	Example	Rationale	Data Type
liquorAge	How old in years - 1 to 100 for premium spirits, some fortified & liqueursco	15	Required by the Australasian Liquor Industry for Local Data Synchronisation requirements	integer

Other attributes include enums (alcoholicStrengthDescription), floats (standardDrinks) and a number of attributes that are constrained by various rules - potentially mappable to JSON Schema regex type definitions.

If we were to record data for alcoholic products - at least for the Australian market - we would map all these into a specific JSON Schema document for validating data.

3. Move Title Translations Into I18N

This may be an edge case, but there might be reasons to provide translations of property names. Rather than bake this into the schemas, we can push this off to Rails' I18N mechanism, translating based on keys, e.g.

  calories: 'calorías'

!! ADVANCE NOTICE - SERVICE WILL BECOME READ-ONLY FOR A WHILE !!

Hi folks

To address some data quality concerns - some dirty data that was already in the system, some that has been added by people testing - I'm going to be turning off write access for a while. The system will remain readable.

I'm considering implementing a protocol that allows changes to be posted but not immediately available - e.g. a log of "changes" that can either be reviewed manually or otherwise quality assured before being integrated into the main data. This isn't clear in my head just yet, but if anyone has any thoughts on this I'd appreciate if we can share ideas.

My goal is to have more people involved in the decision making - at least in the long term, as there are many problems to solve on the way to building out a distributed system.

Apologies for not responding to questions and issues in a timely manner. I'll try to be more on the ball from here on in.

-Tom

Add a security.txt file

Add a file to guide security reporting, etc. See https://krebsonsecurity.com/2021/09/does-your-organization-have-a-security-txt-file/

Investigate 'consensualknowledge'

Review data captured by consensualknowledge.net, in particular the resources they have gathered here:

https://consensualknowledge.net/shopping-advisor-and-other-uses-of-knowledge-base-about-products/

Maybe look to reach out to them, see if there are ways or reasons to collaborate.

Replace PaperTrail with Logidze

Logidze might be a better solution for maintaining an audit trail than the exisiting PaperTrail gem. It uses triggers to maintain the delta, and stores the data inside the affected record. It should be faster and simpler than PaperTrail.

Not an urgent issue, as we only have ~200 actual changes recorded for products. But it might become important going forward.

Investigate OAI-PMH

Investigate OAI-PMH as a potential mechanism for distributing requests.

http://www.openarchives.org/OAI/openarchivesprotocol.html#Introduction

It's probably not a solution, but will at least help surface meta data concerns (e.g. use of Dublin Core, etc).

Investigate GSDN standards

It looks like the concept of federated product data pools is already covered in the GS1 standards, under the general banner of GSDN (Global Data Synchronization Network).

It'd be worth investing some time into looking at the standards and seeing how they fit into the overall goals of Brocade. See:

https://www.gs1.org/services/gdsn
https://en.wikipedia.org/wiki/Global_Data_Synchronization_Network

Update UI

Post the update to Rails 7, we've more options re serving up JS, CSS, etc. The sign-on screen can do with some work, as the general "below the fold" content.

Swap to css bundling w/ tailwind for simplicity.

Search - Error when no results found

Occasionally a search returns a server side error when results are not found or times out. Seems to happen mainly when a new search is submitted.

Query
https://www.brocade.io/api/items?query=peanux

Automatically pad barcodes shorter than 14 digits with leading zeros

I think that it would make sense to add leading zeros to the barcode if it is shorter than 14 digits. Otherwise searching for the more common variants of EAN-8, EAN-13 or the GTIN counterparts doesn't lead to a result...
As far as I remember this was also the default behavior of datakick's API.

e.g. the following doesn't work as expected, as it's a EAN-13 barcode:
https://www.brocade.io/api/items/4012200328002
but padded with a leading zero to form a EAN-14 the correct item is returned:
https://www.brocade.io/api/items/04012200328002

If you don't think that this addition makes sense in this repository, I can as well change it in my downstream library 😄

Add Swagger / OpenAPI documentation

Partially addressed in #37, we need to document the API formally. The OpenAPI / Swagger model is an accepted set of processes and standards, so we'll use this as a starting point.

NB The rswag gem has already been added to the project, for the purposes of documenting and testing the API. We'll continue to expand use of this.

Acceptance Criteria

Documentation of API at the /documents endpoint
RSpec tests for the API

Legal notices

Will need to provide text for defining the legal side of things, including:

Do not guarantee the completeness of any of the data.
Giving credit wherever it is due

Explain the intent of brocade.io

Testing ground for GS1, barcode, supply chain management services
Not intended as a production system

MUST include "credits" page
MUST include "overview" page, detailing the legal use
SHOULD create new tickets for setting up process for resolving credit disputes, setting credit

Access to all products database

Hi, thanks for this repo.
Can I get access to all products database, without an api, just to simply download it in a csv format for example.
Thanks

Feature: Edit Product

Allow logged in users to edit product details.

Will have to provide some default property names (e.g. Trans Fat) and present assumed / default units.

Design Decision: Item Change Management

Background

The original Datakick API allowed any client - authenticated or not - to add or update items.

This creates risk, as bad-faith third parties could easily pollute the database with poor quality - or incorrect - data.

We don't want to block read access for anyone, but we probably need to have some control over incoming data.

Proposal 1: Writing to the database requires authentication

In this model reading is open but writing back to the database requires some kind of authentication - either as a logged in user in the HTML interface or using some kind of token system when accessing via the API.

This requires a decision on type of token. The standards-based JSON Web Tokens are probably the preferred option. An alternative is a simple access key / secret pair. These are less secure, but possibly easier for library authors to use.

Proposal 2: Changes to the database require validation

In this model any user - authenticated or not - can make changes, but those changes remain in an "unvalidated" state until an authenticated user (or some automated process) marks them as valid.

The review process for changes would require community involvement, though we could use feedback from the manual process to train an AI system (TensorFlow or the like) - moving towards a more automated review process.

Either proposal probably also requires a mechanism to report exisiting bad data. Again, this could be used to train an AI system and help it spot bad data.

Add OmniAuth for API access

Having investigated various approaches to secure access to the API, I've opted to go with OAuth2. This may make integration more complex than a simple token-based solution, but I'm erring on the side of a more secure mechanism. There are numerous implementations of OAuth2 clients in the languages people seem to using to integrate.

I'll also be opting for Github as the first OAuth ID provider, as most users - at least initially - will also have Github accounts.

NOTE Related to #33

Use JSON-LD schemas from schema.org

The lack of any formal structure in the data has always bothered me, as we basically store everything in one blob of JSON data. In the general case this is fine, but when products/items have very definite properties (e.g. a Book has an author, food items have calories, etc) it gets harder to ensure that the data makes sense - or is consumable in a repeatable way.

Proposal

Brocade.io to start using JSON-LD as the base model for all data presented via the API, using schemas published by third-parties like https://schema.org.

For instance, if we adopt the "Product" type from schema.org, we can structure product information that is both human-readable and easily processed by applications:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Lite Italian Dry Salami",
  "gtin": "00073007107096",
  "countryOfAssembly": "USA"
  "brand": {
    "@type": "Brand",
    "name": "Columbus"
  },
  "material": "processed meat"
}

JSON-LD also enables us to add a graph for extended attributes, e.g. if nutritional information is available for a product we can use the NutritionInformation type to present these attributes in a structured manner:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Lite Italian Dry Salami",
  "@graph": [
    {
      "@type": "NutritionInformation",
      "calories": "214 kcal",
      "servingSize": "28 g",
      ...
      }
    },
    ...
}

Other types from schema.org can be used as applicable (e.g. Book, Movie), etc. We can also make use of schemas published by other third parties - or our own custom types - as required and potentially "future proof" the underlying data model.

We can also use the type information in the frontend, using schema types to determine the best way to present data like nutritional information in a table-like format (see #11). We can also use to insert Microdata into the HTML to nest metadata suitable for search engines, web scrapers and the like to consume.

Benefits

Leverage existing data structures and tools
Consistent results from the API
Helps inform the presentation in the UI

Risks / Possible Problems

Harder to parse incoming data
Large amount of existing data that needs to be processed

We can mitigate the second problem by processing individual products on demand and progressively update data. The problem of parsing data requires a bit more thought and investigation, but it looks like a solvable problem.

As part of all this we'll need to make a call on validating EAN. As per #27, assuming EAN-14 (and padding EAN-13 codes) is probably the way to go.

ferrisoxide / brocade.io Goto Github PK

brocade.io's Introduction

brocade.io

Project Background

Project Goals

Open Access

Federated/Distributed Data

Migration Path

Getting Started

Accessing Brocade.io

API Access

Retrieving Items

Creating and Updating Items

TODO

brocade.io's People

Contributors

Stargazers

Watchers

Forkers

brocade.io's Issues

Header Pagination Info

Header - links - First Header

Header - links - Following Headers include the previous and

Header - per-page & total

Data Types

Example JS Class of Data Structure

Example c# Class of Data Structure

Example JSON Data

Background

Proposal

1. Use JSON Schema

2. Use Property Definitions From GS1

3. Move Title Translations Into I18N

Acceptance Criteria

Background

Proposal 1: Writing to the database requires authentication

Proposal 2: Changes to the database require validation

Proposal

Benefits

Risks / Possible Problems

Recommend Projects

Recommend Topics

Recommend Org

Header - `links` - First Header

Header - `links` - Following Headers include the previous and

Header - `per-page` & `total`