Giter Site home page Giter Site logo

ferrisoxide / brocade.io Goto Github PK

View Code? Open in Web Editor NEW
103.0 7.0 4.0 3.08 MB

Open GTIN / barcode & product database

Home Page: https://www.brocade.io

License: GNU Affero General Public License v3.0

Ruby 67.68% JavaScript 3.15% CSS 0.54% HTML 28.53% Shell 0.09%
barcode api-server products

brocade.io's Introduction

brocade.io

Open GTIN / barcode & product database

Project Background

Over the years there have been several attempts at creating a freely accessible database of GTIN/barcodes and associated product data.

Many of these projects have either stalled or disappeared: the Outpan API vanished without warning, Open Product Data doesn't appear to have been updated since 2014 and recently Datakick announced they will be shutting down as of March 31, 2020.

There are still numerous commercial providers, but the number of freely accessible product data sources seems to be severely limited, with the Internet UPC Database being one of the few open services able to sustain itself. For a universal dataset, UPC and related data isn't universally accessible.

Project Goals

Open Access

The project aims to present as few barriers as possible to accessing GTIN and product data. The code for providing the service will be released as open source (see LICENSE for details) and wherever possible the data will be made available under open licenses.

There may be a need to apply some constraints - e.g. rate limiting or requiring authorisation for certain tasks - but these will only be introduced for the sake of performance, security or similar concerns.

Federated/Distributed Data

The project is not intended to be a single source of truth for product data. Instead the goal is to provide a framework for sharing product data between otherwise autonomous sources.

It's anticipated that the project will require novel protocols for federating data between disparate systems, and will be looking for inspiration from other distributed systems (e.g. DNS, Open Social, etc).

Migration Path

The initial goal for this project is to establish a migration path for users of existing APIs - notably Datakick. One aim is to provide a Datakick-compatible endpoint for developers of Datakick-related libraries to target.

The database will be seeded using data sourced from Datakick, at least initially.

Getting Started

This section will have to be added to as the project proceeds, but for now assume a technology stack based around Ruby on Rails and Postgres.

Assuming you're able to get the app installed, there is a basic seed file available using a recent download from datakick.org. Install the data via the rake task:

[bundle exec] rails db:seed:datakick

NB There are about 6000 entries in the seed data, but it doesn't include any images.

Accessing Brocade.io

You are welcome to use the Brocode instance found at https://brocade.io. Register a user account by clicking on the "Sign In" link and swap to the "Sign Up" tab before entering your email address and password. Once your account is confirmed you will be able to access other features.

API Access

Read access to the API is unrestricted, but if you want to add or edit product items you will need an access token.

Use the following curl command to fetch an access token, replacing <your email address> and <your password> as approproiate. NB We will be providing an easier mechanism to generate tokens in the UI. Please bear with us while we get the basics sorted out.

curl -X POST -d "grant_type=password&email=<your email address>&password=<your password>" https://www.brocade.io/oauth/token

The API will respond with a JSON payload containing your token, along the lines of:

{"access_token":"29bd3f1b-76ad-45c4-867f-179803f5246d","token_type":"Bearer","expires_in":7200,"created_at":1600087628}

Take note of the value of the access_token key (the 29bd3f1b... GUID in the example). You'll use this to authenticate against the API.

CAUTION: Tokens are currently set to expire after 2 hours. We will be addressing this limitation in the future, once we've established this is a secure enough model. It should be sufficient for testing, but we can extend this if it's causing issues.

Future plans include rolling out PKCE to provide security for mobile apps or other implementations where a long-running token could be decompiled out of the app or otherwise compromised.

Retrieving Items

Read access will work without authentication, so a simple curl to the API endpoint, passing the GTIN of the product you are after will suffice. Retrieving the 'test' GTIN '00000000000000':

curl -v https://www.brocade.io/api/items/00000000000000

returns a JSON payload containing the sample data:

{"gtin14":"00000000000000","brand_name":"ayam","name":"testname","size":"081216382297","ingredients":"Chocolate","serving_size":"34g","servings_per_container":"10","calories":5,"fat_calories":5,"fat":0.5,"saturated_fat":0.5,"trans_fat":0.5,"polyunsaturated_fat":0.5,"monounsaturated_fat":0.5,"cholesterol":0,"sodium":0,"potassium":0,"carbohydrate":0,"fiber":0,"sugars":0,"protein":0,"author":"MyAuthor","publisher":"MyPublisher","pages":0,"alcohol_by_volume":40.0}

Creating and Updating Items

To create a new item, send a POST to the API:

curl -i -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <your token>" -d '{"gtin":"00000000000002", "name":"test", "brand_name":"my brand", "properties": {"size":"11 inches"}}' https://www.brocade.io/api/items

Updating requires a PUT, adding the GTIN to the end of the URL:

curl -i -X PUT -H "Content-Type: application/json" -H "Authorization: Bearer <your token>" -d '{"name":"new test", "brand_name":"new brand", "properties": {"size":"12 inches"}}' https://www.brocade.io/api/items/00000000000002

NB The JSON payload for creating/updating is slightly different to the data returned via read access. Where the data retrieved is a simple flat list of key/value pairs, when pushing data to the API you will need to nest property values (anything other than gtin, name or brand_name) as a set of key/value pairs assigned to the properties key:

{ 
  "gtin": "GTIN / barcode id",
  "name": "product name,
  "brand_name": "product brand",
  "properties": {
    "serving_size": "..",
    "ingredients": "..",
    ...
  }
}

Right now you can pretty well put any keys in here, but we're thinking of adding the idea of property "sets" - common attributes for similar products (e.g. for books allow the keys author, number_of_pages, etc ). The UI has something like this now, but it's not currently supported in the API.

TODO

  • Import image data from Datakick
  • Source more open product data
  • Improve front-end UI, add capacity to manage tokens
  • New /products endpoint with more features (leaving /items to avoid breaking the API early adopters)
  • Implement PKCE
  • Introduce 'property sets'
  • Clean up database, remove / fix non-GTIN14 records

brocade.io's People

Contributors

dependabot[bot] avatar ferrisoxide avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

brocade.io's Issues

Replace PaperTrail with Logidze

Logidze might be a better solution for maintaining an audit trail than the exisiting PaperTrail gem. It uses triggers to maintain the delta, and stores the data inside the affected record. It should be faster and simpler than PaperTrail.

Not an urgent issue, as we only have ~200 actual changes recorded for products. But it might become important going forward.

Feature: Edit Product

Allow logged in users to edit product details.

Will have to provide some default property names (e.g. Trans Fat) and present assumed / default units.

Add OmniAuth for API access

Having investigated various approaches to secure access to the API, I've opted to go with OAuth2. This may make integration more complex than a simple token-based solution, but I'm erring on the side of a more secure mechanism. There are numerous implementations of OAuth2 clients in the languages people seem to using to integrate.

I'll also be opting for Github as the first OAuth ID provider, as most users - at least initially - will also have Github accounts.

NOTE Related to #33

Feature: Nutrition Facts

Present nutrition-related properties in a sensible collection, as per the original Datakick style:

Screen Shot 2020-03-25 at 9 14 40 pm

NB "Percent of daily values" text at the bottom is derived /assumed and not part of the stored properties values.

NB Default units are not present in the data (and have to be assumed) - e.g. 'Trans Fat' (or trans_fat) is measured in g, whereas Cholesterol is measured in mg. In neither case is the unit stored in the data, with the properties stored as { "trans_fat":0.0, "cholesterol":0 }.

NB Some values are grouped and tallied, e.g. Saturated Fat, Trans Fat, etc are grouped and tallied as 'Total Fats'. The total value is present in the stored properties (as fat) - see * Example JSON Data* below.

NB Total calories and Calories from Fat are reported separately. This data is present in the stored properties (as fat) - see * Example JSON Data* below.

Example JSON Data

{"gtin14":"00000000079983","brand_name":"Trader Joe's","name":"Pistachios - Dry Roasted & Salted","size":"16 oz (1 lb) 454 g","ingredients":"Pistachios, Salt\r\n\r\nFACILITY PROCESSES OTHER TREE NUTS","serving_size":"1/4 cup nuts without shells (30 g / about 1/2 cup with shells)","servings_per_container":"about 8","calories":180,"fat_calories":120,"fat":14.0,"saturated_fat":1.5,"trans_fat":0.0,"polyunsaturated_fat":2.5,"monounsaturated_fat":10.0,"cholesterol":0,"sodium":160,"carbohydrate":9,"fiber":3,"sugars":3,"protein":6,"images":[]}

Feature: Display country of origin

Per original Datakick style, present country of origin - possibly with a flag symbol as well (see "US" mark):

Screen Shot 2020-03-25 at 9 28 45 pm

NB The country of origin can be derived from the GTIN itself - under some conditions at least.

Legal notices

Will need to provide text for defining the legal side of things, including:

  • Do not guarantee the completeness of any of the data.
  • Giving credit wherever it is due

Explain the intent of brocade.io

  • Testing ground for GS1, barcode, supply chain management services
  • Not intended as a production system
  • MUST include "credits" page
  • MUST include "overview" page, detailing the legal use
  • SHOULD create new tickets for setting up process for resolving credit disputes, setting credit

Improve search

Currently search works on the whole word(s). Searching using "?query=gree" under Datakick will return everything with "gree" in the data (e.g. _Gree_n Tea, _Gree_k Yogurt). The same search on Brocode.io doesn't return anything, as it's looking for the literal word 'gree'.

Audit log for changes

Add support for logging who/what against changes in the database (ala Papertrail or similar)

Access to all products database

Hi, thanks for this repo.
Can I get access to all products database, without an api, just to simply download it in a csv format for example.
Thanks

API Consumption Helpful Information

Is this accurate? I kept looping through records to try understand the API behavior and data structure/types.

This information might be helpful for those consuming the API.

Header Pagination Info

Header - links - First Header

https://www.brocade.io/api/items?page=20&query=peanut+butter; rel="last", https://www.brocade.io/api/items?page=2&query=peanut+butter; rel="next"

Header - links - Following Headers include the previous and

https://www.brocade.io/api/items?page=1&query=peanut+butter; rel="first", https://www.brocade.io/api/items?page=1&query=peanut+butter; rel="prev",

Header - per-page & total

per-page - Results per page = 100
total - Total number records = 1912
total / per-page = Total Pages 19.12 rounded up = 20. Total is also found in Header - links last

Data Types

**Key					DataType**
ProductID				int
gtin14					string
brand_name				string
name					string
fat					string
size					string
fiber					string
sodium					string
sugars					string
protein					string
calories				string
potassium				string
cholesterol				string
ingredients				string
carbohydrate				string
fat_calories				string
serving_size				string
saturated_fat				string
trans_fat				string
monounsaturated_fat			string
polyunsaturated_fat			string
servings_per_container			string
pages					string
author					string
format					string
publisher				string
alcohol_by_volume			double
alcohol_by_weight			double
volume_fluid_ounce			double
volume_ml				double
weight_g				double
weight_ounce				double
unit_count				double

Example JS Class of Data Structure

class clsProducts
{
	constructor()
	{
		this.gtin14 = '';
		this.brand_name = '';
		this.name = '';
		this.fat = '';
		this.size = '';
		this.fiber = '';
		this.sodium = '';
		this.sugars = '';
		this.protein = '';
		this.calories = '';
		this.potassium = '';
		this.cholesterol = '';
		this.ingredients = '';
		this.carbohydrate = '';
		this.fat_calories = '';
		this.serving_size = '';
		this.saturated_fat = '';
		this.trans_fat = '';
		this.monounsaturated_fat = '';
		this.polyunsaturated_fat = '';
		this.servings_per_container = '';
		this.pages = '';
		this.author = '';
		this.format = '';
		this.publisher = '';
		this.alcohol_by_volume = null; //Double
		this.alcohol_by_weight = null; //Double
		this.volume_fluid_ounce = null; //Double
		this.volume_ml = null; //Double
		this.weight_g = null; //Double
		this.weight_ounce = null; //Double
		this.unit_count = null; //Double
	}
}

Example c# Class of Data Structure

public class Products
    {
        public string gtin14 { get; set; }
        public string brand_name { get; set; }
        public string name { get; set; }
        public string fat { get; set; }
        public string size { get; set; }
        public string fiber { get; set; }
        public string sodium { get; set; }
        public string sugars { get; set; }
        public string protein { get; set; }
        public string calories { get; set; }
        public string potassium { get; set; }
        public string cholesterol { get; set; }
        public string ingredients { get; set; }
        public string carbohydrate { get; set; }
        public string fat_calories { get; set; }
        public string serving_size { get; set; }
        public string saturated_fat { get; set; }
        public string trans_fat { get; set; }
        public string monounsaturated_fat { get; set; }
        public string polyunsaturated_fat { get; set; }
        public string servings_per_container { get; set; }
        public string pages { get; set; }
        public string author { get; set; }
        public string format { get; set; }
        public string publisher { get; set; }
        public double? alcohol_by_volume { get; set; }
        public double? alcohol_by_weight { get; set; }
        public double? volume_fluid_ounce { get; set; }
        public double? volume_ml { get; set; }
        public double? weight_g { get; set; }
        public double? weight_ounce { get; set; }
        public double? unit_count { get; set; }
    }

Automatically pad barcodes shorter than 14 digits with leading zeros

I think that it would make sense to add leading zeros to the barcode if it is shorter than 14 digits. Otherwise searching for the more common variants of EAN-8, EAN-13 or the GTIN counterparts doesn't lead to a result...
As far as I remember this was also the default behavior of datakick's API.

e.g. the following doesn't work as expected, as it's a EAN-13 barcode:
https://www.brocade.io/api/items/4012200328002
but padded with a leading zero to form a EAN-14 the correct item is returned:
https://www.brocade.io/api/items/04012200328002

If you don't think that this addition makes sense in this repository, I can as well change it in my downstream library 😄

Use JSON-LD schemas from schema.org

The lack of any formal structure in the data has always bothered me, as we basically store everything in one blob of JSON data. In the general case this is fine, but when products/items have very definite properties (e.g. a Book has an author, food items have calories, etc) it gets harder to ensure that the data makes sense - or is consumable in a repeatable way.

Proposal

Brocade.io to start using JSON-LD as the base model for all data presented via the API, using schemas published by third-parties like https://schema.org.

For instance, if we adopt the "Product" type from schema.org, we can structure product information that is both human-readable and easily processed by applications:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Lite Italian Dry Salami",
  "gtin": "00073007107096",
  "countryOfAssembly": "USA"
  "brand": {
    "@type": "Brand",
    "name": "Columbus"
  },
  "material": "processed meat"
}

JSON-LD also enables us to add a graph for extended attributes, e.g. if nutritional information is available for a product we can use the NutritionInformation type to present these attributes in a structured manner:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Lite Italian Dry Salami",
  "@graph": [
    {
      "@type": "NutritionInformation",
      "calories": "214 kcal",
      "servingSize": "28 g",
      ...
      }
    },
    ...
}

Other types from schema.org can be used as applicable (e.g. Book, Movie), etc. We can also make use of schemas published by other third parties - or our own custom types - as required and potentially "future proof" the underlying data model.

We can also use the type information in the frontend, using schema types to determine the best way to present data like nutritional information in a table-like format (see #11). We can also use to insert Microdata into the HTML to nest metadata suitable for search engines, web scrapers and the like to consume.

Benefits

  • Leverage existing data structures and tools
  • Consistent results from the API
  • Helps inform the presentation in the UI

Risks / Possible Problems

  • Harder to parse incoming data
  • Large amount of existing data that needs to be processed

We can mitigate the second problem by processing individual products on demand and progressively update data. The problem of parsing data requires a bit more thought and investigation, but it looks like a solvable problem.

Introduce GPC Bricks

Background

Already baked into the app is the notion of a property set, basic collections of facts about a product.

Each property set defines the name, type and potentially the title of product information elements:

  NUTRITION_FACTS = {
    serving_size: { type: :text },
    servings_per_container: { type: :text },
    calories: { type: :number },
    fat_calories: { type: :number, title: 'Calories from Fat' },
   ...

This scheme is a bit clumsy and likely to become more complicated as more complex types are added. There is also no way currently in the system to validate data against property sets.

Proposal

1. Use JSON Schema

Rather than build out a complete type system, use the existing standards and tooling around JSON Schema. Under this model, a property set definition would look something like this:

{
  "$id": "https://brocade.io/nutrition-facts.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Nutrition Facts",
  "type": "object",
  "properties": {
    "servingSize": {
      "type": "string",
      "description": "Details of typical serving portion"
    },
    ....
    "calories": {
      "type": "integer",
      "description": "Number of calories per serving"
    },
    ....

There are numerous resources for validating JSON data, including multiple ruby gems:

We can probably push a lot of this down to the database level, validating / typing data within PostgreSQL itself:

2. Use Property Definitions From GS1

GS1 publishes standard property definitions for different product types and industries. Rather than invent our our, we should defer to the GS1 standards for property name and type information.

For example, the published Australasian_Liquor_Industry attributes includes a liquorAge integer type attribute.

Attribute Name Definition Example Rationale Data Type
liquorAge How old in years - 1 to 100 for premium spirits, some fortified & liqueursco 15 Required by the Australasian Liquor Industry for Local Data Synchronisation requirements integer

Other attributes include enums (alcoholicStrengthDescription), floats (standardDrinks) and a number of attributes that are constrained by various rules - potentially mappable to JSON Schema regex type definitions.

If we were to record data for alcoholic products - at least for the Australian market - we would map all these into a specific JSON Schema document for validating data.

3. Move Title Translations Into I18N

This may be an edge case, but there might be reasons to provide translations of property names. Rather than bake this into the schemas, we can push this off to Rails' I18N mechanism, translating based on keys, e.g.

  calories: 'calorías'

Set up CI

Need to have a CI solution in place. Investigate use of Github Actions as a first call.

CI solution will need to:

  • Run all tests
  • Run bundle audit
  • Run zeitwerk checks

and report on failure if any of these fail.

Stretch target is to merge to master and deploy on success, but that's not super critical at this stage.

Bearer token for API access

In preparation for write-access to the API, add support for bearer tokens - to be passed in the header when authenticating.

Read-only access will not require a token - at least not for now.

As part of all this we'll need to make a call on validating EAN. As per #27, assuming EAN-14 (and padding EAN-13 codes) is probably the way to go.

Update UI

Post the update to Rails 7, we've more options re serving up JS, CSS, etc. The sign-on screen can do with some work, as the general "below the fold" content.

Swap to css bundling w/ tailwind for simplicity.

No emails can be send

Hi,

emails can not be send, I get repeatably the following error:
Screenshot 2022-01-08 at 14-20-28 We're sorry, but something went wrong (500)
Due to this email addresses can not be verificated and new users, e.g. me, can not sign up.

Regards,
Syndesi

Add Swagger / OpenAPI documentation

Partially addressed in #37, we need to document the API formally. The OpenAPI / Swagger model is an accepted set of processes and standards, so we'll use this as a starting point.

NB The rswag gem has already been added to the project, for the purposes of documenting and testing the API. We'll continue to expand use of this.

Acceptance Criteria

  • Documentation of API at the /documents endpoint
  • RSpec tests for the API

Upgrade to Rails 7

Rails 7 alpha 2 is out now. Edgy, but let's upgrade anyway. By the time Rails 7 hits release-candidate (or release) we'll be looking at building out our own formal release.

Design Decision: Item Change Management

Background

The original Datakick API allowed any client - authenticated or not - to add or update items.

This creates risk, as bad-faith third parties could easily pollute the database with poor quality - or incorrect - data.

We don't want to block read access for anyone, but we probably need to have some control over incoming data.

Proposal 1: Writing to the database requires authentication

In this model reading is open but writing back to the database requires some kind of authentication - either as a logged in user in the HTML interface or using some kind of token system when accessing via the API.

This requires a decision on type of token. The standards-based JSON Web Tokens are probably the preferred option. An alternative is a simple access key / secret pair. These are less secure, but possibly easier for library authors to use.

Proposal 2: Changes to the database require validation

In this model any user - authenticated or not - can make changes, but those changes remain in an "unvalidated" state until an authenticated user (or some automated process) marks them as valid.

The review process for changes would require community involvement, though we could use feedback from the manual process to train an AI system (TensorFlow or the like) - moving towards a more automated review process.

Either proposal probably also requires a mechanism to report exisiting bad data. Again, this could be used to train an AI system and help it spot bad data.

Remove Google Analytics

Google Analytics - at least via MS Edge - reports a net::ERR_BLOCKED_BY_CLIENT error and doesn't appear to work anymore.

We don't really need this as we're not actively tracking use of the app. We may as well remove the Google Analytics script - if we need to provide any kind of analytics in the future I'd rather ones that are useful to end users directly.

!! ADVANCE NOTICE - SERVICE WILL BECOME READ-ONLY FOR A WHILE !!

Hi folks

To address some data quality concerns - some dirty data that was already in the system, some that has been added by people testing - I'm going to be turning off write access for a while. The system will remain readable.

I'm considering implementing a protocol that allows changes to be posted but not immediately available - e.g. a log of "changes" that can either be reviewed manually or otherwise quality assured before being integrated into the main data. This isn't clear in my head just yet, but if anyone has any thoughts on this I'd appreciate if we can share ideas.

My goal is to have more people involved in the decision making - at least in the long term, as there are many problems to solve on the way to building out a distributed system.

Apologies for not responding to questions and issues in a timely manner. I'll try to be more on the ball from here on in.

-Tom

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.