ferrisoxide / brocade.io Goto Github PK

View Code? Open in Web Editor NEW

104.0 7.0 4.0 3.08 MB

Open GTIN / barcode & product database

Home Page: https://www.brocade.io

License: GNU Affero General Public License v3.0

Ruby 67.68% JavaScript 3.15% CSS 0.54% HTML 28.53% Shell 0.09%

barcode api-server products

brocade.io's Issues

Replace PaperTrail with Logidze

Logidze might be a better solution for maintaining an audit trail than the exisiting PaperTrail gem. It uses triggers to maintain the delta, and stores the data inside the affected record. It should be faster and simpler than PaperTrail.

Not an urgent issue, as we only have ~200 actual changes recorded for products. But it might become important going forward.

Add a security.txt file

Add a file to guide security reporting, etc. See https://krebsonsecurity.com/2021/09/does-your-organization-have-a-security-txt-file/

Feature: Edit Product

Allow logged in users to edit product details.

Will have to provide some default property names (e.g. Trans Fat) and present assumed / default units.

Investigate GSDN standards

It looks like the concept of federated product data pools is already covered in the GS1 standards, under the general banner of GSDN (Global Data Synchronization Network).

It'd be worth investing some time into looking at the standards and seeing how they fit into the overall goals of Brocade. See:

https://www.gs1.org/services/gdsn
https://en.wikipedia.org/wiki/Global_Data_Synchronization_Network

Assess impact of "2D" barcodes coming in 2027

Is this something we need to be aware of?

https://www.prnewswire.com/news-releases/retail-industry-to-transition-from-upc-to-two-dimensional-barcodes-on-product-packaging-by-2027-301458969.html

No emails can be send

Hi,

emails can not be send, I get repeatably the following error:

Due to this email addresses can not be verificated and new users, e.g. me, can not sign up.

Regards,
Syndesi

Bearer token for API access

In preparation for write-access to the API, add support for bearer tokens - to be passed in the header when authenticating.

Read-only access will not require a token - at least not for now.

As part of all this we'll need to make a call on validating EAN. As per #27, assuming EAN-14 (and padding EAN-13 codes) is probably the way to go.

Design Decision: Item Change Management

Background

The original Datakick API allowed any client - authenticated or not - to add or update items.

This creates risk, as bad-faith third parties could easily pollute the database with poor quality - or incorrect - data.

We don't want to block read access for anyone, but we probably need to have some control over incoming data.

Proposal 1: Writing to the database requires authentication

In this model reading is open but writing back to the database requires some kind of authentication - either as a logged in user in the HTML interface or using some kind of token system when accessing via the API.

This requires a decision on type of token. The standards-based JSON Web Tokens are probably the preferred option. An alternative is a simple access key / secret pair. These are less secure, but possibly easier for library authors to use.

Proposal 2: Changes to the database require validation

In this model any user - authenticated or not - can make changes, but those changes remain in an "unvalidated" state until an authenticated user (or some automated process) marks them as valid.

The review process for changes would require community involvement, though we could use feedback from the manual process to train an AI system (TensorFlow or the like) - moving towards a more automated review process.

Either proposal probably also requires a mechanism to report exisiting bad data. Again, this could be used to train an AI system and help it spot bad data.

Legal notices

Will need to provide text for defining the legal side of things, including:

Do not guarantee the completeness of any of the data.
Giving credit wherever it is due

Explain the intent of brocade.io

Testing ground for GS1, barcode, supply chain management services
Not intended as a production system

MUST include "credits" page
MUST include "overview" page, detailing the legal use
SHOULD create new tickets for setting up process for resolving credit disputes, setting credit

Remove Google Analytics

Google Analytics - at least via MS Edge - reports a net::ERR_BLOCKED_BY_CLIENT error and doesn't appear to work anymore.

We don't really need this as we're not actively tracking use of the app. We may as well remove the Google Analytics script - if we need to provide any kind of analytics in the future I'd rather ones that are useful to end users directly.

!! ADVANCE NOTICE - SERVICE WILL BECOME READ-ONLY FOR A WHILE !!

Hi folks

To address some data quality concerns - some dirty data that was already in the system, some that has been added by people testing - I'm going to be turning off write access for a while. The system will remain readable.

I'm considering implementing a protocol that allows changes to be posted but not immediately available - e.g. a log of "changes" that can either be reviewed manually or otherwise quality assured before being integrated into the main data. This isn't clear in my head just yet, but if anyone has any thoughts on this I'd appreciate if we can share ideas.

My goal is to have more people involved in the decision making - at least in the long term, as there are many problems to solve on the way to building out a distributed system.

Apologies for not responding to questions and issues in a timely manner. I'll try to be more on the ball from here on in.

-Tom

Automatically pad barcodes shorter than 14 digits with leading zeros

I think that it would make sense to add leading zeros to the barcode if it is shorter than 14 digits. Otherwise searching for the more common variants of EAN-8, EAN-13 or the GTIN counterparts doesn't lead to a result...
As far as I remember this was also the default behavior of datakick's API.

e.g. the following doesn't work as expected, as it's a EAN-13 barcode:
https://www.brocade.io/api/items/4012200328002
but padded with a leading zero to form a EAN-14 the correct item is returned:
https://www.brocade.io/api/items/04012200328002

If you don't think that this addition makes sense in this repository, I can as well change it in my downstream library 😄

Task: Add data from Product Open Data

Import data from http://product-open-data.com/download

NB This could interesting, as the dump file is for MySQL

Introduce GPC Bricks

Background

Already baked into the app is the notion of a property set, basic collections of facts about a product.

Each property set defines the name, type and potentially the title of product information elements:

  NUTRITION_FACTS = {
    serving_size: { type: :text },
    servings_per_container: { type: :text },
    calories: { type: :number },
    fat_calories: { type: :number, title: 'Calories from Fat' },
   ...

This scheme is a bit clumsy and likely to become more complicated as more complex types are added. There is also no way currently in the system to validate data against property sets.

Proposal

1. Use JSON Schema

Rather than build out a complete type system, use the existing standards and tooling around JSON Schema. Under this model, a property set definition would look something like this:

{
  "$id": "https://brocade.io/nutrition-facts.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Nutrition Facts",
  "type": "object",
  "properties": {
    "servingSize": {
      "type": "string",
      "description": "Details of typical serving portion"
    },
    ....
    "calories": {
      "type": "integer",
      "description": "Number of calories per serving"
    },
    ....

There are numerous resources for validating JSON data, including multiple ruby gems:

We can probably push a lot of this down to the database level, validating / typing data within PostgreSQL itself:

2. Use Property Definitions From GS1

GS1 publishes standard property definitions for different product types and industries. Rather than invent our our, we should defer to the GS1 standards for property name and type information.

For example, the published Australasian_Liquor_Industry attributes includes a liquorAge integer type attribute.

Attribute Name	Definition	Example	Rationale	Data Type
liquorAge	How old in years - 1 to 100 for premium spirits, some fortified & liqueursco	15	Required by the Australasian Liquor Industry for Local Data Synchronisation requirements	integer

Other attributes include enums (alcoholicStrengthDescription), floats (standardDrinks) and a number of attributes that are constrained by various rules - potentially mappable to JSON Schema regex type definitions.

If we were to record data for alcoholic products - at least for the Australian market - we would map all these into a specific JSON Schema document for validating data.

3. Move Title Translations Into I18N

This may be an edge case, but there might be reasons to provide translations of property names. Rather than bake this into the schemas, we can push this off to Rails' I18N mechanism, translating based on keys, e.g.

  calories: 'calorías'

Set up CI

Need to have a CI solution in place. Investigate use of Github Actions as a first call.

CI solution will need to:

Run all tests
Run bundle audit
Run zeitwerk checks

and report on failure if any of these fail.

Stretch target is to merge to master and deploy on success, but that's not super critical at this stage.

Investigate Google Taxonomy

https://www.google.com/basepages/producttype/taxonomy-with-ids.en-US.txt

Access to all products database

Hi, thanks for this repo.
Can I get access to all products database, without an api, just to simply download it in a csv format for example.
Thanks

Upgrade to Rails 7

Rails 7 alpha 2 is out now. Edgy, but let's upgrade anyway. By the time Rails 7 hits release-candidate (or release) we'll be looking at building out our own formal release.

Update UI

Post the update to Rails 7, we've more options re serving up JS, CSS, etc. The sign-on screen can do with some work, as the general "below the fold" content.

Swap to css bundling w/ tailwind for simplicity.

Feature: Nutrition Facts

Present nutrition-related properties in a sensible collection, as per the original Datakick style:

NB "Percent of daily values" text at the bottom is derived /assumed and not part of the stored properties values.

NB Default units are not present in the data (and have to be assumed) - e.g. 'Trans Fat' (or trans_fat) is measured in g, whereas Cholesterol is measured in mg. In neither case is the unit stored in the data, with the properties stored as { "trans_fat":0.0, "cholesterol":0 }.

NB Some values are grouped and tallied, e.g. Saturated Fat, Trans Fat, etc are grouped and tallied as 'Total Fats'. The total value is present in the stored properties (as fat) - see * Example JSON Data* below.

NB Total calories and Calories from Fat are reported separately. This data is present in the stored properties (as fat) - see * Example JSON Data* below.

Example JSON Data

{"gtin14":"00000000079983","brand_name":"Trader Joe's","name":"Pistachios - Dry Roasted & Salted","size":"16 oz (1 lb) 454 g","ingredients":"Pistachios, Salt\r\n\r\nFACILITY PROCESSES OTHER TREE NUTS","serving_size":"1/4 cup nuts without shells (30 g / about 1/2 cup with shells)","servings_per_container":"about 8","calories":180,"fat_calories":120,"fat":14.0,"saturated_fat":1.5,"trans_fat":0.0,"polyunsaturated_fat":2.5,"monounsaturated_fat":10.0,"cholesterol":0,"sodium":160,"carbohydrate":9,"fiber":3,"sugars":3,"protein":6,"images":[]}

Debugger breakpoint left in properties_controller.js

Per email from @ajwtech, a debugger breakpoint has been left in properties_controller.js, causing the website to halt if viewed using developer tools.

This needs to be removed.

Add Swagger / OpenAPI documentation

Partially addressed in #37, we need to document the API formally. The OpenAPI / Swagger model is an accepted set of processes and standards, so we'll use this as a starting point.

NB The rswag gem has already been added to the project, for the purposes of documenting and testing the API. We'll continue to expand use of this.

Acceptance Criteria

Documentation of API at the /documents endpoint
RSpec tests for the API

Feature: Display country of origin

Per original Datakick style, present country of origin - possibly with a flag symbol as well (see "US" mark):

NB The country of origin can be derived from the GTIN itself - under some conditions at least.

Add static security checks

Time to add bundle-audit to the mix. Fix any reported static security issues.

Investigate OpenFoodFacts as another potential data source

https://world.openfoodfacts.org/

Search - Error when no results found

Occasionally a search returns a server side error when results are not found or times out. Seems to happen mainly when a new search is submitted.

Query
https://www.brocade.io/api/items?query=peanux

Investigate 'consensualknowledge'

Review data captured by consensualknowledge.net, in particular the resources they have gathered here:

https://consensualknowledge.net/shopping-advisor-and-other-uses-of-knowledge-base-about-products/

Maybe look to reach out to them, see if there are ways or reasons to collaborate.

Ensure UI follows accessibility guidelines

Check UI against WCAG, etc standards. Remedy as necessary.

Improve search

Currently search works on the whole word(s). Searching using "?query=gree" under Datakick will return everything with "gree" in the data (e.g. _Gree_n Tea, _Gree_k Yogurt). The same search on Brocode.io doesn't return anything, as it's looking for the literal word 'gree'.

API Consumption Helpful Information

Is this accurate? I kept looping through records to try understand the API behavior and data structure/types.

This information might be helpful for those consuming the API.

Header Pagination Info

Header - `links` - First Header

https://www.brocade.io/api/items?page=20&query=peanut+butter; rel="last", https://www.brocade.io/api/items?page=2&query=peanut+butter; rel="next"

Header - `links` - Following Headers include the previous and

https://www.brocade.io/api/items?page=1&query=peanut+butter; rel="first", https://www.brocade.io/api/items?page=1&query=peanut+butter; rel="prev",

Header - `per-page` & `total`

per-page - Results per page = 100
total - Total number records = 1912
total / per-page = Total Pages 19.12 rounded up = 20. Total is also found in Header - links last

Data Types

**Key					DataType**
ProductID				int
gtin14					string
brand_name				string
name					string
fat					string
size					string
fiber					string
sodium					string
sugars					string
protein					string
calories				string
potassium				string
cholesterol				string
ingredients				string
carbohydrate				string
fat_calories				string
serving_size				string
saturated_fat				string
trans_fat				string
monounsaturated_fat			string
polyunsaturated_fat			string
servings_per_container			string
pages					string
author					string
format					string
publisher				string
alcohol_by_volume			double
alcohol_by_weight			double
volume_fluid_ounce			double
volume_ml				double
weight_g				double
weight_ounce				double
unit_count				double

Example JS Class of Data Structure

class clsProducts
{
	constructor()
	{
		this.gtin14 = '';
		this.brand_name = '';
		this.name = '';
		this.fat = '';
		this.size = '';
		this.fiber = '';
		this.sodium = '';
		this.sugars = '';
		this.protein = '';
		this.calories = '';
		this.potassium = '';
		this.cholesterol = '';
		this.ingredients = '';
		this.carbohydrate = '';
		this.fat_calories = '';
		this.serving_size = '';
		this.saturated_fat = '';
		this.trans_fat = '';
		this.monounsaturated_fat = '';
		this.polyunsaturated_fat = '';
		this.servings_per_container = '';
		this.pages = '';
		this.author = '';
		this.format = '';
		this.publisher = '';
		this.alcohol_by_volume = null; //Double
		this.alcohol_by_weight = null; //Double
		this.volume_fluid_ounce = null; //Double
		this.volume_ml = null; //Double
		this.weight_g = null; //Double
		this.weight_ounce = null; //Double
		this.unit_count = null; //Double
	}
}

Example c# Class of Data Structure

public class Products
    {
        public string gtin14 { get; set; }
        public string brand_name { get; set; }
        public string name { get; set; }
        public string fat { get; set; }
        public string size { get; set; }
        public string fiber { get; set; }
        public string sodium { get; set; }
        public string sugars { get; set; }
        public string protein { get; set; }
        public string calories { get; set; }
        public string potassium { get; set; }
        public string cholesterol { get; set; }
        public string ingredients { get; set; }
        public string carbohydrate { get; set; }
        public string fat_calories { get; set; }
        public string serving_size { get; set; }
        public string saturated_fat { get; set; }
        public string trans_fat { get; set; }
        public string monounsaturated_fat { get; set; }
        public string polyunsaturated_fat { get; set; }
        public string servings_per_container { get; set; }
        public string pages { get; set; }
        public string author { get; set; }
        public string format { get; set; }
        public string publisher { get; set; }
        public double? alcohol_by_volume { get; set; }
        public double? alcohol_by_weight { get; set; }
        public double? volume_fluid_ounce { get; set; }
        public double? volume_ml { get; set; }
        public double? weight_g { get; set; }
        public double? weight_ounce { get; set; }
        public double? unit_count { get; set; }
    }

Add OmniAuth for API access

Having investigated various approaches to secure access to the API, I've opted to go with OAuth2. This may make integration more complex than a simple token-based solution, but I'm erring on the side of a more secure mechanism. There are numerous implementations of OAuth2 clients in the languages people seem to using to integrate.

I'll also be opting for Github as the first OAuth ID provider, as most users - at least initially - will also have Github accounts.

NOTE Related to #33

I wrote a simple mobile app that consumes brocade.io API

I threw this together this weekend. Can't wait for the ability to edit and add new records from the API. This is the initial working version.
https://github.com/MrRedBeard/brocade.io-Mobile-App

Investigate UNSPSC as a possible alternative taxonomy

Investigate https://store.unspsc.org/ as a possible alternative taxonomy - though this does appear to require a subscription or purchase of the dataset

Use JSON-LD schemas from schema.org

The lack of any formal structure in the data has always bothered me, as we basically store everything in one blob of JSON data. In the general case this is fine, but when products/items have very definite properties (e.g. a Book has an author, food items have calories, etc) it gets harder to ensure that the data makes sense - or is consumable in a repeatable way.

Proposal

Brocade.io to start using JSON-LD as the base model for all data presented via the API, using schemas published by third-parties like https://schema.org.

For instance, if we adopt the "Product" type from schema.org, we can structure product information that is both human-readable and easily processed by applications:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Lite Italian Dry Salami",
  "gtin": "00073007107096",
  "countryOfAssembly": "USA"
  "brand": {
    "@type": "Brand",
    "name": "Columbus"
  },
  "material": "processed meat"
}

JSON-LD also enables us to add a graph for extended attributes, e.g. if nutritional information is available for a product we can use the NutritionInformation type to present these attributes in a structured manner:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Lite Italian Dry Salami",
  "@graph": [
    {
      "@type": "NutritionInformation",
      "calories": "214 kcal",
      "servingSize": "28 g",
      ...
      }
    },
    ...
}

Other types from schema.org can be used as applicable (e.g. Book, Movie), etc. We can also make use of schemas published by other third parties - or our own custom types - as required and potentially "future proof" the underlying data model.

We can also use the type information in the frontend, using schema types to determine the best way to present data like nutritional information in a table-like format (see #11). We can also use to insert Microdata into the HTML to nest metadata suitable for search engines, web scrapers and the like to consume.

Benefits

Leverage existing data structures and tools
Consistent results from the API
Helps inform the presentation in the UI

Risks / Possible Problems

Harder to parse incoming data
Large amount of existing data that needs to be processed

We can mitigate the second problem by processing individual products on demand and progressively update data. The problem of parsing data requires a bit more thought and investigation, but it looks like a solvable problem.

It's probably not a solution, but will at least help surface meta data concerns (e.g. use of Dublin Core, etc).

Task: Add data from UhttBarcodeReference

Import data from https://github.com/papyrussolution/UhttBarcodeReference

There is a lot of data here, so we'll probably have to process files individually
Will need to translate some of the Russian text, often used for general categories (e.g. Продукты питания = Food)
Still not clear what license this data is provided under

ferrisoxide / brocade.io Goto Github PK

brocade.io's Issues

Background

Proposal 1: Writing to the database requires authentication

Proposal 2: Changes to the database require validation

Background

Proposal

1. Use JSON Schema

2. Use Property Definitions From GS1

3. Move Title Translations Into I18N

Example JSON Data

Acceptance Criteria

Header Pagination Info

Header - links - First Header

Header - links - Following Headers include the previous and

Header - per-page & total

Data Types

Example JS Class of Data Structure

Example c# Class of Data Structure

Proposal

Benefits

Risks / Possible Problems

Recommend Projects

Recommend Topics

Recommend Org

Header - `links` - First Header

Header - `links` - Following Headers include the previous and

Header - `per-page` & `total`