Giter Site home page Giter Site logo

ferrisoxide / brocade.io Goto Github PK

View Code? Open in Web Editor NEW
104.0 7.0 4.0 3.08 MB

Open GTIN / barcode & product database

Home Page: https://www.brocade.io

License: GNU Affero General Public License v3.0

Ruby 67.68% JavaScript 3.15% CSS 0.54% HTML 28.53% Shell 0.09%
barcode api-server products

brocade.io's Issues

Replace PaperTrail with Logidze

Logidze might be a better solution for maintaining an audit trail than the exisiting PaperTrail gem. It uses triggers to maintain the delta, and stores the data inside the affected record. It should be faster and simpler than PaperTrail.

Not an urgent issue, as we only have ~200 actual changes recorded for products. But it might become important going forward.

Feature: Edit Product

Allow logged in users to edit product details.

Will have to provide some default property names (e.g. Trans Fat) and present assumed / default units.

No emails can be send

Hi,

emails can not be send, I get repeatably the following error:
Screenshot 2022-01-08 at 14-20-28 We're sorry, but something went wrong (500)
Due to this email addresses can not be verificated and new users, e.g. me, can not sign up.

Regards,
Syndesi

Bearer token for API access

In preparation for write-access to the API, add support for bearer tokens - to be passed in the header when authenticating.

Read-only access will not require a token - at least not for now.

As part of all this we'll need to make a call on validating EAN. As per #27, assuming EAN-14 (and padding EAN-13 codes) is probably the way to go.

Design Decision: Item Change Management

Background

The original Datakick API allowed any client - authenticated or not - to add or update items.

This creates risk, as bad-faith third parties could easily pollute the database with poor quality - or incorrect - data.

We don't want to block read access for anyone, but we probably need to have some control over incoming data.

Proposal 1: Writing to the database requires authentication

In this model reading is open but writing back to the database requires some kind of authentication - either as a logged in user in the HTML interface or using some kind of token system when accessing via the API.

This requires a decision on type of token. The standards-based JSON Web Tokens are probably the preferred option. An alternative is a simple access key / secret pair. These are less secure, but possibly easier for library authors to use.

Proposal 2: Changes to the database require validation

In this model any user - authenticated or not - can make changes, but those changes remain in an "unvalidated" state until an authenticated user (or some automated process) marks them as valid.

The review process for changes would require community involvement, though we could use feedback from the manual process to train an AI system (TensorFlow or the like) - moving towards a more automated review process.

Either proposal probably also requires a mechanism to report exisiting bad data. Again, this could be used to train an AI system and help it spot bad data.

Legal notices

Will need to provide text for defining the legal side of things, including:

  • Do not guarantee the completeness of any of the data.
  • Giving credit wherever it is due

Explain the intent of brocade.io

  • Testing ground for GS1, barcode, supply chain management services
  • Not intended as a production system
  • MUST include "credits" page
  • MUST include "overview" page, detailing the legal use
  • SHOULD create new tickets for setting up process for resolving credit disputes, setting credit

Remove Google Analytics

Google Analytics - at least via MS Edge - reports a net::ERR_BLOCKED_BY_CLIENT error and doesn't appear to work anymore.

We don't really need this as we're not actively tracking use of the app. We may as well remove the Google Analytics script - if we need to provide any kind of analytics in the future I'd rather ones that are useful to end users directly.

!! ADVANCE NOTICE - SERVICE WILL BECOME READ-ONLY FOR A WHILE !!

Hi folks

To address some data quality concerns - some dirty data that was already in the system, some that has been added by people testing - I'm going to be turning off write access for a while. The system will remain readable.

I'm considering implementing a protocol that allows changes to be posted but not immediately available - e.g. a log of "changes" that can either be reviewed manually or otherwise quality assured before being integrated into the main data. This isn't clear in my head just yet, but if anyone has any thoughts on this I'd appreciate if we can share ideas.

My goal is to have more people involved in the decision making - at least in the long term, as there are many problems to solve on the way to building out a distributed system.

Apologies for not responding to questions and issues in a timely manner. I'll try to be more on the ball from here on in.

-Tom

Automatically pad barcodes shorter than 14 digits with leading zeros

I think that it would make sense to add leading zeros to the barcode if it is shorter than 14 digits. Otherwise searching for the more common variants of EAN-8, EAN-13 or the GTIN counterparts doesn't lead to a result...
As far as I remember this was also the default behavior of datakick's API.

e.g. the following doesn't work as expected, as it's a EAN-13 barcode:
https://www.brocade.io/api/items/4012200328002
but padded with a leading zero to form a EAN-14 the correct item is returned:
https://www.brocade.io/api/items/04012200328002

If you don't think that this addition makes sense in this repository, I can as well change it in my downstream library 😄

Introduce GPC Bricks

Background

Already baked into the app is the notion of a property set, basic collections of facts about a product.

Each property set defines the name, type and potentially the title of product information elements:

  NUTRITION_FACTS = {
    serving_size: { type: :text },
    servings_per_container: { type: :text },
    calories: { type: :number },
    fat_calories: { type: :number, title: 'Calories from Fat' },
   ...

This scheme is a bit clumsy and likely to become more complicated as more complex types are added. There is also no way currently in the system to validate data against property sets.

Proposal

1. Use JSON Schema

Rather than build out a complete type system, use the existing standards and tooling around JSON Schema. Under this model, a property set definition would look something like this:

{
  "$id": "https://brocade.io/nutrition-facts.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Nutrition Facts",
  "type": "object",
  "properties": {
    "servingSize": {
      "type": "string",
      "description": "Details of typical serving portion"
    },
    ....
    "calories": {
      "type": "integer",
      "description": "Number of calories per serving"
    },
    ....

There are numerous resources for validating JSON data, including multiple ruby gems:

We can probably push a lot of this down to the database level, validating / typing data within PostgreSQL itself:

2. Use Property Definitions From GS1

GS1 publishes standard property definitions for different product types and industries. Rather than invent our our, we should defer to the GS1 standards for property name and type information.

For example, the published Australasian_Liquor_Industry attributes includes a liquorAge integer type attribute.

Attribute Name Definition Example Rationale Data Type
liquorAge How old in years - 1 to 100 for premium spirits, some fortified & liqueursco 15 Required by the Australasian Liquor Industry for Local Data Synchronisation requirements integer

Other attributes include enums (alcoholicStrengthDescription), floats (standardDrinks) and a number of attributes that are constrained by various rules - potentially mappable to JSON Schema regex type definitions.

If we were to record data for alcoholic products - at least for the Australian market - we would map all these into a specific JSON Schema document for validating data.

3. Move Title Translations Into I18N

This may be an edge case, but there might be reasons to provide translations of property names. Rather than bake this into the schemas, we can push this off to Rails' I18N mechanism, translating based on keys, e.g.

  calories: 'calorías'

Set up CI

Need to have a CI solution in place. Investigate use of Github Actions as a first call.

CI solution will need to:

  • Run all tests
  • Run bundle audit
  • Run zeitwerk checks

and report on failure if any of these fail.

Stretch target is to merge to master and deploy on success, but that's not super critical at this stage.

Access to all products database

Hi, thanks for this repo.
Can I get access to all products database, without an api, just to simply download it in a csv format for example.
Thanks

Upgrade to Rails 7

Rails 7 alpha 2 is out now. Edgy, but let's upgrade anyway. By the time Rails 7 hits release-candidate (or release) we'll be looking at building out our own formal release.

Update UI

Post the update to Rails 7, we've more options re serving up JS, CSS, etc. The sign-on screen can do with some work, as the general "below the fold" content.

Swap to css bundling w/ tailwind for simplicity.

Feature: Nutrition Facts

Present nutrition-related properties in a sensible collection, as per the original Datakick style:

Screen Shot 2020-03-25 at 9 14 40 pm

NB "Percent of daily values" text at the bottom is derived /assumed and not part of the stored properties values.

NB Default units are not present in the data (and have to be assumed) - e.g. 'Trans Fat' (or trans_fat) is measured in g, whereas Cholesterol is measured in mg. In neither case is the unit stored in the data, with the properties stored as { "trans_fat":0.0, "cholesterol":0 }.

NB Some values are grouped and tallied, e.g. Saturated Fat, Trans Fat, etc are grouped and tallied as 'Total Fats'. The total value is present in the stored properties (as fat) - see * Example JSON Data* below.

NB Total calories and Calories from Fat are reported separately. This data is present in the stored properties (as fat) - see * Example JSON Data* below.

Example JSON Data

{"gtin14":"00000000079983","brand_name":"Trader Joe's","name":"Pistachios - Dry Roasted & Salted","size":"16 oz (1 lb) 454 g","ingredients":"Pistachios, Salt\r\n\r\nFACILITY PROCESSES OTHER TREE NUTS","serving_size":"1/4 cup nuts without shells (30 g / about 1/2 cup with shells)","servings_per_container":"about 8","calories":180,"fat_calories":120,"fat":14.0,"saturated_fat":1.5,"trans_fat":0.0,"polyunsaturated_fat":2.5,"monounsaturated_fat":10.0,"cholesterol":0,"sodium":160,"carbohydrate":9,"fiber":3,"sugars":3,"protein":6,"images":[]}

Add Swagger / OpenAPI documentation

Partially addressed in #37, we need to document the API formally. The OpenAPI / Swagger model is an accepted set of processes and standards, so we'll use this as a starting point.

NB The rswag gem has already been added to the project, for the purposes of documenting and testing the API. We'll continue to expand use of this.

Acceptance Criteria

  • Documentation of API at the /documents endpoint
  • RSpec tests for the API

Feature: Display country of origin

Per original Datakick style, present country of origin - possibly with a flag symbol as well (see "US" mark):

Screen Shot 2020-03-25 at 9 28 45 pm

NB The country of origin can be derived from the GTIN itself - under some conditions at least.

Improve search

Currently search works on the whole word(s). Searching using "?query=gree" under Datakick will return everything with "gree" in the data (e.g. _Gree_n Tea, _Gree_k Yogurt). The same search on Brocode.io doesn't return anything, as it's looking for the literal word 'gree'.

API Consumption Helpful Information

Is this accurate? I kept looping through records to try understand the API behavior and data structure/types.

This information might be helpful for those consuming the API.

Header Pagination Info

Header - links - First Header

https://www.brocade.io/api/items?page=20&query=peanut+butter; rel="last", https://www.brocade.io/api/items?page=2&query=peanut+butter; rel="next"

Header - links - Following Headers include the previous and

https://www.brocade.io/api/items?page=1&query=peanut+butter; rel="first", https://www.brocade.io/api/items?page=1&query=peanut+butter; rel="prev",

Header - per-page & total

per-page - Results per page = 100
total - Total number records = 1912
total / per-page = Total Pages 19.12 rounded up = 20. Total is also found in Header - links last

Data Types

**Key					DataType**
ProductID				int
gtin14					string
brand_name				string
name					string
fat					string
size					string
fiber					string
sodium					string
sugars					string
protein					string
calories				string
potassium				string
cholesterol				string
ingredients				string
carbohydrate				string
fat_calories				string
serving_size				string
saturated_fat				string
trans_fat				string
monounsaturated_fat			string
polyunsaturated_fat			string
servings_per_container			string
pages					string
author					string
format					string
publisher				string
alcohol_by_volume			double
alcohol_by_weight			double
volume_fluid_ounce			double
volume_ml				double
weight_g				double
weight_ounce				double
unit_count				double

Example JS Class of Data Structure

class clsProducts
{
	constructor()
	{
		this.gtin14 = '';
		this.brand_name = '';
		this.name = '';
		this.fat = '';
		this.size = '';
		this.fiber = '';
		this.sodium = '';
		this.sugars = '';
		this.protein = '';
		this.calories = '';
		this.potassium = '';
		this.cholesterol = '';
		this.ingredients = '';
		this.carbohydrate = '';
		this.fat_calories = '';
		this.serving_size = '';
		this.saturated_fat = '';
		this.trans_fat = '';
		this.monounsaturated_fat = '';
		this.polyunsaturated_fat = '';
		this.servings_per_container = '';
		this.pages = '';
		this.author = '';
		this.format = '';
		this.publisher = '';
		this.alcohol_by_volume = null; //Double
		this.alcohol_by_weight = null; //Double
		this.volume_fluid_ounce = null; //Double
		this.volume_ml = null; //Double
		this.weight_g = null; //Double
		this.weight_ounce = null; //Double
		this.unit_count = null; //Double
	}
}

Example c# Class of Data Structure

public class Products
    {
        public string gtin14 { get; set; }
        public string brand_name { get; set; }
        public string name { get; set; }
        public string fat { get; set; }
        public string size { get; set; }
        public string fiber { get; set; }
        public string sodium { get; set; }
        public string sugars { get; set; }
        public string protein { get; set; }
        public string calories { get; set; }
        public string potassium { get; set; }
        public string cholesterol { get; set; }
        public string ingredients { get; set; }
        public string carbohydrate { get; set; }
        public string fat_calories { get; set; }
        public string serving_size { get; set; }
        public string saturated_fat { get; set; }
        public string trans_fat { get; set; }
        public string monounsaturated_fat { get; set; }
        public string polyunsaturated_fat { get; set; }
        public string servings_per_container { get; set; }
        public string pages { get; set; }
        public string author { get; set; }
        public string format { get; set; }
        public string publisher { get; set; }
        public double? alcohol_by_volume { get; set; }
        public double? alcohol_by_weight { get; set; }
        public double? volume_fluid_ounce { get; set; }
        public double? volume_ml { get; set; }
        public double? weight_g { get; set; }
        public double? weight_ounce { get; set; }
        public double? unit_count { get; set; }
    }

Add OmniAuth for API access

Having investigated various approaches to secure access to the API, I've opted to go with OAuth2. This may make integration more complex than a simple token-based solution, but I'm erring on the side of a more secure mechanism. There are numerous implementations of OAuth2 clients in the languages people seem to using to integrate.

I'll also be opting for Github as the first OAuth ID provider, as most users - at least initially - will also have Github accounts.

NOTE Related to #33

Use JSON-LD schemas from schema.org

The lack of any formal structure in the data has always bothered me, as we basically store everything in one blob of JSON data. In the general case this is fine, but when products/items have very definite properties (e.g. a Book has an author, food items have calories, etc) it gets harder to ensure that the data makes sense - or is consumable in a repeatable way.

Proposal

Brocade.io to start using JSON-LD as the base model for all data presented via the API, using schemas published by third-parties like https://schema.org.

For instance, if we adopt the "Product" type from schema.org, we can structure product information that is both human-readable and easily processed by applications:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Lite Italian Dry Salami",
  "gtin": "00073007107096",
  "countryOfAssembly": "USA"
  "brand": {
    "@type": "Brand",
    "name": "Columbus"
  },
  "material": "processed meat"
}

JSON-LD also enables us to add a graph for extended attributes, e.g. if nutritional information is available for a product we can use the NutritionInformation type to present these attributes in a structured manner:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "Lite Italian Dry Salami",
  "@graph": [
    {
      "@type": "NutritionInformation",
      "calories": "214 kcal",
      "servingSize": "28 g",
      ...
      }
    },
    ...
}

Other types from schema.org can be used as applicable (e.g. Book, Movie), etc. We can also make use of schemas published by other third parties - or our own custom types - as required and potentially "future proof" the underlying data model.

We can also use the type information in the frontend, using schema types to determine the best way to present data like nutritional information in a table-like format (see #11). We can also use to insert Microdata into the HTML to nest metadata suitable for search engines, web scrapers and the like to consume.

Benefits

  • Leverage existing data structures and tools
  • Consistent results from the API
  • Helps inform the presentation in the UI

Risks / Possible Problems

  • Harder to parse incoming data
  • Large amount of existing data that needs to be processed

We can mitigate the second problem by processing individual products on demand and progressively update data. The problem of parsing data requires a bit more thought and investigation, but it looks like a solvable problem.

Audit log for changes

Add support for logging who/what against changes in the database (ala Papertrail or similar)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.