openbrewerydb / openbrewerydb Goto Github PK

View Code? Open in Web Editor NEW

177.0 10.0 89.0 26.14 MB

🍻 An open-source dataset of breweries, cideries, brewpubs, and bottleshops.

Home Page: https://www.openbrewerydb.org

License: MIT License

Jupyter Notebook 65.71% TypeScript 34.29%

breweries dataset hacktoberfest csv json sql typescript

openbrewerydb's Issues

Add Breweries with missing/not enough data

Is your feature request related to a problem? Please describe.
There may be some breweries where there isn't enough available data (like planning or nano types for example) and currently, these aren't being included. However, that does not make them invalid or breweries that should not be known (my opinion). Additionally, it may be helpful to see a list of breweries that need information, so in the future, when more information about a particular brewery becomes available, it can be updated.

Describe the solution you'd like
It would be great if there was a section where breweries with missing or not enough data can be placed for future use.

Describe alternatives you've considered
At the moment, I've got my own helper repository for this to keep track of the ones that are missing breweries_with_missing_data.json. It would be great to keep this information within the main repo though.

International translations

We want to handle international translations in the dataset because not everything is English.

From the Discord thread:

Resten — Today at 2:30 AM
this is my personal opinion.
How about separating English and foreign names by column?
Currently, the method added in Korean is 맥파이(Magpie), so it would be better to manage it separately for future use.

@chrisjm — Today at 8:20 PM
@resten Thank you for the suggestion! This is a great idea and I'm still mulling it over. Perhaps a better solution is to have another linking translations table. I also think the names (and any other field) in the DB is reflective of the native language depending on the country. So in the Korean case, the English names should go in the translation table.

Github Action: CSV Schema

Validate CSV file

ArcGIS REST Service

Overview

This could be either an implementation of ArcGIS REST Service (does this cost anything?) or just an endpoint to return the properly formated data.

Per Paul Doherty on Twitter:

"Ideally an ArcGIS REST service, but anything with consistent headers and latitude/longitude (decimal degrees) would work (geojson, .csv). Obviously, with attribution and links to whatever can help your amazing project succeed!"

https://twitter.com/pjdohertygis/status/1374182569936232456

Autocomplete endpoint does not match documentation

Describe the bug
The /v1/breweries/autocomplete endpoint specifies:

The maximum number of breweries returned is 15.

This appears not to be the case.

To Reproduce
Steps to reproduce the behavior:

Issue a get request to https://api.openbrewerydb.org/breweries/autocomplete?query=brew
Observe the >5000 entries generated

Expected behavior
A maximum of 15 entries are generated

Screenshots

System

Browser: Anything that can send an HTTP GET request

Add Brewery Data Change Manager

Continuing the conversation from #12.

Ideally, this would be a UI that opens up the main dataset file, allows you to directly edit/add/delete entries, saves and submits this as a Pull Request to the dataset repository.

Questions we'd like answered

Where should this UI live?
Use 3rd party tool to manage VCS or build our own?

BREAKING CHANGES: Update database schema

Tasks

Rename obdb_id to slug
Update id to use UUID

Notes

This should be done after the versioning is migrated (it might already have)
The API will also need to be updated

Export: CSV

Export /data to /breweries.csv

Dependencies

I chose Papaparse because of the small footprint (249kB) compared to csvtojson (8.69MB).

Fix how international phone numbers are handled

Describe the bug
Currently, international phone numbers do not seem to be handled properly. If they are too long, they are treated like a long scientific number which ends up looking like this: "phone":"3.53599E+11"

To Reproduce
Steps to reproduce the behavior:

Go to https://github.com/openbrewerydb/openbrewerydb/blob/master/data/ireland/laois.csv
See the phone number on the first line

Expected behavior
We would expect phone numbers to be handled appropriately. This value should be +353599107299

OS is irrelevant as this is an issue with the data itself.

Context
For Ireland, the format is country code + region number/mobile prefix + 7 digit number. Example: +353 21 123 4567. The region number is similar to the US areacode. It can be 3 digits, with a 0 prefix if the country code is not included (local dialing).

This also not limited to Ireland. For example, Germany has phone numbers with a max of 11 digits (not including country code)

Github Action: Validate ESLint

npm run lint

CLI tool to create obdb_id

The obdb_id is essentially a kebab case combination of the brewery name and the city. This should handle most of the cases but let's see.

Notes

Might make use of the notebook
Perhaps there needs to be more additions to the ID for it to be unique?

Import: CSV

Import /breweries.csv to /data

Discussion: Scraping for brewery data

While we want the community to help to keep brewery data up to date, it might be easier / more efficient to set up some scraping scripts. There are several things we want to take into consideration when doing this:

What to scrape? (brewery guilds? brewery associations? google places? yelp?)
Ethical scraping (i.e., getting permission, not overloading the server, using APIs when available, etc.)
How to automate this process? (Consul? Airflow? other?)

Github Action > Update Open Brewery DB API

Add Cloudflare Cache Cleaner Github Action

https://github.com/marketplace/actions/cloudflare-cache-cleaner

Add gzipped versions of datasets

Question: How to handle closed breweries?

Overview

I realized I'm not accounting for closed breweries. I think I'd like to keep them in the dataset for historical and analytical reasons, but I'm curious how best to do this.

Options

Add a "closed" brewery type
Add a new boolean field/column for "Active" or something similar

Other options?

🇬🇧 UK Schema

Is your feature request related to a problem? Please describe.
It would be great to have UK Breweries on the DB too!

Directory Structure
In the UK we have all the individual countries then individual county areas, so what would be the best directory tree for this?
Would something like:

openbrewerydb/data/uk/scotland/west-dunbartonshire.csv
openbrewerydb/data/uk/scotland/argyll-bute.csv

openbrewerydb/data/uk/west-dunbartonshire/west-dunbartonshire.csv   
openbrewerydb/data/uk/argyll-bute/argyll-bute.csv

openbrewerydb/data/uk/scotland/west-dunbartonshire/west-dunbartonshire.csv   
openbrewerydb/data/uk/scotland/argyll-bute/argyll-bute.csv

be best?

CSV Structure
Additionally, what would be the best layout for the CSV be? Maybe:
id,name,brewery_type,street,city,**county**,postal_code,website_url,phone,created_at,updated_at,country,longitude,latitude,tags

Info
Unfortunately, SIBA don't provide a great data source, but I'm hoping to start the pull from there, then try and automate some further parts..

OpenStreetMap database ?

Hi,
in openstreetmap we can find around 8000 craft=brewery (https://taginfo.openstreetmap.org/tags/craft=brewery#map)

but you can't import OSM data in your database in regard of MIT licence (not compatible with OpenDataBaseLicence).

Do you consider to change your licence ?

Describe the bug
There seems to be some bad characters in the dataset. For example, Anheuser-Busch Inc establishments are all across the US. I think the dataset meant to have the © copyright character, but bad decoding changed it to â� or �.

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots