Comments (8)
I wrote a script to check this. It looks like a few public bodies are missing, and a few sort of match…
What’s the best way to proceed? Update the GB csv? How was the original csv generated?
from publicbodies.
Ah, sorry! The data clearly comes from What Do They Know. It would still be good to see the scraper/converter code, so this could perhaps be altered or added to.
In general, I love the idea of putting data into version control. I don’t know if it’s common practice to store it alongside the tool used to collect it, but that seems like a prudent thing to do.
from publicbodies.
@andylolz to be clear the original source data came from What Do They Know but we aren't confined to that and there are public bodies we would want to list that aren't in What Do They Know (because they may not be FOIable).
So I'd definitely suggest merging your changes to GB csv (adding the script to scripts and a note in the README).
Regarding pulling more regularly from WDTK that's probably new ticket!
from publicbodies.
@davidread - would you be interested in contributing here?
from publicbodies.
This little script is good. Humorous to see results like this:
(0.9215686274509803,
u'National Institute for Health and Care Excellence',
u'National Institute for Health and Clinical Excellence'),
However I think it would be better to use Nomenklatura to do the matching, rather than a one-off Levenshtein and then forget the manual decisions made. I'll take a look if I get a chance.
from publicbodies.
@davidread agreed re nomenklatura - good connection here with #2 (reconciliation support via nomenklatura ...)
from publicbodies.
Just noting another list of UK public bodies to reconcile with and track: http://data.gov.uk/dataset/iati-organisation-identifier-for-uk-government-bodies
from publicbodies.
@davidread @rgrp this list (created by DFID for IATI reporting) is symptomatic of the problem of having no globally consistent methodology for identifying public bodies. This is an issue that the currently-being-born Joined-up Data Alliance (https://docs.google.com/document/d/1ZcBkxKaY9x31t4LH76yJ7dFMA0uyqyJ9Q-tk3FIE7UE/edit) will be tackling. It would be good to link up the pragmatic approach of public-bodies with the standards approach of JDA.
from publicbodies.
Related Issues (20)
- Nepal: add import scripts and schedule HOT 1
- Bot trying to update 2 data sources simultaneously creates conflict
- Fix infinite redirect and get site back online HOT 9
- Keep running update process even if step fails
- Convert website to Jekyll + Github Pages + Github Actions
- Implement retry in data import scripts
- Automatic parallel updates conflict with each other
- Rename `master` branch to `main` HOT 1
- Greece: values in `id` field are not sluggable
- Github Pages build takes too long and times out
- Data update scripts are still using the master branch HOT 1
- CSV download button still points to `master` branch
- Github Pages default Jekyll deploy does not render some pages properly
- Commit, push & rebase GH Action not working on `main` branch
- Site gives 404 error HOT 3
- Replace broken URL for dados.gov.br CKAN API HOT 2
- Add Switzerland to the list
- `import_br.py` works locally, but fails in Github Actions HOT 2
- Upgrade Frictionless
- Upgrade deprecated Github Action scripts
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from publicbodies.