Giter Site home page Giter Site logo

Comments (8)

andylolz avatar andylolz commented on June 4, 2024

I wrote a script to check this. It looks like a few public bodies are missing, and a few sort of match…

What’s the best way to proceed? Update the GB csv? How was the original csv generated?

from publicbodies.

andylolz avatar andylolz commented on June 4, 2024

Ah, sorry! The data clearly comes from What Do They Know. It would still be good to see the scraper/converter code, so this could perhaps be altered or added to.

In general, I love the idea of putting data into version control. I don’t know if it’s common practice to store it alongside the tool used to collect it, but that seems like a prudent thing to do.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 4, 2024

@andylolz to be clear the original source data came from What Do They Know but we aren't confined to that and there are public bodies we would want to list that aren't in What Do They Know (because they may not be FOIable).

So I'd definitely suggest merging your changes to GB csv (adding the script to scripts and a note in the README).

Regarding pulling more regularly from WDTK that's probably new ticket!

from publicbodies.

rufuspollock avatar rufuspollock commented on June 4, 2024

@davidread - would you be interested in contributing here?

from publicbodies.

davidread avatar davidread commented on June 4, 2024

This little script is good. Humorous to see results like this:

(0.9215686274509803,
u'National Institute for Health and Care Excellence',
u'National Institute for Health and Clinical Excellence'),

However I think it would be better to use Nomenklatura to do the matching, rather than a one-off Levenshtein and then forget the manual decisions made. I'll take a look if I get a chance.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 4, 2024

@davidread agreed re nomenklatura - good connection here with #2 (reconciliation support via nomenklatura ...)

from publicbodies.

davidread avatar davidread commented on June 4, 2024

Just noting another list of UK public bodies to reconcile with and track: http://data.gov.uk/dataset/iati-organisation-identifier-for-uk-government-bodies

from publicbodies.

bill-anderson avatar bill-anderson commented on June 4, 2024

@davidread @rgrp this list (created by DFID for IATI reporting) is symptomatic of the problem of having no globally consistent methodology for identifying public bodies. This is an issue that the currently-being-born Joined-up Data Alliance (https://docs.google.com/document/d/1ZcBkxKaY9x31t4LH76yJ7dFMA0uyqyJ9Q-tk3FIE7UE/edit) will be tackling. It would be good to link up the pragmatic approach of public-bodies with the standards approach of JDA.

from publicbodies.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.