Giter Site home page Giter Site logo

misclassified cases about bodegha HOT 8 OPEN

tommens avatar tommens commented on July 20, 2024
misclassified cases

from bodegha.

Comments (8)

mehdigolzadeh avatar mehdigolzadeh commented on July 20, 2024

I didn't understand the second option and in the third option, it can only improve the result for that specific user. But the first one seems to be feasible. Could you please elaborate on this idea a bit more?

from bodegha.

AlexandreDecan avatar AlexandreDecan commented on July 20, 2024

I think it's better to ask users (or to have a semi-automztic way to do it) to report those misclassified cases, so we can add them to the training set and release a new version of the tool with an improved model.

From a reusability point of view, it's better to improve the model rather than having a list of "edge cases" whose target class override the one of the model.

from bodegha.

tommens avatar tommens commented on July 20, 2024

We already mention in the README that users can report misclassified cases to us if they find any. To have a semi-automatic way, would it be feasible to add support in the tool itself to report misclassified cases to us? I do not immediately see how to.

from bodegha.

AlexandreDecan avatar AlexandreDecan commented on July 20, 2024

Since an API key has to be provided, we can add a subcommand to report about invalid cases (e.g. enter usernames that are misclassified in a given repository) and that automatically open an issue in this repository with them?

I'm not convinced we need something like this, since we can simply ask/expect/hope users to report misclassified cases "manually".

from bodegha.

tommens avatar tommens commented on July 20, 2024

It is probably too positive to think that people will report misclassified cases manually, just because it is mentioned in our readme
I think that any support that can help to automate the process will reduce the workload, both for the user that wants to report the misclassification, and for us to keep track of reported misclassifications. Therefore, if it is possible and not too difficult to implement such a reporting scheme as part of the tool, that will automatically open an issue on the bodega github repository that could be a nice solution.

from bodegha.

AlexandreDecan avatar AlexandreDecan commented on July 20, 2024

Any built-in possibility to report misclassified cases as Github issues will require a second execution of the tool (since it is not interactive, and it won't be given we want to keep it as a reusable CLI). Why a second execution is needed? Because we should be able to reproduce the example, so we need the exact set of comments that were considered by the model (or, at least, the exact set of features that were considered for that specific case).

One "easy" possibility to do so would be to add an extra "--report" flag, accepting a list of accounts that are misclassified, e.g., if the tool was run with bodega request/request --key <my token> --start-date 01-01-2017 --verbose (example taken from the readme), one could use bodega request/request --key <my token> --start-date 01-01-2017 --verbose --report greenkeeperio-bot hktalent for example to report automatically these two accounts as misclassified. This should create an issue in the bodega repository, with enough information for each account so that we can check and confirm the misclassification. I believe we only need the version of bodega that was used and, for each account (accounts do NOT have to be provided) a list of considered comments (that way, we can download them, compute the features, predict its class, and add the "opposite" class in the training set for this example, rebuild the model, and release a new version of bodega).

Btw, doing all of this manually could be very time-consuming for us, but if it's the case, we can still try to implement all these steps as part of a CI (e.g. let's dream of a bot we would develop, that downloads the comments, compute the features and prediction, and posts all of this in the corresponding issue, so that one of us can "confirm" the misclassified case by putting a "confirmed" label on the issue, and then the CI rebuilds the model and pushes it on the repository, with an incremented version of bodega and a tag for the new release). But honestly, given the work all of this represents, I think it's too much for a "research tool" ;)

from bodegha.

AlexandreDecan avatar AlexandreDecan commented on July 20, 2024

Notice we can ask a student to do this (e.g., as a M1 project).

from bodegha.

tommens avatar tommens commented on July 20, 2024

Yes, looks like an interesting master student project to pursue. Let' try that. If you want, you can close this issue for now (or leave it open until we have a worling implementation, but this can take quite a while).

from bodegha.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.