In those rare cases where bodega misclassifies a human as bot, or a bot as human, it w

misclassified cases about bodegha HOT 8 OPEN

tommens commented on July 20, 2024

misclassified cases

from bodegha.

Comments (8)

mehdigolzadeh commented on July 20, 2024

I didn't understand the second option and in the third option, it can only improve the result for that specific user. But the first one seems to be feasible. Could you please elaborate on this idea a bit more?

from bodegha.

AlexandreDecan commented on July 20, 2024

I think it's better to ask users (or to have a semi-automztic way to do it) to report those misclassified cases, so we can add them to the training set and release a new version of the tool with an improved model.

From a reusability point of view, it's better to improve the model rather than having a list of "edge cases" whose target class override the one of the model.

from bodegha.

tommens commented on July 20, 2024

We already mention in the README that users can report misclassified cases to us if they find any. To have a semi-automatic way, would it be feasible to add support in the tool itself to report misclassified cases to us? I do not immediately see how to.

from bodegha.

AlexandreDecan commented on July 20, 2024

Since an API key has to be provided, we can add a subcommand to report about invalid cases (e.g. enter usernames that are misclassified in a given repository) and that automatically open an issue in this repository with them?

I'm not convinced we need something like this, since we can simply ask/expect/hope users to report misclassified cases "manually".

from bodegha.

tommens commented on July 20, 2024

It is probably too positive to think that people will report misclassified cases manually, just because it is mentioned in our readme
I think that any support that can help to automate the process will reduce the workload, both for the user that wants to report the misclassification, and for us to keep track of reported misclassifications. Therefore, if it is possible and not too difficult to implement such a reporting scheme as part of the tool, that will automatically open an issue on the bodega github repository that could be a nice solution.

from bodegha.

AlexandreDecan commented on July 20, 2024

Any built-in possibility to report misclassified cases as Github issues will require a second execution of the tool (since it is not interactive, and it won't be given we want to keep it as a reusable CLI). Why a second execution is needed? Because we should be able to reproduce the example, so we need the exact set of comments that were considered by the model (or, at least, the exact set of features that were considered for that specific case).

One "easy" possibility to do so would be to add an extra "--report" flag, accepting a list of accounts that are misclassified, e.g., if the tool was run with bodega request/request --key <my token> --start-date 01-01-2017 --verbose (example taken from the readme), one could use bodega request/request --key <my token> --start-date 01-01-2017 --verbose --report greenkeeperio-bot hktalent for example to report automatically these two accounts as misclassified. This should create an issue in the bodega repository, with enough information for each account so that we can check and confirm the misclassification. I believe we only need the version of bodega that was used and, for each account (accounts do NOT have to be provided) a list of considered comments (that way, we can download them, compute the features, predict its class, and add the "opposite" class in the training set for this example, rebuild the model, and release a new version of bodega).

Btw, doing all of this manually could be very time-consuming for us, but if it's the case, we can still try to implement all these steps as part of a CI (e.g. let's dream of a bot we would develop, that downloads the comments, compute the features and prediction, and posts all of this in the corresponding issue, so that one of us can "confirm" the misclassified case by putting a "confirmed" label on the issue, and then the CI rebuilds the model and pushes it on the repository, with an incremented version of bodega and a tag for the new release). But honestly, given the work all of this represents, I think it's too much for a "research tool" ;)

from bodegha.

AlexandreDecan commented on July 20, 2024

Notice we can ask a student to do this (e.g., as a M1 project).

from bodegha.

tommens commented on July 20, 2024

Yes, looks like an interesting master student project to pursue. Let' try that. If you want, you can close this issue for now (or leave it open until we have a worling implementation, but this can take quite a while).

from bodegha.

misclassified cases about bodegha HOT 8 OPEN

Comments (8)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent