Dear BoDeGHa team, thanks for this interesting tool to detect bots on GitHub, your app

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Several bugs: Parameters not used, misclassifications, and other errors... about bodegha HOT 6 CLOSED

bockthom commented on July 20, 2024

Several bugs: Parameters not used, misclassifications, and other errors...

from bodegha.

Comments (6)

mehdigolzadeh commented on July 20, 2024

Dear Thomas,
Thanks for considering our bot identification tool. I will have a look at the reported bugs as soon as possible and provide you with updates.

from bodegha.

tommens commented on July 20, 2024

@mehdigolzadeh, to start our analysis of the issues raised in this interesting and detailed issue report, could you share internally with the team (not through GitHub obviously for the usual privacy considerations that were also raised by Thomas) the output of running the latest version of BoDeGHa on the owncloud/core project? This will allow us to start analysing the issues raised. I suggest we arrange a group meeting somewhere next week so that we can look at all of this in detail together. Let's take the rest of the discussion offline in order not to clutter this GitHub discussion, until we come up with clarifications.

from bodegha.

mehdigolzadeh commented on July 20, 2024

Dear @bockthom,

We would like to thank you again for considering our tool to use in your project and reporting these issues. We executed BoDeGHa with GitHub project you mentioned (i.e., owncloud/core) and investigated the issues you raised. We provide these answers based on the reported issues:

(1) Regarding the misclassifications: Before starting to answer, it is expected that the tool cannot classify everything correctly, as the tool is based on a classification model that is not 100% accurate, for more details about its accuracy on the basis of a ground-truth dataset we refer to the accompanying research publication mentioned in the README file. On some projects, there may be more classifications than on others, it really depends on the types of accounts and comments (details about reasons for misclassifications can again be found in the accompanying research article).
After running the tool on owncloud/core we identified 18 bots, of which 9 cases indeed corresponded to misclassified humans. We noticed that the main reasons for these misclassifications are a combination of a low number of comments and the use of comment templates. We don't see any short-term solution for this issue; at the longer term, it would require developing a classification model that is template-aware but this is less trivial than it seems. We are currently thinking about future solutions to reduce such types of misclassifications.

(2) Regarding the maxComments parameter: BoDeGHa uses a fixed cutoff of 100 comments since the underlying classification model has been evaluated based on a ground-truth dataset that was manually rated for not more than 100 comments per account. Since we did not study the model's performance on more than 100 comments we preferred not to allow this in the tool. In a similar way we have set a minimum threshold of 10 comments per account, since the classification model started to show good performance from 10 comments onwards. (For details, we refer again to the accompanying paper.)
We understand that, from the user perspective, this may sound a bit confusing, and we realise that the README file of the BoDeGHa repository was not very specific on this. To address this issue, we will clarify the limitations imposed by the model in the README file.
At the same time, we will upload a new version of the tool that allows to consider more than 100 comments in accounts, but in that case, the user of the tool needs to be aware that there is no guarantee on the performance or accurracy, given that the underlying classification model has never been evaluated on more than 100 comments. (Intuitively, we believe that the tool should continue to work fine for more than 100 comments, but the execution time may take significantly longer as the number of comments to be considered will be increases. It will be up to the user to decide whether this is acceptable.)

(3) The third issue raised related to restrictions imposed by GitHub's GraphQL API, which does not allow to do more than 100 requests at a time. This is not a limitation of our tool itself, but a limitation of that API. There is nothing we can do about that, so this is the reason why that value 100 was hard-coded. (Note that this value of 100 is different from the value 100 in point (2) above, which is probably one of the sources of your confusion.)

We hope that we have clarified the issues. Stay tuned for an update of BoDeGHa coming up soon, especially to address point (2) above. We hope that our tool will be able to play a beneficial role in your projects, and we always welcome any feedback or specific scenarios of use about why an how BoDeGHa is being used in practice. This may allow us to further adapt the tool (or its underlying classification model) to user needs.

from bodegha.

bockthom commented on July 20, 2024

Dear @mehdigolzadeh, thanks for your fast replies and investigations. Let me just add a few comments:

(1) I am aware that the tool is not 100% accurate. However, as in the investigated project half of the detected bots are not bots, the tool's result (without any manual investigations afterwards) doesn't look reliable, which was the reason for trying other configuration parameters, just to see if the output is closer to my expectations then. No offense, I just wanted to tweak the results self-paced. 😉 I have already read your paper before (to be precise, I found the tool through reading the paper which I stumbled over during literature research about bot detection), so I am aware of potential misclassifications and the potential reasons for that. Nevertheless, I think there is room for improvement and considering comment templates (or just ignoring comments which make use of such a template) could be very beneficial for your classification model. Just take my comments on that as a motivation for future work 😉
As there are that much misclassifications based on comment templates at the moment, as the one example showed, it seems like that one needs to use a semi-manual approach when using your results, i.e., manually checking for each classified bot whether it really is a bot and, if not, manually removing it from the list of bots.

(2) I know that you evaluated the performance just on 100 comments (at least, from what you have stated in the paper), but I would just give it a try to use more comments 😄 If this reduces the amount of misclassified humans, I would be happy with it. Even if you don't evaluate your tool for more than 100 comments and state your limitations in the README, I would be glad if you could provide the possibility to use more than 100 comments (on one's own risk)––just to be able to try this, without any guarantees. So, I am looking forward to that. And yes, the execution time will increase, but I am aware of that and that is no bother for me.

(3) I already assumed that this could be an API restriction, but, as you already mentioned, I am/was confused that the hard-coded value for 100 is different from the 100 in point (2). So, I hope that it is, though, possible to use more than 100 comments in some way.

Thanks a lot for clarifying the issues. I am looking forward to your updates regarding point (2) soon. (And as I think that your tool could be very beneficial, I also hope that you can improve your approach regarding comment templates on the long run.)

from bodegha.

tommens commented on July 20, 2024

Thanks @bockthom for your response. Concerning the fact of needing to taking into account comment templates, one of the problems we have encountered, and as a consequence one of the reasons why we have not integrated this, is the fact that it is very difficult to know if a project is using templates, and if the project is, what are the templates being used. Moreover, if templates are being used, there is no historical record of this. As a result, it becomes difficult to take this into account during comment analysis. There can be many different project-specific ways in which templates are being used: using the template-mechanism from GitHub; or using some external service or tool; the use of templates is not standardised. I am not sure what is the best way to deal with this problem. A semi-manual approach could be an option, but it would be very effort-intensive. Allowing to provide comment templates as input to the tool could be another option, but it is difficult for us to evaluate whether this might lead to better results for our classification model. In fact, we would need a kind of ground-truth dataset of templates being used by GitHub projects for their issue and PR comments. Based on such a dataset we could try to see what can be done. How to come to such a dataset (that should be big enough) is another thing.

from bodegha.

bockthom commented on July 20, 2024

Thanks for your insights, I see that taking into account comment templates is way more complex than I had expected. I just thought about the templates stored in a ".github/" directory in the project––I was not aware of other templates and non-standardized templates, and I also disregarded template changes during a project's evolution. I completely agree with you that all of this has to be taken care of properly, which is not easy to achieve and needs lots of further investigations. Maybe it would be easier to completely ignore the first message of a pull-request or issue, as templates usually affect the initial comment (however, I am not sure if this is actually true). Nevertheless, if I remember correctly, this would be in contrast to your performance evaluation regarding the consideration of empty comments, which would be neglected when ignoring the initial comment. Anyway, I just provided you with my thoughts on that, without having deeper knowledge on those templates. Thanks for the detailed responses.

from bodegha.

Several bugs: Parameters not used, misclassifications, and other errors... about bodegha HOT 6 CLOSED

Comments (6)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent