Comments (4)
https://github.com/natalymr/gcm/blob/master/naive_bayes/naive_bayes.ipynb
all tokens without separators
train data | test data | classification acc | bleu |
---|---|---|---|
intellij | intellij | 0.97 | 0.03794679 |
intellij | aurora | 0.91 | 0.01468671 |
aurora | aurora | 0.95 | 0.01288976 |
intellij + aurora | aurora | 0.95 | 0.01288976 |
only identifiers
train data | test data | classification acc | bleu |
---|---|---|---|
intellij | intellij | 0.98 | 0.0381872 |
intellij | aurora | 0.94 | 0.0145752 |
aurora | aurora | 0.96 | 0.01311591 |
intellij + aurora | aurora | 0.96 | 0.01311591 |
from gcm.
Количество данных в каждом из датасетов:
intellij
- все токены кроме сепараторов
aurora
- все токены кроме сепараторов ([, ], (, ), {, }, ;, ,,)
intellij
- только идентификаторы
aurora
- только идентификаторы
from gcm.
- "smth was changed" does not seem to be the best comment w.r.t. BLEU score. Maybe we should leave the message empty or construct a phrase that would be similar (in terms of BLEU) to as many commits as possible. Same idea applies to other labels.
- Naive Bayes was supporsed to be a simple baseline, but setting
min_df
andmax_df
values to non-default values is still a good idea. - What are bleu score values in brackets?
from gcm.
- да, поняла, переделаю
- я, если честно, не поняла, про какие именно значения идет речь
- если мы при подсчете bleu score-a учитываем только юниграммы
from gcm.
Related Issues (20)
- [code2seq] add rnn in training pipeline HOT 1
- [merge messages] HOT 4
- [code2seq] new dataset HOT 5
- [code2seq] commits with 1, 2, 3 etc changed functions HOT 2
- [code2seq] repeat "perfect storm" HOT 3
- [scores] оценить качество данных - ранжирующая функция HOT 6
- [NMT] do not forget about this article HOT 6
- [scores] хватит смотреть в один bleu score, надо что-нибудь визуализировать, чтобы лучше понимать, что происходит HOT 2
- [dataset] гипотезы HOT 2
- [seim] выступление HOT 2
- [diploma] Текст диплома HOT 2
- [nmt-2.0] HOT 8
- [code2seq] 2 inputs HOT 2
- [dataset] анализ сообщений HOT 3
- [baseline] naive HOT 1
- [articles] metrics table HOT 3
- [baseline] naive message generation based on gumtree diff HOT 3
- [baseline] code2seq HOT 1
- [code2seq] train on method diff until reasonable BLEU @natalymr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcm.