Comments (8)
http://www.sciencedirect.com/science/article/pii/S0164121216300528
from pits_lda.
amrit... is the paper all done? like do that before moving on
t
from pits_lda.
I am on it prof
from pits_lda.
@timm Here is the result of using LDA to automatically label the documents and then use a learner.
From the paper, we cant reproduce results, due to :
- Mylyn Project, Eclipse Project, FIrefox project, Netbeans. The preprocessed datasets are not available neither the exact preprocessing steps given. they followed some naming conventions which they havent described.
Experiment:
- Took this as an example. http://dl.acm.org/citation.cfm?id=2390074
- After doing LDA, they labeled each document to the top weighted topic.
- Each document will have a label 1,2,3...
- Selected a target label (yes) and rest will be chosen as no. Converted into binary classification.
- 5 by 5 cross val. Hashing trick with 10k features. SVM Classifier
Conclusion
- Baseline SVM didnt perform well, this might be because of the tags which we used to label the Stackexchange websites. This can affect all our previous results which we showed to LN. Basically the numbers will change. Conclusions might remain same or not.
- LDA is able to correctly label the documents.
Results:
from pits_lda.
am now lost in the details.
please bust fscore into precision and recall
this looks like no win with tuning... right?
please write this up as a 2-4 page pdf doc. define all your terms. dont worry about the start up sections (motivation, background)
but what is your justification for "baseline"? what papers use "baseline"?
t
from pits_lda.
Yes no win with tuning, but our result numbers shown to LN might change. Conclusion might remain same or not.
My baseline results is from our BIGDSE paper, where we just used hashing trick with svm as baseline.
I will compile all these terms and my thoughts into a white paper soon.
from pits_lda.
fyi- you may need to tune (1) the feature extraction (of the topics) AND (2) the learner to get improved performance.
right now ur just tuning (1) right?
without doing (2), what you could do is show conclusion instability (a venn diagram of documents classified XYZ via untuned feature extraction repeated 10 times on 10 different data orderings.
with (2) you might get the kinds of improvements wei reported
from pits_lda.
- I did (1)tuning and then tried labeling the documents with topics X,Y,Z. On the other hand, the original dataset (stackexchange websites or so called manny dataset generator) were labeled with tags. Once I labeled the document using LDA, (2) the feature extraction used is the feature hasher (hashing trick) and then a learner.
- what my conclusion is with tuning or without tuning, both performed better than the baseline results. So this has to do with the dataset (wrong data) which we used during LN times.
- On your suggestion, I will try (1) feature extraction of topics and (2) then a learner.
- I didnt understand you about the venn diagram. From tuned results, I will have documents classified as X1, Y1, Z1...and from untuned results I will have documents classified as X2, Y2, Z2. What do you mean now?
from pits_lda.
Related Issues (20)
- DE results on pitsA
- Review - 04/27/2016 HOT 1
- Results HOT 2
- Updated ToDos
- Citemap Results HOT 5
- Meeting - 06/02
- Meeting - 06/08 HOT 2
- Meeting - 06/16 HOT 2
- Results 06-23
- Parameter alpha and beta
- F CR Pop Graph
- Spark Results
- VEM vs Gibbs
- terms overlap HOT 2
- LDA topics as feature selector HOT 1
- Mail with Prof. Mika Mäntylä
- Randomness
- Credibility Of LDA HOT 4
- Weekly Report - 10/11/2016 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pits_lda.