Experiment Setup Datasets - Manney Generator of Stack Exchan

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I did (1)tuning and then tried labeling the documents with topics X,Y,Z. On the

Classification using LDA about pits_lda HOT 8 OPEN

amritbhanu commented on July 2, 2024

Classification using LDA

from pits_lda.

Comments (8)

amritbhanu commented on July 2, 2024

http://www.sciencedirect.com/science/article/pii/S0164121216300528

from pits_lda.

timm commented on July 2, 2024

amrit... is the paper all done? like do that before moving on

from pits_lda.

amritbhanu commented on July 2, 2024

I am on it prof

from pits_lda.

amritbhanu commented on July 2, 2024

@timm Here is the result of using LDA to automatically label the documents and then use a learner.

From the paper, we cant reproduce results, due to :

Mylyn Project, Eclipse Project, FIrefox project, Netbeans. The preprocessed datasets are not available neither the exact preprocessing steps given. they followed some naming conventions which they havent described.

Experiment:

Took this as an example. http://dl.acm.org/citation.cfm?id=2390074
After doing LDA, they labeled each document to the top weighted topic.
Each document will have a label 1,2,3...
Selected a target label (yes) and rest will be chosen as no. Converted into binary classification.
5 by 5 cross val. Hashing trick with 10k features. SVM Classifier

Conclusion

Baseline SVM didnt perform well, this might be because of the tags which we used to label the Stackexchange websites. This can affect all our previous results which we showed to LN. Basically the numbers will change. Conclusions might remain same or not.
LDA is able to correctly label the documents.

Results:

from pits_lda.

timm commented on July 2, 2024

am now lost in the details.

please bust fscore into precision and recall

this looks like no win with tuning... right?

please write this up as a 2-4 page pdf doc. define all your terms. dont worry about the start up sections (motivation, background)

but what is your justification for "baseline"? what papers use "baseline"?

from pits_lda.

amritbhanu commented on July 2, 2024

Yes no win with tuning, but our result numbers shown to LN might change. Conclusion might remain same or not.

My baseline results is from our BIGDSE paper, where we just used hashing trick with svm as baseline.

I will compile all these terms and my thoughts into a white paper soon.

from pits_lda.

timm commented on July 2, 2024

fyi- you may need to tune (1) the feature extraction (of the topics) AND (2) the learner to get improved performance.

right now ur just tuning (1) right?

without doing (2), what you could do is show conclusion instability (a venn diagram of documents classified XYZ via untuned feature extraction repeated 10 times on 10 different data orderings.

with (2) you might get the kinds of improvements wei reported

from pits_lda.

amritbhanu commented on July 2, 2024

I did (1)tuning and then tried labeling the documents with topics X,Y,Z. On the other hand, the original dataset (stackexchange websites or so called manny dataset generator) were labeled with tags. Once I labeled the document using LDA, (2) the feature extraction used is the feature hasher (hashing trick) and then a learner.
- what my conclusion is with tuning or without tuning, both performed better than the baseline results. So this has to do with the dataset (wrong data) which we used during LN times.
On your suggestion, I will try (1) feature extraction of topics and (2) then a learner.
I didnt understand you about the venn diagram. From tuned results, I will have documents classified as X1, Y1, Z1...and from untuned results I will have documents classified as X2, Y2, Z2. What do you mean now?

from pits_lda.

Classification using LDA about pits_lda HOT 8 OPEN

Comments (8)

Experiment:

Conclusion

Results:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent