Giter Site home page Giter Site logo

Classification using LDA about pits_lda HOT 8 OPEN

amritbhanu avatar amritbhanu commented on July 2, 2024
Classification using LDA

from pits_lda.

Comments (8)

amritbhanu avatar amritbhanu commented on July 2, 2024

http://www.sciencedirect.com/science/article/pii/S0164121216300528

from pits_lda.

timm avatar timm commented on July 2, 2024

amrit... is the paper all done? like do that before moving on

t

from pits_lda.

amritbhanu avatar amritbhanu commented on July 2, 2024

I am on it prof

from pits_lda.

amritbhanu avatar amritbhanu commented on July 2, 2024

@timm Here is the result of using LDA to automatically label the documents and then use a learner.

From the paper, we cant reproduce results, due to :

  • Mylyn Project, Eclipse Project, FIrefox project, Netbeans. The preprocessed datasets are not available neither the exact preprocessing steps given. they followed some naming conventions which they havent described.

Experiment:

  • Took this as an example. http://dl.acm.org/citation.cfm?id=2390074
  • After doing LDA, they labeled each document to the top weighted topic.
  • Each document will have a label 1,2,3...
  • Selected a target label (yes) and rest will be chosen as no. Converted into binary classification.
  • 5 by 5 cross val. Hashing trick with 10k features. SVM Classifier

Conclusion

  • Baseline SVM didnt perform well, this might be because of the tags which we used to label the Stackexchange websites. This can affect all our previous results which we showed to LN. Basically the numbers will change. Conclusions might remain same or not.
  • LDA is able to correctly label the documents.

Results:

file

from pits_lda.

timm avatar timm commented on July 2, 2024

am now lost in the details.

please bust fscore into precision and recall

this looks like no win with tuning... right?

please write this up as a 2-4 page pdf doc. define all your terms. dont worry about the start up sections (motivation, background)

but what is your justification for "baseline"? what papers use "baseline"?

t

from pits_lda.

amritbhanu avatar amritbhanu commented on July 2, 2024

Yes no win with tuning, but our result numbers shown to LN might change. Conclusion might remain same or not.

My baseline results is from our BIGDSE paper, where we just used hashing trick with svm as baseline.

I will compile all these terms and my thoughts into a white paper soon.

from pits_lda.

timm avatar timm commented on July 2, 2024

fyi- you may need to tune (1) the feature extraction (of the topics) AND (2) the learner to get improved performance.

right now ur just tuning (1) right?

without doing (2), what you could do is show conclusion instability (a venn diagram of documents classified XYZ via untuned feature extraction repeated 10 times on 10 different data orderings.

with (2) you might get the kinds of improvements wei reported

from pits_lda.

amritbhanu avatar amritbhanu commented on July 2, 2024
  • I did (1)tuning and then tried labeling the documents with topics X,Y,Z. On the other hand, the original dataset (stackexchange websites or so called manny dataset generator) were labeled with tags. Once I labeled the document using LDA, (2) the feature extraction used is the feature hasher (hashing trick) and then a learner.
    • what my conclusion is with tuning or without tuning, both performed better than the baseline results. So this has to do with the dataset (wrong data) which we used during LN times.
  • On your suggestion, I will try (1) feature extraction of topics and (2) then a learner.
  • I didnt understand you about the venn diagram. From tuned results, I will have documents classified as X1, Y1, Z1...and from untuned results I will have documents classified as X2, Y2, Z2. What do you mean now?

from pits_lda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.