Giter Site home page Giter Site logo

dynamic-nmf's People

Contributors

derekgreene avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dynamic-nmf's Issues

document contribution to window topic

Hi:
In the paper, it says: "3) A ranking of every MEPs contributions relative to all window and dynamic topics in the corpus"

Is there a function that out put this result? Or we need to calculate it by self.

results of track-dynamic-topics.py

In the results of track-dynamic-topics.py, as shown in - Dynamic Topic: D01, there are 3 topics in window 3.
I am confused. The result means we got 3 topics in a time window?
how to get the topic evolution? I mean, how to know which topic in window 2 changes to topics in window 3, if we get more than 2 topics in window 2.

track-dynamic-topics KeyError

Hello everyone!
Thank you for providing such wonderful tool! I am studying topic model now.
I follow exactly the README instructions, but when I try 'track-dynamic-topics.py out....' I have a KeyError.

Traceback (most recent call last):
File "track-dynamic-topics.py", line 101, in
main()
File "track-dynamic-topics.py", line 57, in main
dynamic_topic_idx = assigned_window_map[window_topic_label]
KeyError: 'month1_06'

Could you please help me with it?
Thank you very much.

Regards,
Tong

Formula that permits to compute "model coherence"

Hello everyone! Thanks for the attention!

I'm using this library for an university project that has the scope to analyze the topics in Twitter's data. I used also LDA algorithm for discover topics in tweets. Now I'm interested on using this Dynamic Topic Modeling approach. So, my question is:

  1. when I execute the step 2 and 3 in "Advanced Usage", this library returns a model coherence that I can't understand very well because I don't understand which is the formula that is used to compute this value (e.g. model coherence = 0.5923)
    EXAMPLE:
    When the library returns this strings listed after, the library returns also a model coherence value.
    e.g. "Top recommendations for number of topics for 'month1': 6,5,9"
    ==> So, what is the formula used for this purpose?

P.S. If I have not been too much clear, I can re-write this question more precisely.

Again, thanks a lot for the attention!

Edoardo, an Italian Computer Science student!

Run on streams?

Hi --

From the example in the docs, it doesn't look like this is really designed to be run on streaming data, but is it possible to do so? If not, are you aware of similar dynamic topic modeling packages that could work on a data stream?

Thanks
Ben

n-gram

I want to use n-gram, to build my window topic model:
python prep-text.py data/sample/month1 data/sample/month2 data/sample/month3 -o data --tfidf --norm --ngram 3
python find-window-topics.py data/*.pkl -k 5 -o out
python display-topics.py out/month1_windowtopics_k05.pkl out/month2_windowtopics_k05.pkl out/month3_windowtopics_k05.pkl

when I display the window topics, why it is still unigram?

topic relevance in document?

Please forgive if this is a stupid question as I'm new to topic modeling, coding, and DTM. I've found your tutorial to be extremely helpful and user friendly. I just have one question: I'm running multiple DTM on thousands of documents and was wondering if there was a way to see which documents were the most relevant to a particular topic instead of just the arbitrary top 50 (or 100, 1000, etc)?

When I've run TM using MALLET I'm able to see the percentage of how relevant a topic is in any given document and can weed out the documents that have the highest relevance using that data. I'm wondering if there's a way to do that with your method.

Pushing to PIP?

Hi,

first of all many thanks for providing the code, but also your excellent homepage to document all of your analysis on the European Parliament. Are there any plans to enable PIP install for this model?
In my opinion your approach is the only valid Python option for including time in topic models. Meanwhile, the R package stm has become an enormous contribution for any social scientists working with textual data.
Structuring dynamic-nmf, such that it is as easy to use as possible - e.g. like gensim, would really be awesome for workflow, teaching purposes, etc.

Best,
Carsten

Excution Problem

I meet a problem when I try to follow your command.

python prep-text.py data/sample/month1 data/sample/month2 data/sample/month3 -o data --tfidf --norm
/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/init.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=FutureWarning)
Loaded 347 stopwords

  • Processing 'month1' from data/sample/month1 ...
    Found 438 documents to parse
    Pre-processing documents (347 stopwords, tfidf=True, normalize=True, min_df=10, max_ngram=1) ...
    Traceback (most recent call last):
    File "prep-text.py", line 91, in
    main()
    File "prep-text.py", line 81, in main
    apply_norm = options.apply_norm, ngram_range = (1,options.max_ngram), lemmatizer=lemmatizer )
    File "/home/jjc/ๆกŒ้ข/dynamic-nmf-master/text/util.py", line 40, in preprocess
    X = tfidf.fit_transform(docs)
    File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1859, in fit_transform
    X = super().fit_transform(raw_documents)
    File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1220, in fit_transform
    self.fixed_vocabulary_)
    File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1131, in _count_vocab
    for feature in analyze(doc):
    File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 108, in _analyze
    doc = ngrams(doc, stop_words)
    File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 227, in _word_ngrams
    tokens = [w for w in tokens if w not in stop_words]
    TypeError: 'NoneType' object is not iterable

I don't know why... could you please help me to solve the problem?

Execution time

I was wondering about what kinds of runtime have you encountered in the practical application of this topic model (leaving aside the question of choosing K). In my limited experience, the scikit NMF decomposition algorithm has been extremely fast for small corpora (a matter of seconds) but it slows down drastically at higher K and larger matrices. I have a model currently running with K=20 on a sparse matrix with 4.3 million cells and it has been going for hours. Compared to standard LDA, this is significantly slower.

The scikit learn documentation mentions polynomial time complexity, which would explain the huge changes in execution time I experienced, and I would like to understand whether this is an issue for others as well.

How can this code dynamic?

Hi I'm Semiha Makinist,
I' a computer engineer and I'm working about aout topic detection and I want you to ask one question about this issue. You said finding dynamic topic, but you gived k value. How can this code dynamic? I didn't understand this situation.

"python find-dynamic-topics.py out/month1_windowtopics_k05.pkl out/month2_windowtopics_k05.pkl out/month3_windowtopics_k05.pkl -k 5 -o out"

Already, Thank you indeed for helping. Have a good work and nice day.

Best regards,
Semiha Makinist

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.