Giter Site home page Giter Site logo

Comments (7)

vinid avatar vinid commented on May 18, 2024

Hello @subho22!

You can look at this notebook: https://colab.research.google.com/drive/1euxW3ya3_PX6Kj1tnCNrIQ7pjZIODsB6?usp=sharing

It is with the CombinedTM and not with the ZeroShotTM, but it should be just a mater of switching the models' names

from contextualized-topic-models.

subho22 avatar subho22 commented on May 18, 2024

Thanks for your quick reply. Can we see the predicted topics for a single document with a probability scores?
Is the n_samples used in prediction function is the no of times the results have been sampled and it will result out the topic which came frequently with the highest order, right ?

from contextualized-topic-models.

vinid avatar vinid commented on May 18, 2024

Yes :)!

in the last part of the notebook you should be able to see that.

get_doc_topic_distribution returns the topic probabilities for each document. You should get a list of arrays, each arrays contains the probability distribution of each document in the testing_dataset.

So:

testing_dataset = tp.create_test_set(testing_contextual_documents, testing_bow_documents) # create dataset for the testset
predictions = ctm.get_doc_topic_distribution(testing_dataset, n_samples=10) 

let's suppose we are interested in the topic of the first document, i.e., testing_contextual_documents[0]. Its topic distribution (the probabilities for each topic, are in predictions[0].

Then, we can simpy do this to see the topic

topic_index = np.argmax(predictions[0])
ctm.get_topic_lists(5)[topic_index]

Exactly, n_samples is used to do multiple samplig to get a better estimate of the distribution.

Let me know if this helps :)

from contextualized-topic-models.

subho22 avatar subho22 commented on May 18, 2024

Is there any interval that you can suggest to try for n_samples ? Is it dependent on the total no of documents I have in the training set say for more than 7000

from contextualized-topic-models.

silviatti avatar silviatti commented on May 18, 2024

The more samples you do, the more accurate your estimate of the probability distribution will be. However, if you have many documents and select a high number of samples, this may take you a considerable amount of time to get the results. In other words, you need to find the right trade-off between time and the accuracy of the results. If time is important to you, I suggest a n_samples lower than 10.

Silvia

from contextualized-topic-models.

subho22 avatar subho22 commented on May 18, 2024

Thanks!!

from contextualized-topic-models.

Elfilali-Taoufiq avatar Elfilali-Taoufiq commented on May 18, 2024

Hello, and thank you for this amazing work,

I'm trying to use my trained model for inference, and i found this notebook that you suggest to do so, but i have a problem:

  • I'm using a new version that i installed in local, and i dont find some functions in the "TopicModelDataPreparation" class, such as : create_training_set .

how can i do it with the recent version of the package plz ?

from contextualized-topic-models.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.