Notebook example for training, saving and loading the model file and predicting unseen

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Notebook example for saving and loading the model file and predicting unseen documents about contextualized-topic-models HOT 7 CLOSED

milanlproc commented on May 18, 2024

Notebook example for saving and loading the model file and predicting unseen documents

from contextualized-topic-models.

Comments (7)

vinid commented on May 18, 2024

Hello @subho22!

You can look at this notebook: https://colab.research.google.com/drive/1euxW3ya3_PX6Kj1tnCNrIQ7pjZIODsB6?usp=sharing

It is with the CombinedTM and not with the ZeroShotTM, but it should be just a mater of switching the models' names

from contextualized-topic-models.

subho22 commented on May 18, 2024

Thanks for your quick reply. Can we see the predicted topics for a single document with a probability scores?
Is the n_samples used in prediction function is the no of times the results have been sampled and it will result out the topic which came frequently with the highest order, right ?

from contextualized-topic-models.

vinid commented on May 18, 2024

Yes :)!

in the last part of the notebook you should be able to see that.

get_doc_topic_distribution returns the topic probabilities for each document. You should get a list of arrays, each arrays contains the probability distribution of each document in the testing_dataset.

So:

testing_dataset = tp.create_test_set(testing_contextual_documents, testing_bow_documents) # create dataset for the testset
predictions = ctm.get_doc_topic_distribution(testing_dataset, n_samples=10)

let's suppose we are interested in the topic of the first document, i.e., testing_contextual_documents[0]. Its topic distribution (the probabilities for each topic, are in predictions[0].

Then, we can simpy do this to see the topic

topic_index = np.argmax(predictions[0])
ctm.get_topic_lists(5)[topic_index]

Exactly, n_samples is used to do multiple samplig to get a better estimate of the distribution.

Let me know if this helps :)

from contextualized-topic-models.

subho22 commented on May 18, 2024

Is there any interval that you can suggest to try for n_samples ? Is it dependent on the total no of documents I have in the training set say for more than 7000

from contextualized-topic-models.

silviatti commented on May 18, 2024

The more samples you do, the more accurate your estimate of the probability distribution will be. However, if you have many documents and select a high number of samples, this may take you a considerable amount of time to get the results. In other words, you need to find the right trade-off between time and the accuracy of the results. If time is important to you, I suggest a n_samples lower than 10.

Silvia

from contextualized-topic-models.

subho22 commented on May 18, 2024

Thanks!!

from contextualized-topic-models.

Elfilali-Taoufiq commented on May 18, 2024

Hello, and thank you for this amazing work,

I'm trying to use my trained model for inference, and i found this notebook that you suggest to do so, but i have a problem:

I'm using a new version that i installed in local, and i dont find some functions in the "TopicModelDataPreparation" class, such as : create_training_set .

how can i do it with the recent version of the package plz ?

from contextualized-topic-models.

Notebook example for saving and loading the model file and predicting unseen documents about contextualized-topic-models HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent