Hi, I have a considerably large corpus of transcribed phone calls th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Error when fitting lda with trigrams about textminer HOT 4 CLOSED

Frank5547 commented on July 18, 2024

Error when fitting lda with trigrams

from textminer.

Comments (4)

Frank5547 commented on July 18, 2024

I apologize my code does not seem to be rendering correctly. Please let me know if I can clarify better

from textminer.

TommyJones commented on July 18, 2024

Hi @Frank5547. I can't reproduce your example without data, but I think I know what's going on. Your trigram matrix is huge. Fortunately, you can get rid of many of the columns without impacting your end result. Because of Zipf's law, most of your trigrams will only appear once or twice throughout the entire corpus. You can get rid of them by typing

dtm <- dtm[, colSums(dtm) > 1]

dtm <- dtm[, colSums(dtm > 0) > 1]

The first prunes trigrams that only appear once. The second prunes trigrams that only appear in one document.

But you may not need trigrams. I generally don't use them because the risk of building a model that does not generalize is too high. I noticed you set ngram_window = c(3,3), which only gives you trigrams and not unigrams or bigrams. Try ngram_window = c(1,2). That's what I usually use and there's usually lots of signal there. (You should still prune infrequent tokens as they will make training time much longer without adding anything worthwhile to the model.)

I'm going to close as this isn't really an issue with textmineR. But please feel free to comment again if you have any questions. I'll help out where I can.

from textminer.

Frank5547 commented on July 18, 2024

You were right it turns out I only needed bigrams after all. I know this reply comes in late, but I wanted to thank you for your comment to my post. It came in handy later on to be reminded of Zipf's Law and that I can delete all the terms with frequency of 1 without losing value.

from textminer.

TommyJones commented on July 18, 2024

Awesome! Glad it was helpful.

On Mon, Oct 15, 2018 at 3:15 PM Francisco Javier Carrera Arias < ***@***.***> wrote: You were right it turns out I only needed bigrams after all. I know this reply comes in late, but I wanted to thank you for your comment to my post. It came in handy later on to be reminded of Zipf's Law and that I can delete all the terms with frequency of 1 without losing value. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#39 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEwoAjGjG4cAF5zDr-5ScKCwwtaEkg9wks5ulN7SgaJpZM4XINV2> .

-- I am responsible for the concept of this message. Unfortunately, autocorrect is responsible for the content

from textminer.

Error when fitting lda with trigrams about textminer HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent