Hi,
I have a considerably large corpus of transcribed phone calls that I am trying to get topics from. I have tried fitting unigram and bigram lda models to achieve this, but so far I have not obtained great results. Thus, I wanted to try and see if I could achieve better results using trigrams. However, when I try this the fitLDAModel function fails with the following error
2 nodes produced errors; first error: SpMat::init(): requested size is too large
My code is the following:
`prosCallTranscripts <- completeCalls %>%
filter(Speaker != 'company') %>%
group_by(Call_Name) %>%
summarize('Call_Text' = ChampTextclean(paste(Text,'',collapse = ' '),stops))
prosCallTranscripts$Call_Text <- lemmatize_strings(prosCallTranscripts$Call_Text)
prosCallTranscripts$Call_Text <- ChampTextclean(prosCallTranscripts$Call_Text,stops)
PreCall <- grepl('Pre-Call',prosCallTranscripts$Call_Name)
prosCallTranscripts <- prosCallTranscripts[!PreCall,]
dtm <- CreateDtm(doc_vec = prosCallTranscripts$Call_Text,
doc_names = prosCallTranscripts$Call_Name,
ngram_window = c(3,3),
verbose = TRUE)
latenAlloc <- FitLdaModel(dtm = dtm, k = 5, iterations = 100,alpha = 0.1,beta = 0.05,cpus = 3)`
I am using a Windows x64 computer in RStudio and I have the following pachages imported in my R session
library(tm) library(RWeka) library(qdap) library(dplyr) library(tidyr) library(imputeMissings) library(textmineR) library(textstem) library(ggplot2) library(ggsignif) library(radarchart)
Please let me know if this is a bug, it looks like an Rcpp issue, but it may be something that I am doing wrong.