Giter Site home page Giter Site logo

Comments (11)

juliasilge avatar juliasilge commented on August 24, 2024

I just went through the example with the Associated Press data in the vignette here and it seems to be still working. What happens if you try this, with the example data?

data("AssociatedPress", package = "topicmodels")
ap_td <- tidy(AssociatedPress)
# the output here ap_td should look very very similar to your tibble above
ap_tdm %>% cast_dtm(document, term, count)

Do you get the same error as on your data?

One thought -- do you have the latest version of dplyr? It's possible we should require a later version of dplyr and don't right now.

from tidytext.

dsdesrosiers avatar dsdesrosiers commented on August 24, 2024

Julia -

Thanks for the response and suggestion.

Your sample code works, my code still fails. I downloaded the latest dplyr... no help. I've done the unlikely dumb stuff like cast the integer column to a double and rename the column 'n' to 'count'. I wrote out the data and there is nothing obviously wrong with it in terms of unexpected characters. I also tested a couple subsets using the words for a single case and I get the same result.

The data I am working with can be a bit unpredictable in that it's free form text entered by our support agents which may contain software error codes. Is there any cleansing steps I should try before I run the unnest_tokens command?

Danielle

from tidytext.

juliasilge avatar juliasilge commented on August 24, 2024

Could it be the column CaseNumber? When it tries to make a DocumentTermMatrix, it needs a column of documents (case numbers, for you) in integer form (like 1, 2, 3...) to use as matrix indices. What data type and distribution are the values in CaseNumber? Will the function be able to use those values as matrix indices to make a DocumentTerm Matrix?

(Actually, now that I wrote this, I am not sure if it's right. Let me check...)

from tidytext.

juliasilge avatar juliasilge commented on August 24, 2024

I don't think I am right that you need to pass an integer, but I am wondering if that is where the problem is, in converting the data you have in CaseNumber into row numbers for the matrix indices. (You can see where this is done right about here. What kind of data is being kept in CaseNumber? What type is it? Factor? How is it distributed?

from tidytext.

dsdesrosiers avatar dsdesrosiers commented on August 24, 2024

So, to expedite this, I just replaced the case numbers (which are literally the number of the case in our CRM) with 1, 2, 3 in the .CSV before reading it into the initial data frame. Same result.

But, then I wrote the data to a .CSV, re-read the same data and now it works. I have to set this aside for now and I don't want to take any more of your time at the moment since I have a workaround. It's very strange though - I am not doing anything particularly novel with the data processing.

Please let me know if you'd like me to supply some sample data/code. Thanks for your time and energy.

from tidytext.

juliasilge avatar juliasilge commented on August 24, 2024

If you can make a small, reproducible example (i.e. not 40,000 lines) that errors, that would certainly be great. If not, best of luck! I'll let you close this issue if you feel like you are good for now.

from tidytext.

dgrtwo avatar dgrtwo commented on August 24, 2024

The problem is that subject.freq is grouped- I'm able to reproduce the error with:

mtcars %>% group_by(gear) %>% cast_dtm(cyl, am, mpg)

I think cast_dtm should ignore groups- if @juliasilge agrees I'll push a fix. (In the meantime an easier workaround than writing to a CSV and reading would be using ungroup() first).

Thanks for the report!

from tidytext.

juliasilge avatar juliasilge commented on August 24, 2024

Ah, I didn't think of that! Yes, that sounds very sensible; I think cast_dtm (cast_sparse_?) should ignore groups.

from tidytext.

dsdesrosiers avatar dsdesrosiers commented on August 24, 2024

I had actually ungrouped and was still getting the issue, but I will try the updated version. If I am still getting the error I will supply some sample data and my R code. I just need identify a set of data that reproduced the error and keeps me in line with my employers data guidelines.

Thanks to both of you for helping with this issue.

from tidytext.

dsdesrosiers avatar dsdesrosiers commented on August 24, 2024

It worked!!! Happy weekend!

from tidytext.

github-actions avatar github-actions commented on August 24, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from tidytext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.