Giter Site home page Giter Site logo

dsc_intro's People

Contributors

gtmaskall avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsc_intro's Issues

ValueError when reading review.json dataset (05_case_study_filter_reviews.ipynb)

After adding StringIO(...) to the reader creation code in 05_case_study_filter_reviews.ipynb. I am encountering a new error when trying to read the review file in chunks.

ValueError: Unexpected character found when decoding 'false'

N.B.

  • The error only occurs when trying to read the data from a file on disk
    • If the records are placed inline within a string or "here doc" (``` string), the error does not occur
  • The error only occurs when trying to read the file in chunks.
    • If the complete data file is read in, without using chunksize to create a reader, there is no error.
  • I have reduced the size of the data set for testing. I can reproduce the error with a single JSON record in the file.

Sample Data

Either or both of the following records can be used

{"review_id":"rEITo90tpyKmEfNDp3Ou3A","user_id":"6Fz_nus_OG4gar721OKgZA","business_id":"6lj2BJ4tJeu7db5asGHQ4w","stars":5.0,"useful":0,"funny":0,"cool":0,"text":"We've been a huge Slim's fan since they opened one up in Texas about two years ago when we used to live there. This place never disappoints. They even have great salads and grilled chicken. Plus they have fresh brewed sweet tea, it's the best!","date":"2017-05-26 01:23:19"}
{"review_id":"Amo5gZBvCuPc_tZNpHwtsA","user_id":"DzZ7piLBF-WsJxqosfJgtA","business_id":"qx6WhZ42eDKmBchZDax4dQ","stars":5.0,"useful":1,"funny":0,"cool":0,"text":"Our family LOVES the food here. Quick, friendly, delicious, and a great restaurant to take kids to. 5 stars!","date":"2017-03-27 01:14:37"}

Error Output
Screen Shot 2019-06-23 at 19 27 42

I have filed an issue with pandas-dev as well as a question on stackexchange, but haven't gotten very far.

TypeError when reading review.json dataset (05_case_study_filter_reviews.ipynb)

When I first set up the code to read in the review.json dataset, I encountered an error:

TypeError: sequence item 0: expected str instance, bytes found

Code

path 'file://localhost/Users/.../DSC_Intro/'
filename = path + 'yelp_dataset/review_Test.json'
review_reader = pd.read_json(filename, lines=True, chunksize=100000)

for chunk in review_reader:
    print(chunk)
...

I changed the code to use

from io import StringIO
...
review_reader = pd.read_json(StringIO(filename), lines=True, chunksize=100000)

insead and this error went away.

Referencing panda users guide indicates that the StringIO function is required.

You may want to update the worked notebook accordingly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.