Giter Site home page Giter Site logo

dsc_intro's Introduction

Springboard data science taster course notebooks

worked_notebooks

Contain the worked example notebooks that include completed code.

source_notebooks

These are sourced from worked_notebooks but have had the assignment code deleted and output cleared. Students will load these notebooks and work from them. They can, if and when they choose, refer to the original notebooks in worked_notebooks to check their answers or get unstuck.

About me

Hi, I'm Guy. I'm a mentor with Springboard and really hope you find this taste of learning data scientist is something you'd like more of. If you have any questions, or are thinking of signing up to take things further with Springboard's full courses, then drop me a line. I'll be happy to help.

Yelp dataset

Note, these exercises were put together using Round 12 of Yelp's challenge. Currently Yelp are on round 13. I haven't revisited the exercises using the newer dataset. I hope you still get sensible answers, but let me know if this causes you any difficulties. If you'd like the original dataset (from round 12) let me know. If this option is popular I may devote some of my Google drive to hosting it.

dsc_intro's People

Contributors

gtmaskall avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsc_intro's Issues

TypeError when reading review.json dataset (05_case_study_filter_reviews.ipynb)

When I first set up the code to read in the review.json dataset, I encountered an error:

TypeError: sequence item 0: expected str instance, bytes found

Code

path 'file://localhost/Users/.../DSC_Intro/'
filename = path + 'yelp_dataset/review_Test.json'
review_reader = pd.read_json(filename, lines=True, chunksize=100000)

for chunk in review_reader:
    print(chunk)
...

I changed the code to use

from io import StringIO
...
review_reader = pd.read_json(StringIO(filename), lines=True, chunksize=100000)

insead and this error went away.

Referencing panda users guide indicates that the StringIO function is required.

You may want to update the worked notebook accordingly.

ValueError when reading review.json dataset (05_case_study_filter_reviews.ipynb)

After adding StringIO(...) to the reader creation code in 05_case_study_filter_reviews.ipynb. I am encountering a new error when trying to read the review file in chunks.

ValueError: Unexpected character found when decoding 'false'

N.B.

  • The error only occurs when trying to read the data from a file on disk
    • If the records are placed inline within a string or "here doc" (``` string), the error does not occur
  • The error only occurs when trying to read the file in chunks.
    • If the complete data file is read in, without using chunksize to create a reader, there is no error.
  • I have reduced the size of the data set for testing. I can reproduce the error with a single JSON record in the file.

Sample Data

Either or both of the following records can be used

{"review_id":"rEITo90tpyKmEfNDp3Ou3A","user_id":"6Fz_nus_OG4gar721OKgZA","business_id":"6lj2BJ4tJeu7db5asGHQ4w","stars":5.0,"useful":0,"funny":0,"cool":0,"text":"We've been a huge Slim's fan since they opened one up in Texas about two years ago when we used to live there. This place never disappoints. They even have great salads and grilled chicken. Plus they have fresh brewed sweet tea, it's the best!","date":"2017-05-26 01:23:19"}
{"review_id":"Amo5gZBvCuPc_tZNpHwtsA","user_id":"DzZ7piLBF-WsJxqosfJgtA","business_id":"qx6WhZ42eDKmBchZDax4dQ","stars":5.0,"useful":1,"funny":0,"cool":0,"text":"Our family LOVES the food here. Quick, friendly, delicious, and a great restaurant to take kids to. 5 stars!","date":"2017-03-27 01:14:37"}

Error Output
Screen Shot 2019-06-23 at 19 27 42

I have filed an issue with pandas-dev as well as a question on stackexchange, but haven't gotten very far.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.