Giter Site home page Giter Site logo

threegiantnoobs / chegg-scraper Goto Github PK

View Code? Open in Web Editor NEW
69.0 3.0 23.0 105 KB

Download Chegg homework-help questions to self-sufficient HTML files

License: The Unlicense

Python 36.27% HTML 63.73%
chegg chegg-answers chegg-downloader scraping

chegg-scraper's Introduction

NOTE

The Original Developers are no longer in a position to maintain this project. But we would still like to keep the project alive, thus any open source contribution from the community is more than welcome.


Chegg-Scrapper

Download Chegg homework-help questions to html files, these html files are self sufficient, you don't need account access to load them

Details
  • All files are saved to html document.
  • You will not need your chegg account to open these files later.
  • USE-CASES

    • In Bots You can share your chegg subscription with your friends, eg: by making discord bot
    • Saving Chegg Questions Locally

    Setup:

    • Download latest release

    • Install requirements pip install -r requirements.txt

    • Save your cookie in file cookie.txt (preferably)

      Using Browser Console
      • Log-in to chegg in your browser and open up the developer console. (cmd-shift-c or ctrl-shift-i)
      • Grab your cookies by typing
      • document.cookie
      • paste yout cookie from console into cookie.txt (without ")

      โ€‹ Or

      Using Chrome Extenstion
      • Log-in to chegg in your web browser
      • Click Export and paste in cookie.txt
    • You may also need to change user-agent

      • Open conf.json and edit user_agent

        • Find your browser user agent

    Usage:

    • If you are new to python go here

    • Run the Downloader.py Script

      $ python Downloader.py
      
      Enter url of the homework-help:
    • Arguments

      ALL ARGUMENTS ARE OPTIONAL
      -u or --url      >   URL of Chegg
      -c or --cookie   >   Path of Cookie file (Defualt: cookie.txt)
      -s or --save     >   file path, where you want to save, put inside " "
      

    chegg-scraper's People

    Contributors

    04anubhavv avatar deepsourcebot avatar harsh22222 avatar jason-s-wu avatar k4anubhav avatar prathakgarg avatar ravi-akagra avatar

    Stargazers

     avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

    Watchers

     avatar  avatar  avatar

    chegg-scraper's Issues

    Getting an AttributeError: 'NoneType' object has no attribute 'group'

    After creating the cookie.txt file and compiling I get the following error message:

    Traceback (most recent call last):
    File "C:\Users\Julian\Desktop\chegg\Downloader.py", line 3, in
    Downloader.main()
    File "C:\Users\xxxx\Desktop\chegg\cheggscraper\Downloader.py", line 40, in main
    print(Chegg.url_to_html(args['url'], file_name_format=args['file_format']))
    File "C:\Users\xxxxx\Desktop\chegg\cheggscraper\CheggScraper.py", line 521, in url_to_html
    headers, heading, question_div, answers__, question_uuid = self._parse(html_text=html_res_text,
    File "C:\Users\xxxxx\Desktop\chegg\cheggscraper\CheggScraper.py", line 437, in _parse
    re.search(r'C.page.homeworkhelp_question((.*)?);', html_text).group(1))
    AttributeError: 'NoneType' object has no attribute 'group'

    Please let me know if there is a solution for this or if this is now an outdated scraper, thank you.

    Captcha Error

    Chegg detects the script as a bot and asks to resolve the captcha after 2 links, any solution? I have changed the UserAgent correctly in the configuration file.

    File "Downloader.py", line 3, in <module> Downloader.main() File "C:\Users\nitin\Desktop\chegg-scraper\cheggscraper\Downloader.py", line 40, in main print(Chegg.url_to_html(args['url'], file_name_format=args['file_format'])) File "C:\Users\nitin\Desktop\chegg-scraper\cheggscraper\CheggScraper.py", line 521, in url_to_html html_res_text = self._get_response_text(url=url) File "C:\Users\nitin\Desktop\chegg-scraper\cheggscraper\CheggScraper.py", line 311, in _get_response_text raise Exception(f'Expected status code {expected_status} but got {response.status_code}\n{error_note}') Exception: Expected status code (200,) but got 403 Error in request PS C:\Users\nitin\Desktop\chegg-scraper>

    i get these error

    error in request

    raise Exception(f'Expected status code {expected_status} but got {response.status_code}\n{error_note}')
    Exception: Expected status code (200,) but got 403
    Error in request
    PS C:\Users\nitin\Desktop\chegg-scraper>

    Chegg account keeps getting suspended

    Have my own subscription, and I don't share with anyone. Launched the bot on the same browser, same IP as my chegg account and I still get that prompt a day or so after launching the bot. How to fix?

    Error

    Got this error while trying to scrape a URL
    Traceback (most recent call last): File "/Users/sheheryartariq/Downloads/chegg-scraper-1.3/Downloader.py", line 3, in <module> Downloader.main() File "/Users/sheheryartariq/Downloads/chegg-scraper-1.3/cheggscraper/Downloader.py", line 40, in main print(Chegg.url_to_html(args['url'], file_name_format=args['file_format'])) File "/Users/sheheryartariq/Downloads/chegg-scraper-1.3/cheggscraper/CheggScraper.py", line 521, in url_to_html headers, heading, question_div, answers__, question_uuid = self._parse(html_text=html_res_text, File "/Users/sheheryartariq/Downloads/chegg-scraper-1.3/cheggscraper/CheggScraper.py", line 437, in _parse re.search(r'C\.page\.homeworkhelp_question\((.*)?\);', html_text).group(1)) AttributeError: 'NoneType' object has no attribute 'group'

    Help(CAN PAY FOR IT)

    Ey, can you help someone like me who has no idea about using python?
    I need to have chegg bot into my server! and I'm willing to pay for this help! THANKS

    Bot Flag Error when deployed to heroku

    code works when running on local computer, however when deployed to heroku or any site: AWS, GCP, pythonanywhere etc... I get bot flagged error even with cookie file.

    Error downloading

    Traceback (most recent call last):
    File "C:\Users\tarek\OneDrive\Escritorio\chegg-scraper-main\Downloader.py", line 3, in
    Downloader.main()
    File "C:\Users\tarek\OneDrive\Escritorio\chegg-scraper-main\cheggscraper\Downloader.py", line 40, in main
    print(Chegg.url_to_html(args['url'], file_name_format=args['file_format']))
    File "C:\Users\tarek\OneDrive\Escritorio\chegg-scraper-main\cheggscraper\CheggScraper.py", line 533, in url_to_html
    headers, heading, question_div, answers__ = self._parse(
    File "C:\Users\tarek\OneDrive\Escritorio\chegg-scraper-main\cheggscraper\CheggScraper.py", line 448, in _parse
    heading = self._parse_heading(soup)
    File "C:\Users\tarek\OneDrive\Escritorio\chegg-scraper-main\cheggscraper\CheggScraper.py", line 345, in _parse_heading
    heading = json.loads(heading_data)['query']['qnaSlug']
    KeyError: 'qnaSlug'

    Cookie file not found

    No matter where I put cookie.txt file, it keeps telling me:

    Traceback (most recent call last):
      File "Downloader.py", line 1, in <module>
        from cheggscraper import Downloader
      File "/mnt/c/Users/Username/OneDrive/Desktop/solutions/cheggscraper/Downloader.py", line 48
        raise Exception(f'{args["cookie_file"]} does not exists')
    

    Also tried Downloader.py in the root folder of the project. And using the -c argument. Nothing works
    Any idea?

    Likes and Dislikes

    The script works very well, but could you add the Likes and Dislikes of the answers? That would be very good as it would help us to know if the answers are correct. Regards.

    How to change the saved html directory?

    Hi, after entered the url, the converted html is saved in the same directory as the chegg-scraper file. How do i change the file directory to other place? Thanks

    Bro do you face reCaptcha error too?

    Hi guys, codes works good but some of the URL doesn't work.... I mean unless you open them first and pass reCaptcha stuff. Have you faced same issues? or the problem is on my side? Thanks

    AttributeError: 'NoneType' object has no attribute 'group'

    Hi there,

    After giving the link: https://www.chegg.com/homework-help/questions-and-answers/us-government-recently-announced-started-implement-large-scale-fiscal-expansion-mitigate-n-q91031911?trackid=2e28bc4b1bcc&strackid=0708d012927e, the programme failed with the following error:

    root@78:/codingProject/chegg-scraper# python3 Downloader.py -u https://www.chegg.com/homework-help/questions-and-answers/us-government-recently-announced-started-implement-large-scale-fiscal-expansion-mitigate-n-q91031911?trackid=2e28bc4b1bcc&strackid=0708d012927e
    [16] 42052
    root@78:/codingProject/chegg-scraper# Traceback (most recent call last):
    File "Downloader.py", line 3, in
    Downloader.main()
    File "/codingProject/chegg-scraper/cheggscraper/Downloader.py", line 40, in main
    print(Chegg.url_to_html(args['url'], file_name_format=args['file_format']))
    File "/codingProject/chegg-scraper/cheggscraper/CheggScraper.py", line 521, in url_to_html
    headers, heading, question_div, answers__, question_uuid = self._parse(html_text=html_res_text,
    File "/codingProject/chegg-scraper/cheggscraper/CheggScraper.py", line 437, in _parse
    re.search(r'C.page.homeworkhelp_question((.*)?);', html_text).group(1))
    AttributeError: 'NoneType' object has no attribute 'group'

    Some url can't work on it

    Cookie not working

    Placing cookie.txt file in project folder not working

    File "C:\Users\user\Documents\GitHub\chegg-scraper-main\Downloader.py", line 3, in
    Downloader.main()
    File "C:\Users\user\Documents\GitHub\chegg-scraper-main\cheggscraper\Downloader.py", line 34, in main
    raise Exception(f'{args["cookie_file"]} does not exists')
    Exception: cookie.txt does not exists

    run

    chegg-scraper>python Downloader.py
    Enter url of the homework-help: https://www.chegg.com/homework-help/questions-and-answers/bicycle-pedal-crank-subjected-1000-n-pedaling-force-determine-torque-nm-point-b--dimension-q30751129
    Traceback (most recent call last):
    File "chegg-scraper\Downloader.py", line 28, in
    print(Chegg.url_to_html(args['url'], file_path=args['file_path']))
    File "chegg-scraper\CheggScraper.py", line 261, in url_to_html
    final_html, heading = self.parse(html_text=html_res_text)
    File "chegg-scraper\CheggScraper.py", line 243, in parse
    answers
    = self._parse_answer(soup, html_text)
    File "chegg-scraper\CheggScraper.py", line 196, in _parse_answer
    _, question_data = self.parse_json(re.search(r'C.page.homeworkhelp_question((.*)?);', html_text).group(1))
    AttributeError: 'NoneType' object has no attribute 'group'

    chegg-scraper>python Downloader.py
    Enter url of the homework-help: https://www.chegg.com/homework-help/questions-and-answers/bicycle-pedal-crank-subjected-1000-n-pedaling-force-determine-torque-nm-point-b--dimension-q30751129
    Traceback (most recent call last):
    File "chegg-scraper\Downloader.py", line 28, in
    print(Chegg.url_to_html(args['url'], file_path=args['file_path']))
    File "chegg-scraper\CheggScraper.py", line 261, in url_to_html
    final_html, heading = self.parse(html_text=html_res_text)
    File "chegg-scraper\CheggScraper.py", line 243, in parse
    answers
    = self._parse_answer(soup, html_text)
    File "chegg-scraper\CheggScraper.py", line 208, in _parse_answer
    raise Exception
    Exception

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google โค๏ธ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.