Giter Site home page Giter Site logo

nullifiers / hackerrank-solution-crawler Goto Github PK

View Code? Open in Web Editor NEW
19.0 19.0 8.0 77 KB

๐Ÿ Crawls solutions of hackerrank and stores as local files.

License: MIT License

Python 100.00%
crawler crawls-solutions hackerrank hackerrank-solution-crawler python python3

hackerrank-solution-crawler's People

Contributors

derac avatar rajat19 avatar rajatgoyal715 avatar rishabhsingh971 avatar siddhant-soni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hackerrank-solution-crawler's Issues

Create root level and domain level README files

Currently, the script generates README files at the sub-domain level (Eg: Hackerrank/Algorithms/BitManipluation/). It'll be good if the script can similarly generate README files at root level (Hackerrank/) and domain level (Eg: Hackerrank/Algorithms/) as well.

Config file(s)

A config file can be made to store different stuff like:

  • username, password (after informing and taking permission from user)
  • user settings
  • last request time (to override old submission by a new one)
    Also, metadata about each file downloaded can be stored in the same/different config.

Sorting logic while creating/updating readme

Description:

Currently sorting is happening from the wrong line number. We are splitting readme headers using \n as the delimiter. So if there are 4 newlines, then headers' length will be 5 which should not be the case. That's why it is not considering the first row while sorting all the rows.

Screenshot:

Screenshot from 2019-10-13 15-59-24

Overriding newest solution with oldest solution

Right now, if two or more solutions to the same problem are found, then we override the existing solution with the new one crawled. But the crawling order is from newest to oldest. So, in this case, it will override the new solution with the old solution, which is not what we want.

Steps to reproduce the behavior:

  1. This is the query we are using to get the submissions: https://www.hackerrank.com/rest/contests/master/submissions/?offset=0&limit=100
  2. Here, we traverse through the list of submissions: https://github.com/Nullifiers/Hackerrank-Solution-Crawler/blob/master/hsc/crawler.py#L85

Solution:

  1. We can reverse the order of traversing.
  2. Or, we can get the submissions in reverse order, maybe by tweaking some query parameters.

We also need to handle the offset and limit feature in this case.

Unable to download submssions

Description
When downloading my submissions it returns the JSONDecodeError (see the code below).
The error is traced to the method login(username, password) in line 45 of crawler.py.
As far I can figure out, the value of data for data = resp.json() is empty, resp.json() does not return anything.

Steps taken to try to fix
I could not figure out why resp.json() returns nothing, so I tried writing my own crawler based on yours to simplify and understand what's going on.
My crawler worked fine enough in providing the authentication details and in getting the models or all the submissions, but when it came to calling resp.json() for each challenge in the get_submission(self, submissions) method, it failed and retured the same JSONDecodeError.

If you wish, I can email my modified crawler to you.

Desktop:

  • Kubuntu
  • pip 20.0.2
  • Python 3.8

Error message

user@pc:~$ hsc
Hackerrank Username: [email protected]
Hackerrank Password: 
Traceback (most recent call last):
  File "/home/parth/.local/bin/hsc", line 8, in <module>
    sys.exit(main())
  File "/home/parth/.local/lib/python3.8/site-packages/hsc/crawler.py", line 224, in main
    if not crawler.authenticate():
  File "/home/parth/.local/lib/python3.8/site-packages/hsc/crawler.py", line 65, in authenticate
    return self.login(username, password)
  File "/home/parth/.local/lib/python3.8/site-packages/hsc/crawler.py", line 45, in login
    data = resp.json()
  File "/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Error occur ,after enter correct credentials

When i run hsc command after installation , it will asked for credentials to enter. I enter the correct details but it showing error.

Error:-

Traceback (most recent call last):
File "C:\Python27\Scripts\hsc-script.py", line 11, in
load_entry_point('hsc==1.2.1', 'console_scripts', 'hsc')()
File "c:\python27\lib\site-packages\hsc\crawler.py", line 221, in main
if not crawler.authenticate():
File "c:\python27\lib\site-packages\hsc\crawler.py", line 62, in authenticate
return self.login(username, password)
File "c:\python27\lib\site-packages\hsc\crawler.py", line 46, in login
self.get_number_of_submissions()
File "c:\python27\lib\site-packages\hsc\crawler.py", line 68, in get_number_of_submissions
self.total_submissions = resp.json()['total']
File "c:\python27\lib\site-packages\requests\models.py", line 898, in json
return complexjson.loads(self.text, **kwargs)
File "c:\python27\lib\json_init_.py", line 339, in loads
return _default_decoder.decode(s)
File "c:\python27\lib\json\decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\python27\lib\json\decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

I also use the direct login command hsc -l 34 -p testpassword -u testuser
But this also not working!

Help me to resolve it!

Setup testing suite

There should be one test suite for the script, which executes the script and then checks if all the tasks which script needs to do are completed.
Run this test as part of Continous Integration.

Allow crawler to fetch all solutions

Problem

Currently, there is no feature to directly fetch all the solutions user have done.

Solution

Firstly, we are fetching the total number of submissions user has done (to check whether login has been successful or not), so either we can show the number of submissions first, and then user can put the same number in the limit input to fetch all solutions.

On second thoughts, we can provide limit as a parameter to the script which will behave like this:

  • If limit parameter is not passed, it will fetch all the submissions
  • If limit parameter is passed, then it will fetch only that many solutions

@rajat19 @rishabhsingh971 Please provide your opinions on this.

Problems in sorted order in Readme

Currently All Problems listed in Readme are in order of their being crawled
Change code so that they are in ascending order of the name of Problem

Solution Link in Readme is Wrong

Bug
Readme's generated do not contain valid solution links (path is incorrect)
Wrong path - /abc.cpp -> points to hsc/abc.cpp

Expected behavior
Expected path - ./abc.cpp -> points to hsc/Hackerrank/Algorithms/Implementation/abc.cpp

Add Code Language to Readme files

Currently readme files created do not have language assigned with them.
This can cause confusions if a user has submitted successful solutions for a question using different langauge

What to do

  • Add language Field in heading of readme files
  • Add respective language column for each solution

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.