nullifiers / hackerrank-solution-crawler Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 8.0 77 KB

🐍 Crawls solutions of hackerrank and stores as local files.

License: MIT License

Python 100.00%

crawler crawls-solutions hackerrank hackerrank-solution-crawler python python3

hackerrank-solution-crawler's People

Contributors

Stargazers

Watchers

Forkers

ardlank inuwashi siddhant-soni tib-ved gouravrusiya30 derac gerardfp

hackerrank-solution-crawler's Issues

Switch from Travis to Github Actions

Can we switch from travis to github actions instead ?

Pros-

no need to maintain seperate service
automated deployments is easy
currently we use travis only for test run -> easily done using github actions

Cons-

travis has support for api (but we don't use travis api -> so no issue)

More on this https://knapsackpro.com/ci_comparisons/github-actions/vs/travis-ci

Apart from just running tests and build
other things can also be implemented

Cover existing features with unit tests or integration tests

Increase code coverage by creating more unit tests and integration tests.

Create root level and domain level README files

Currently, the script generates README files at the sub-domain level (Eg: Hackerrank/Algorithms/BitManipluation/). It'll be good if the script can similarly generate README files at root level (Hackerrank/) and domain level (Eg: Hackerrank/Algorithms/) as well.

Config file(s)

A config file can be made to store different stuff like:

username, password (after informing and taking permission from user)
user settings
last request time (to override old submission by a new one)
Also, metadata about each file downloaded can be stored in the same/different config.

Sorting logic while creating/updating readme

Description:

Currently sorting is happening from the wrong line number. We are splitting readme headers using \n as the delimiter. So if there are 4 newlines, then headers' length will be 5 which should not be the case. That's why it is not considering the first row while sorting all the rows.

Screenshot:

Automate Deployment

We can deploy on pypi automatically on tag release on master.
https://docs.travis-ci.com/user/deployment/pypi/

Overriding newest solution with oldest solution

Right now, if two or more solutions to the same problem are found, then we override the existing solution with the new one crawled. But the crawling order is from newest to oldest. So, in this case, it will override the new solution with the old solution, which is not what we want.

Steps to reproduce the behavior:

This is the query we are using to get the submissions: https://www.hackerrank.com/rest/contests/master/submissions/?offset=0&limit=100
Here, we traverse through the list of submissions: https://github.com/Nullifiers/Hackerrank-Solution-Crawler/blob/master/hsc/crawler.py#L85

Solution:

We can reverse the order of traversing.
Or, we can get the submissions in reverse order, maybe by tweaking some query parameters.

We also need to handle the offset and limit feature in this case.

Community Profile

https://github.com/Nullifiers/Hackerrank-Solution-Crawler/community

Unable to download submssions

Description
When downloading my submissions it returns the JSONDecodeError (see the code below).
The error is traced to the method login(username, password) in line 45 of crawler.py.
As far I can figure out, the value of data for data = resp.json() is empty, resp.json() does not return anything.

Steps taken to try to fix
I could not figure out why resp.json() returns nothing, so I tried writing my own crawler based on yours to simplify and understand what's going on.
My crawler worked fine enough in providing the authentication details and in getting the models or all the submissions, but when it came to calling resp.json() for each challenge in the get_submission(self, submissions) method, it failed and retured the same JSONDecodeError.

If you wish, I can email my modified crawler to you.

Desktop:

Kubuntu
pip 20.0.2
Python 3.8

Error message

user@pc:~$ hsc
Hackerrank Username: [email protected]
Hackerrank Password: 
Traceback (most recent call last):
  File "/home/parth/.local/bin/hsc", line 8, in <module>
    sys.exit(main())
  File "/home/parth/.local/lib/python3.8/site-packages/hsc/crawler.py", line 224, in main
    if not crawler.authenticate():
  File "/home/parth/.local/lib/python3.8/site-packages/hsc/crawler.py", line 65, in authenticate
    return self.login(username, password)
  File "/home/parth/.local/lib/python3.8/site-packages/hsc/crawler.py", line 45, in login
    data = resp.json()
  File "/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Error occur ,after enter correct credentials

When i run hsc command after installation , it will asked for credentials to enter. I enter the correct details but it showing error.

Error:-

Traceback (most recent call last):
File "C:\Python27\Scripts\hsc-script.py", line 11, in
load_entry_point('hsc==1.2.1', 'console_scripts', 'hsc')()
File "c:\python27\lib\site-packages\hsc\crawler.py", line 221, in main
if not crawler.authenticate():
File "c:\python27\lib\site-packages\hsc\crawler.py", line 62, in authenticate
return self.login(username, password)
File "c:\python27\lib\site-packages\hsc\crawler.py", line 46, in login
self.get_number_of_submissions()
File "c:\python27\lib\site-packages\hsc\crawler.py", line 68, in get_number_of_submissions
self.total_submissions = resp.json()['total']
File "c:\python27\lib\site-packages\requests\models.py", line 898, in json
return complexjson.loads(self.text, **kwargs)
File "c:\python27\lib\json_init_.py", line 339, in loads
return _default_decoder.decode(s)
File "c:\python27\lib\json\decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\python27\lib\json\decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

I also use the direct login command hsc -l 34 -p testpassword -u testuser
But this also not working!

Help me to resolve it!

Setup testing suite

There should be one test suite for the script, which executes the script and then checks if all the tasks which script needs to do are completed.
Run this test as part of Continous Integration.

No Error for Login Failed

No error is thrown if login fails in hackerrank
No way to retry login if login fails

Progress bar to show running progress of crawler

Problem:

Currently, the user has to wait for the completion of the crawler and then a message is displayed showing that "All solutions have been crawled".

Solution:

It would be better if we can provide a progress bar showing the current progress of the number of solutions crawled.

Reference Link:

https://stackoverflow.com/questions/3173320/text-progress-bar-in-the-console

Allow crawler to fetch all solutions

Problem

Currently, there is no feature to directly fetch all the solutions user have done.

Solution

Firstly, we are fetching the total number of submissions user has done (to check whether login has been successful or not), so either we can show the number of submissions first, and then user can put the same number in the limit input to fetch all solutions.

On second thoughts, we can provide limit as a parameter to the script which will behave like this:

If limit parameter is not passed, it will fetch all the submissions
If limit parameter is passed, then it will fetch only that many solutions

@rajat19 @rishabhsingh971 Please provide your opinions on this.

Create a command line tool/script

The script should run just by a command. For that some changes need to be done in the PyPi package.

Wrong Handling of newline

newlines in written in quotes are printed as newline character (i.e they are not escaped)
Example:

line 31 should have been fptr.write(str(result) + '\n')

Cause:
https://github.com/Nullifiers/Hackerrank-Solution-Crawler/blob/master/hsc/crawler.py#L103
Why is this done @rajat19 @rajatgoyal715 ?

Wrong login success check

Currently login success is checked by comparing request URL with current URL which is not working.
https://github.com/Nullifiers/Hackerrank-Solution-Crawler/blob/master/hsc/crawler.py#L39

Add more file extensions

More file extensions should be added in the crawler.

Separate input logic from crawler

I think we should separate input logic from crawler (in a different PR) or else it will become messy with time.

Originally posted by @rishabhsingh971 in #32

Add PyPi package in README

Persist user session

should we store some cookie instead ?

storing cookie is a good idea
we can also can cache the entire session. For that we can either manually pickle session or use https://github.com/rishabhsingh971/persession

Originally posted by @rishabhsingh971 in #23 (comment)

Add language Field in heading of readme files
Add respective language column for each solution

nullifiers / hackerrank-solution-crawler Goto Github PK

hackerrank-solution-crawler's People

Contributors

Stargazers

Watchers

Forkers

hackerrank-solution-crawler's Issues

Description:

Screenshot:

Problem:

Solution:

Reference Link:

Problem

Solution

Recommend Projects

Recommend Topics

Recommend Org