Comments (3)
Hey @jaikishantulswani,
I know this is a real pain. I have been discussing it with @lc. This is quite a tricky problem to solve because the links in a JavaScript file may not be referring to the same domain that the JS file is hosted on.
So, in your example above, imagine that the JavaScript at https://example.com/1.js looks something like this:
var baseURL = "https://api.example.com"
var url = baseURL + "/add/user"
In this case, hakrawler (currently) would output the following two urls:
[linkfinder] https://api.example.com
[linkfinder] /add/user
In order to make the connection that the relative URL "/add/user" is actually https://api.example.com/add/user and not https://example.com/add/user, hakrawler would need to parse the JavaScript. The best way that I know of to parse JavaScript for a terminal application is by utilising a headless browser such as chromium, however, this is extremely processor intensive and slow. One of the core concepts that I want to keep with hakrawler is it's speed and scalability, so this is not an option. At least, not by default. Furthermore, even if we were using a headless browser, it doesn't necessarily mean that a crawler would discover every endpoint in the JS file. Only the ones that become visible within the application.
I think we have two options to resolve this:
- We compromise, and just have special output for linkfinder output which prints relative URLs along with the JS file they were found in, so that they can be manually analysed later. Example output could look something like this:
[linkfinder] /add/user from https://example.com/1.js
- We implement full headless browser crawling as a non-default option in hakrawler which will enable more comprehensive crawls with full support for SPAs. If someone wants comprehensive crawling with a headless browser they can enable it, otherwise the current configuration is used.
My personal preference would be to go for option 1, I think that a headless browser crawling could be saved for a different tool.
from hakrawler.
Option 1 seems the most logical approach. Perhaps with a "best guess" of the URL presented along with the currently proposed text. i.e:
[linkfinder] https://api.example.com/add/user - /add/user from https://example.com/1.js
I know that's fairly simple to write the code for in the example given, but I'm sure it's going to be impossible to catch all scenarios.
from hakrawler.
Hey guys, this has been resolved in beta 6. I ended up just going with
[linkfinder] /add/user from https://example.com/1.js
@lc thanks again for the linkfinder functionality
from hakrawler.
Related Issues (20)
- Can we access the urls with .... .com/api HOT 2
- Hight consume of memory: SSH losing conection. HOT 3
- Parrot os
- Not all flags available HOT 1
- 能否增加从参数指定url或者urlfile HOT 1
- Cannot grab some urls HOT 2
- Terminated by signal SIGKILL (Force quit) HOT 6
- Post url HOT 1
- return nothing. HOT 4
- Windows | Error | URL Failed HOT 9
- sudo docker build -t hakluke/hakrawler . sudo docker run --rm -i hakluke/hakrawler --helpCannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? ERRO[0000] Can't add file /home/teamsync/hakrawler/hakrawler/.git/hooks/push-to-checkout.sample to tar: io: read/write on closed pipe HOT 4
- HTML form with GET method is not detected
- returns nothing on particular site HOT 1
- Hi
- H HOT 1
- Error (is not in GOROOT)
- How to debug when getting no output or errors? HOT 3
- Hello
- Add support of HTTPS or SOCKS proxy
- Hakrawler does not support/handle <meta http-equiv="Refresh"...> redirect tags
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hakrawler.