Giter Site home page Giter Site logo

Comments (3)

hakluke avatar hakluke commented on September 21, 2024

Hey @jaikishantulswani,
I know this is a real pain. I have been discussing it with @lc. This is quite a tricky problem to solve because the links in a JavaScript file may not be referring to the same domain that the JS file is hosted on.

So, in your example above, imagine that the JavaScript at https://example.com/1.js looks something like this:

var baseURL = "https://api.example.com"
var url = baseURL + "/add/user"

In this case, hakrawler (currently) would output the following two urls:

[linkfinder] https://api.example.com
[linkfinder] /add/user

In order to make the connection that the relative URL "/add/user" is actually https://api.example.com/add/user and not https://example.com/add/user, hakrawler would need to parse the JavaScript. The best way that I know of to parse JavaScript for a terminal application is by utilising a headless browser such as chromium, however, this is extremely processor intensive and slow. One of the core concepts that I want to keep with hakrawler is it's speed and scalability, so this is not an option. At least, not by default. Furthermore, even if we were using a headless browser, it doesn't necessarily mean that a crawler would discover every endpoint in the JS file. Only the ones that become visible within the application.

I think we have two options to resolve this:

  1. We compromise, and just have special output for linkfinder output which prints relative URLs along with the JS file they were found in, so that they can be manually analysed later. Example output could look something like this:
[linkfinder] /add/user from https://example.com/1.js
  1. We implement full headless browser crawling as a non-default option in hakrawler which will enable more comprehensive crawls with full support for SPAs. If someone wants comprehensive crawling with a headless browser they can enable it, otherwise the current configuration is used.

My personal preference would be to go for option 1, I think that a headless browser crawling could be saved for a different tool.

from hakrawler.

JeffreyShran avatar JeffreyShran commented on September 21, 2024

Option 1 seems the most logical approach. Perhaps with a "best guess" of the URL presented along with the currently proposed text. i.e:

[linkfinder] https://api.example.com/add/user - /add/user from https://example.com/1.js

I know that's fairly simple to write the code for in the example given, but I'm sure it's going to be impossible to catch all scenarios.

from hakrawler.

hakluke avatar hakluke commented on September 21, 2024

Hey guys, this has been resolved in beta 6. I ended up just going with

[linkfinder] /add/user from https://example.com/1.js

@lc thanks again for the linkfinder functionality

from hakrawler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.