Giter Site home page Giter Site logo

Comments (12)

k3mlol avatar k3mlol commented on July 18, 2024 1

Current version: v1.0.5

from katana.

dogancanbakir avatar dogancanbakir commented on July 18, 2024

Thanks for opening this issue. Our JSONL output option includes request information which you can use to extract your desired data. For example, for your use case, you can do the following:

$ katana -u yahoo.com -silent -j | jq -r '"\(.request.method) \(.request.endpoint)"'
GET https://yahoo.com
GET https://www.yahoo.com/
GET https://edge-mcdn.secure.yahoo.com/ybar/cerebro_min.js
GET https://www.yahoo.com/news/sale-donald-trump-lightly-used-080735193.html
GET https://www.yahoo.com/news/ohio-toddler-died-her-mom-130612092.html
GET https://br.yahoo.com
GET https://autos.yahoo.com/michael-jordan-gets-personal-delivery-180611826.html
GET https://hk.yahoo.com
GET https://www.yahoo.com/autos/michael-jordan-gets-personal-delivery-180611826.html
...

Is this something that'll work for you?

from katana.

k3mlol avatar k3mlol commented on July 18, 2024

Hi dogancanbakir, not it doesn't work for me. I can't see any api data out.
rad_yahoo.com.txt
I upload this help you diff, for katana, I can't see any api data out, all of them are static files

from katana.

dogancanbakir avatar dogancanbakir commented on July 18, 2024

Could you please clarify your statement "I can't see any API data out."? The command we are running only extracts the request method and endpoint. This means that the results you obtain from running the command are not different from those you get when running Katana as katana -u yahoo.com -silent.

from katana.

k3mlol avatar k3mlol commented on July 18, 2024

Hi dogancanbakir, for example, rad cat get

https://sg.yahoo.com/tdv2_fp/api/
https://udc.yahoo.com/v2/public/
https://c2shb-oao.ssp.yahoo.com/admax/
https://query1.finance.yahoo.com/v1/finance/screener/predefined/saved?formatted=true&lang=en-SG&region=SG&scrIds=all_cryptocurrencies_us&start=0&count=25&enableSectorIndustryLabelFix=true&corsDomain=sg.finance.yahoo.com

all these are API URL

but result of katana are static resource.

from katana.

joelczk avatar joelczk commented on July 18, 2024

Hi @dogancanbakir, I have the same feature request as well. I would like to be able to extract the api endpoints for websites (say yahoo.com) in this case. Currently, katana only supports the extraction of javascript files and html files etc. To illustrate this, for a target such as yahoo.com, I would like to be able to extract all their api endpoints such as https://yahoo.com/api/login etc and not just extract the js files and html files

from katana.

dogancanbakir avatar dogancanbakir commented on July 18, 2024

I see. How about using -headless mode for better coverage coupled with filters to obtain the desired output? For example:

katana  -u yahoo.com -mr "(api\.|\/api\/|\/v[0-9]\/)" -hl -xhr -silent

from katana.

k3mlol avatar k3mlol commented on July 18, 2024

Hi dogancanbakir do you know how to match the url which response is Content-Type: application/json?

from katana.

k3mlol avatar k3mlol commented on July 18, 2024

katana -u https://yahoo.com -mr "(api.|/api/|/v[0-9]/)" -hl -xhr -silent

https://fr.yahoo.com/v2/partners?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://fr.yahoo.com/v2/partners-list?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://uk.yahoo.com/v2/partners?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://video-api.yql.yahoo.com
https://uk.yahoo.com/v2/partners-list?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://de.yahoo.com/v2/partners?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://de.yahoo.com/v2/partners-list?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x12849ff]

goroutine 103203 [running]:
github.com/projectdiscovery/retryablehttp-go.FromRequest(0x0)
        /home/runner/go/pkg/mod/github.com/projectdiscovery/[email protected]/request.go:176 +0x5f
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Crawler).navigateRequest.func1(0xc001ec4320)
        /home/runner/work/katana/katana/pkg/engine/hybrid/crawl.go:89 +0x66a
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Hijack).Start.func2(0x0?)
        /home/runner/work/katana/katana/pkg/engine/hybrid/hijack.go:52 +0x28
reflect.Value.call({0x13905e0?, 0xc00cfe2630?, 0x100c002757e40?}, {0x1572396, 0x4}, {0xc002757f58, 0x1, 0xc001ec4320?})
        /opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:596 +0xce7
reflect.Value.Call({0x13905e0?, 0xc00cfe2630?, 0xc001ec4320?}, {0xc002757f58?, 0xc004978420?, 0xc04156b558?})
        /opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:380 +0xb9
github.com/go-rod/rod.(*Browser).eachEvent.func1()
        /home/runner/go/pkg/mod/github.com/go-rod/[email protected]/browser.go:401 +0x3d9
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Hijack).Start.func3()
        /home/runner/work/katana/katana/pkg/engine/hybrid/hijack.go:57 +0x22
created by github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Crawler).navigateRequest in goroutine 103165
        /home/runner/work/katana/katana/pkg/engine/hybrid/crawl.go:49 +0x51b

the result is not better than rad result.

from katana.

dogancanbakir avatar dogancanbakir commented on July 18, 2024

do you know how to match the url which response is Content-Type: application/json?

katana -u target.com -mdc 'contains(headers, "application/json")'

Could you please let me know which version you are currently using?

from katana.

dogancanbakir avatar dogancanbakir commented on July 18, 2024

The nil pointer dereference issue in your last command execution has been resolved in the dev branch.

from katana.

dogancanbakir avatar dogancanbakir commented on July 18, 2024

It's worth noting that due to the web's non-deterministic nature, the results can vary. However, this is to be expected, and the suggestion is to refine filters/matchers (including DSL) if you have a clear idea of what you're looking for. I hope that helps!

from katana.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.