Comments (12)
Current version: v1.0.5
from katana.
Thanks for opening this issue. Our JSONL output option includes request information which you can use to extract your desired data. For example, for your use case, you can do the following:
$ katana -u yahoo.com -silent -j | jq -r '"\(.request.method) \(.request.endpoint)"'
GET https://yahoo.com
GET https://www.yahoo.com/
GET https://edge-mcdn.secure.yahoo.com/ybar/cerebro_min.js
GET https://www.yahoo.com/news/sale-donald-trump-lightly-used-080735193.html
GET https://www.yahoo.com/news/ohio-toddler-died-her-mom-130612092.html
GET https://br.yahoo.com
GET https://autos.yahoo.com/michael-jordan-gets-personal-delivery-180611826.html
GET https://hk.yahoo.com
GET https://www.yahoo.com/autos/michael-jordan-gets-personal-delivery-180611826.html
...
Is this something that'll work for you?
from katana.
Hi dogancanbakir, not it doesn't work for me. I can't see any api data out.
rad_yahoo.com.txt
I upload this help you diff, for katana, I can't see any api data out, all of them are static files
from katana.
Could you please clarify your statement "I can't see any API data out."? The command we are running only extracts the request method and endpoint. This means that the results you obtain from running the command are not different from those you get when running Katana as katana -u yahoo.com -silent
.
from katana.
Hi dogancanbakir, for example, rad cat get
https://sg.yahoo.com/tdv2_fp/api/
https://udc.yahoo.com/v2/public/
https://c2shb-oao.ssp.yahoo.com/admax/
https://query1.finance.yahoo.com/v1/finance/screener/predefined/saved?formatted=true&lang=en-SG®ion=SG&scrIds=all_cryptocurrencies_us&start=0&count=25&enableSectorIndustryLabelFix=true&corsDomain=sg.finance.yahoo.com
all these are API URL
but result of katana are static resource.
from katana.
Hi @dogancanbakir, I have the same feature request as well. I would like to be able to extract the api endpoints for websites (say yahoo.com) in this case. Currently, katana only supports the extraction of javascript files and html files etc. To illustrate this, for a target such as yahoo.com, I would like to be able to extract all their api endpoints such as https://yahoo.com/api/login etc and not just extract the js files and html files
from katana.
I see. How about using -headless
mode for better coverage coupled with filters to obtain the desired output? For example:
katana -u yahoo.com -mr "(api\.|\/api\/|\/v[0-9]\/)" -hl -xhr -silent
from katana.
Hi dogancanbakir do you know how to match the url which response is Content-Type: application/json?
from katana.
katana -u https://yahoo.com -mr "(api.|/api/|/v[0-9]/)" -hl -xhr -silent
https://fr.yahoo.com/v2/partners?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://fr.yahoo.com/v2/partners-list?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://uk.yahoo.com/v2/partners?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://video-api.yql.yahoo.com
https://uk.yahoo.com/v2/partners-list?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://de.yahoo.com/v2/partners?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://de.yahoo.com/v2/partners-list?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x12849ff]
goroutine 103203 [running]:
github.com/projectdiscovery/retryablehttp-go.FromRequest(0x0)
/home/runner/go/pkg/mod/github.com/projectdiscovery/[email protected]/request.go:176 +0x5f
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Crawler).navigateRequest.func1(0xc001ec4320)
/home/runner/work/katana/katana/pkg/engine/hybrid/crawl.go:89 +0x66a
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Hijack).Start.func2(0x0?)
/home/runner/work/katana/katana/pkg/engine/hybrid/hijack.go:52 +0x28
reflect.Value.call({0x13905e0?, 0xc00cfe2630?, 0x100c002757e40?}, {0x1572396, 0x4}, {0xc002757f58, 0x1, 0xc001ec4320?})
/opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:596 +0xce7
reflect.Value.Call({0x13905e0?, 0xc00cfe2630?, 0xc001ec4320?}, {0xc002757f58?, 0xc004978420?, 0xc04156b558?})
/opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:380 +0xb9
github.com/go-rod/rod.(*Browser).eachEvent.func1()
/home/runner/go/pkg/mod/github.com/go-rod/[email protected]/browser.go:401 +0x3d9
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Hijack).Start.func3()
/home/runner/work/katana/katana/pkg/engine/hybrid/hijack.go:57 +0x22
created by github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Crawler).navigateRequest in goroutine 103165
/home/runner/work/katana/katana/pkg/engine/hybrid/crawl.go:49 +0x51b
the result is not better than rad result.
from katana.
do you know how to match the url which response is Content-Type: application/json?
katana -u target.com -mdc 'contains(headers, "application/json")'
Could you please let me know which version you are currently using?
from katana.
The nil pointer dereference
issue in your last command execution has been resolved in the dev branch.
from katana.
It's worth noting that due to the web's non-deterministic nature, the results can vary. However, this is to be expected, and the suggestion is to refine filters/matchers (including DSL) if you have a clear idea of what you're looking for. I hope that helps!
from katana.
Related Issues (20)
- Binary doesn't build in Windows11 HOT 1
- iqp does not work in passive mode HOT 2
- Adding urlscan passive sources is recommended HOT 2
- DSL doesn't filter status codes. HOT 3
- Flag -crawl-scope not working as expected
- Host Header OVERRIDE Problem HOT 4
- Installation issues HOT 2
- Can auto crawl parent dir HOT 7
- -store-field Storage Location HOT 2
- Katana JSONL file Issue on `raw` request field HOT 3
- dedupe output files
- issue on -passive feature HOT 2
- remove passive crawling
- Default User Agent should match headless chrome on the system
- 扫描大量url卡住,需要手动kill chrme线程 HOT 4
- Page loaded but waitload/waitidle reach generates an error => No XHR in outputs HOT 1
- normalizing the response headers causes confusion
- headless form filling should be reconsidered
- Cookies in CustomHeaders not correctly used & building altered headers (Hybrid) HOT 3
- Headless mode option has no effect in Library mode HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from katana.