Giter Site home page Giter Site logo

urlhunter's Introduction

urlhunter's People

Contributors

itsignacioportal avatar rzhade3 avatar utkusen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

urlhunter's Issues

`panic: invalid line` when parsing one of the urls

Steps to reproduce:

echo "google" >key1.txt; urlhunter --keywords key1.txt --date 2023-03-15:2023-03-15 --output a.txt

Output:


	o  	 Utku Sen's
	\_/\o
	( Oo)                    \|/
	(_=-)  .===O-  ~~U~R~L~~ -O-
	/   \_/U'        hunter  /|\
	||  |_/
	\\  |    utkusen.com
	{K ||	twitter.com/utkusen

 
Search starting for: 2023-03-15
[+]: urlteam_2023-03-15-20-17-01 already exists locally. Skipping download..
[+]: Searching: "google" in urlteam_2023-03-15-20-17-01/ed-gr
[+]: Searching: "google" in urlteam_2023-03-15-20-17-01/goo-gl
panic: invalid line: yRjCQD|http://www.priceminister.com/offer/buy/853650105/google-chromecast-audio-noir.html?t=180110&ptnrid=pt%7C89084411603%7Cc%7C53388079763%7C853650105&gclid=CK_R94KCvswCFZadGwod1bQLdA#sort=0&bbaid=1919751435&filter=20&xtatc=PUB-%5Bggp%5D-%5BHifi%5D-%5Baccessoire-audio-video%5D-%5B853650105%5D-%5Boccasion%5D-%5BTop_Occasion%5D&t=&ptnrid=s16SUVAfu_dc|pcrid|53388079763|pkw||pmt|&ja1=tsid:67590|cid:285246443|agid:14445716363|tid:pla-89084411603|crid:53388079763|nw:g|rnd:3013699068986104353|dvc:c|adp:1o2

goroutine 1 [running]:
main.searchFile({0xc0003786c0, 0x36}, {0xc000599dc9, 0x6}, {0x7ffe85ef8071, 0x5})
	github.com/utkusen/urlhunter/main.go:276 +0x70e
main.getArchive({0xc000436000, 0x1cec1, 0x20000}, {0xc00013e010, 0xa}, {0x7ffe85ef8042, 0x8}, {0x7ffe85ef8071, 0x5})
	github.com/utkusen/urlhunter/main.go:222 +0x5e9
main.main()
	github.com/utkusen/urlhunter/main.go:114 +0x6c5

Using a multiline keywords file breaks the output

When keywords.txt has a single word as its content:

code
C:\Users\REDACTED\Desktop>main.exe -k C:\Users\REDACTED\Documents\GitHub\urlhunter\keywords.txt -d 2022-01-01:2022-01-04 -o test.txt -a C:\Users\REDACTED\Documents\GitHub\urlhunter\archives

        o         Utku Sen's
         \_/\o
        ( Oo)                    \|/
        (_=-)  .===O-  ~~U~R~L~~ -O-
        /   \_/U'        hunter  /|\
        ||  |_/
        \\  |    utkusen.com
        {K ||   twitter.com/utkusen


Search starting for: 2022-01-01
[+]: Couldn't find an archive with that date.
Search starting for: 2022-01-02
[+]: urlteam_2022-01-02-11-17-02 already exists locally. Skipping download..
[+]: Searching: "code" in C:\Users\REDACTED\Documents\GitHub\urlhunter\archives\urlteam_2022-01-02-11-17-02\goo-gl\______.txt
^C

When keywords.txt has multiple lines as its content:

code
auth
token
C:\Users\REDACTED\Desktop>main.exe -k C:\Users\REDACTED\Documents\GitHub\urlhunter\keywords.txt -d 2022-01-01:2022-01-04 -o test.txt -a C:\Users\REDACTED\Documents\GitHub\urlhunter\archives

        o         Utku Sen's
         \_/\o
        ( Oo)                    \|/
        (_=-)  .===O-  ~~U~R~L~~ -O-
        /   \_/U'        hunter  /|\
        ||  |_/
        \\  |    utkusen.com
        {K ||   twitter.com/utkusen


Search starting for: 2022-01-01
[+]: Couldn't find an archive with that date.
Search starting for: 2022-01-02
[+]: urlteam_2022-01-02-11-17-02 already exists locally. Skipping download..
" in C:\Users\REDACTED\Documents\GitHub\urlhunter\archives\urlteam_2022-01-02-11-17-02\goo-gl\______.txt
^C

Return shortlink referring to a given longlink

Currently, we're only searching for/ returning the long link for a given resource. However, the archives also contain information about the shortlink within the file (the format of the file is in the Beacon Link Dump format:

b9YiMs|https://www.google.com

It'd be helpful to also return the shortlink which refers to the longlink. This could be implemented by scanning the archive to find the entire line that contains a resource instead of just the specific string that was matched.

urlteam_2020-12-28-03-17-02 Archive already exists!

I have download this repo.
I have download xz-5.2.5-windows.zip. Install files to c:\Windows\System32.
I have goo message from: > xz --help
First time run was downloaded the archives to archives\urlteam_2020-12-28-03-17-02
But then i have errors: go run main.go -date latest -keywords keywords.txt -o out.txt

Search starting for: latest
urlteam_2020-12-28-03-17-02 Archive already exists!
panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
main.searchFile(0xc0001a5380, 0x37, 0xc0001adbc0, 0x14, 0xc000010118, 0x7)
        ../urlhunter/main.go:201 +0xc5c
main.getArchive(0xc000300000, 0x172e4, 0x1fe00, 0xc0000100a8, 0x6, 0xc0000100f0, 0xc, 0xc000010118, 0x7)
        ../urlhunter/main.go:194 +0xb2f
main.main()
        ../urlhunter/main.go:94 +0x387
exit status 2

Containerize URLHunter

Would you be open to containerizing this service, as well as possibly publishing it to DockerHub (or GitHub Container Registry)? It would really help to automate running this service.

I did see that someone submitted this PR: #2, which you closed.

XZ executable file not found

image

Dear Creator,
Love this tool, and can't wait to see its full capabilities. I'm faced with the following errors even when copying the files to the path folders

Install via Go Get fails

go/src/golang.org/x/term/term_unix_linux.go:9:7: ioctlReadTermios redeclared in this block
        previous declaration at go/src/golang.org/x/term/term_unix_aix.go:9:26
go/src/golang.org/x/term/term_unix_linux.go:10:7: ioctlWriteTermios redeclared in this block
        previous declaration at go/src/golang.org/x/term/term_unix_aix.go:10:27

I'm using Ubuntu.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.