Giter Site home page Giter Site logo

photon's Introduction


Photon
Photon

Incredibly fast crawler designed for OSINT.

pypi

demo

Photon WikiHow To UseCompatibilityPhoton LibraryContributionRoadmap

Key Features

Data Extraction

Photon can extract the following data while crawling:

  • URLs (in-scope & out-of-scope)
  • URLs with parameters (example.com/gallery.php?id=2)
  • Intel (emails, social media accounts, amazon buckets etc.)
  • Files (pdf, png, xml etc.)
  • Secret keys (auth/API keys & hashes)
  • JavaScript files & Endpoints present in them
  • Strings matching custom regex pattern
  • Subdomains & DNS related data

The extracted information is saved in an organized manner or can be exported as json.

save demo

Flexible

Control timeout, delay, add seeds, exclude URLs matching a regex pattern and other cool stuff. The extensive range of options provided by Photon lets you crawl the web exactly the way you want.

Genius

Photon's smart thread management & refined logic gives you top notch performance.

Still, crawling can be resource intensive but Photon has some tricks up it's sleeves. You can fetch URLs archived by archive.org to be used as seeds by using --wayback option.

Plugins

Docker

Photon can be launched using a lightweight Python-Alpine (103 MB) Docker image.

$ git clone https://github.com/s0md3v/Photon.git
$ cd Photon
$ docker build -t photon .
$ docker run -it --name photon photon:latest -u google.com

To view results, you can either head over to the local docker volume, which you can find by running docker inspect photon or by mounting the target loot folder:

$ docker run -it --name photon -v "$PWD:/Photon/google.com" photon:latest -u google.com

Frequent & Seamless Updates

Photon is under heavy development and updates for fixing bugs. optimizing performance & new features are being rolled regularly.

If you would like to see features and issues that are being worked on, you can do that on Development project board.

Updates can be installed & checked for with the --update option. Photon has seamless update capabilities which means you can update Photon without losing any of your saved data.

Contribution & License

You can contribute in following ways:

  • Report bugs
  • Develop plugins
  • Add more "APIs" for ninja mode
  • Give suggestions to make it better
  • Fix issues & submit a pull request

Please read the guidelines before submitting a pull request or issue.

Do you want to have a conversation in private? Hit me up on my twitter, inbox is open :)

Photon is licensed under GPL v3.0 license

photon's People

Contributors

0xd0ug avatar 0xinfection avatar alessaba avatar bberastegui avatar connorskees avatar fabaff avatar gil2abir avatar joeyliechty avatar marvzinc avatar milo2012 avatar neutrinoguy avatar otherpirate avatar oxis avatar paulrosset avatar ropc avatar s0md3v avatar sirfoga avatar snehm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

photon's Issues

just some ideas

collecting of /ads.txt
and https:// certs info

may be useful

add newline after each output file

The output text files currently don't have a newline at the end, so it messes up the terminal if you cat them, or if I try to cat two files together the first line of the second file is on the same line as the last line of the first file. Simple fix:

Add "f.write('\n')" after after these two lines:
https://github.com/s0md3v/Photon/blob/master/photon.py#L494
https://github.com/s0md3v/Photon/blob/master/photon.py#L490

def writer(datasets, dataset_names, output_dir):
    for dataset, dataset_name in zip(datasets, dataset_names):
        if dataset:
            filepath = output_dir + '/' + dataset_name + '.txt'
            if python3:
                with open(filepath, 'w+', encoding='utf8') as f:
                    f.write(str('\n'.join(dataset)))
                    f.write('\n')
            else:
                with open(filepath, 'w+') as f:
                    joined = '\n'.join(dataset)
                    f.write(str(joined.encode('utf-8')))
                    f.write('\n')

thanks

Subdomains overwrite the same folder name.

Hello there, nice tool! here something I found.

If I enter:

python3 photon.py -u "https://aaa.example.com"
python3 photon.py -u "https://bbb.example.com"
python3 photon.py -u "https://ccc.example.com"

The saved folder is saved with the name "example" but this folder is overwrite with every subdomain result. I recommend adding the full subdomain and domain name to the folder name so the folder result looks like :

bbb.example.com
bbb.example.com
ccc.example.com

This will help when testing multiple subdomains.

PS: Adding a option to load multiple domains and subdomains like -u "https://aaa.example.com,https://bbb.example.com,https://ccc.example.com" it be nice.

Encoding error

App crash and show this error: "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 17: ordinal not in range(128)"
captura de pantalla 2018-07-24 a las 19 53 27

Error while launching

Traceback (most recent call last):
File "photon.py", line 14, in
from requests import get, post
ModuleNotFoundError: No module named 'requests'

error when scanning IP

Line 169 errors out if you run photon against an IP. Easiest fix might be to just add a try/except, but there is prob a more elgant solution.

I'm pretty sure this was working before.

root@kali:/opt/Photon# python /opt/Photon/photon.py -u http://192.168.0.213:80
      ____  __          __
     / __ \/ /_  ____  / /_____  ____
    / /_/ / __ \/ __ \/ __/ __ \/ __ \
   / ____/ / / / /_/ / /_/ /_/ / / / /
  /_/   /_/ /_/\____/\__/\____/_/ /_/ v1.1.1

Traceback (most recent call last):
  File "/opt/Photon/photon.py", line 169, in <module>
    domain = get_fld(host, fix_protocol=True) # Extracts top level domain out of the host
  File "/usr/local/lib/python2.7/dist-packages/tld/utils.py", line 387, in get_fld
    search_private=search_private
  File "/usr/local/lib/python2.7/dist-packages/tld/utils.py", line 339, in process_url
    raise TldDomainNotFound(domain_name=domain_name)
tld.exceptions.TldDomainNotFound: Domain 192.168.0.213 didn't match any existing TLD name!

Option to save save redirection value instead of request

I used python3 photon.py --url http://x.x.x.x --level 1 --only-url and I got a list of 103 internal URL.

All the URL are using the following pattern: http://x.x.x.x/?r=[redirection_token].

Having this list alone is pretty useless, what is interesting is to get the redirection value (for example contained in the Location header after a HTTP 302 or 303 code).

There should be an option to store the redirection value instead of the raw URL when a redirection HTTP code is hit.

This could be implemented with something like in pseudo-code:

check_http_code_status(code):
  switch(code):
  case 200:
    store(request)
  case 301, 302, 303:
    store(answer.location):
  case 404:
    do_nothing

clear definition of -u

what's the differences among:

-u example.com
-u www.example.com
-u http://example.com
-u https://example.com
-u http://www.example.com
-u https://www.example.com

looks like their output are all saved under the same example.com/ directory but the content will be different for each case?

Non colored mode

Would it be possible to include a non colored mode?
I know it's such a strange edge case, but Pythonista on iOS doesn’t support colors in the output (yet) and it just displays this instead of the colored Photon text

�[91m ____ __ __ / �[1;97m__�[91m \/ /_ ____ / /_____ ____ / �[1;97m/_/�[91m / __ \/ �[1;97m__�[91m \/ __/ �[1;97m__�[91m \/ __ \ / ____/ / / / �[1;97m/_/�[91m / /_/ �[1;97m/_/�[91m / / / / /_/ /_/ /_/\____/\__/\____/_/ /_/ �[1;m

requests.exceptions.SSLError

Traceback (most recent call last):
File "C:\Python37\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Python37\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "C:\Python37\lib\site-packages\urllib3\connectionpool.py", line 849, in validate_conn
conn.connect()
File "C:\Python37\lib\site-packages\urllib3\connection.py", line 356, in connect
ssl_context=context)
File "C:\Python37\lib\site-packages\urllib3\util\ssl
.py", line 359, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Python37\lib\ssl.py", line 412, in wrap_socket
session=session
File "C:\Python37\lib\ssl.py", line 850, in _create
self.do_handshake()
File "C:\Python37\lib\ssl.py", line 1108, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python37\lib\site-packages\requests\adapters.py", line 445, in send
timeout=timeout
File "C:\Python37\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Python37\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.xxxx.com', port=443): Max retries exceeded with url: /robots.txt (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "photon.py", line 413, in
zap(main_url)
File "photon.py", line 235, in zap
response = get(url + '/robots.txt').text # makes request to robots.txt
File "C:\Python37\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Python37\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python37\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python37\lib\site-packages\requests\sessions.py", line 644, in send
history = [resp for resp in gen] if allow_redirects else []
File "C:\Python37\lib\site-packages\requests\sessions.py", line 644, in
history = [resp for resp in gen] if allow_redirects else []
File "C:\Python37\lib\site-packages\requests\sessions.py", line 222, in resolve_redirects
**adapter_kwargs
File "C:\Python37\lib\site-packages\requests\sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "C:\Python37\lib\site-packages\requests\adapters.py", line 511, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.xxxx.com', port=443): Max retries exceeded with url: /robots.txt (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)')))

deleting output directory is dangerous

right now, if the user specifies -o /root, won't the program recursively delete /root because it exists (line 640).

That is why in my pull i changed the w+ to w and only modified the files you wrote in the first place.

I suggest removing the part about recursively removing the directory structure, and instead, just overwrite your files if the tools is run twice.

A couple minor issues

SSL issues

You're ignoring SSL verification:

/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:843: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)

You can fix this by adding the following to the top of the file (or wherever you feel like putting it):

import warnings
warnings.filterwarnings("ignore")

Now this is bad practice, for obvious reasons, however, it makes sense why you're doing it. The above solution is the quickest and simplest solution to solve the problem and keep the errors from being annoying.

An example with and without:

Without:
without

With:
with


Dammit Unicode

If I supply a URL that is in, lets say Russian and try to extract all the data:

      ____  __          __
     / __ \/ /_  ____  / /_____  ____
    / /_/ / __ \/ __ \/ __/ __ \/ __ \
   / ____/ / / / /_/ / /_/ /_/ / / / /
  /_/   /_/ /_/\____/\__/\____/_/ /_/ 

 URLs retrieved from robots.txt: 5
 Level 1: 6 URLs
 Progress: 6/6
 Level 2: 35 URLs
 Progress: 35/35
 Level 3: 7 URLs
 Progress: 7/7
 Crawling 7 JavaScript files
 Progress: 7/7
Traceback (most recent call last):
  File "photon.py", line 429, in <module>
    f.write(x + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u200b' in position 7: ordinal not in range(128)

Easiest solution would be to just ignore things like this and continue, with a warning to the user that it was ignored.

Export as csv

Hey great tool!

I was wondering if you could as an export to CSV option?

Lines incorrectly stored in text file.

OS: macOS High Sierra
Python version: Python 3.6.6

So, I tried photon-ing twitter and the output format wasn't what I expected. Or, did I miss something?

Actual output (the \n is not processed):
screen shot 2018-07-29 at 12 02 57 am

Expected output:
screen shot 2018-07-28 at 11 47 02 pm

How to Save Results

Dear Team,

Tool is awesome but when I tried running your command like:

Python photon.py http://site.com --delay=1.5

I got result like this one:

      ____  __          __            
     / __ \/ /_  ____  / /_____  ____ 
    / /_/ / __ \/ __ \/ __/ __ \/ __ \
   / ____/ / / / /_/ / /_/ /_/ / / / /
  /_/   /_/ /_/\____/\__/\____/_/ /_/ 

[!] Links to crawl: 108
[!] Time required: ~5 minutes
[+] Total URLs found: 780
[+] Fuzzable URLs found: 149
[+] JavaScript files found: 126
[~] Scanning JavaScript files for endpoints
[!] Time required: ~1 minute
[+] Enpoints found: 346
mkdir: missing operand
Try 'mkdir --help' for more information.
[+] Results saved in  directory

i don't know where results are being saved.

Can you please update the READme file.

Thanks.

setup.py

As mentioned already in #98 is the setup.py file missing in the repo.

If there is one around then I would like to suggest that install_requires and entry_points is added.

If entry_points one could launch Photon with photon without caring about the interpreter.

It looks like that it would require some changes in photon.py for the access of at least user-agents.txt.

else:
    here = os.path.abspath(os.path.dirname(__file__))
    with open('{}/core/user-agents.txt'.format(here), 'r') as uas:
        user_agents = [agent.strip('\n') for agent in uas]

Exception thrown while using photon as module

OS: Arch Linux
Python Version: 3.7

Python Script:

from photon import crawl

data = crawl("http://stackoverflow.com", timeout=5)
print(data)

Exception Thrown

Exception in thread Thread-38:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1049, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 908, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/requests/adapters.py", line 445, in send
    timeout=timeout
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3.7/site-packages/urllib3/util/retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='stackoverflow.com', port=443): Read timed out. (read timeout=5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/home/$USER/.local/lib/python3.7/site-packages/photon/photon.py", line 184, in extractor
    response = requester(url, delay, domain_name, user_agents, cookie, timeout) # make request to the url
  File "/home/$USER/.local/lib/python3.7/site-packages/photon/photon.py", line 56, in requester
    response = get(url, cookies=cookie, headers=headers, verify=False, timeout=timeout, stream=True)
  File "/usr/lib/python3.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/adapters.py", line 526, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='stackoverflow.com', port=443): Read timed out. (read timeout=5)

Exception in thread Thread-43:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1049, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 908, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/requests/adapters.py", line 445, in send
    timeout=timeout
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3.7/site-packages/urllib3/util/retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='stackoverflow.com', port=443): Read timed out. (read timeout=5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/home/$USER/.local/lib/python3.7/site-packages/photon/photon.py", line 184, in extractor
    response = requester(url, delay, domain_name, user_agents, cookie, timeout) # make request to the url
  File "/home/$USER/.local/lib/python3.7/site-packages/photon/photon.py", line 56, in requester
    response = get(url, cookies=cookie, headers=headers, verify=False, timeout=timeout, stream=True)
  File "/usr/lib/python3.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/adapters.py", line 526, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='stackoverflow.com', port=443): Read timed out. (read timeout=5)

^CTraceback (most recent call last):
  File "a.py", line 3, in <module>
    data = crawl("https://stackoverflow.com", timeout=5)
  File "/home/$USER/.local/lib/python3.7/site-packages/photon/photon.py", line 296, in crawl
    flash(extractor, links, threads, delay, domain_name, user_agents, cookie, timeout, regex, keys, only_urls, main_url)
  File "/home/$USER/.local/lib/python3.7/site-packages/photon/photon.py", line 255, in flash
    threader(function, delay, threads, domain_name, user_agents, cookie, timeout, regex, keys, only_urls, main_url, splitted)
  File "/home/$USER/.local/lib/python3.7/site-packages/photon/photon.py", line 242, in threader
    thread.join()
  File "/usr/lib/python3.7/threading.py", line 1032, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/usr/lib/python3.7/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 1273, in _shutdown
    t.join()
  File "/usr/lib/python3.7/threading.py", line 1032, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

Unneeded f.close() instructions

That's a small one I've spotted browsing your code...

Considering the following block of code:

with open('%s/links.txt' % name, 'w+') as f:
    for x in storage:
        f.write(x + '\n')
f.close()

the f.close() instruction is redundant, because if you're using the with statement, the file is already closed, as shown in the Python documentation: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files

>>> with open('workfile') as f:
...     read_data = f.read()
>>> f.closed
True

Multithreading question

The approach of splitting the list is a good method as explained in your README, but wouldn't a Queue accomplish the same thing without all the list splitting stuff? You'd still only access unique items across workers.

J/w if there's a reason why you didn't do this as I was going to put in a PR for this exact thing but won't bother if you aren't doing it for some reason

Thanks!

High False Positives

So I was just crawling my own website and I noticed a high number of false positives in the results saved in links.txt files.

Contents of links.txt from running

$ python photon.py -u https://site.com -t 4
https://site.com/Facebook_Forum
https://site.com/about us
https://site.com/EmailAccount
https://site.com/YouTube channel
https://site.com/Facebook Forum
https://site.com
https://site.com/prev
https://site.com/GitHub
https://site.com/Youtube Channel
https://site.com/error_footer
https://site.com/
https://site.com/YouTube_Channel
https://site.com/Facebook_page
https://site.com/cdn-cgi/l/email-protection
https://site.com/next
https://site.com/Projects
https://site.com/facebook page
https://site.com/index.html
https://site.com/shit
https://site.com/contact
https://site.com/YouTube_channel

Of which about 95% were non-existent urls.

Any way to fix this or why this is happening?

TypeError: request() got an unexpected keyword argument 'stream'

py2 photon.py -u  http://xxxx/login.jsp --wayback
      ____  __          __
     / __ \/ /_  ____  / /_____  ____
    / /_/ / __ \/ __ \/ __/ __ \/ __ \
   / ____/ / / / /_/ / /_/ /_/ / / / /
  /_/   /_/ /_/\____/\__/\____/_/ /_/ v1.1.5

[~] Fetching URLs from archive.org
[+] Retrieved -1 URLs from archive.org
[~] Level 1: 1 URLs
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "photon.py", line 435, in extractor
    response = requester(url) # make request to the url
  File "photon.py", line 285, in requester
    return normal(url)
  File "photon.py", line 239, in normal
    response = get(url, cookies=cook, headers=finalHeaders, verify=False, timeout=timeout, stream=True)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 65, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/safe_mode.py", line 39, in wrapped
    return function(method, url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 51, in request
    return session.request(method=method, url=url, **kwargs)
TypeError: request() got an unexpected keyword argument 'stream'

Option for custom user agent string

The list of user agents can be easily edited in the source code but a command line option to specify one specific user agent string would be nice.
I use Photon on my own server and want to delete Photons requests from my log files.

A Docker img ?

Hi,

Would you like a Docker image to your project ? I can do the PR if so.

Regards,
Stephen

get_lfd

Hi
After using the option --update i get this error:

Traceback (most recent call last):
File "photon.py", line 187, in
domain = topLevel(main_url)
File "photon.py", line 182, in topLevel
toplevel = tld.get_fld(host, fix_protocol=True)
AttributeError: 'module' object has no attribute 'get_fld'

Any clue?
I installed also on another system and still the same
change target: same
delete totally and clone again: same

OS Kali3
Thanks

[Multiple Suggestions]

Okay so I was going through the Photon library and I found some areas of improvements.

  • Let photon be just a powerful crawler. Adding a option like --plugin <name of plugin> / --plugin=<name of plugin> is a good idea. Why?
    • It will make it easier for users to run custom plugins which they might have developed on their own.
    • Think what bloat + heck it will be adding custom arguments for each plugin developed like if there are 100 plugins, will you serve 100 arguments? One for each plugin? Sick idea isn't it?
    • Make it something similar to the Nmap Scripting Engine. Your basic work is extensive crawling, but custom plugins help it to extend features.
  • Add guidelines for custom plugin development which will can be used with photon without any changes in main code.
  • Include exporter as a part of main build, not as a plugin. Plugins will be strictly restricted to extending features and enrichment of capabilities only.
  • Change the description of photon to a suitable one Reconnaissance is way too broad for photon so far as the plugins available are concerned.
  • Refer the url of the build badge to Travis CI where the build is being maintained (it currently points to commits history).
  • Add a verbose mode -v/--verbose for photon. Sometimes just that small output of what it has crawled doesn't satisfy. I have done it!

Error while using a cookie

argv: python photon.py -u https://.... -c PHPSESSID=.... -t 150 -l 10
Traceback:

Traceback (most recent call last):
  File "/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "photon.py", line 408, in extractor
    response = requester(url) # make request to the url
  File "photon.py", line 258, in requester
    return normal(url)
  File "photon.py", line 212, in normal
    response = get(url, cookies=cook, headers=headers, verify=False, timeout=timeout, stream=True)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/sessions.py", line 498, in request
    prep = self.prepare_request(req)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/sessions.py", line 419, in prepare_request
    cookies = cookiejar_from_dict(cookies)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/cookies.py", line 522, in cookiejar_from_dict
    cookiejar.set_cookie(create_cookie(name, cookie_dict[name]))
TypeError: string indices must be integers, not str

Full output:

      ____  __          __
     / __ \/ /_  ____  / /_____  ____
    / /_/ / __ \/ __ \/ __/ __ \/ __ \
   / ____/ / / / /_/ / /_/ /_/ / / / /
  /_/   /_/ /_/\____/\__/\____/_/ /_/ v1.1.4

 Level 1: 1 URLs
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "photon.py", line 408, in extractor
    response = requester(url) # make request to the url
  File "photon.py", line 258, in requester
    return normal(url)
  File "photon.py", line 212, in normal
    response = get(url, cookies=cook, headers=headers, verify=False, timeout=timeout, stream=True)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/sessions.py", line 498, in request
    prep = self.prepare_request(req)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/sessions.py", line 419, in prepare_request
    cookies = cookiejar_from_dict(cookies)
  File "/Users/admin/bin/tools/Photon/venv3/lib/python2.7/site-packages/requests/cookies.py", line 522, in cookiejar_from_dict
    cookiejar.set_cookie(create_cookie(name, cookie_dict[name]))
TypeError: string indices must be integers, not str

 Progress: 1/1
 Crawling 0 JavaScript files

--------------------------------------------------
 Internal: 1
--------------------------------------------------
 Total requests made: 1
 Total time taken: 0 minutes 0 seconds
 Requests per second: 1
 Results saved in ... directory

No matter what arguments I drop be it -l or -t still does the same thing.

python is not suitable for multi thread(because of the GIL) I think you should use multi process

This function starts multiple threads for a function

def threader(function, *urls):
    threads = [] # list of threads
    urls = urls[0] # because urls is a tuple
    for url in urls: # iterating over urls
        task = threading.Thread(target=function, args=(url,))
        threads.append(task)
    # start threads
    for thread in threads:
        thread.start()
    # wait for all threads to complete their work
    for thread in threads:
        thread.join()
    # delete threads
    del threads[:]

DNS Plantext Output

Request for DNS plaintext output:

I was wondering if there could be an enhancement for having Photon spit out the DNS enumerated hostnames to accompany the beautiful image that it produces. This thing is great for sub-domain mapping but typing them out is a pain...

Great product!
spelling

Add directory busting

An option to supply a dictionary of strings to brute force at each level of the directory tree would be great.

It's not uncommon to see juicy dirs that are not linked by the rest of the app.

Release

Hmm, PyPI tell me that the latest release is 1.1.9. The changelog says it's 1.1.4 (I assume that the changelog wasn't updated) and the latest release on GitHub is 1.1.5.

Also, it seems that the setup.py file is missing.

Usually, I don't care where the package source is coming from (the Fedora Package guidelines don't make a statement about that) but I would be nice if it doesn't change too often.

Spelling

-c flag in usage spelled COOK instead of COOKIE

Nothing else is abbreviated so it would seem that this is a typo.

spelling

Character handling

Traceback (most recent call last):
File "photon.py", line 382, in
f.write(x + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 137: ordinal not in range(128)

Fix:
change f.write(x + '\n')
to f.write(x.encode('utf-8') + '\n')

RuntimeError: Set changed size during iteration in Photon 1.0.7

[!] Progress: 1/1
[~] Level 2: 387 URLs
[!] Progress: 387/387
[~] Level 3: 18078 URLs
[!] Progress: 18078/18078
[~] Level 4: 90143 URLs
[!] Progress: 39750/90143^C
[~] Crawling 0 JavaScript files

Traceback (most recent call last):
  File "photon.py", line 454, in <module>
    for url in external:
RuntimeError: Set changed size during iteration

Generating invalid external links

if link.startswith(main_url):

If a website use protocol relative URL (//example.com/foo), Photon mistakenly detect internal links as external. To fix it you can compare links based on netloc, for example:
change:

if link.startswith(main_url):

to:

if urlparse(link).netloc == urlparse(main_url).netloc:

Also consider that in case of www sub-domain, same thing may happen.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.