Giter Site home page Giter Site logo

soxoj / maigret Goto Github PK

View Code? Open in Web Editor NEW
9.7K 90.0 754.0 5.59 MB

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

Home Page: https://t.me/osint_maigret_bot

License: MIT License

Python 65.07% Dockerfile 0.14% Smarty 3.69% CSS 0.18% HTML 28.98% Makefile 0.30% Jupyter Notebook 0.40% Shell 0.01% Batchfile 1.23%
osint social-network identification parsing detective socmint dossier username-checker nickname username-search

maigret's Issues

Error for kinogo.by

Hello,

I have the following error :

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/aiohttp-3.7.4-py3.6-linux-x86_64.egg/aiohttp/connector.py", line 999, in _create_direct_connection
    hosts = await asyncio.shield(host_resolved)
  File "/usr/local/lib/python3.6/dist-packages/aiohttp-3.7.4-py3.6-linux-x86_64.egg/aiohttp/connector.py", line 865, in _resolve_host
    addrs = await self._resolver.resolve(host, port, family=self._family)
  File "/usr/local/lib/python3.6/dist-packages/aiohttp-3.7.4-py3.6-linux-x86_64.egg/aiohttp/resolver.py", line 36, in resolve
    flags=socket.AI_ADDRCONFIG,
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/maigret-0.1.14-py3.6.egg/maigret/checking.py", line 49, in get_response
    response = await request_future
  File "/usr/local/lib/python3.6/dist-packages/aiohttp-3.7.4-py3.6-linux-x86_64.egg/aiohttp/client.py", line 521, in _request
    req, traces=traces, timeout=real_timeout
  File "/usr/local/lib/python3.6/dist-packages/aiohttp-3.7.4-py3.6-linux-x86_64.egg/aiohttp/connector.py", line 535, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/usr/local/lib/python3.6/dist-packages/aiohttp-3.7.4-py3.6-linux-x86_64.egg/aiohttp/connector.py", line 892, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "/usr/local/lib/python3.6/dist-packages/aiohttp-3.7.4-py3.6-linux-x86_64.egg/aiohttp/connector.py", line 1011, in _create_direct_connection
    raise ClientConnectorError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host kinogo.by:443 ssl:default [Name or service not known]

I'm going to add this to error catching : https://github.com/soxoj/maigret/blob/main/maigret/checking.py#L61

sites.md missing

I've seen you deleted the sites.md file that list all the supported sites, but in the readme there is always the line Currently supported more than 1500 sites (full list). with this link :
https://github.com/soxoj/maigret/blob/main/sites.md which redirects to nothing.

Proxy policy

Implement proxy use logic with the ability to specify a policy for sites separately:

  • retry (make requests-retries through a proxy)
  • censorship (make requests-retries though a proxy if censorship error got)
  • always (always make requests through a proxy)

Publish the false positive fix as 0.3.1

Currently is generating false positives using non-existent usernames for Reddit and Facebook.

maigret asuidiausdiasndIOijasda --site Reddit
[-] Starting a search on top 3 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username asuidiausdiasndIOijasda on:
[+] Reddit: https://www.reddit.com/user/asuidiausdiasndIOijasda
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.98it/s]
[*] Short text report:
Search by username asuidiausdiasndIOijasda returned 1 accounts.
Extended info extracted from 0 accounts.
Interests (tags): discussion, news

new async exception

Hello o/

I have now this new exception with a fresh install of maigret :

[-] Starting a search on top 500 sites from the Maigret database...
[!] You can run search by full list of sites with flag `-a`
[*] Checking username someuser on:
Traceback (most recent call last):
  File "/usr/local/bin/maigret", line 11, in <module>
    load_entry_point('maigret==0.1.15', 'console_scripts', 'maigret')()
  File "/usr/local/lib/python3.6/dist-packages/maigret-0.1.15-py3.6.egg/maigret/maigret.py", line 423, in run
    loop.run_until_complete(main())
  File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.6/dist-packages/maigret-0.1.15-py3.6.egg/maigret/maigret.py", line 355, in main
    max_connections=args.connections,
  File "/usr/local/lib/python3.6/dist-packages/maigret-0.1.15-py3.6.egg/maigret/checking.py", line 559, in maigret
    results = await executor.run(coroutines)
  File "/usr/local/lib/python3.6/dist-packages/maigret-0.1.15-py3.6.egg/maigret/checking.py", line 54, in run
    results = await self._run(tasks)
  File "/usr/local/lib/python3.6/dist-packages/maigret-0.1.15-py3.6.egg/maigret/checking.py", line 123, in _run
    for _ in range(self.workers_count)]
  File "/usr/local/lib/python3.6/dist-packages/maigret-0.1.15-py3.6.egg/maigret/checking.py", line 123, in <listcomp>
    for _ in range(self.workers_count)]
AttributeError: module 'asyncio' has no attribute 'create_task'
[base_events.py:1285] ERROR  13:13:40 Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f2f9e880f28>
sys:1: RuntimeWarning: coroutine 'ClientSession._request' was never awaited

Updating tokens for scraping sites for win32 binary

Checklist

  • I'm reporting a bug in Maigret functionality
  • I've checked for similar bug reports including closed ones
  • I've checked for pull requests that attempt to fix this bug

Description

Info about Maigret version you are running and environment (--version, operation system, ISP provuder):
win32 package

How to reproduce this bug (commandline options / conditions):
--site Twitter --print-errors several times

Tokens did not update due to specific of binary package (json with sites is updated in temporary dir only)

xhtmp2pdf error with Win32 standalone binary

Crashing while generating PDF report

Traceback (most recent call last):
  File "maigret_standalone.py", line 7, in <module>
  File "asyncio\runners.py", line 43, in run
  File "asyncio\base_events.py", line 579, in run_until_complete
  File "maigret\maigret.py", line 696, in main
  File "maigret\report.py", line 79, in save_pdf_report
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
  File "xhtml2pdf\pisa.py", line 18, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
  File "xhtml2pdf\default.py", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
  File "xhtml2pdf\util.py", line 22, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
  File "arabic_reshaper\__init__.py", line 3, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
  File "arabic_reshaper\arabic_reshaper.py", line 242, in <module>
  File "arabic_reshaper\arabic_reshaper.py", line 64, in __init__
  File "arabic_reshaper\reshaper_config.py", line 44, in auto_config
Exception: Default configuration file C:\Users\User\AppData\Local\Temp\_MEI2216
\arabic_reshaper\default-config.ini not found, check the module installation.
[8872] Failed to execute script 'maigret_standalone' due to unhandled exception

Add support of mirrors and similar links

Often there are different links to the same profile with slightly changed URLs, e.g.:

https://flickr.com/people/blue
https://flickr.com/photos/blue

Also, there are mirrors of the site that update information on the fly, thus, useless, but should be marked as known, e.g.:

https://vk.com/ivanov24
https://vk-look.com/user/ivanov24
https://findmerr.com/user/ivanov24

The proposal is: add mirrors section to the site info and don't do any actions for these URLs except checking them for backward-search of known sites (through Marple, for example).

Web interface

I am willing to write the front-end code for the application in order to create a website for easier use of the software. Is this something that interests you?

KeyError: 'followerCount'

This does not happen in all of my searches, but one in particular, which I will redact, dies during searching eBay.
[+] Ebay: https://www.ebay.com/usr/fluffybutt
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "maigret/main.py", line 14, in
maigret.main()
File "maigret/maigret.py", line 752, in main
debug=args.debug)
File "maigret/maigret.py", line 404, in sherlock
extracted_ids_data = extract(r.text)
File "/home/bob/.local/lib/python3.6/site-packages/socid_extractor/main.py", line 662, in extract
value = get_field(json_data)
File "/home/bob/.local/lib/python3.6/site-packages/socid_extractor/main.py", line 565, in
'follower_count': lambda x: x['person-contacts-count-models'][0]['followerCount'],
KeyError: 'followerCount'

Any suggestion how to deal with this?

Saving HTML report in home dir for Win32 package

Checklist

  • I'm reporting a bug in Maigret functionality
  • I've checked for similar bug reports including closed ones
  • I've checked for pull requests that attempt to fix this bug

Description

Info about Maigret version you are running and environment (--version, operation system, ISP provuder):
win32 package

How to reproduce this bug (commandline options / conditions):
-H

Reports saved to c:\Users\%USERNAME%\reports\, should be %workdir%\reports\

Выносит WiFi при полном сканировании (-a)

Приветствую, уважаемые разработчики! Наблюдаю проблему при использовании программы в режиме сканирования по всему списку ресурсов (с параметром -a). Примерно на 9-й сотне из 2428 проверяемых сайтов, отключается интернет и ложится. Выходит из строя именно точка доступа. Эффект наблюдается постоянно при запуске на стандартных настройках с параметром -a.

Не могли бы вы подсказать, что можно попробовать предпринять. Может задержку между запросами какую-то ввести или еще что....

Google Cloud Shell exceptions

[base_events.py:1608] ERROR  21:18:01 SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x7f916014df60>
transport: <_SelectorSocketTransport fd=191 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/usr/lib/python3.7/asyncio/sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "/usr/lib/python3.7/ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609
maigret.py 0.1.15
Socid-extractor:  0.0.13
Aiohttp:  3.7.4
Requests:  2.25.1
Python:  3.7.3

SyntaxError: EOL while scanning string literal

image

text
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.5/x64/bin/maigret", line 33, in <module>
    sys.exit(load_entry_point('maigret==0.2.3', 'console_scripts', 'maigret')())
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/site-packages/maigret/maigret.py", line 656, in run
    loop.run_until_complete(main())
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/site-packages/maigret/maigret.py", line 581, in main
    results = await maigret(
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/site-packages/maigret/checking.py", line 575, in maigret
    cur_results = await executor.run(tasks_dict.values())
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/site-packages/maigret/executors.py", line 25, in run
    results = await self._run(tasks)
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/site-packages/maigret/executors.py", line 40, in _run
    return await asyncio.gather(*futures)
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/site-packages/maigret/checking.py", line 428, in check_site_for_username
    query_notify.update(response_result['status'], site.similar_search)
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/site-packages/maigret/notify.py", line 233, in update
    ids_data_text = get_dict_ascii_tree(self.result.ids_data.items(), " ")
  File "/opt/hostedtoolcache/Python/3.9.5/x64/lib/python3.9/site-packages/maigret/utils.py", line 78, in get_dict_ascii_tree
    field_value = get_dict_ascii_tree(eval(field_value), prepend_symbols)
  File "<string>", line 1
    ['maximousblk
                ^
SyntaxError: EOL while scanning string literal
Error: Process completed with exit code 1.

TypeError: Object of type MaigretSite is not JSON serializable

error:
image

text
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.4/x64/bin/maigret", line 33, in <module>
    sys.exit(load_entry_point('maigret==0.1.20', 'console_scripts', 'maigret')())
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/site-packages/maigret/maigret.py", line 617, in run
    loop.run_until_complete(main())
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/site-packages/maigret/maigret.py", line 588, in main
    save_json_report(filename, username, results, report_type=args.json)
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/site-packages/maigret/report.py", line 71, in save_json_report
    generate_json_report(username, results, f, report_type=report_type)
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/site-packages/maigret/report.py", line 280, in generate_json_report
    file.write(json.dumps(all_json))
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/opt/hostedtoolcache/Python/3.9.4/x64/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type MaigretSite is not JSON serializable
Error: Process completed with exit code 1.

using latest commit: 1afdda7

pyvis error with Win32 standalone binary

Crashing while generating graph report

Traceback (most recent call last):
  File "maigret_standalone.py", line 7, in <module>
  File "asyncio\runners.py", line 43, in run
  File "asyncio\base_events.py", line 579, in run_until_complete
  File "maigret\maigret.py", line 701, in main
  File "maigret\report.py", line 210, in save_graph_report
  File "pyvis\network.py", line 74, in __init__
  File "pyvis\network.py", line 498, in prep_notebook
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\User\\AppDat
a\\Local\\Temp\\_MEI83082\\pyvis/templates/template.html'
[6996] Failed to execute script 'maigret_standalone' due to unhandled exception!

NameError: name `print_ascii_tree` is not defined

When using more than 30s of timeout, I get this error everytime

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.2/x64/bin/maigret", line 33, in <module>
    sys.exit(load_entry_point('maigret==0.1.18', 'console_scripts', 'maigret')())
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/maigret.py", line 428, in run
    loop.run_until_complete(main())
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/maigret.py", line 348, in main
    results = await maigret(username=username,
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/checking.py", line 576, in maigret
    results = await executor.run(coroutines)
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/checking.py", line 55, in run
    results = await self._run(tasks)
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/checking.py", line 70, in _run
    return await asyncio.gather(*futures)
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/checking.py", line 203, in update_site_dict_from_response
    return sitename, process_site_result(response, query_notify, logger, results_info, site_obj)
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/checking.py", line 385, in process_site_result
    query_notify.update(result, site.similar_search)
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/notify.py", line 199, in update
    ids_data_text = get_dict_ascii_tree(self.result.ids_data.items(), ' ')
  File "/opt/hostedtoolcache/Python/3.9.2/x64/lib/python3.9/site-packages/maigret/utils.py", line 70, in get_dict_ascii_tree
    field_value = print_ascii_tree(eval(field_value), prepend_symbols)
NameError: name 'print_ascii_tree' is not defined

Logs: Generate Report.txt

Incompatible version requirements

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
flask 2.0.1 requires Jinja2>=3.0, but you have jinja2 2.11.3 which is incompatible.
but I see in requirements.txt is Jinja2==2.11.3

[nodename nor servname provided, or not known]

While trying to run the program, a lot of sites when entering --print-error answer this message :
[?] TheSite: Connecting failure error: Cannot connect to host www.thesite.com:443 ssl:default [nodename nor servname provided, or not known]

I'm running the program on Python 3.9.6, Mac OS 11.4

pvpru

сайт pvpru никогда не будет найден ник, даже если он там есть.

Вот ник Lexodey на сайте он есть в Мерге его не находит, потому что этот сайт специально заблокировал страницу с парсингом пользователей (уже как несколько месяцев назад), по причине "отравления" сайта Шерлоком

Stuck on last two websites

When I run maigret on my username with the -a option the CLI gets stuck on the last two websites

Screenshot_20210401-140454_Chrome

This reduces to getting stuck on one website when I use only top 50 and doesn't happen when using top 30 or less

Feature Request: sort by number of data points

It would be better if the exported reports (atleast pdf and html) were sorted by the number of data points from the sources. So the important results are at the top and the ones that do not have any info are at the bottom.

Win32 binary warnings

#209

[1332] WARNING: file already exists but should not: C:\Users\User\AppData\Local\
Temp\_MEI13322\reportlab\graphics\_renderPM.cp37-win_amd64.pyd
[1332] WARNING: file already exists but should not: C:\Users\User\AppData\Local\
Temp\_MEI13322\reportlab\lib\_rl_accel.cp37-win_amd64.pyd

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.