Giter Site home page Giter Site logo

ail-project / ail-framework Goto Github PK

View Code? Open in Web Editor NEW
502.0 28.0 80.0 94.68 MB

AIL framework - Analysis Information Leak framework

License: GNU Affero General Public License v3.0

Python 43.51% Shell 2.17% CSS 1.04% JavaScript 17.65% Dockerfile 0.04% HTML 35.59% YARA 0.01%
ail-framework information-extraction information-security data-mining leak

ail-framework's People

Contributors

adulau avatar alainfou avatar blackbern avatar cudeso avatar davidcruciani avatar fukusuket avatar gallypette avatar jhedden avatar kovacsbalu avatar kywoskylake avatar markus-lassfolk avatar mokaddem avatar ngsimon avatar nmd03 avatar obilodeau avatar osagit avatar paulsec avatar rafiot avatar raggadhub avatar rommelfs avatar shadow2033 avatar simonsigre avatar stamparm avatar starow avatar steveclement avatar sw-pschmied avatar terrtia avatar tonyjabbour avatar wimpyman avatar xme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ail-framework's Issues

Crawlers out of memory: Kill child process

I don't know if you experience that too. But I have a lot of docker cgroup out of memory issues resulting into the killing of child process with splash crawler. Don't know if there is an actual negative impact.
I did set --memory=2G to --memory=3G at
screen -S "Docker_Splash" -X screen -t "docker_splash:$port_number" bash -c 'sudo docker run -d -p '$port_number':8050 --restart=always --cpus=1 --memory=3G -v '$f':/etc/splash/proxy-profiles/ --net="bridge" scrapinghub/splash --maxrss '$u'; read x' in bin/torcrawler/launch_splash_crawler.sh and that seems to have resolved this.
I am running 6 crawlers concurrently.
Don't know if this should be changed as a standard setting in the source.

Bug?: YARA tracker notification not working

I am receiving notifications from my regex trackers without a problem but even if there was a hit in the YARA trackers, which can be seen in /trackers in the sparkline column, and configured email notification address, I don't receive anything.
I don't know how to debug that. Can you confirm for sure that email notification for YARA trackers is working properly?

Stops after one hour

When I start AIL after about an hour it stops. The command executed is the ./LAUNCH.sh within the venv.
The interface works but the feeders and pystemon do not report anything to the console. Is there any way to make them persistent?

Thanks and greetings

Crawling error

HI,

when I start crawling onion sites, all appears down on dashboard.

After view the screen running the Crawler_AIL I have this error:

Traceback (most recent call last):
File "./Crawler.py", line 413, in
crawler_config = load_crawler_config(to_crawl['type_service'], url_data['domain'], to_crawl['paste'], to_crawl['url'], date)
File "./Crawler.py", line 189, in load_crawler_config
crawler_config['crawler_options'] = get_crawler_config(redis_crawler, 'auto', service_type, domain, url=url)
File "./Crawler.py", line 173, in get_crawler_config
crawler_options['time'] = int(config['time'])
KeyError: 'time'

UI improvement

I am looking at the leakhunter page and I think that it would be super useful to have more informative links to items, for instance:

  • instead of twitter/2020/09/02/1300683477965209601.gz, we could have the tweet's content,
  • instead of urlextract/2020/09/02/blog.ardennes-developpement.com_actualites_european-green-deal-appel-projets-h2020-1-milliard-euros840d3dcc-1c31-4cd7-917a-3f9b9e3925b3.gz we could have the page title.

Detailed logs

I am having problems getting email notifications up and running. So I am wondering if there is a place where detailed logs are stored or if there is some kind of debug mode or testscripts.

Feature request: Add tag to tracker overview (or add some grouping functionality)

It would be great to add the tag (or description) of a tracker to the tracker overview in /trackers as a column. Then it would be possible to order by tag (or description), making it easier to have an overview of the trackers purpose. Alternatively add the possibility to add trackers to definable groups and allow to view those groups.

Extracting Google Analytics ID for correlation

Extracting Google Analytics ID for correlation

    	<!-- Global site tag (gtag.js) - Google Analytics -->
    <!--OLD CODE<script async src="https://www.googletagmanager.com/gtag/js?id=UA-58643-34"></script>-->

    <!--OLD CODE<script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());

      gtag('config', 'UA-58643-34');
    </script>-->

Question/Documentation/Usability: How does the search exactly work

In https://aildomain.tld:7000/search I sometimes find stuff sometimes I don't. How does the search work exactly?

  • Is it possible to search also search for the name of a paste like in https://pastebin.com/a68jmkq9 searching 'a68jmkq9' yields the paste ingested by ail? => This would be a great feature if this isnt already possible

  • Is the serach dependant of the loaded index? If yes it would be also great to have the possibility to search in all indices.

A bit more detail or overhauled search site would help to highten usability.

Bug: Cannot delete user

I have added a read-only test user and wanted to delete it afterwards but it doesnt work.
delete_user_fails

Flask not running

When I start AIL Flask it does not start and I force it with the -m option and it does not lift. Why would this happen?
Thank you and greetings

image
image

Add Source

Hi all,

it`s possible to consider to make a ingest for the site xup.in.

thanks. #

Bug?: Hostname is missing in notification email

When configuring ...

##### Notifications ######
[Notifications]
ail_domain = https://<sub>.<domain>.<tld>:7000 
...

to something else than localhost the sent out email body doesn't show the ail_domain but nothing e.g. ...

item id: submitted/2020/09/04/54cc3ebb-81ce-4609-8cd6-9d9022876022.gz
url: /showsavedpaste/?paste=submitted/2020/09/04/54cc3ebb-81ce-4609-8cd6-9d9022876022.gz

Documentation: User roles

Some documentation of the differences and use-cases of the possible user roles that can be chosen in https://domain.tld:7000/settings/create_user would be great.

running pystemon.py has multiple indentation errors

(AILENV) ail@ail:/AIL-framework$ cd pystemon/
(AILENV) ail@ail:
/AIL-framework/pystemon$ ./pystemon.py
File "./pystemon.py", line 69
exit('You need python version 2.7 or newer.')
^
TabError: inconsistent use of tabs and spaces in indentation
(AILENV) ail@ail:~/AIL-framework/pystemon$

My goal is to feed data to AIL, but somehow I am unable to, been trying for a while now.

Feature request - VT hunting integration

It'd be nice to have an integration with the VT hunting API as source feed.
The integration would download the matched binaries/files and then ingest them as input like anything else and apply all the other magical AIL features such as pattern matching and so on.

Feature request: Editing YARA rules for leak hunter in the WebUI

As much as I love the YARA rules for leak hunter it would be really useful if it was possible to edit the rules for already generated trackers in the WebUI. Especially if a lot of YARA trackers are used it is hard to identify the correct rule via the UUID at the filesystem.

ardb repo from yinqiwen doesn't seem to be maintained anymore

ARDB

test ! -d ardb/ && git clone https://github.com/yinqiwen/ardb.git pushd ardb/ make popd

I have the following gcc version when I try to compile ardb I get this error:

error: implicitly-declared 
‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ 
is deprecated [-Werror=deprecated-copy]
gcc (Debian 9.3.0-10) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Someone has already opened an issue over yinqiwen/ardb#484
This issue seems to related to a gcc-9 change
https://gcc.gnu.org/gcc-9/changes.html
grpc/grpc#19570
GPUOpen-Drivers/AMDVLK#131

Are there any workarounds? Thanks

Error when installing ARDB

020-06-14 10:11:19 (694 KB/s) - ‘/home/polatalemdar/circl/ail-framework/ardb/src/../deps/rocksdb-5.14.2.tar.gz’ saved [4685894]

<<<<< Done dowloading RocksDB

Unpacking ROCKSDB
<<<<< Done unpacking ROCKSDB
Building ROCKSDB
make[2]: Entering directory '/home/polatalemdar/circl/ail-framework/ardb/deps/rocksdb-5.14.2'
GEN util/build_version.cc
GEN util/build_version.cc
CC cache/clock_cache.o
CC cache/lru_cache.o
CC cache/sharded_cache.o
CC db/builder.o
In file included from ./db/range_del_aggregator.h:16,
from ./db/memtable.h:19,
from ./db/memtable_list.h:17,
from ./db/column_family.h:17,
from ./db/version_set.h:31,
from ./db/compaction.h:11,
from ./db/compaction_iterator.h:12,
from db/builder.cc:16:
./db/version_edit.h: In constructor ‘rocksdb::FdWithKeyRange::FdWithKeyRange(rocksdb::FileDescriptor, rocksdb::Slice, rocksdb::Slice, rocksdb::FileMetaData*)’:
./db/version_edit.h:157:33: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
157 | largest_key(_largest_key) {}
| ^
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
./db/version_edit.h: In instantiation of ‘constexpr std::pair<_T1, _T2>::pair(_U1&&, _U2&&) [with _U1 = int&; _U2 = rocksdb::FileMetaData; typename std::enable_if<(std::_PCC<true, _T1, _T2>::_MoveConstructiblePair<_U1, _U2>() && std::_PCC<true, _T1, _T2>::_ImplicitlyMoveConvertiblePair<_U1, _U2>()), bool>::type = true; _T1 = int; _T2 = rocksdb::FileMetaData]’:
/usr/include/c++/9/ext/new_allocator.h:147:4: required from ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>]’
/usr/include/c++/9/bits/alloc_traits.h:484:4: required from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
/usr/include/c++/9/bits/vector.tcc:115:30: required from ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>; _Alloc = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
./db/version_edit.h:227:48: required from here
./db/version_edit.h:76:8: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
76 | struct FileMetaData {
| ^~~~~~~~~~~~
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
In file included from /usr/include/c++/9/bits/stl_algobase.h:64,
from /usr/include/c++/9/bits/char_traits.h:39,
from /usr/include/c++/9/string:40,
from ./db/builder.h:9,
from db/builder.cc:10:
/usr/include/c++/9/bits/stl_pair.h:342:64: note: synthesized method ‘rocksdb::FileMetaData::FileMetaData(rocksdb::FileMetaData&&)’ first required here
342 | : first(std::forward<_U1>(__x)), second(std::forward<_U2>(__y)) { }
| ^
In file included from ./db/range_del_aggregator.h:16,
from ./db/memtable.h:19,
from ./db/memtable_list.h:17,
from ./db/column_family.h:17,
from ./db/version_set.h:31,
from ./db/compaction.h:11,
from ./db/compaction_iterator.h:12,
from db/builder.cc:16:
./db/version_edit.h: In instantiation of ‘constexpr std::pair<_T1, _T2>::pair(_U1&&, const _T2&) [with _U1 = int&; typename std::enable_if<std::_PCC<true, _T1, _T2>::_MoveCopyPair<true, _U1, _T2>(), bool>::type = true; _T1 = int; _T2 = rocksdb::FileMetaData]’:
/usr/include/c++/9/ext/new_allocator.h:147:4: required from ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>]’
/usr/include/c++/9/bits/alloc_traits.h:484:4: required from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
/usr/include/c++/9/bits/vector.tcc:115:30: required from ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>; _Alloc = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
./db/version_edit.h:232:37: required from here
./db/version_edit.h:76:8: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
76 | struct FileMetaData {
| ^~~~~~~~~~~~
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
In file included from /usr/include/c++/9/bits/stl_algobase.h:64,
from /usr/include/c++/9/bits/char_traits.h:39,
from /usr/include/c++/9/string:40,
from ./db/builder.h:9,
from db/builder.cc:10:
/usr/include/c++/9/bits/stl_pair.h:312:51: note: synthesized method ‘rocksdb::FileMetaData::FileMetaData(const rocksdb::FileMetaData&)’ first required here
312 | : first(std::forward<_U1>(__x)), second(__y) { }
| ^
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:1879: db/builder.o] Error 1
make[2]: Leaving directory '/home/polatalemdar/circl/ail-framework/ardb/deps/rocksdb-5.14.2'
make[1]: *** [Makefile:401: /home/polatalemdar/circl/ail-framework/ardb/src/../deps/rocksdb-5.14.2/librocksdb.a] Error 2
make[1]: Leaving directory '/home/polatalemdar/circl/ail-framework/ardb/src'
make: *** [Makefile:4: all] Error 2

Documentation: Module Manager

It would be really great if there was a bit more documentation on how to use the module manager. Or is it depricated?
I am asking because I have complete different behaviour with the queues using ail-framework with or without the module manager.

Feature-Request: Allow configuration of proxy for pystemon

Would be great if it was possible to configure a proxy for pystemon on system level via e.g. http_proxy=http://proxy.domain.tld:8080 and https_proxy=http://proxy.domain.tld:8080 . Because even if using the most current repo of pystemon where you can define a list of proxies to use in the pystemon config it doesnt seem to work very well. If configuring a proxy systemwide the splash crawler fails.
So probably it is possible to have some variable read from the core.cfg and put in front of the feeder launcher script in LAUNCH.sh.
A proxy could be neccesary if using pastebin pro account tied to a special external IP.
Would be a great and useful addtion, also for companies where pastesites are blocked and are allowed using a certain proxy.

Problems with YARA rules

I love the YARA rules feature for leak hunter but it seems that there are some problems. When choosing a default rule like that
grafik
an error is thrown ...
grafik

Flask not working

I`m deploying a fresh AIL but when try to run the server the flask server wont run.

this is the error:

Misp not connected
The HIVE not connected
VT submission is disabled
Traceback (most recent call last):
File "/usr/lib/python3.6/configparser.py", line 1138, in _unify_values
sectiondict = self._sections[section]
KeyError: 'Splash_Manager'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./Flask_server.py", line 41, in
from blueprints.crawler_splash import crawler_splash
File "/opt/ail-framework/var/www/blueprints/crawler_splash.py", line 28, in
import crawlers
File "/opt/ail-framework//bin/lib/crawlers.py", line 41, in
splash_manager_url = config_loader.get_config_str('Splash_Manager', 'splash_url')
File "/opt/ail-framework//bin/lib/ConfigLoader.py", line 45, in get_config_str
return self.cfg.get(section, key_name)
File "/usr/lib/python3.6/configparser.py", line 781, in get
d = self._unify_values(section, vars)
File "/usr/lib/python3.6/configparser.py", line 1141, in _unify_values
raise NoSectionError(section)
configparser.NoSectionError: No section: 'Splash_Manager'

thirdparty update: no module named pytaxonomies

After last pull we getting this error, when trying to run AIL.

Traceback (most recent call last):
File "./Flask_server.py", line 24, in
from pytaxonomies import Taxonomies
ModuleNotFoundError: No module named 'pytaxonomies'

I cannot find the module, which flask is looking for. I tried to run -t thirdparty update, but without luck. Can anyone help. Thank You

Add original tweet ID reference

Twitter has an URL where you can get a tweet only by its ID:

https://mobile.twitter.com/user/status/{ID}

Adding the reference from a Twitter item would allow user to find back the original tweet.

Feature Request: Choose source for tracker hit

It would be sometimes comfortable if it was possible to have the possibility to only define certain sources for a tracker to hit. For example I d'like to to have a certain regex tracker which only hits if the source was pastebin_pro or crawler or probably also specific url or both.

Installation output potential errors

ERROR: peepdf 0.4.2 has requirement colorama==0.3.7, but you'll have colorama 0.4.3 which is incompatible.
ERROR: peepdf 0.4.2 has requirement Pillow==3.2.0, but you'll have pillow 7.2.0 which is incompatible.
ERROR: sflock 0.3.10 has requirement click==6.6, but you'll have click 7.1.2 which is incompatible.
ERROR: sflock 0.3.10 has requirement python-magic==0.4.12, but you'll have python-magic 0.4.18 which is incompatible.

/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
[nltk_data] Downloading package vader_lexicon to
[nltk_data] /home/ail/nltk_data...
/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
.

Installation error

I get mv: Call of stat for 'temp/jquery.canvasjs.min.js' not possible: file or directory not found .
It is the error message translated from German, so it may not be completely correct.

Documentation: core.cfg

Would be really great to get some more insight in all the settings and their dependencies in core.cfg.

Enhancements for handling huge pastes (probably also a bug)

If accessing a huge paste via the web interface on https://<aildomain.tld>:7000/showsavedpaste/? it is not displayed/loaded properly ...
grafik

A possible solution for that could be to initially just display/load the first 100 lines and when scrolling down dynamically load more lines.

More so it would be great to have a button in the webinterface to just download the raw content as .txt file.
(I know it is possible via right click on [raw content] and select "save target as". Nevertheless a button just doing that would enhance the usability)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.