bayshorenetworks / yextend Goto Github PK
View Code? Open in Web Editor NEWYara integrated software to handle archive file data.
License: BSD 3-Clause "New" or "Revised" License
Yara integrated software to handle archive file data.
License: BSD 3-Clause "New" or "Revised" License
The current stable version of yara is 4.4.0, but the current codebase can not compiled with that version. This makes it impossible to create a package, because the dependency is newer (see here: https://gitlab.alpinelinux.org/alpine/aports/-/issues/9625).
I am aware of the different branches for different versions of yara
, but there is no branch for the 4.4.x version. Is there a valid reason for that branches? I can not see the need 3+ special yara related branches in various state of divergence to the master branch.
./run_yextend -r /home/myfiles/yara/rules/email_index.yar -t /home/file/20200522141244/redis2.4.5.zip -j
JSON parse error : [json.exception.parse_error.101] parse error at 25: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara½'
JSON parse error : [json.exception.parse_error.101] parse error at 25: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara½'
JSON parse error : [json.exception.parse_error.101] parse error at 25: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 135: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 136: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 138: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 136: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 137: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 141: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 139: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 138: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 143: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 138: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 138: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 142: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 143: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 141: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 141: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 142: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 130: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 149: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 148: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 139: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yara¼'
JSON parse error : [json.exception.parse_error.101] parse error at 28: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"yextend¼'
JSON parse error : [json.exception.parse_error.101] parse error at 22: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"͡'
JSON parse error : [json.exception.parse_error.101] parse error at 21: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"³'
[
{
"file_name": "/home/file/20200522141244/redis2.4.5.zip",
"file_signature_MD5": "04616571230f01b5dd5cadf66e8b22ee",
"file_size": 236768,
"yara_ruleset_file_name": "/home/myfiles/yara/rules/email_index.yar"
}
]
I got the following errors when make unittests
make unittests
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/root/yextend-yara-3.8.1/test/test_files_yextend.py", line 1280, in test_content_yara_7z_multiple_embed_pdf
json_resp = json.loads(out)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/root/yextend-yara-3.8.1/test/test_files_yextend.py", line 1409, in test_content_yara_putty_multiple_embed_pdf
json_resp = json.loads(out)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/root/yextend-yara-3.8.1/test/test_files_yextend.py", line 1489, in test_content_yara_putty_zip_multiple_embed_pdf
json_resp = json.loads(out)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/root/yextend-yara-3.8.1/test/test_files_yextend.py", line 1191, in test_yara_archive_rar_multiple_embed_pdf
json_resp = json.loads(out)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Ran 57 tests in 11.787s
FAILED (errors=4)
Makefile:937: recipe for target 'unittests' failed
make: *** [unittests] Error 1
yextend
misreads the yara
version when the minor number is greater than 9
, it seems. My current yara
version is 3.10.0
and not 3.1
:
Version issue: yextend version 1.7+ will not run with yara versions below 3.4
Your env has yextend version 1.7
Your env has yara version 3.1
$ yara -v
3.10.0
make all-am
make[1]: Entering directory /home/chris/Bayshore/yextend' gcc -DHAVE_CONFIG_H -I. -I. -g -O2 -MT bayshore_yara_wrapper.o -MD -MP -MF .deps/bayshore_yara_wrapper.Tpo -c -o bayshore_yara_wrapper.o bayshore_yara_wrapper.c bayshore_yara_wrapper.c: In function ‘bayshore_yara_preprocess_rules’: bayshore_yara_wrapper.c:286:4: warning: passing argument 2 of ‘yr_compiler_set_callback’ from incompatible pointer type [enabled by default] In file included from /usr/local/include/yara.h:22:0, from bayshore_yara_wrapper.h:27, from bayshore_yara_wrapper.c:28: /usr/local/include/yara/compiler.h:138:13: note: expected ‘YR_COMPILER_CALLBACK_FUNC’ but argument is of type ‘void (*)(int, const char *, int, const char *)’ bayshore_yara_wrapper.c: In function ‘bayshore_yara_wrapper_api’: bayshore_yara_wrapper.c:480:4: warning: passing argument 2 of ‘yr_compiler_set_callback’ from incompatible pointer type [enabled by default] In file included from /usr/local/include/yara.h:22:0, from bayshore_yara_wrapper.h:27, from bayshore_yara_wrapper.c:28: /usr/local/include/yara/compiler.h:138:13: note: expected ‘YR_COMPILER_CALLBACK_FUNC’ but argument is of type ‘void (*)(int, const char *, int, const char *)’ mv -f .deps/bayshore_yara_wrapper.Tpo .deps/bayshore_yara_wrapper.Po g++ -g -O2 -o yextend filedissect.o wrapper.o bayshore_content_scan.o filedata.o main.o zl.o bayshore_yara_wrapper.o -lpcrecpp -lz -lyara -lcrypto -larchive make[1]: Leaving directory
/home/chris/Bayshore/yextend'
Yextend thinks the YARA version is 0.0 instead of 3.7.0 when compiled on Amazon Linux:
$ ./run_yextend test.yara test.yara
Version issue: yextend version 1.4+ will not run with yara versions below 3.4
Your env has yextend version 1.4
Your env has yara version 0.0
This is in the official 1.5 release. The problem has been fixed as of the latest commit to master, but there hasn't been a new release in over a year.
Can we get a new stable release? Thanks!
Will the package work with yara-python 4.0.1?
It seems pdftotext
is not mentioned in the readme and has to be installed on some systems.
Kernel version 3.11.0-15-generic.
Just installed Yara 3.4.0 per the doc's here: http://yara.readthedocs.org/en/latest/gettingstarted.html
I can get through the yextend install up until make, then the following occurs:
~/yextend$ sudo make
make all-am
make[1]: Entering directory/home/sansforensics/yextend' g++ -g -O2 -o yextend filedissect.o wrapper.o bayshore_content_scan.o filedata.o main.o zl.o bayshore_yara_wrapper.o -lpcrecpp -lz -lyara -lcrypto -larchive bayshore_content_scan.o: In function
scan_content(unsigned char const_, unsigned long, char const_, std::list<security_scan_results_t, std::allocator<security_scan_results_t> >, char const, void ()(void, std::list<security_scan_results_t, std::allocator<security_scan_results_t> >, char const), int)':
/home/sansforensics/yextend/bayshore_content_scan.cpp:586: undefined reference toyr_rules_destroy' main.o: In function
main':
/home/sansforensics/yextend/main.cpp:379: undefined reference toyr_rules_destroy' bayshore_yara_wrapper.o: In function
bayshore_yara_preprocess_rules':
/home/sansforensics/yextend/bayshore_yara_wrapper.c:271: undefined reference toyr_rules_load' /home/sansforensics/yextend/bayshore_yara_wrapper.c:286: undefined reference to
yr_compiler_create'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:331: undefined reference toyr_finalize' /home/sansforensics/yextend/bayshore_yara_wrapper.c:288: undefined reference to
yr_compiler_set_callback'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:319: undefined reference toyr_compiler_add_file' /home/sansforensics/yextend/bayshore_yara_wrapper.c:328: undefined reference to
yr_compiler_destroy'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:279: undefined reference toyr_finalize' /home/sansforensics/yextend/bayshore_yara_wrapper.c:322: undefined reference to
yr_compiler_get_rules'
bayshore_yara_wrapper.o: In functionbayshore_yara_wrapper_yrrules_api': /home/sansforensics/yextend/bayshore_yara_wrapper.c:362: undefined reference to
yr_initialize'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:364: undefined reference toyr_rules_scan_mem' /home/sansforensics/yextend/bayshore_yara_wrapper.c:378: undefined reference to
yr_finalize'
bayshore_yara_wrapper.o: In functionbayshore_yara_wrapper_api': /home/sansforensics/yextend/bayshore_yara_wrapper.c:430: undefined reference to
yr_initialize'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:431: undefined reference toyr_rules_load' /home/sansforensics/yextend/bayshore_yara_wrapper.c:443: undefined reference to
yr_compiler_create'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:458: undefined reference toyr_compiler_define_integer_variable' /home/sansforensics/yextend/bayshore_yara_wrapper.c:465: undefined reference to
yr_compiler_define_boolean_variable'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:506: undefined reference toyr_compiler_get_rules' /home/sansforensics/yextend/bayshore_yara_wrapper.c:507: undefined reference to
yr_compiler_destroy'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:517: undefined reference toyr_rules_scan_mem' /home/sansforensics/yextend/bayshore_yara_wrapper.c:531: undefined reference to
yr_rules_destroy'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:532: undefined reference toyr_finalize' /home/sansforensics/yextend/bayshore_yara_wrapper.c:472: undefined reference to
yr_compiler_define_string_variable'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:482: undefined reference toyr_compiler_set_callback' /home/sansforensics/yextend/bayshore_yara_wrapper.c:494: undefined reference to
yr_compiler_add_file'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:500: undefined reference toyr_compiler_destroy' /home/sansforensics/yextend/bayshore_yara_wrapper.c:501: undefined reference to
yr_finalize'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:436: undefined reference toyr_finalize' /home/sansforensics/yextend/bayshore_yara_wrapper.c:488: undefined reference to
yr_compiler_destroy'
/home/sansforensics/yextend/bayshore_yara_wrapper.c:489: undefined reference toyr_finalize' collect2: ld returned 1 exit status make[1]: *** [yextend] Error 1 make[1]: Leaving directory
/home/sansforensics/yextend'
make: *** [all] Error 2
Any idears?
Add support for bzip2 compressed content
I got the following errors when running the build.sh script
libs/bayshore_yara_wrapper.c:73:27: error: ‘TRUE’ undeclared here (not in a function); did you mean ‘RE’?
static int show_strings = TRUE;
^~~~
RE
libs/bayshore_yara_wrapper.c:75:31: error: ‘FALSE’ undeclared here (not in a function); did you mean ‘FILE’?
static int show_module_data = FALSE;
^~~~~
FILE
libs/bayshore_yara_wrapper.c: In function ‘bayshore_yara_callback’:
libs/bayshore_yara_wrapper.c:299:21: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
mi->module_data = module_data->mapped_file.data;
^
Makefile:498: recipe for target 'libs/bayshore_yara_wrapper.o' failed
make[1]: *** [libs/bayshore_yara_wrapper.o] Error 1
make[1]: Leaving directory '/home/student/Downloads/stuff/yextend-1.6'
Makefile:367: recipe for target 'all' failed
make: *** [all] Error 2
Is this normal? Or was there anything missing?
YARA rules can include external variables, which yara
allows you to define with a -d
flag.
It would be great if yextend
could support external variables (perhaps with the same -d
command line switch) and pass them on to the underlying yara
invocations
I've been using yextend to identify strings in .docx files that are indicative of the presence of a sub document. While testing, I realized what appears to be a bug in the way yextend scans .docx files.
The strings defined in the yara rule are:
"rId4"
"w:subDoc"
When ran against the test.docx file, the result is:
[
{
"scan_results": [],
"file_name": "/home/moretang/malware/test.docx",
"file_size": 19198,
"yara_ruleset_file_name": "/home/moretang/malware/subdoc_rules.yar",
"children": [
{
"file_name": "/home/moretang/malware/test.docx",
"file_size": 19198,
"yara_matches_found": false,
"file_signature_MD5": "978d6cdc38cbad918da526822a10aba0"
},
{
"yara_matches_found": false,
"file_size": 989,
"file_name": "word/document.xml",
"file_signature_MD5": "7a7c6fa0a200a4dbeda3e389068da2dc",
"scan_type": "Yara Scan (Office Open XML) "
}
],
"file_signature_MD5": "978d6cdc38cbad918da526822a10aba0"
}
]
When word/document.xml is manually extracted from test.docx and scanned, the output shows true matches:
[
{
"yara_matches_found": true,
"scan_results": [],
"file_name": "/home/moretang/malware/word/document.xml",
"yara_ruleset_file_name": "/home/moretang/malware/subdoc_rules.yar",
"file_size": 4356,
"children": [
{
"yara_matches_found": true,
"file_name": "/home/moretang/malware/word/document.xml",
"scan_type": "Yara Scan (XML Document)",
"yara_results": {
"embedded_doc": {
"description": ".docx subdoc identification",
"hit_count": "2",
"offsets": [
"0x8ae:$s1",
"0x89f:$s2"
]
}
},
"file_size": 4356,
"date": "2019-11-19",
"file_signature_MD5": "078e06af7c487a83d550b76a6c6fa56b"
}
],
"file_signature_MD5": "078e06af7c487a83d550b76a6c6fa56b"
}
]
Note that the hashes are different as well. Is anyone aware of the cause of this?
I do realize that the scan type is different between the two sets of results.
Thanks!
is there API for python scripts?
Exception when using run_yextend to run a ruleset on a pdf file
Traceback (most recent call last):
File "./run_yextend", line 169, in
print_yextend_output(out=out)
File "./run_yextend", line 120, in print_yextend_output
rmdspl_offset, rmdspl_label = rmdspl.split(':')
ValueError: need more than 1 value to unpack
Fermilab.pdf
Yextend is designed to be compiled from source and invokes a couple of subprocesses (pdfdetach
, pdftotext
, yara
). This makes it challenging to build and run in an isolated environment (e.g. AWS Lambda).
Ideally, yextend
could be a pip
package (or similar) which could be installed on any platform (much like yara
itself).
For reference, the BinaryAlert documentation describes how we were able to get a portable yextend
binary: we commented out the yara
version check and copied the necessary .so
files (but this likely doesn't solve the pdf
parsing subprocess calls)
Thanks again for contributing such a useful tool!
Add support for command line arguments (via getopt) to run_yextend. This way the command line arguments can handled in a looser way than by strict index numbers. So for instance I want to be able to run like this:
./run_yextend -r x.yara -t file1
or
./run_yextend --ruleset x.yara --target file1
First, let me say - what a great tool!
We are thinking of integrating yextend
into BinaryAlert so that users can analyze archives with all of their YARA rules. Unfortunately, as far as I can tell (looking at v1.5), there is no machine-readable output, just text printed to stdout.
Would it be possible to get the output in JSON format?
Even better (but probably more difficult) - if this could be a Python package (pip install yextend
), that would be amazing. Because then the match results would be available to a Python application (like BinaryAlert)
yextend will require Yara 3.6 and above.
Change code to check for: YR_MAJOR_VERSION == 3 && YR_MINOR_VERSION >= 6
Update README to reflect the requirements.
pdfparser::PdfDetach currently returns a vector of the discovered buffers as such: std::vector<std::vector<uint8_t>>. I need that changed in to a set of key/value pairs where the key is the name of the file that is embedded and the value is the relevant buffer (what gets returned in the current code base). So what we need returned is something like:
std::map<std::pair<string,uint8_t>>
Where the pair is the file name as the key (string) and the content buffer is the value (uint8_t) and the map holds as many of these pairs that have been discovered.
Currently, the codebase is not compatible with musl.
I got the following errors when running the build.sh script
libs/bayshore_yara_wrapper.c:73:27: error: ‘TRUE’ undeclared here (not in a function)
static int show_strings = TRUE;
^
libs/bayshore_yara_wrapper.c:75:31: error: ‘FALSE’ undeclared here (not in a function)
static int show_module_data = FALSE;
Makefile:529: recipe for target 'libs/bayshore_yara_wrapper.o' failed
make[1]: *** [libs/bayshore_yara_wrapper.o] Error 1
make[1]: Leaving directory '/home/hmj/yextend-master'
Makefile:386: recipe for target 'all' failed
make: *** [all] Error 2
When running yextend against a file that is not an archive the output (when there is a hit) looks like this:
...
Scan Type: Yara Scan (PDF - Raw data)
Parent File Name: test_files/lipsum.txt.pdf
Child File Name: test_files/lipsum.txt.pdf
File Signature (MD5): ec650a3a287603d350718b74716aee1c
...
Since the filename is the same for parent and child it makes no sense to output them both so only output:
File Name: X
when the actual file name (and path) are identical values. Otherwise the output should stay the same.
yextend segfaults when run against an uncompiled ruleset. It should either support uncompiled rules, or make it clear to the end user (in the usage statement or somewhere obvious) that only compiled rules are supported.
btaub@B:~/Documents/GeneralEngineering/yextend$ LD_LIBRARY_PATH=/usr/local/lib ./yextend test_rulesets/xml.ruleset test_files/Lorem-winlogon.docx
Segmentation fault (core dumped)
btaub@B:~/Documents/GeneralEngineering/yextend$ LD_LIBRARY_PATH=/usr/local/lib ./yextend test_rulesets/xml.ruleset.bin test_files/Lorem-winlogon.docx
===============================ALPHA===================================
Filename: test_files/Lorem-winlogon.docx
File Size: 210193
File Signature (MD5): d8f0fab30eae91687c0d80f8dd08218f
=======================================================================
Yara Result(s): XML:[detected offsets=0x2:$a-0x40:$a-0x5a:$a-0xbf:$a-0x154:$a-0x175:$a-0x18f:$a-0x1ad:$a-0x1d6:$a-0x1fc:$a-0x235:$a-0x25c:$a-0x282:$a-0x2b4:$a-0x2dd:$a-0x303:$a-0x337:$a-0x363:$a-0x389:$a-0x3c0:$a-0x3ea:$a-0x410:$a-0x445:$a-0x472:$a-0x498:$a-0x4b8:$a-0x4e1:$a-0x507:$a-0x52a:$a-0x552:$a-0x578:$a-0x5a6:$a,hit_count=32]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: [Content_Types].xml
File Signature (MD5): 0ceb2b5a990b5dba8285144bf4465001
Yara Result(s): XML:[detected offsets=0x2:$a-0x48:$a-0x62:$a-0xbe:$a-0x119:$a-0x150:$a-0x1aa:$a-0x1e1:$a-0x238:$a,hit_count=9]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: _rels/.rels
File Signature (MD5): 77bf61733a633ea617a4db76ef769a4d
Yara Result(s): XML:[detected offsets=0x2:$a-0x48:$a-0x62:$a-0xbe:$a-0x10b:$a-0x142:$a-0x194:$a-0x1cb:$a-0x219:$a-0x250:$a-0x29c:$a-0x2d3:$a-0x31b:$a-0x352:$a-0x3f6:$a-0x487:$a,hit_count=16]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: word/_rels/document.xml.rels
File Signature (MD5): 208ee36e5f55bcf6a11163b7a06145b8
Yara Result(s): XML:[detected offsets=0x2:$a-0x45:$a-0x94:$a-0xb1:$a-0xdb:$a-0x10d:$a-0x129:$a-0x15b:$a-0x177:$a-0x1a0:$a-0x1c8:$a-0x219:$a-0x236:$a-0x26b:$a-0x29d:$a-0x2b9:$a-0x2e4:$a-0x325:$a-0x366:$a-0x3b4:$a-0x400:$a-0x441:$a-0xe3a:$a-0x10c4:$a-0x1658:$a-0x1852:$a-0x1aac:$a-0x1ccb:$a-0x2154:$a-0x23c7:$a-0x25ef:$a-0x28a6:$a-0x2ace:$a-0x2e49:$a-0x34b3:$a-0x36c0:$a-0x38f4:$a-0x3bf0:$a-0x3e74:$a-0x41f6:$a-0x4468:$a-0x468a:$a-0x48d5:$a-0x4b1b:$a-0x4da4:$a-0x500a:$a-0x54e4:$a-0x56dd:$a-0x5a32:$a-0x62bc:$a-0x64b5:$a-0x6713:$a-0x69c9:$a-0x6c13:$a-0x7109:$a-0x7338:$a-0x75b7:$a-0x77ea:$a-0x7b1b:$a-0x7d26:$a-0x7f4e:$a-0x8169:$a-0x83af:$a-0x888b:$a-0x8ada:$a-0x8d56:$a-0x8fae:$a-0x9225:$a-0x9439:$a-0x967a:$a-0x9953:$a-0x9ccb:$a-0xa165:$a-0xa367:$a-0xa594:$a-0xa856:$a-0xab3f:$a-0xadc3:$a-0xb002:$a-0xb226:$a-0xb4da:$a-0xb71c:$a-0xba5f:$a-0xbc60:$a-0xbf4f:$a-0xc174:$a-0xc448:$a-0xc6b9:$a-0xc9f3:$a-0xccb7:$a-0xcf80:$a-0xd2c1:$a-0xd3d8:$a-0xd67a:$a-0xd8e9:$a-0xdb17:$a-0xdd21:$a-0xdf85:$a-0xe203:$a-0xe41a:$a-0xe633:$a-0xe86b:$a-0xeaba:$a-0xed22:$a-0xf018:$a-0xf275:$a-0xf480:$a-0xf698:$a-0xf8ee:$a-0xfb34:$a-0xfdb5:$a-0x10010:$a-0x10273:$a-0x10526:$a-0x10739:$a-0x10ac2:$a-0x10cca:$a-0x10f72:$a-0x11180:$a-0x1138b:$a-0x115de:$a-0x1184f:$a-0x11a78:$a-0x11ca6:$a-0x11ed7:$a-0x120f0:$a-0x1234a:$a-0x12571:$a-0x12782:$a-0x12a16:$a-0x12c11:$a-0x12e98:$a-0x131e4:$a-0x134b3:$a-0x13705:$a-0x13a1a:$a-0x13c33:$a-0x13e80:$a-0x140d1:$a-0x143a3:$a-0x14829:$a-0x14a21:$a-0x14c35:$a-0x14ea2:$a-0x150c9:$a-0x1532b:$a-0x1580f:$a-0x15a79:$a-0x15cd1:$a-0x15de2:$a-0x16014:$a-0x1626a:$a-0x164c1:$a-0x1672a:$a-0x169df:$a-0x16cad:$a-0x16ecd:$a-0x170de:$a-0x17344:$a-0x1780f:$a-0x17a48:$a-0x17cee:$a-0x18033:$a-0x18241:$a-0x184bd:$a-0x1871e:$a-0x18a9b:$a-0x18cde:$a-0x18f08:$a-0x1910f:$a-0x19335:$a-0x19571:$a-0x197e2:$a-0x19a32:$a-0x19c38:$a-0x19e85:$a-0x1a0dd:$a-0x1a3be:$a-0x1a5dd:$a-0x1a8bd:$a-0x1ab59:$a-0x1ad81:$a-0x1b035:$a-0x1b2d4:$a-0x1b8ea:$a-0x1bb1b:$a-0x1bdd2:$a-0x1c04a:$a-0x1c2af:$a-0x1c4f4:$a-0x1c73f:$a-0x1c939:$a-0x1cbb0:$a-0x1ce56:$a-0x1d181:$a-0x1d3a2:$a-0x1d645:$a-0x1d83a:$a-0x1da94:$a-0x1dca9:$a-0x1e292:$a-0x1e521:$a-0x1e7c1:$a-0x1ea04:$a-0x1ecf4:$a-0x1ef59:$a-0x1f194:$a-0x1f436:$a-0x1f653:$a-0x1f88c:$a-0x1fb07:$a-0x1fd90:$a-0x1ffb6:$a-0x2023a:$a-0x20498:$a-0x20714:$a-0x20992:$a-0x20bf1:$a-0x20e14:$a-0x21033:$a-0x2153c:$a-0x21774:$a-0x21a63:$a-0x21cc9:$a-0x21ed3:$a-0x221ac:$a-0x2240b:$a-0x2265f:$a-0x22857:$a-0x22dd8:$a-0x22fda:$a-0x2329f:$a-0x234fd:$a-0x237a6:$a-0x239cf:$a-0x23d46:$a-0x23f64:$a-0x241d1:$a-0x244f2:$a-0x24814:$a-0x24b11:$a-0x24d76:$a-0x24fc3:$a-0x251d6:$a-0x253fd:$a-0x25627:$a-0x25865:$a-0x25af4:$a-0x25d84:$a-0x2601d:$a-0x2621c:$a-0x2670e:$a-0x26963:$a,hit_count=253]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: word/document.xml
File Signature (MD5): 66bea1a9a998508e7ba3e25f8321c13f
Yara Result(s): XML:[detected offsets=0x2:$a-0x42:$a-0x5e:$a-0x19bc:$a,hit_count=4]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: word/theme/theme1.xml
File Signature (MD5): c0347b16cac6d6312c9e2c8154c808ea
Yara Result(s): XML:[detected offsets=0x2:$a-0x45:$a-0x62:$a-0x8c:$a-0xbe:$a-0xda:$a-0x10c:$a-0x128:$a-0x151:$a-0x179:$a-0x1ab:$a-0x1c7:$a-0x1f2:$a-0x233:$a-0x274:$a-0x291:$a,hit_count=16]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: word/settings.xml
File Signature (MD5): 82ac5c60a5e05100970f6acba87fabf6
Yara Result(s): XML:[detected offsets=0x2:$a-0x48:$a-0x65:$a-0x8f:$a-0xab:$a-0xdd:$a-0xf9:$a-0x124:$a-0x165:$a,hit_count=9]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: word/webSettings.xml
File Signature (MD5): 45849c9d3c04ebab3d3abcdb903bfc98
Yara Result(s): XML:[detected offsets=0x2:$a-0x4c:$a-0x69:$a-0x9f:$a-0xcb:$a-0xf5:$a-0x123:$a-0x145:$a,hit_count=8]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: docProps/core.xml
File Signature (MD5): 4cbe0d4c3ddad80f0a194ce5576717fe
Yara Result(s): XML:[detected offsets=0x2:$a-0x43:$a-0x60:$a-0x8a:$a-0xa6:$a-0xd8:$a-0xf4:$a-0x11f:$a-0x160:$a,hit_count=9]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: word/styles.xml
File Signature (MD5): 8d5d740a3dccba10a12451d938214034
Yara Result(s): XML:[detected offsets=0x2:$a-0x42:$a-0x5f:$a-0x89:$a-0xa5:$a-0xd7:$a-0xf3:$a-0x11e:$a-0x15f:$a,hit_count=9]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: word/fontTable.xml
File Signature (MD5): 4bc82d369ddef7b8e79bc742ad60863e
Yara Result(s): XML:[detected offsets=0x2:$a-0x45:$a-0x5f:$a-0x97:$a-0xb4:$a,hit_count=5]
Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
Parent File Name: test_files/Lorem-winlogon.docx
Child File Name: docProps/app.xml
File Signature (MD5): d5bc288357685722d06872f11eba3522
===============================OMEGA===================================
It would be helpful for some projects to have prebuilt binaries for releases.
For this we can probably use Travis CI which supports macOS, Linux and Windows for builds.
It would be great to be able to scan sensitive files entirely in-memory so that they never have to be saved to disk. This would likely be a feature added to a Python API or similar (see #15 ). For example, yara-python
supports passing data in as an argument rather than a filename.
In yextend 1.4, we were detecting it correctly (Audio Video Interleaved File (AVI)). But in 1.5, we are detecting it as encrypted file.
Command to run:
./run_yextend test_rulesets/bayshore.yara.testing.ruleset test_files/sample.avi
Add support for ASCIIDecode encoding support in PDF parser. Example file for texting here:
asciihexdecode.pdf
That need to be tested with the raw option as encoding in the PdfToText (const uint8_t* pdf_pointer, size_t pdf_size, pdfparser::TextEncoding encoding)
Hi:
Have you considered using the same license as yara proper, i.e. Apache 2.0?
Thanks for your kind consideration!
$ LD_LIBRARY_PATH=/usr/local/lib ./yextend test_rulesets/bayshore.yara.testing.ruleset test_files/Test.docx
===============================ALPHA===================================
File Name: test_files/Test.docx
File Size: 523136
File Signature (MD5): 50b62fecabe04e5cc23efe1c6b1ca891
Segmentation fault (core dumped)
I noticed that yextend can only be run from its own directory because it has to match the current working directory. The following two modes of operation did not work for me:
The reason is, that the archive file type cannot be distinguished and therefore no scan will be performed. As the file type detection is done using a YARA rule which resides in the libs
directory of yextend, it cannot be found when run from another directory.
https://github.com/BayshoreNetworks/yextend/blob/master/libs/bayshore_file_type_detect.c#L425-L431
char path[MAXPATHLEN];
if (NULL==getcwd(path, MAXPATHLEN)) {
// We either have no access or another error occured
return 65535; //-1;
}
strncat (path, "/libs/bayshore_file_type_detect.yara", sizeof(path)-strlen(path)-1);
Would be great to have the possibility to run it from anywhere on a system. Until then I would probably just mention it in the README file.
The logic to parse the output into JSON format presumably splits fields by comma, but YARA metadata can also include commas. The result is what looks like multiple different rule matches when there is actually only one.
Compile yextend
from the develop
branch.
test.yara
:
rule dummy_true {
meta:
description = "One, Two, Three"
condition:
true
}
LD_LIBRARY_PATH=/usr/local/lib ./yextend -r test.yara -t test.yara -j
produces the following:
[
{
"file_name": "test.yara",
"file_signature_MD5": "8aa26a676c45edd7953962f9442abcfb",
"file_size": 103,
"scan_results": [
{
"file_signature_MD5": "8aa26a676c45edd7953962f9442abcfb",
"file_size": 103,
"non_archive_file_name": "test.yara",
"scan_type": "Yara Scan (Goodwill guess Encrypted file detected)",
"yara_matches_found": true,
"yara_rule_id": "dummy_true:[description=One"
},
{
"file_signature_MD5": "8aa26a676c45edd7953962f9442abcfb",
"file_size": 103,
"non_archive_file_name": "test.yara",
"scan_type": "Yara Scan (Goodwill guess Encrypted file detected)",
"yara_matches_found": true,
"yara_rule_id": "Two"
},
{
"file_signature_MD5": "8aa26a676c45edd7953962f9442abcfb",
"file_size": 103,
"non_archive_file_name": "test.yara",
"scan_type": "Yara Scan (Goodwill guess Encrypted file detected)",
"yara_matches_found": true,
"yara_rule_id": "Three]"
}
],
"yara_matches_found": true,
"yara_ruleset_file_name": "test.yara"
}
]
As you can see, it looks like 3 separate matches with 3 different rule IDs
If you perform this run:
./run_yextend -r test_rulesets/pdf_multiple_embed.yara -t test_files/pdf_with_multiple_embedded.pdf.tar.gz -j
you will get JSON output that for instance contains:
...
{
"child_file_name": "squld",
"file_signature_MD5": "368c8cbc67d3ce1ff7d2735cfe84f670",
"file_size": 1135000,
"parent_file_name": "test_files/pdf_with_multiple_embedded.pdf.tar",
"scan_type": "Yara Scan (ELF Executable)",
"yara_matches_found": true,
"yara_rule_id": "FILE_SIGS"
}
...
The ruleset at hand contains meta data that needs to show up in that JSON output. If you look at the ruleset you will see:
...
rule FILE_SIGS {
meta:
description = "Known malware signature"
...
So the expected output would be something like:
...
{
"child_file_name": "squld",
"file_signature_MD5": "368c8cbc67d3ce1ff7d2735cfe84f670",
"file_size": 1135000,
"parent_file_name": "test_files/pdf_with_multiple_embedded.pdf.tar",
"scan_type": "Yara Scan (ELF Executable)",
"yara_matches_found": true,
"yara_rule_id": "FILE_SIGS",
"description": "Known malware signature"
}
...
https://travis-ci.com/DanielRuf/clamav-test/jobs/186282650#L441
Not sure why. What am I missing?
make all-am
make[1]: Entering directory /home/cuckoo/yextend-1.2' g++ -DHAVE_CONFIG_H -I. -I. -g -O2 -MT bayshore_content_scan.o -MD -MP -MF .deps/bayshore_content_scan.Tpo -c -o bayshore_content_scan.o bayshore_content_scan.cpp bayshore_content_scan.cpp: In function ‘void scan_office_open_xml_api(void*, std::list<security_scan_results_t>*, const char*, const char*, bool, void (*)(void*, std::list<security_scan_results_t>*, const char*), int)’: bayshore_content_scan.cpp:303:61: error: cannot convert ‘off_t* {aka long int*}’ to ‘int64_t* {aka long long int*}’ for argument ‘4’ to ‘int archive_read_data_block(archive*, const void**, size_t*, int64_t*)’ x = archive_read_data_block(a, &buff, &lsize, &offset); ^ bayshore_content_scan.cpp: In function ‘void scan_content2(const uint8_t*, size_t, YR_RULES*, std::list<security_scan_results_t>*, const char*, void (*)(void*, std::list<security_scan_results_t>*, const char*), int)’: bayshore_content_scan.cpp:749:62: error: cannot convert ‘off_t* {aka long int*}’ to ‘int64_t* {aka long long int*}’ for argument ‘4’ to ‘int archive_read_data_block(archive*, const void**, size_t*, int64_t*)’ x = archive_read_data_block(a, &buff, &lsize, &offset); ^ make[1]: *** [bayshore_content_scan.o] Error 1 make[1]: Leaving directory
/home/cuckoo/yextend-1.2'
any help?
thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.