Comments (17)
for number 1 you can do: source:reversinglabs (case sensitive)
@cccs-sgaron Will investigate indexing all the text of the rule body in the futur.
@cccs-rs for #2 :)
from assemblyline.
Regex seems wrong, should be .*\.yara$
? I'll test it with the sources mentioned.
from assemblyline.
Refer to my recent messages re: 1. https://discord.com/channels/908084610158714900/908084610624274494/1086247172724510843
from assemblyline.
Sorry, I did have the dot in front .*.yara$
, but I tried to add the backslash but same non-result.
from assemblyline.
Source Configurations (typo on the second but still matched the file path):
Which yielded signatures being import for both sources:
Is there anything in the yara-updates
container to suggest what the issue could be?
from assemblyline.
No errors in the logs. For all purposes that I can tell, it completes successfully. There are just no signatures when I click on the fingerprint. Status: Skipped.
from assemblyline.
Issue partially resolved by clearing the cached entries in Redis. Adding a cache-less update mechanism has been created as a ticket.
from assemblyline.
Documenting what @cccs-rs and I have discussed in chat (and also a reminder). The Redis cache resolved 2 of my 3 issues with the yara repositories. The last repo reported an error of 'utf-8' codec can't decode byte 0xbb in position 908: invalid start byte.
as it read the files.
The line causing the exception is here: https://github.com/CybercentreCanada/assemblyline-service-yara/blob/3061ac084004db4cb514d192ab88fd142ad9d09e/yara_/update_server.py#L96
The readlines()
causes an exception as it finds a character it can't decode. I did verify the offending files are utf-8 files that just have unknown characters in them. Here is an example.
What I guess I'd suggest is maybe a couple things...
-
Might make sense to Include the reading of a file in the try / except statement, so even if there is an error on a file, it continues to read the other rules in the repo.
-
Probably would be good to define errors in the
open(errors=)
so that it can hopefully use the rule as intended. I'm not sure if they should be set toignore
,replace
, or maybesurrogateescape
? https://docs.python.org/3/library/functions.html#open
from assemblyline.
Based on these comments, I recommend ignoring any encoding errors.
VirusTotal/yara#1770 (comment)
VirusTotal/yara-python#136 (comment)
I expect that the offending files are CP1252, but the invalid characters are in comments, which YARA ignores.
from assemblyline.
I did have one that appeared in the rule itself. Don't know if it was intentional or not, but it looked like this.
$str_05 = "TESTING"
$str_06 = "SOFTWARE\\Clients\\Mail"
$str_07 = "8.8.8.8"
$str_08 = "<(:<\\Documents and Settings\\all users\\Application Data\\�"
$str_09 = "C:\\ProgramData\\Microsoft\\RAC\\"
from assemblyline.
$str_08 = "<(:<\Documents and Settings\all users\Application Data\�"
Do you have a link to the source, or can you upload that portion of the rule here? Strings pasted inline won't retain the source encoding.
from assemblyline.
It's not public, but here is a modified file based on that rule. I have 4 yara files that fail on the reading due to a character encoding.
testgen.txt
from assemblyline.
The character in the file you uploaded includes the Unicode replacement character. It appears that the original character was lost.
I can replicate the issue by converting the file to CP-1252 and adding the 0xBB byte in that position, but any successful compilation and matches with YARA with rule files using non UTF8 encodings should not be relied upon.
from assemblyline.
Yeah, it was a BB
hex for that one, which comes across in my hex editor as »
.
from assemblyline.
I'm not overly concerned if one rule in a thousand doesn't work - just wasn't sure what the best way to handle it is. It does need to be handled in some way though so it doesn't cause an exception and abort the entire repo read.
from assemblyline.
Looks like I was behind on the status of the rule encodings. But there appears to be a consistent desire from the development team to only support UTF-8 in the future.
VirusTotal/yara@68eb237
iso-8859-1
will accept any byte, but the values in [\x80-\xFF]
may be incorrect if a different encoding was used at the source.
try:
with open(file, 'r') as f:
f_lines = f.readlines()
except UnicodeDecodeError as exc:
with open(file, 'r', encoding='iso-8859-1') as f:
f_lines = f.readlines()
self.log.error(f"File could not be loaded as UTF-8: {file}")
from assemblyline.
v4.4.0.stable5 of the YARA service will include performing a surrogateescape
when reading the contents of YARA files with non-UTF8 characters
Services w/ updaters built from v4.4.0.stable5 of Assemblyline will now be able to instruct the Scaler about when to scale a service.
If a service depends on the bundle sent by the updater, indicated by wait_for_update
flag, (ie. YARA) then the service is only considered 'active' and ready for scaling only if the updater deems it so (ie. it has at least 1 downloaded source).
If you notice this issue still persists, feel free to re-open the issue with more information! 😁
from assemblyline.
Related Issues (20)
- Configuration for Privileged vs Non-Privileged services HOT 3
- Feature Request: Regional Storage of Malware Samples Due to Legislative Changes HOT 3
- Sorting extracted files HOT 2
- Full-text search in submission files HOT 2
- S3 IAM role authentication
- Support for ASAR Archives HOT 2
- Allow setting some metadata when manually submitting a file
- Cannot submit archived expired file
- Wrong file type identification - Python as INI HOT 4
- Missed .online static domain
- UI: Badlisted tags are not colored in file details view HOT 2
- Scaler to recognize service in failed state HOT 2
- Suricata service can be stuck for hours if suricata didn't start
- Health checks for services are broken in Docker Compose HOT 1
- Update service stays in a loop trying to install obsoletes or non accessible docker images. HOT 1
- Intezer-Analyze short-circuit download
- Feature Request: tolerations and nodeAffinity HOT 4
- Identity: Python obfuscated code identified as text/plain HOT 2
- Suricata 4.5.0.7 seems to be broken HOT 1
- Expose `delete_file_from_filestore` API to Python Client
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from assemblyline.