virustotal / yara-x Goto Github PK
View Code? Open in Web Editor NEWA rewrite of YARA in Rust.
Home Page: https://virustotal.github.io/yara-x/
License: BSD 3-Clause "New" or "Revised" License
A rewrite of YARA in Rust.
Home Page: https://virustotal.github.io/yara-x/
License: BSD 3-Clause "New" or "Revised" License
The tests fail with the stable-x86_64-pc-windows-gnu
toolchain, apparently because of some issue with the linkme crate. It works with stable-x86_64-pc-windows-msvc
, though.
The linkme
crate is used for creating the static slice wasm::WASM_EXPORTS
, which contains an entry for every function that can be called from WASM. This includes functions decorated with the #[wasm_export]
and #[module_export]
attributes, which ultimately result in the use of the #[distributed_slice]
attribute provided by linkme
. Example:
#[distributed_slice(WASM_EXPORTS)]
pub(crate) static export__add: WasmExport = WasmExport {
name: "add",
mangled_name: "add@ii@i",
rust_module_path: "yara_x::modules::my_module",
func: &WasmExportedFn2 { target_fn: &add },
public: true,
};
For some reason, when using the stable-x86_64-pc-windows-gnu
toolchain, not all functions are added to the wasm::WASM_EXPORTS
slice, only the ones declared in the yara-x/src/wasm/mod.rs
file are added. All other functions, specifically those declared in modules are not added at all. The compilation process works well, but tests fail due to the missing entries in wasm::WASM_EXPORTS
.
Compiling an application that uses this library fails on x86 and arm targets due to Wasmtime. I've tested this on i686-pc-windows-msvc
, i686-unknown-linux-gnu
, armv7-unknown-linux-gnueabihf
, and i686-unknown-freebsd
.
On x86 architectures, the build for wasmtime
with no 'asm_sym' in the root
or Wasmtime is being compiled for an architecture that it does not support
.
On arm architectures, the build for cranelift-codegen
fails with no supported isa found for arch 'armv7'
.
This condition is an impossible one: uint8(0) == 0xABCDE
. As uint8
returns a single byte, the maximum possible result is 0xFF
or 255
. The same occurs with uint16
and uint32
.
A condition like this could raise a warning.
Implement uint64
and uint64be
that works that works in the same way than existing uint8
, uint16
and uint32
and their respective big-endian counterparts.
This requires implementing full support for u64
integers at the language level, particularly we need to handle correctly the cases in which the integers are larger than 0x7FFFFFFFFFFFFF
. A literal like0x8000000000000000
is not currently supported because it doesn't fit in an i64
, so we need to add a new variant to TypeValue
that accepts an u64
, and we must handle correctly the operations between signed and unsigned integers.
I built the C library as documented here: https://virustotal.github.io/yara-x/docs/api/c/c-/#building-the-c-library without errors.
Adding it to a project with import yarax "github.com/VirusTotal/yara-x/go"
and building the Go code fails with:
# github.com/VirusTotal/yara-x/go
vendor/github.com/VirusTotal/yara-x/go/compiler.go:20:24: not enough arguments in call to (_Cfunc_yrx_compiler_create)
have (**_Ctype_struct_YRX_COMPILER)
want (_Ctype_uint, **_Ctype_struct_YRX_COMPILER)
Machine:
Go Environement:
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/steffen/Library/Caches/go-build'
GOENV='/Users/steffen/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/steffen/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.3/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.3/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/rt/mbz5zkp533sgg49fmm6ky3tr0000gn/T/go-build4207267908=/tmp/go-build -gno-record-gcc-switches -fno-common'
I just executed the following on PowerShell terminal when testing yara-x for the first time.
yr.exe compile ./signatures/*
As a result of running this command one arbitrary file in the signature folder was replaced with the compiled version of all files.
Obviously I should have specified an output file, but this behavior destroys a signature file without asking. This can happen on accident and may result in lost work. The user might not notice until much later and may not be able to trace it back to using yr.exe. The output of the tool does not say which file was overwritten.
This is something which was lacking in the C version, according to the current documentation yara-x only supports "global external variable" but not "external object" which would include arrays and structures for richer data enrichment.
This could be really great for modules that would want to keep the same name convention of VirusTotal live hunting to make rules interoperable for example variables like vt.behaviour.command_executions
or vt.behaviour.modules_loaded
which are only accessible as an array via the for
loop keyword. But also variables under specific structures such as vt.behaviour
.
More information about existing issues which were not addressable in the current C version of yara:
Support for EXTERNAL OBJECT_TYPE_ARRAY and OBJECT_TYPE_STRUCTURE
Exporting yr_object_create() to enable custom structures?
I cannot find this in crates.io is it published?
Request:
Could we add a --scan-list flag like the old YARA version? It'd be handy for scanning lists of files stored in a txt file.
Context:
This feature would streamline scanning multiple files, enhancing YARA-X's usability.
Closing:
Excited for future enhancements! Thanks again for your hard work.
For example:
pe.resources.iter().find(|resource| {
resource.type_() == yara_x::modules::protos::pe::ResourceType::RESOURCE_TYPE_MANIFEST
});
I can get the ResourceType
from resource.type_()
but I can't check which type it is because yara_x::modules
is private.
Hi,
so happy for this x update.
we developed layer on the Yara that running yara scans of multiple files on windows OS automatically with our exe. its distributed till now over 170K machines so we have the experience with the tool.
my request:
2.run as a service and read from specific directory like SYSVOL\Git Repo new yara rule files.
Im available here if it helps
[email protected]
+972508117000
Sorry about opening another issue about this, I am not as knowledgeable with Rust as I thought. I tired to implement the C-API for rule's metadata but it hasn't gone as well as I have hoped. Thank you in advance!
Would it be possible to request the rule metadata be added to the C-API as well?
Add back the -w
, or --no-warnings
flag on the command line to disable warnings when scanning using a corpus of rules, as was present in traditional YARA.
Description:
When processing a binary with yr dump -pe <binary> -o json
, the values of sections.name
, rawData
and clearData
fields in the JSON output are base64 encoded.
Example JSON Output:
{
"pe": {
"isPe": true,
"machine": "MACHINE_AMD64",
"subsystem": "SUBSYSTEM_WINDOWS_GUI",
"osVersion": {
"major": 6,
"minor": 0
},
"subsystemVersion": {
"major": 6,
"minor": 0
},
"imageVersion": {
"major": 0,
"minor": 0
},
"linkerVersion": {
"major": 14,
"minor": 0
},
"opthdrMagic": "IMAGE_NT_OPTIONAL_HDR64_MAGIC",
"characteristics": 8226,
"dllCharacteristics": 352,
"timestamp": 1715333013,
"imageBase": "6442450944",
"checksum": 0,
"baseOfCode": 4096,
"entryPoint": 12872,
"entryPointRaw": 12872,
"dllName": "UpdaterTag.dll",
"exportTimestamp": 1715333013,
"sectionAlignment": 4096,
"fileAlignment": 4096,
"loaderFlags": 0,
"sizeOfOptionalHeader": 240,
"sizeOfCode": 39936,
"sizeOfInitializedData": 12800,
"sizeOfUninitializedData": 0,
"sizeOfImage": 69632,
"sizeOfHeaders": 1024,
"sizeOfStackReserve": "1048576",
"sizeOfStackCommit": "1048576",
"sizeOfHeapReserve": "1048576",
"sizeOfHeapCommit": "4096",
"pointerToSymbolTable": 0,
"win32VersionValue": 0,
"numberOfSymbols": 0,
"numberOfRvaAndSizes": 16,
"numberOfSections": 5,
"numberOfImportedFunctions": "5",
"numberOfDelayedImportedFunctions": "0",
"numberOfResources": "0",
"numberOfVersionInfos": "0",
"numberOfImports": "2",
"numberOfDelayedImports": "0",
"numberOfExports": "4",
"numberOfSignatures": "0",
"richSignature": {
"offset": 128,
"length": 56,
"key": 3957332653,
"rawData": "6XuOuK0a4OutGuDrrRrg63DlK+uoGuDrrRrh66ga4OsaROTqoRrg6xpE4OqsGuDrGkTi6qwa4Os=",
"clearData": "RGFuUwAAAAAAAAAAAAAAAN3/ywAFAAAAAAABAAUAAAC3XgQBDAAAALdeAAEBAAAAt14CAQEAAAA=",
"tools": [
{
"toolid": 203,
"version": 65501,
"times": 5
},
{
"toolid": 1,
"version": 0,
"times": 5
},
{
"toolid": 260,
"version": 24247,
"times": 12
},
{
"toolid": 256,
"version": 24247,
"times": 1
},
{
"toolid": 258,
"version": 24247,
"times": 1
}
]
},
"sections": [
{
"name": "LnRleHQ=",
"fullName": "LnRleHQ=",
"characteristics": 1610612768,
"rawDataSize": 40960,
"rawDataOffset": 4096,
"virtualAddress": 4096,
"virtualSize": 40960,
"pointerToRelocations": 0,
"pointerToLineNumbers": 0,
"numberOfRelocations": 0,
"numberOfLineNumbers": 0
},
{
"name": "LnJkYXRh",
"fullName": "LnJkYXRh",
"characteristics": 1073741888,
"rawDataSize": 4096,
"rawDataOffset": 45056,
"virtualAddress": 45056,
"virtualSize": 4096,
"pointerToRelocations": 0,
"pointerToLineNumbers": 0,
"numberOfRelocations": 0,
"numberOfLineNumbers": 0
},
{
"name": "LmRhdGE=",
"fullName": "LmRhdGE=",
"characteristics": 3221225536,
"rawDataSize": 12288,
"rawDataOffset": 49152,
"virtualAddress": 49152,
"virtualSize": 12288,
"pointerToRelocations": 0,
"pointerToLineNumbers": 0,
"numberOfRelocations": 0,
"numberOfLineNumbers": 0
},
{
"name": "LnBkYXRh",
"fullName": "LnBkYXRh",
"characteristics": 1073741888,
"rawDataSize": 4096,
"rawDataOffset": 61440,
"virtualAddress": 61440,
"virtualSize": 4096,
"pointerToRelocations": 0,
"pointerToLineNumbers": 0,
"numberOfRelocations": 0,
"numberOfLineNumbers": 0
},
{
"name": "LnJlbG9j",
"fullName": "LnJlbG9j",
"characteristics": 1107296320,
"rawDataSize": 4096,
"rawDataOffset": 65536,
"virtualAddress": 65536,
"virtualSize": 4096,
"pointerToRelocations": 0,
"pointerToLineNumbers": 0,
"numberOfRelocations": 0,
"numberOfLineNumbers": 0
}
],
"dataDirectories": [
{
"virtualAddress": 46064,
"size": 120
},
{
"virtualAddress": 46184,
"size": 60
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 61440,
"size": 1332
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 65536,
"size": 12
},
{
"virtualAddress": 45200,
"size": 28
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 45056,
"size": 64
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 0,
"size": 0
},
{
"virtualAddress": 0,
"size": 0
}
],
"importDetails": [
{
"libraryName": "KERNEL32.dll",
"numberOfFunctions": "3",
"functions": [
{
"name": "PeekNamedPipe",
"rva": 45056
},
{
"name": "GetLastError",
"rva": 45064
},
{
"name": "CreateMutexW",
"rva": 45072
}
]
},
{
"libraryName": "USER32.dll",
"numberOfFunctions": "2",
"functions": [
{
"name": "MessageBeep",
"rva": 45088
},
{
"name": "MessageBoxA",
"rva": 45096
}
]
}
],
"exportDetails": [
{
"name": "extra",
"ordinal": 1,
"rva": 12984,
"offset": 12984
},
{
"name": "follower",
"ordinal": 2,
"rva": 12984,
"offset": 12984
},
{
"name": "run",
"ordinal": 3,
"rva": 12984,
"offset": 12984
},
{
"name": "scub",
"ordinal": 4,
"rva": 12984,
"offset": 12984
}
],
"isSigned": false,
"overlay": {
"offset": "69632",
"size": "4096"
}
}
}
The parser parses the meta fields but the compiler doesn't do anything with those fields. I was curious if there is any plans to add that functionality in anytime soon.
I utilize meta fields for additional information when processing a match in my code.
It will be great to have the additional pre-processing modules subsystem that can transform original data, e.g extracting, unpacking, decrypting or other kinds of tasks before the scan, without need to use custom yara modules functions and full potential of yara search engine.
Example modules can be:
I'm sure with the right SDK for such functionality community will produce a lot of useful stuff.
The dotnet module seems to crash and end processing if a binary contains more than 8000 user strings, such as https://www.virustotal.com/gui/file/67984703c89ee30cadaa8d7dd5c1a0e9f7f5d096ab0d6d03fdb01115780fa7c3.
During the development of a detection engine based on yara-x in Rust, I encountered a lifetime issue because the scanner depends on a reference to rules, but I want to put the scanner inside a struct. How should this be implemented? Are there any corresponding examples to learn from?
When a source file contains an invalid UTF-8 character, YARA-X fails with an error like this:
error: invalid UTF-8
--> test.yar:3:19
|
3 | author = "John Smith � "
| ^ invalid UTF-8 character
|
By using the chardetng and encoding_rs crates, the encoding of the original source file could be automatically detected and then converted to UTF-8, before the source code is passed to the parser.
This automatic encoding conversion would be performed only when the --force-utf-8
option is passed to the CLI.
Example: { E0 F:0?1?: :0110 1001: AB }
Everything in between pairs of :
is interpreted as a sequence of bits where ?
is a wildcard for a single bit. The number of bits in each sequence must be multiple of 4. You can express one nibble in its hex form and the other one in binary, like in F:0?1?:
When i compile the rules and use the compiled rules to scan a file, one erroe occured with invalid UTF-8 character. But use the original rule scan the same file, it is normal.
Allow using underscores (_) as part of numeric literals like in Rust.
You could write:
1_000_000
instead of 1000000
10_KB
instead of 10KB
0xcafe_babe
instead of 0xcafebabe
...etc
user@host:~/src/yara-x$ cargo install --path cli
Installing yara-x-cli v0.3.0 (./src/yara-x/cli)
Updating crates.io index
error: failed to compile `yara-x-cli v0.3.0 (./src/yara-x/cli)`, intermediate artifacts can be found at `./src/yara-x/target`
Caused by:
failed to select a version for `wasmtime`.
... required by package `yara-x v0.3.0 (./src/yara-x/lib)`
... which satisfies path dependency `yara-x` of package `yara-x-cli v0.3.0 (./src/yara-x/cli)`
versions that meet the requirements `^19.0.1` are: 19.0.2, 19.0.1
the package `yara-x` depends on `wasmtime`, with features: `wasmtime-runtime` but `wasmtime` does not have these features.
It has an optional dependency with that name, but that dependency uses the "dep:" syntax in the features table, so it does not have an implicit feature with that name.
failed to select a version for `wasmtime` which could resolve this conflict
this is on the latest commit to main (e8dedd7), with rustc 1.70.0 and cargo 1.69.1 on a Debian Testing-based distro.
As a result of how pest
library is implemented and usage of WHITESPACE
rule in the grammar which silently consumes all white spaces,
it is possible to correctly parse wrong source code e.g.:
privateruleexample { condition: true }
ruletest{condition:example}
Which results in following AST:
[
Rule {
flags: RuleFlags {
mask: 1,
},
identifier: Ident {
span: Span {
source_id: SourceId(
0,
),
start: 11,
end: 18,
},
name: "example",
},
tags: None,
meta: None,
patterns: None,
condition: True {
span: Span {
source_id: SourceId(
0,
),
start: 32,
end: 36,
},
},
},
Rule {
flags: RuleFlags {
mask: 0,
},
identifier: Ident {
span: Span {
source_id: SourceId(
0,
),
start: 42,
end: 46,
},
name: "test",
},
tags: None,
meta: None,
patterns: None,
condition: Ident(
Ident {
span: Span {
source_id: SourceId(
0,
),
start: 57,
end: 64,
},
name: "example",
},
),
},
]
With tip of tree checked out (7ac04b4) and a clean build, the following command panics:
./target/debug/yr scan eicar.yara eicar
eicar.yara:
rule eicar_av_test {
/*
Per standard, match only if entire file is EICAR string plus optional trailing whitespace.
The raw EICAR string to be matched is:
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
*/
meta:
description = "This is a standard AV test, intended to verify that BinaryAlert is working correctly."
author = "Austin Byers | Airbnb CSIRT"
reference = "http://www.eicar.org/86-0-Intended-use.html"
strings:
$eicar_regex = /^X5O!P%@AP\[4\\PZX54\(P\^\)7CC\)7\}\$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!\$H\+H\*\s*$/
condition:
all of them
}
Consider this rule:
import "pe"
rule not_ms {
condition:
not for any i in (0..pe.number_of_signatures - 1) : (
pe.signatures[i].issuer contains "Microsoft Corporation"
)
}
With files that are not PE, the value for pe.number_of_signatures
is undefined and the loop is not executed. In YARA this means that the for .. in ..
expression is false, but in YARA-X is currently undefined. As a result, this rule is true
for non-PE files in YARA, and false
in YARA-X.
In complex YARA conditions there are many cases in which the same sub-expression is repeated more than once, and its results could be reused instead of re-computed. For instance, consider this condition:
uint16(0) == 0x15FF or uint16(0) == 0x25FF
The sub-expression uint16(0)
is used twice, and the current implementation calls the uint16
function twice with the same argument. However, the result from the first invocation could be stored in a temporary variable and reused when uint16
is called for the second time, instead of invoking the function again, which is an expensive operation.
Additionally, when a sub-expression is contained in the body of a loop, it can be moved out of the loop if the sub-expression doesn't depend on the loop variables. For instance,
for any offset in (0..filesize-1): (
((uint16(offset) == 0x15FF or uint16(offset) == 0x25FF) and
uint32(offset+2) == pe.sections[0].virtual_address + pe.image_base)
)
In the example above, the sub-expression pe.sections[0].virtual_address+pe.image_base
doesn't depend on the offset
variable, and therefore produces the same result on each loop iteration. This expression could be evaluated once outside the loop, and its value reused inside the loop.
Common sub-expression elimination (CSE) and loop-invariant code motion (LICM) are well-known techniques used in compilers.
compiler.add_source
doesn't appear to support adding multiple rules at once, only the first one defined is parsed.
Many file formats contain integer fields that are really interpreted as flags (each bit has a particular meaning). For instance, the characteristics
field in the PE file is one of those fields, where each bit represents a specific characteristic. Currently, when the output of the pe
module is outputted in YAML format, it shows the characteristics
field as a standard integer:
characteristics: 271
This is not human-readable because 271
value doesn't mean anything by itself. If we show this field in hex form it doesn't help that much:
characteristics: 0x10f
It would be very helpful if we show some thing like:
characteristics: 0x10f # MACHINE_32BIT | LOCAL_SYMS_STRIPPED | LINE_NUMS_STRIPPED | EXECUTABLE_IMAGE | RELOCS_STRIPPED
The comment shows the individual bits that are enabled in the 0x10f
value.
This could be implemented by adding a new modifier for fields in the .proto
file.
message PE {
...
required uint32 characteristics = 3 [(yaml.field).fmt = "flags:pe.Characteristics"];
...
}
enum Characteristics {
option (yara.enum_options).inline = true;
RELOCS_STRIPPED = 0x0001;
EXECUTABLE_IMAGE = 0x0002;
LINE_NUMS_STRIPPED = 0x0004;
LOCAL_SYMS_STRIPPED = 0x0008;
AGGRESIVE_WS_TRIM = 0x0010;
LARGE_ADDRESS_AWARE = 0x0020;
BYTES_REVERSED_LO = 0x0080;
MACHINE_32BIT = 0x0100;
DEBUG_STRIPPED = 0x0200;
REMOVABLE_RUN_FROM_SWAP = 0x0400;
NET_RUN_FROM_SWAP = 0x0800;
SYSTEM = 0x1000;
DLL = 0x2000;
UP_SYSTEM_ONLY = 0x4000;
BYTES_REVERSED_HI = 0x8000;
}
The annotation [(yaml.field).fmt = "flags:pe.Characteristics"]
besides the characteristics
field, indicates that this field must be interpreted as a set of flags where the enum defining the flags values is named Characteristics
.
NOTE: It's important to check at some point that each value in the enum has one and only one bit set. A enum with values: 1,2,3,4,5 is not suitable to be used as flags.
The following rule causes a panic:
rule test {
condition:
"foo" matches /https:\/\/[0-9\.]*:[0-9]{2,5}\/[\w]{1,20}\/[\w\d]{1,20}\?[\w]{1,40}=[\w\d]{1,20})(&[\w]{1,20}=[\w]{1,20}){1,3}/
}
The reason is that the regex-automata
crate, fails while compiling the regex because the resulting NFA is too large. The default limit for the NFA size is 10MB, which is insufficient for this regex.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.