getreu / stringsext Goto Github PK

View Code? Open in Web Editor NEW

116.0 116.0 9.0 565 KB

Find multi-byte-encoded strings in binary data (Gitlab mirror).

Home Page: https://blog.getreu.net/projects/stringsext/

License: Other

Rust 97.86% Shell 1.90% Dockerfile 0.24%

forensics rust string-search unicode

stringsext's People

Contributors

Stargazers

Watchers

Forkers

oylenshpeegul shekkbuilder wdv4758h yehgdotnet eirexe rnbguy wangfuwen000 zaventh microwave-wyb

stringsext's Issues

Implements support for start and end offsets.

Hi,

This relates to #3 in my Need for Speed, it would be great to be able to specify start and end offsets to read the file, this way one could cheaply multiprocess the entire scan by allocating different chunks of a large file to different stringsext instances.

Cheers,

Thomas

Byte offsets not accurate

Hello and thank you. The byte offsets produced by using the -t flag don't appear to be entirely accurate. There are duplicates. Reading the manual it appears that they signify either a range (<), (>) or indicate that the line is an extension of a line passed the length limit (+). Somewhat of an approximation rather than an exact location like it is with the standard strings command. Is this because of the nature of the worker threads not being aware of one another and able to piece together an exact picture?

No ELF? Intended behavior?

When I run stringsext on a binary, I get an empty line where the ELF designator would be. Example:

GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
test.c
main
.symtab
.strtab
.shstrtab
.text
.data
.bss
.comment
.note.GNU-stack
.note.gnu.property
.rela.eh_frame

Whereas the typical strings function gives:
ELF
GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
test.c
main
.symtab
.strtab
.shstrtab
.text
.data
.bss
.comment
.note.GNU-stack
.note.gnu.property
.rela.eh_frame

Is this by design? I would rather see that string in the output frankly. Thank you

Document meaning of `<`, `+` and `>` on offsets

E.g. on the example given on README:

stringsext -tx -e utf-8 -e utf-16le -e utf-16be \
           -n 10 -a None -u African  /dev/disk/by-uuid/567a8410

outputting:

 3de2fff0+	(b UTF-16LE)	ݒݓݔݕݖݗݙݪ
 3de30000+	(b UTF-16LE)	ݫݱݶݷݸݹݺ
<3de36528 	(a UTF-8)	فيأنمامعكلأورديافىهولملكاولهبسالإنهيأيقدهلثمبهلوليبلايبكشيام
>3de36528+	(a UTF-8)	أمنتبيلنحبهممشوش
<3de3a708 	(a UTF-8)	علىإلىهذاآخرعددالىهذهصورغيركانولابينعرضذلكهنايومقالعليانالكن
>3de3a708+	(a UTF-8)	حتىقبلوحةاخرفقطعبدركنإذاكمااحدإلافيهبعضكيفبح
 3de3a780+	(a UTF-8)	ثومنوهوأناجدالهاسلمعندليسعبرصلىمنذبهاأنهمثلكنتالاحيثمصرشرححو
 3de3a7f8+	(a UTF-8)	لوفياذالكلمرةانتالفأبوخاصأنتانهاليعضووقدابنخيربنتلكمشاءوهياب
 3de3a870+	(a UTF-8)	وقصصومارقمأحدنحنعدمرأياحةكتبدونيجبمنهتحتجهةسنةيتمكرةغزةنفسبي
 3de3a8e8+	(a UTF-8)	تللهلناتلكقلبلماعنهأولشيءنورأمافيكبكلذاترتببأنهمسانكبيعفقدحس
 3de3a960+	(a UTF-8)	نلهمشعرأهلشهرقطرطلب
 3df4cca8 	(c UTF-16BE)	փօև։֋֍֏׹
<3df4cd20 	(c UTF-16BE)	־ֿ׀ׁׂ׃ׅׄ׆ׇ׈׉׊׋

what is the meaning of <, + and > characters. Perhaps it is even harder to understand for non speaker of the sample languages like me.

It seems:

+ is linked to the enforce line limit of -q
> seems to mean raw newline without +

Option to show file name like `strings -f` / `--print-file-name`

As of 1d7efab ASCII characters are printed instead of the file names.

echo asdf > file1
echo qwer > file2
stringsext file1 file2

outputs:

A asdf
B qwer

where A means file1 and B means file2, which is quite obscure, especially if you have many files. It would be good at least to have an option to have filenames directly:

stringsext -f file1 file2

giving:

file1 asdf
file2 qwer

Suggestion: line numbers

Hello. Figured I would give some feedback. This would be even better if it was possible to print the line numbers of the occurrences. Thanks

Split out into a library crate?

Hi, this looks great! I was wondering if you would be open to splitting out the functionality into a library crate for use via crates.io. Then this repo would end up being a command line interface for the library.

Redirecting output to a program like `head` causes a panic

$ RUST_BACKTRACE=1 stringsext input-file | head

Results in the following once it tries to print the 11th line:

thread '<unnamed>' panicked at /home/priw8/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stringsext-2.3.4/src/main.rs:165:37:
Error: Can not sent result through output channel. Write permissions? Is there enough space? : SendError { .. }
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: <F as scoped_threadpool::FnBox>::call_box
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'main' panicked at /home/priw8/.cargo/registry/src/index.crates.io-6f17d22bba15001f/scoped_threadpool-0.1.9/src/lib.rs:236:13:
Thread pool worker panicked
stack backtrace:
   0: std::panicking::begin_panic
   1: scoped_threadpool::Scope::join_all
   2: scoped_threadpool::Pool::scoped
   3: stringsext::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

I guess head stops accepting any input once it gets the amount of lines it wants. stringsext could probably exit gracefully once the output becomes unwritable

Implement support for Regex filtering of strings.

Hi,

Follow-up of the email thread: I was looking at using stringsext to scrape paths in binary files, however, piping the output to grep for example, is very slow for large files, e.g. 25Go, I was thinking that having native Regex filtering of the found strings would maybe help here instead of piping a torrent of data via stdout.

Cheers,

Thomas

getreu / stringsext Goto Github PK

stringsext's People

Contributors

Stargazers

Watchers

Forkers

stringsext's Issues

Implements support for start and end offsets.

Byte offsets not accurate

No ELF? Intended behavior?

Document meaning of `<`, `+` and `>` on offsets

Option to show file name like `strings -f` / `--print-file-name`

Suggestion: line numbers

Split out into a library crate?

Redirecting output to a program like `head` causes a panic

Implement support for Regex filtering of strings.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent