getreu / stringsext Goto Github PK
View Code? Open in Web Editor NEWFind multi-byte-encoded strings in binary data (Gitlab mirror).
Home Page: https://blog.getreu.net/projects/stringsext/
License: Other
Find multi-byte-encoded strings in binary data (Gitlab mirror).
Home Page: https://blog.getreu.net/projects/stringsext/
License: Other
Hi,
This relates to #3 in my Need for Speed, it would be great to be able to specify start and end offsets to read the file, this way one could cheaply multiprocess the entire scan by allocating different chunks of a large file to different stringsext instances.
Cheers,
Thomas
Hello and thank you. The byte offsets produced by using the -t flag don't appear to be entirely accurate. There are duplicates. Reading the manual it appears that they signify either a range (<), (>) or indicate that the line is an extension of a line passed the length limit (+). Somewhat of an approximation rather than an exact location like it is with the standard strings command. Is this because of the nature of the worker threads not being aware of one another and able to piece together an exact picture?
When I run stringsext on a binary, I get an empty line where the ELF designator would be. Example:
GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
test.c
main
.symtab
.strtab
.shstrtab
.text
.data
.bss
.comment
.note.GNU-stack
.note.gnu.property
.rela.eh_frame
Whereas the typical strings function gives:
ELF
GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
test.c
main
.symtab
.strtab
.shstrtab
.text
.data
.bss
.comment
.note.GNU-stack
.note.gnu.property
.rela.eh_frame
Is this by design? I would rather see that string in the output frankly. Thank you
E.g. on the example given on README:
stringsext -tx -e utf-8 -e utf-16le -e utf-16be \
-n 10 -a None -u African /dev/disk/by-uuid/567a8410
outputting:
3de2fff0+ (b UTF-16LE) ݒݓݔݕݖݗݙݪ
3de30000+ (b UTF-16LE) ݫݱݶݷݸݹݺ
<3de36528 (a UTF-8) فيأنمامعكلأورديافىهولملكاولهبسالإنهيأيقدهلثمبهلوليبلايبكشيام
>3de36528+ (a UTF-8) أمنتبيلنحبهممشوش
<3de3a708 (a UTF-8) علىإلىهذاآخرعددالىهذهصورغيركانولابينعرضذلكهنايومقالعليانالكن
>3de3a708+ (a UTF-8) حتىقبلوحةاخرفقطعبدركنإذاكمااحدإلافيهبعضكيفبح
3de3a780+ (a UTF-8) ثومنوهوأناجدالهاسلمعندليسعبرصلىمنذبهاأنهمثلكنتالاحيثمصرشرححو
3de3a7f8+ (a UTF-8) لوفياذالكلمرةانتالفأبوخاصأنتانهاليعضووقدابنخيربنتلكمشاءوهياب
3de3a870+ (a UTF-8) وقصصومارقمأحدنحنعدمرأياحةكتبدونيجبمنهتحتجهةسنةيتمكرةغزةنفسبي
3de3a8e8+ (a UTF-8) تللهلناتلكقلبلماعنهأولشيءنورأمافيكبكلذاترتببأنهمسانكبيعفقدحس
3de3a960+ (a UTF-8) نلهمشعرأهلشهرقطرطلب
3df4cca8 (c UTF-16BE) փօև։֍֏
<3df4cd20 (c UTF-16BE) ־ֿ׀ׁׂ׃ׅׄ׆ׇ
what is the meaning of <
, +
and >
characters. Perhaps it is even harder to understand for non speaker of the sample languages like me.
It seems:
+
is linked to the enforce line limit of -q
>
seems to mean raw newline without +
As of 1d7efab ASCII characters are printed instead of the file names.
echo asdf > file1
echo qwer > file2
stringsext file1 file2
outputs:
A asdf
B qwer
where A
means file1
and B
means file2
, which is quite obscure, especially if you have many files. It would be good at least to have an option to have filenames directly:
stringsext -f file1 file2
giving:
file1 asdf
file2 qwer
Hello. Figured I would give some feedback. This would be even better if it was possible to print the line numbers of the occurrences. Thanks
Hi, this looks great! I was wondering if you would be open to splitting out the functionality into a library crate for use via crates.io. Then this repo would end up being a command line interface for the library.
$ RUST_BACKTRACE=1 stringsext input-file | head
Results in the following once it tries to print the 11th line:
thread '<unnamed>' panicked at /home/priw8/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stringsext-2.3.4/src/main.rs:165:37:
Error: Can not sent result through output channel. Write permissions? Is there enough space? : SendError { .. }
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: core::result::unwrap_failed
3: <F as scoped_threadpool::FnBox>::call_box
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'main' panicked at /home/priw8/.cargo/registry/src/index.crates.io-6f17d22bba15001f/scoped_threadpool-0.1.9/src/lib.rs:236:13:
Thread pool worker panicked
stack backtrace:
0: std::panicking::begin_panic
1: scoped_threadpool::Scope::join_all
2: scoped_threadpool::Pool::scoped
3: stringsext::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
I guess head
stops accepting any input once it gets the amount of lines it wants. stringsext
could probably exit gracefully once the output becomes unwritable
Hi,
Follow-up of the email thread: I was looking at using stringsext to scrape paths in binary files, however, piping the output to grep for example, is very slow for large files, e.g. 25Go, I was thinking that having native Regex filtering of the found strings would maybe help here instead of piping a torrent of data via stdout.
Cheers,
Thomas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.