Giter Site home page Giter Site logo

getreu / stringsext Goto Github PK

View Code? Open in Web Editor NEW
116.0 116.0 9.0 565 KB

Find multi-byte-encoded strings in binary data (Gitlab mirror).

Home Page: https://blog.getreu.net/projects/stringsext/

License: Other

Rust 97.86% Shell 1.90% Dockerfile 0.24%
forensics rust string-search unicode

stringsext's People

Contributors

getreu avatar iyj6707 avatar kelsolaar avatar zaventh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

stringsext's Issues

Implements support for start and end offsets.

Hi,

This relates to #3 in my Need for Speed, it would be great to be able to specify start and end offsets to read the file, this way one could cheaply multiprocess the entire scan by allocating different chunks of a large file to different stringsext instances.

Cheers,

Thomas

Byte offsets not accurate

Hello and thank you. The byte offsets produced by using the -t flag don't appear to be entirely accurate. There are duplicates. Reading the manual it appears that they signify either a range (<), (>) or indicate that the line is an extension of a line passed the length limit (+). Somewhat of an approximation rather than an exact location like it is with the standard strings command. Is this because of the nature of the worker threads not being aware of one another and able to piece together an exact picture?

No ELF? Intended behavior?

When I run stringsext on a binary, I get an empty line where the ELF designator would be. Example:

GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
test.c
main
.symtab
.strtab
.shstrtab
.text
.data
.bss
.comment
.note.GNU-stack
.note.gnu.property
.rela.eh_frame

Whereas the typical strings function gives:
ELF
GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
test.c
main
.symtab
.strtab
.shstrtab
.text
.data
.bss
.comment
.note.GNU-stack
.note.gnu.property
.rela.eh_frame

Is this by design? I would rather see that string in the output frankly. Thank you

Document meaning of `<`, `+` and `>` on offsets

E.g. on the example given on README:

stringsext -tx -e utf-8 -e utf-16le -e utf-16be \
           -n 10 -a None -u African  /dev/disk/by-uuid/567a8410

outputting:

 3de2fff0+	(b UTF-16LE)	ݒݓݔݕݖݗݙݪ
 3de30000+	(b UTF-16LE)	ݫݱݶݷݸݹݺ
<3de36528 	(a UTF-8)	فيأنمامعكلأورديافىهولملكاولهبسالإنهيأيقدهلثمبهلوليبلايبكشيام
>3de36528+	(a UTF-8)	أمنتبيلنحبهممشوش
<3de3a708 	(a UTF-8)	علىإلىهذاآخرعددالىهذهصورغيركانولابينعرضذلكهنايومقالعليانالكن
>3de3a708+	(a UTF-8)	حتىقبلوحةاخرفقطعبدركنإذاكمااحدإلافيهبعضكيفبح
 3de3a780+	(a UTF-8)	ثومنوهوأناجدالهاسلمعندليسعبرصلىمنذبهاأنهمثلكنتالاحيثمصرشرححو
 3de3a7f8+	(a UTF-8)	لوفياذالكلمرةانتالفأبوخاصأنتانهاليعضووقدابنخيربنتلكمشاءوهياب
 3de3a870+	(a UTF-8)	وقصصومارقمأحدنحنعدمرأياحةكتبدونيجبمنهتحتجهةسنةيتمكرةغزةنفسبي
 3de3a8e8+	(a UTF-8)	تللهلناتلكقلبلماعنهأولشيءنورأمافيكبكلذاترتببأنهمسانكبيعفقدحس
 3de3a960+	(a UTF-8)	نلهمشعرأهلشهرقطرطلب
 3df4cca8 	(c UTF-16BE)	փօև։֋֍֏׹
<3df4cd20 	(c UTF-16BE)	־ֿ׀ׁׂ׃ׅׄ׆ׇ׈׉׊׋

what is the meaning of <, + and > characters. Perhaps it is even harder to understand for non speaker of the sample languages like me.

It seems:

  • + is linked to the enforce line limit of -q
  • > seems to mean raw newline without +

Option to show file name like `strings -f` / `--print-file-name`

As of 1d7efab ASCII characters are printed instead of the file names.

echo asdf > file1
echo qwer > file2
stringsext file1 file2

outputs:

A asdf
B qwer

where A means file1 and B means file2, which is quite obscure, especially if you have many files. It would be good at least to have an option to have filenames directly:

stringsext -f file1 file2

giving:

file1 asdf
file2 qwer

Suggestion: line numbers

Hello. Figured I would give some feedback. This would be even better if it was possible to print the line numbers of the occurrences. Thanks

Split out into a library crate?

Hi, this looks great! I was wondering if you would be open to splitting out the functionality into a library crate for use via crates.io. Then this repo would end up being a command line interface for the library.

Redirecting output to a program like `head` causes a panic

$ RUST_BACKTRACE=1 stringsext input-file | head

Results in the following once it tries to print the 11th line:

thread '<unnamed>' panicked at /home/priw8/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stringsext-2.3.4/src/main.rs:165:37:
Error: Can not sent result through output channel. Write permissions? Is there enough space? : SendError { .. }
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: <F as scoped_threadpool::FnBox>::call_box
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'main' panicked at /home/priw8/.cargo/registry/src/index.crates.io-6f17d22bba15001f/scoped_threadpool-0.1.9/src/lib.rs:236:13:
Thread pool worker panicked
stack backtrace:
   0: std::panicking::begin_panic
   1: scoped_threadpool::Scope::join_all
   2: scoped_threadpool::Pool::scoped
   3: stringsext::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

I guess head stops accepting any input once it gets the amount of lines it wants. stringsext could probably exit gracefully once the output becomes unwritable

Implement support for Regex filtering of strings.

Hi,

Follow-up of the email thread: I was looking at using stringsext to scrape paths in binary files, however, piping the output to grep for example, is very slow for large files, e.g. 25Go, I was thinking that having native Regex filtering of the found strings would maybe help here instead of piping a torrent of data via stdout.

Cheers,

Thomas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.