shenwei356 / rush Goto Github PK

View Code? Open in Web Editor NEW

851.0 21.0 62.0 317 KB

A cross-platform command-line tool for executing jobs in parallel

Home Page: https://github.com/shenwei356/rush

License: MIT License

Shell 3.09% Go 95.28% Python 1.64%

parallel golang cross-platform bioinformatics pipeline execute shell command windows

rush's Introduction

rush -- a cross-platform command-line tool for executing jobs in parallel

rush is a tool similar to GNU parallel and gargs. rush borrows some idea from them and has some unique features, e.g., supporting custom defined variables, resuming multi-line commands, more advanced embeded replacement strings.

These features make rush suitable for easily and flexibly parallelizing complex workflows in fields like Bioinformatics (see examples 18).

Features
Performance
Installation
Usage
Examples
Special Cases
Contributors
Acknowledgements
Contact
License

Features

Major:

Supporting Linux, OS X and Windows (not CygWin)!
Avoid mixed line from multiple processes without loss of performance, e.g. the first half of a line is from one process and the last half of the line is from another process. (--line-buffer in GNU parallel)
Timeout (-t). (--timeout in GNU parallel)
Retry (-r). (--retry-failed --joblog in GNU parallel)
Safe exit after capturing Ctrl-C (not perfect, you may stop it by typing ctrl-c or closing terminal)
Continue (-c). (--resume --joblog in GNU parallel, ~~sut it does not support multi-line commands, which are common in workflow~~)
awk -v like custom defined variables (-v). (Using Shell variable in GNU parallel)
Keeping output in order of input (-k). (Same -k/--keep-order in GNU parallel)
Exit on first error(s) (-e). (not perfect, you may stop it by typing ctrl-c or closing terminal) (--halt 2 in GNU parallel)
Settable record delimiter (-D, default \n). (--recstart and --recend in GNU parallel)
Settable records sending to every command (-n, default 1). (-n/--max-args in GNU parallel)
Settable field delimiter (-d, default \s+). (Same -d/--delimiter in GNU parallel)
Practical replacement strings (like GNU parallel):
- {#}, job ID. (Same in GNU parallel)
- {}, full data. (Same in GNU parallel)
- {n}, nth field in delimiter-delimited data. (Same in GNU parallel)
- Directory and file
  - {/}, dirname. ({//} in GNU parallel)
  - {%}, basename. ({/} in GNU parallel)
  - {.}, remove the last file extension. (Same in GNU parallel)
  - {:}, remove all file extensions (Not directly supported in GNU parallel)
  - {^suffix}, remove suffix (Not directly supported in GNU parallel)
  - {@regexp}, capture submatch using regular expression (Not directly supported in GNU parallel)
- Combinations
  - {%.}, {%:}, basename without extension
  - {2.}, {2/}, {2%.}, manipulate nth field
Preset variable (macro), e.g., rush -v p={^suffix} 'echo {p}_new_suffix', where {p} is replaced with {^suffix}. (Using Shell variable in GNU parallel)

Minor:

Dry run (--dry-run). (Same in GNU parallel)
Trim input data (--trim). (Same in GNU parallel)
Verbose output (--verbose). (Same in GNU parallel)

Differences between rush and GNU parallel on GNU parallel site.

Performance

Performance of rush is similar to gargs, and they are both slightly faster than parallel (Perl) and both slower than Rust parallel (discussion).

Note that speed is not the #.1 target, especially for processes that last long.

Installation

rush is implemented in Go programming language, executable binary files for most popular operating systems are freely available in release page.

Method 1: Download binaries

rush v0.5.4

Tip: run rush -V to check update !!!

OS	Arch	File, (**镜像)
Linux	32-bit	rush_linux_386.tar.gz, (mirror)
Linux	64-bit	rush_linux_amd64.tar.gz, (mirror)
Linux	arm64	rush_linux_arm64.tar.gz, (mirror)
OS X	64-bit	rush_darwin_amd64.tar.gz, (mirror)
OS X	arm64	rush_darwin_arm64.tar.gz, (mirror)
Windows	32-bit	rush_windows_386.exe.tar.gz, (mirror)
Windows	64-bit	rush_windows_amd64.exe.tar.gz, (mirror)

Just download compressed executable file of your operating system, and decompress it with tar -zxvf *.tar.gz command or other tools. And then:

For Linux-like systems
1. If you have root privilege simply copy it to /usr/local/bin:
```
 sudo cp rush /usr/local/bin/
```
2. Or copy to anywhere in the environment variable PATH:
```
 mkdir -p $HOME/bin/; cp rush $HOME/bin/
```
For windows, just copy rush.exe to C:\WINDOWS\system32.

Method 2: For Go developer

go install github.com/shenwei356/rush@latest

Method 3: Compiling from source

# download Go from https://go.dev/dl
wget https://go.dev/dl/go1.17.13.linux-amd64.tar.gz

tar -zxf go1.17.13.linux-amd64.tar.gz -C $HOME/

# or 
#   echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc
#   source ~/.bashrc
export PATH=$PATH:$HOME/go/bin

git clone https://github.com/shenwei356/rush
cd rush

go build

# or statically-linked binary
CGO_ENABLED=0 go build -tags netgo -ldflags '-w -s'

# or cross compile for other operating systems and architectures
CGO_ENABLED=0 GOOS=openbsd GOARCH=amd64 go build -tags netgo -ldflags '-w -s'

Usage

rush -- a cross-platform command-line tool for executing jobs in parallel

Version: 0.5.4

Author: Wei Shen <[email protected]>

Homepage: https://github.com/shenwei356/rush

Input:
  - Input could be a list of strings or numbers, e.g., file paths.
  - Input can be given either from the STDIN or file(s) via the option -i/--infile.
  - Some options could be used to defined how the input records are parsed:
    -d, --field-delimiter   field delimiter in records (default "\s+")
    -D, --record-delimiter  record delimiter (default "\n")
    -n, --nrecords          number of records sent to a command (default 1)
    -J, --records-join-sep  record separator for joining multi-records (default "\n")
    -T, --trim              trim white space (" \t\r\n") in input

Output:
  - Outputs of all commands are written to STDOUT by default,
    you can also use -o/--out-file to specify a output file.
  - Outputs of all commands are random, you can use the flag -k/--kep-order
    to keep output in order of input.
  - Outputs of all commands are buffered, you can use the flag -I/--immediate-output
    to print output immediately and interleaved.

Replacement strings in commands:
  {}          full data
  {#}         job ID
  {n}         nth field in delimiter-delimited data
  {/}         dirname
  {%}         basename
  {.}         remove the last file extension
  {:}         remove all file extensions.
  {^suffix}   remove suffix
  {@regexp}   capture submatch using regular expression

  Combinations:
    {%.}, {%:}          basename without extension
    {2.}, {2/}, {2%.}   manipulate nth field

Preset variable (macro):
  1. You can pass variables to the command like awk via the option -v. E.g.,
     $ seq 3 | rush -v p=prefix_ -v s=_suffix 'echo {p}{}{s}'
     prefix_3_suffix
     prefix_1_suffix
     prefix_2_suffix
  2. The value could also contain replacement strings.
     # {p} will be replaced with {%:}, which computes the basename and remove all file extensions.
     $ echo a/b/c.txt.gz | rush -v 'p={%:}' 'echo {p} {p}.csv'
     c c.csv

Usage:
  rush [flags] [command]

Examples:
  1. simple run, quoting is not necessary
      $ seq 1 10 | rush echo {}
  2. keep order
      $ seq 1 10 | rush 'echo {}' -k
  3. timeout
      $ seq 1 | rush 'sleep 2; echo {}' -t 1
  4. retry
      $ seq 1 | rush 'python script.py' -r 3
  5. dirname & basename & remove suffix
      $ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
      dir file.txt.gz dir/file
  6. basename without the last or any extension
      $ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
      dir.d/file.txt dir.d/file file.txt file
  7. job ID, combine fields and other replacement strings
      $ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}'
      job 1: file.txt file s
  8. capture submatch using regular expression
      $ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
      read
  9. custom field delimiter
      $ echo a=b=c | rush 'echo {1} {2} {3}' -d =
      a b c
  10. custom record delimiter
      $ echo a=b=c | rush -D "=" -k 'echo {}'
      a
      b
      c
      $ echo abc | rush -D "" -k 'echo {}'
      a
      b
      c
  11. assign value to variable, like "awk -v"
      # seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei,lname=Shen
      $ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
      Hello, Wei Shen!
  12. preset variable (Macro)
      # equal to: echo sample_1.fq.gz | rush 'echo {:^_1} {} {:^_1}_2.fq.gz'
      $ echo sample_1.fq.gz | rush -v p={:^_1} 'echo {p} {} {p}_2.fq.gz'
      sample sample_1.fq.gz sample_2.fq.gz
  13. save successful commands to continue in NEXT run
      $ seq 1 3 | rush 'sleep {}; echo {}' -c -t 2
      [INFO] ignore cmd #1: sleep 1; echo 1
      [ERRO] run cmd #1: sleep 2; echo 2: time out
      [ERRO] run cmd #2: sleep 3; echo 3: time out
  14. escape special symbols
      $ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' -q
      a
  15. run a command with relative paths in Windows, please use backslash as the separator.
      # "brename -l -R" is used to search paths recursively
      $ brename -l -q -R -i -p "\.go$" | rush "bin\app.exe {}"

  More examples: https://github.com/shenwei356/rush

Flags:
  -v, --assign strings            assign the value val to the variable var (format: var=val, val also
                                  supports replacement strings)
      --cleanup-time int          time to allow child processes to clean up between stop / kill signals
                                  (unit: seconds, 0 for no time) (default 3) (default 3)
  -c, --continue                  continue jobs. NOTES: 1) successful commands are saved in file (given
                                  by flag -C/--succ-cmd-file); 2) if the file does not exist, rush saves
                                  data so we can continue jobs next time; 3) if the file exists, rush
                                  ignores jobs in it and update the file
      --dry-run                   print command but not run
  -q, --escape                    escape special symbols like $ which you can customize by flag
                                  -Q/--escape-symbols
  -Q, --escape-symbols string     symbols to escape (default "$#&`")
      --eta                       show ETA progress bar
  -d, --field-delimiter string    field delimiter in records, support regular expression (default "\\s+")
  -h, --help                      help for rush
  -I, --immediate-output          print output immediately and interleaved, to aid debugging
  -i, --infile strings            input data file, multi-values supported
  -j, --jobs int                  run n jobs in parallel (default value depends on your device) (default 16)
  -k, --keep-order                keep output in order of input
      --no-kill-exes strings      exe names to exclude from kill signal, example: mspdbsrv.exe; or use
                                  all for all exes (default none)
      --no-stop-exes strings      exe names to exclude from stop signal, example: mspdbsrv.exe; or use
                                  all for all exes (default none)
  -n, --nrecords int              number of records sent to a command (default 1)
  -o, --out-file string           out file ("-" for stdout) (default "-")
      --print-retry-output        print output from retry commands (default true)
      --propagate-exit-status     propagate child exit status up to the exit status of rush (default true)
  -D, --record-delimiter string   record delimiter (default is "\n") (default "\n")
  -J, --records-join-sep string   record separator for joining multi-records (default is "\n") (default "\n")
  -r, --retries int               maximum retries (default 0)
      --retry-interval int        retry interval (unit: second) (default 0)
  -e, --stop-on-error             stop child processes on first error (not perfect, you may stop it by
                                  typing ctrl-c or closing terminal)
  -C, --succ-cmd-file string      file for saving successful commands (default "successful_cmds.rush")
  -t, --timeout int               timeout of a command (unit: seconds, 0 for no timeout) (default 0)
  -T, --trim string               trim white space (" \t\r\n") in input (available values: "l" for left,
                                  "r" for right, "lr", "rl", "b" for both side)
      --verbose                   print verbose information
  -V, --version                   print version information and check for update

Examples

Simple run, quoting is not necessary

 # seq 1 3 | rush 'echo {}'
 $ seq 1 3 | rush echo {}
 3
 1
 2

Read data from file (-i)

 $ rush echo {} -i data1.txt -i data2.txt

Keep output order (-k)

 $ seq 1 3 | rush 'echo {}' -k
 1
 2
 3

Timeout (-t)

 $ time seq 1 | rush 'sleep 2; echo {}' -t 1
 [ERRO] run command #1: sleep 2; echo 1: time out

 real    0m1.010s
 user    0m0.005s
 sys     0m0.007s

Retry (-r)

 $ seq 1 | rush 'python unexisted_script.py' -r 1
 python: can't open file 'unexisted_script.py': [Errno 2] No such file or directory
 [WARN] wait command: python unexisted_script.py: exit status 2
 python: can't open file 'unexisted_script.py': [Errno 2] No such file or directory
 [ERRO] wait command: python unexisted_script.py: exit status 2

Dirname ({/}) and basename ({%}) and remove custom suffix ({^suffix})

 $ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
 dir file_1.txt.gz dir/file

Get basename, and remove last ({.}) or any ({:}) extension

 $ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
 dir.d/file.txt dir.d/file file.txt file

Job ID, combine fields index and other replacement strings

 $ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}'
 job 1: file.txt file s

Capture submatch using regular expression ({@regexp})
```
 $ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
```

Custom field delimiter (-d)

 $ echo a=b=c | rush 'echo {1} {2} {3}' -d =
 a b c

Send multi-lines to every command (-n)

 $ seq 5 | rush -n 2 -k 'echo "{}"; echo'
 1
 2

 3
 4

 5

 # Multiple records are joined with separator `"\n"` (`-J/--records-join-sep`)
 $ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
 1 2

 3 4

 5

 $ seq 5 | rush -n 2 -k -j 3 'echo {1}'
 1
 3
 5

Custom record delimiter (-D), note that empty records are not used.

 $ echo a b c d | rush -D " " -k 'echo {}'
 a
 b
 c
 d

 $ echo abcd | rush -D "" -k 'echo {}'
 a
 b
 c
 d

 # FASTA format
 $ echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC"
 >seq1
 actg
 >seq2
 AAAA
 >seq3
 CCCC

 $ echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC" | rush -D ">" 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
 FASTA record 1: name: seq1 sequence: actg
 FASTA record 2: name: seq2 sequence: AAAA
 FASTA record 3: name: seq3 sequence: CCCC

Assign value to variable, like awk -v (-v)

 $ seq 1  | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
 Hello, Wei Shen!

 $ seq 1  | rush 'echo Hello, {fname} {lname}!' -v fname=Wei,lname=Shen
 Hello, Wei Shen!

 $ for var in a b; do \
 $   seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
 $ done
 var: a, data: 1
 var: a, data: 2
 var: a, data: 3
 var: b, data: 1
 var: b, data: 2
 var: b, data: 3

Preset variable (-v), avoid repeatedly writing verbose replacement strings

 # naive way
 $ echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
 read read_2.fq.gz

 # macro + removing suffix
 $ echo read_1.fq.gz | rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'

 # macro + regular expression
 $ echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'

Escape special symbols

 $ seq 1 | rush 'echo "I have $100"'
 I have 00
 $ seq 1 | rush 'echo "I have $100"' -q
 I have $100
 $ seq 1 | rush 'echo "I have $100"' -q --dry-run
 echo "I have \$100"

 $ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"'
 a       b

 $ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' -q
 a

Interrupt jobs by Ctrl-C, rush will stop unfinished commands and exit.

 $ seq 1 20 | rush 'sleep 1; echo {}'
 ^C[CRIT] received an interrupt, stopping unfinished commands...
 [ERRO] wait cmd #7: sleep 1; echo 7: signal: interrupt
 [ERRO] wait cmd #5: sleep 1; echo 5: signal: killed
 [ERRO] wait cmd #6: sleep 1; echo 6: signal: killed
 [ERRO] wait cmd #8: sleep 1; echo 8: signal: killed
 [ERRO] wait cmd #9: sleep 1; echo 9: signal: killed
 1
 3
 4
 2

Continue/resume jobs (-c). When some jobs failed (by execution failure, timeout, or cancelling by user with Ctrl + C), please switch flag -c/--continue on and run again, so that rush can save successful commands and ignore them in NEXT run.

 $ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
 1
 2
 [ERRO] run cmd #3: sleep 3; echo 3: time out

 # successful commands:
 $ cat successful_cmds.rush
 sleep 1; echo 1__CMD__
 sleep 2; echo 2__CMD__

 # run again
 $ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
 [INFO] ignore cmd #1: sleep 1; echo 1
 [INFO] ignore cmd #2: sleep 2; echo 2
 [ERRO] run cmd #1: sleep 3; echo 3: time out

Commands of multi-lines (Not supported in GNU parallel)

 $ seq 1 3 | rush 'sleep {}; echo {}; \
 echo finish {}' -t 3 -c -C finished.rush
 1
 finish 1
 2
 finish 2
 [ERRO] run cmd #3: sleep 3; echo 3; \
 echo finish 3: time out

 $ cat finished.rush
 sleep 1; echo 1; \
 echo finish 1__CMD__
 sleep 2; echo 2; \
 echo finish 2__CMD__

 # run again
 $ seq 1 3 | rush 'sleep {}; echo {}; \
 echo finish {}' -t 3 -c -C finished.rush
 [INFO] ignore cmd #1: sleep 1; echo 1; \
 echo finish 1
 [INFO] ignore cmd #2: sleep 2; echo 2; \
 echo finish 2
 [ERRO] run cmd #1: sleep 3; echo 3; \
 echo finish 3: time out

Commands are saved to file (-C) right after it finished, so we can view the check finished jobs:

 grep -c __CMD__ successful_cmds.rush

A comprehensive example: downloading 1K+ pages given by three URL list files using phantomjs save_page.js (some page contents are dynamicly generated by Javascript, so wget does not work). Here I set max jobs number (-j) as 20, each job has a max running time (-t) of 60 seconds and 3 retry changes (-r). Continue flag -c is also switched on, so we can continue unfinished jobs. Luckily, it's accomplished in one run 😄
```
 $ for f in $(seq 2014 2016); do \
 $    /bin/rm -rf $f; mkdir -p $f; \
 $    cat $f.html.txt | rush -v d=$f -d = 'phantomjs save_page.js "{}" > {d}/{3}.html' -j 20 -t 60 -r 3 -c; \
 $ done
```

A bioinformatics example: mapping with bwa, and processing result with samtools:

 $ tree raw.cluster.clean.mapping
 raw.cluster.clean.mapping
 ├── M1
 │   ├── M1_1.fq.gz -> ../../raw.cluster.clean/M1/M1_1.fq.gz
 │   ├── M1_2.fq.gz -> ../../raw.cluster.clean/M1/M1_2.fq.gz
 ...

 $ ref=ref/xxx.fa
 $ threads=25
 $ ls -d raw.cluster.clean.mapping/* \
     | rush -v ref=$ref -v j=$threads \
         'bwa mem -t {j} -M -a {ref} {}/{%}_1.fq.gz {}/{%}_2.fq.gz > {}/{%}.sam; \
         samtools view -bS {}/{%}.sam > {}/{%}.bam; \
         samtools sort -T {}/{%}.tmp -@ {j} {}/{%}.bam -o {}/{%}.sorted.bam; \
         samtools index {}/{%}.sorted.bam; \
         samtools flagstat {}/{%}.sorted.bam > {}/{%}.sorted.bam.flagstat; \
         /bin/rm {}/{%}.bam {}/{%}.sam;' \
         -j 2 --verbose -c -C mapping.rush

Since {}/{%} appears many times, we can use preset variable (macro) to simplify it:

 $ ls -d raw.cluster.clean.mapping/* \
     | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
         'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz > {p}.sam; \
         samtools view -bS {p}.sam > {p}.bam; \
         samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
         samtools index {p}.sorted.bam; \
         samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
         /bin/rm {p}.bam {p}.sam;' \
         -j 2 --verbose -c -C mapping.rush

Special Cases

Shell grep returns exit code 1 when no matches found. rush thinks it failed to run. Please use grep foo bar || true instead of grep foo bar.

  $ seq 1 | rush 'echo abc | grep 123'
  [ERRO] wait cmd #1: echo abc | grep 123: exit status 1
  $ seq 1 | rush 'echo abc | grep 123 || true'

Contributors

Main contributors:

Wei Shen
@bburgin Brian Burgin for cross-platform process management.
@howeyc Chris Howey for ETA progress bar.

Others contributors

Acknowledgements

Specially thank @brentp and his gargs, from which rush borrows some ideas.

Thank @bburgin for his contribution on improvement of child process management.

Contact

Create an issue to report bugs, propose new functions or ask for help.

License

MIT License

Starchart

rush's People

Contributors

Stargazers

Watchers

Forkers

rchicoli secsecsec yixf-self dayedepps xtmgah yolanda-ht devopsmi joe2hpimn bburgin mediciprime pwr22 pythseq shammishailaj neumachen falcatrua mikemadden42 watsonwoo zhaoxia413 vmikk tmacbg 0xagentsteal secfb security-geeks techris45 silentsoul04 ifzz liangminliu itildesk bsipos sasqwatch persianyagami90xs elamaran619 danzee1 sam0delkin bbhunter crazyhsu ofjaaah phuong39 jianguozhou3 atlassion yixf-research securitystuffbackup fieryswampshire ti0sec madusec gprime31 lwwal78 raknas999 anyihu saiwii theblackturtle slooppe excloudx6 lejny deepakghengat brunoscaglione terencedong my-allprograminlang mdiqbalahmad porcecodes

rush's Issues

Call new feature: Native Support for perl-like commands

Hi Zhua Brother 😄 ,

As known, short shell scripts like Perl one-liners, awk, and other programming language who can bring us commands in single line are convenient for bioinformatics work. We just take little time to write one (about 50 seconds to several minute), then call rush with these commands embedded for batch processing.
But up to now, we need escape the $ character again and again (it may come to more than 10 times in a one-liner), for instance:

grep ">" ../databases/2clusters.fa | cut -c2- | rush "perl -lanE'(\$x0) = map{/X0:i:(\d+)/}\$_; (\$x1) = map{/X1:i:(\d+)/}\$_; say if \$F[2] eq qq[{}] && \$x0 == 1 && \$x1 == 0' mapped.sam > {}.sam"

All $s, no matter it represents for special variable $_, or slice of array, need to be escaped. Only in this way can we get right statements with --dry-run for preview.
Also see this one:

all=`wc -l < ../mapped.sam`; ls | grep sam | rush "echo \"{.}\t\$(printf '%0.2f' \$(echo \"scale=2; \`wc -l < {}\`*100/$all\" | bc))%\t(\`wc -l < {}\`/$all)\""

Sometimes we need get return value from a Linux command like wc, but though we run rush with --dry-run switch, it also will be run at the time. So we also need escape these back-quotes manually, which make the combined command finally hard to read.
Also, rush has no support for Eskimo symbol (}{) now. Indeed, most exceptions is caused by braces.
The }{ symbol is so important that it help us jump out the while loop implicitly when we use -n switch in Perl one-liner. The right part of the braces is like the end of while loop block, while the left part is play a role as an anonymous block out of the while block. So in this new block, we can summary the hash, or do other operations that we only want to perform once. See this example:

echo -e 'Sample with Virus\tRead number'; cat fq_ref | rush -k "perl -F'\t' -lanE'\$fuck{\$F[0]}++; END{say qq{\$ARGV\t@{[~~keys %fuck]}}}' {1}_{2}.sam"

If not for using in rush, the Perl one-liner could be:

perl -F'\t' -lanE'$fuck{$F[0]}++; }{ say qq{$ARGV\t@{[~~keys %fuck]}}' {1}_{2}.sam

The Eskimo symbol will work well, we don't need heavy END{} any more.
The braces also can be boundary of some tags (common "" in one-liners may cause panics), like qq{string in double-quotes}, qw{string in single-quotes}, qx{regex} and so on. Although Perl's syntax is so flexible that we can replace braces with other paired symbols like **, [], etc., but braces is still the most frequently used one of many people.

So can you bring us a new feature, that supports auto-escaping these symbols ($, back-quote, and {}, place a real back-quote in markdown here will resulting in wrong format) with a new switch like --perl (of course the name is defined by yourself 💯 )?

Better name?

I wanting to try rush locally, so I installed rush (via nix-shell -p rush), but I got the surprise to install another program called also rush, maintained by GNU: https://www.gnu.org/software/rush/, which is a completely different program (a shell).

To properly package rush in other distributions, it would be great to have a name that does not collide with existing programs, especially the ones from GNU software foundation.

Add a `go.mod` file

I'm trying to push rush to Homebrew, but they won't allow the creation of a go.mod file at build time. Would it be possible to have this file in the repository?

Some commands are saved to "successful_cmds.rush" even after unsuccessfully running or interrupting with Ctrl+C

Please update the go version in go.mod

Prerequisites

make sure you're are using the latest version by rush -V
read the usage (rush -h) and examples

Describe your issue

Please update the go version here.

Some modules require higher go version, and the build of the FreeBSD port fails because go-1.17 is required.

escaping not working

it does not escape using -Q flag
echo "http://localhost/?id=asdd=sdd&sdf=dff'&sdf=ff#df\`ds$gfg&sdf$sf"|rush -j 4 "{}" -Q "'#" --dry-run --verbose
output=> http://localhost/?id=asdd=sdd&sdf=dff'&sdf=ff#df`ds&sdf -->why its not escaping ' and #

User defined shell functions

Prerequisites

make sure you're are using the latest version by rush -V

rush v0.5.0

Checking new version...
You are using the latest version of rush

read the usage (rush -h) and examples

Describe your issue

I'm trying to use rush from within a shell script with a user defined bash function, but am getting the command not found: XYZ error.

test.sh

#!/bin/bash

run() {
    echo $1
}

seq 5 | rush run {}

>>> ./test.sh
zsh:1: command not found: run
zsh:1: command not found: run
18:01:27.372 [ERRO] wait cmd #3: run 3: exit status 127
zsh:1: command not found: run
18:01:27.372 [ERRO] wait cmd #4: run 4: exit status 127
18:01:27.373 [ERRO] wait cmd #2: run 2: exit status 127
zsh:1: command not found: run
zsh:1: command not found: run
18:01:27.373 [ERRO] wait cmd #1: run 1: exit status 127
18:01:27.374 [ERRO] wait cmd #5: run 5: exit status 127

describe the problem
provide a reproducible example

wat?

why is it like this?

# rush update
You are not permitted to execute this command.
Contact the systems administrator for further assistance.
# whoami
root

Occasional crash when using the immediate-output flag

Prerequisites

make sure you're are using the latest version by rush -V
$ rush -V
rush v0.5.3

Checking new version...
You are using the latest version of rush

read the usage (rush -h) and examples
Done

Describe your issue

describe the problem
When using the immediate output flag, rush will occasionally fail (but the processes it called will continue to run).

The error looks like the below.

panic: runtime error: slice bounds out of range [7998:7994]

goroutine 81 [running]:
github.com/shenwei356/rush/process.(*ImmediateLineWriter).WritePrefixedLines(0xc0003d0000, {0xc0003ba000, 0x1f3e}, 0xc0000ca008)
        /home/shenwei/go/src/github.com/shenwei356/rush/process/process.go:378 +0x625
github.com/shenwei356/rush/process.ImmediateWriter.Write(...)
        /home/shenwei/go/src/github.com/shenwei356/rush/process/process.go:422
io.copyBuffer({0x1408100, 0xc00038c070}, {0x14084e0, 0xc0003e6008}, {0x0, 0x0, 0x0})
        /usr/local/go/src/io/io.go:429 +0x204
io.Copy(...)
        /usr/local/go/src/io/io.go:386
os/exec.(*Cmd).writerDescriptor.func1()
        /usr/local/go/src/os/exec/exec.go:560 +0x3a
os/exec.(*Cmd).Start.func2(0x0?)
        /usr/local/go/src/os/exec/exec.go:717 +0x32
created by os/exec.(*Cmd).Start
        /usr/local/go/src/os/exec/exec.go:716 +0xab3

provide a reproducible example

Unfortunately, the python script that rush is calling is my company's IP so I cannot provide it to support repro. I can, however, provide the rush command that I'm using if that helps.

cat parallel.txt | rush 'python main.py find_results/{}.txt extraction_results/{} debug' -c -I --eta

Note that it seems to work well for a while after running this command before failing at a seemingly random point in time.

error handling

Not a bug but a request.

I would like to know the error status of all parallel jobs as a list that I can inspect in a shell to determine which has failed.

Also rush should exit with non-zero error code for:

if any of the parallel jobs has failed.
If all parallel jobs have failed - i.e if any of the jobs has succeeded rush reports zero

IS this possible using the current implementation ? My reading of the docs indicates not so.

Equivalent argument of GNU xargs -r (--no-run-if-empty)

Hi! Pretty much the title. As the repository doesn't have a Discussions page, I'm asking here.

As I want to replace my utils with Go equivalents, I want to replace xargs with rush but I can't seem to find the equivalent option. -c for continue doesn't seem like it, besides the fact that it creates a file, which is something I don't want.

Thanks in advance!

Job count appears to sense CPU count

rush v0.4.2

I noticed on an N-core machine, only n jobs spawn by default. The help documentation for -j suggests that 16 is the default. Not sure if this is a feature or a bug, but I thought it was worth documenting.

~/sleepy.sh

#!/bin/bash

COUNT=0

while [[ "$COUNT" -lt 3 ]]; do
  echo "job ${1} count is ${COUNT}"
  COUNT=$((COUNT + 1))
  sleep 1
done
echo "done"

seq 0 3 | rush --immediate-output '~/sleepy.sh {}'

2-core machine:

(2/1/1): job 1 count is 0
(1/1/1): job 0 count is 0
(1/1/2): job 0 count is 1
(2/1/2): job 1 count is 1
(2/1/3): job 1 count is 2
(1/1/3): job 0 count is 2
(1/1/4): done
(2/1/4): done
(4/1/1): job 3 count is 0
(3/1/1): job 2 count is 0
(3/1/2): job 2 count is 1
(4/1/2): job 3 count is 1
(4/1/3): job 3 count is 2
(3/1/3): job 2 count is 2
(4/1/4): done
(3/1/4): done

Similarly on a 4 core, I see 4 jobs simultaneous, and so on. -j N totally overrides this (up to a point, then I ran out of file descriptors :p). But it stumped me for a bit since the directions implied 16 was the default.

Nice tool btw, absolutely love it. Way easier to deploy quickly than parallel, I just copy the bin and I'm ready to rock and roll. :)

Suggestion to improve documentation with practical examples.

I just gotta say, this program is pretty cool, but hard to figure out. The documentation does not make it easy. Please include more practical examples, not just echos. Here's something I'm using it for:

Starting two (or more) azure cloudfunctions.
run.sh

#!/bin/bash
echo '
    cd operator_a && func start -p 7071; 
    cd service_worker_a && func start -p 7072
    ' \
    | rush --immediate-output -k -T b -D ';'   ' {} '

-D = delimiter (;)
-T = trim both sides of input (trying to have a neat script)
--immediate-output = only way to get output
-k = keep order (probably not important?)

How would I run this GNU parallel command with Rush?

Like the title says: how would I run this GNU parallel command with Rush on Windows? The idea is simple: to run bcftools mpileup and bcftools call in parallel.

parallel 'bcftools mpileup -Ou -f ref.fa -r {} in.bam | bcftools call -m -v -Oz -o {}.vcf.gz' ::: {1..22} X Y

How to run two or more commands in parallel?

Prerequisites

make sure you're are using the latest version by rush -V
read the usage (rush -h) and examples

Describe your issue

How to run 3 commands in parallel? Sync example:
$ echo "Hi!"; echo "House!"; echo "Moon" or
$ app1; app2; app3

print ">" using rush

Hello:

how can I print ">" or there special characters with rush echo command: for example:

seq 1 100 |rush 'echo cat {} >n{}'

I want to output similar like this:

cat 1 >n1
cat2 >n2
cat3 >n3
........

Rush failed to create subprocess when running sambamba in parallel

Prerequisites

I'm sure that I'm using the latest version.
But I'm not sure if it occurs by my mistake, for I'm a newbie to this software. If so, forgive me please.

Describe your issue

I have a list file fq_ref containing a part of filename as:

SRR638714	J02428
SRR638714	FJ150422
...

And I've worked with rush and this list well when running commands like bowtie-build and bowtie.
But this time when I tried run sambamba in parallel (or add -j 1 to perform just one command at the time), something went wrong.

The command I used is:

cat fq_ref | rush 'sambamba view -S -F "not unmapped" {1}_{2}.all > {1}_{2}.sam'

And it will produce commands like below with --dry-run parameter:

sambamba view -S -F "not unmapped" SRR638719_J02428.all > SRR638719_J02428.sam
sambamba view -S -F "not unmapped" SRR638716_J02428.all > SRR638716_J02428.sam
sambamba view -S -F "not unmapped" SRR638714_FJ150422.all > SRR638714_FJ150422.sam
sambamba view -S -F "not unmapped" SRR638717_J02428.all > SRR638717_J02428.sam
sambamba view -S -F "not unmapped" SRR638717_FJ150422.all > SRR638717_FJ150422.sam

These commands all perform well as I redirected them as a bash shell script then called bash to execute them one-by-one.

It seems sambamba works in some special mode (maybe multi-threads), which may cause trouble when rush try build subprocess by fork in shell.

I also caught some exception information mixed from both bash and goroutine. For some unknown reasons (["教育网抽风", "辣鸡南开大学"] in Chinese, you know), I cannot upload any files in any format to GitHub again, so I just paste them all here as a comment, for the other details, you can discuss with me on QQ 😆

more firendly for python user

when i usingh rush with seq 1| 'rush python -c "print("aaa")" ' error is parse the ""

Project name conflict

FYI - I went to add this to brewsci but found a conflict with this GNU project:

https://www.gnu.org/software/rush/

Bun support?

Just wondering if there was a plan to add bun's package manager as supported to rush alongside pnpm/npm/yarn?

Short option for --dry-run

Prerequisites

rush 0.5.2

make sure you're are using the latest version by rush -V
read the usage (rush -h) and examples

Describe your issue

Add a short option -D for excuting same function with --dry-run
--dry-run is a frequently option, it is a nice thing to add a short option.
By the way, rush is a very nice tools for bioinformatics guys. 🤩🤩🤩
You can add a buy me coffee link with wechat or something else. 🥳🥳

describe the problem
provide a reproducible example

Attach binary checksums to releases

Hello!

I think it might be a good thing to provide checksums for released binaries, alongside release artifacts — like in many similar projects on GH providing the binaries in releases.

It's important for many to be able to verify the binaries in automated environments (like CI/CD pipelines), while using prebuilt binaries.

Best regards,
Tomasz

feature request: remote execution

Prerequisites

make sure you're are using the latest version by rush -V
read the usage (rush -h) and examples

Feature Request

Do you have any plans to support Remote execution?

Fatal mistake: rush may break orders and cause line broken

When use rush to deal with SAM file, I find:
The -k parameter not works, also original-complete line may broken. To reproduce, you may run command below with attached files example.tar.gz:

head -n7 times | cut -f1 | rush "grep \"{}\" secondary_mapped.sam >> most.sam" -k

When parameter --dry-run is given, I get commands listed as below :

See the order? As same as file times used in rush:

But the order comes wrong

when I remove --dry-run:

Also, lines may be broken by rush:

But when I use only one thread (-j 1) at a time, everything is OK.

head -n7 times | cut -f1 | rush "grep \"{}\" secondary_mapped.sam >> most.sam" -k -j1

rush -e does not kill jobs

Example:
seq 1 10 | rush -j 10 "sleep {} && exit 1" -e

With -e rush will exit immediately as expected as soon as the first error is encountered but the rest of the jobs will keep running until completion which is a problem in my use case.

Note that this a Windows specific issue and the example uses the familiar seq and sleep which I have obtained through MSYS2.

The '>' symbol in the FASTA file is being interpreted as a redirect command.

Hello, I am new to bioinformatics and recently discovered that Rush offers support for custom-defined variables and advanced embedded replacement strings. I have realized the convenience of using single-line commands to process sequences in a shell environment. However, while learning through examples, I encountered an issue when trying to execute the following command:
echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC"|rush
It get the right result,but when i try:
echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC"|rush echo {}

As shown in the picture, this command only prints out the sequence lines, but the description line is not output, instead generating an empty file with the description line as the file name.
So I tried the following command again and the result was the same：

It appears that the issue I encountered is related to the interpretation of the redirect command (>) by the echo command. Logically, enclosing the redirect command in quotation marks should prevent this problem. Additionally, I tried using the parallel command and obtained the correct result.

Is the situation described above a result of my incorrect usage of Rush?

Unable to run executable within a folder

Prerequisites

make sure you're are using the latest version by rush -V
read the usage (rush -h) and examples

Describe your issue

describe the problem

I'm trying to run a series of scripts in parallel as a Git pre-push script (code linting, etc), but I'm having trouble getting anything to run.

provide a reproducible example

/c/dev/bash-scripts/binaries/rush --verbose -I -D ';' ' {} ' <<- "DOC"
    venv/scripts/python manage.py makemigrations --check --dry-run;
DOC

Gives this output:

+ /c/dev/bash-scripts/binaries/rush --verbose -I -D ';' ' {} '
09:55:59.801 [INFO] start cmd #1: venv/scripts/python manage.py makemigrations --check --dry-run
(1/1/1): 'venv' is not recognized as an internal or external command,
(1/1/2): operable program or batch file.
09:55:59.851 [INFO] finish cmd #1 in 49.7856ms: venv/scripts/python manage.py makemigrations --check --dry-run
09:55:59.851 [ERRO] wait cmd #1: venv/scripts/python manage.py makemigrations --check --dry-run : exit status 1

I can, of course, run this command by itself without issue:

$ venv/scripts/python manage.py makemigrations --check --dry-run
No changes detected

What am I doing wrong here?

TODO

add example of -v
implement retry interval
add more examples on bioinformatics
do not send empty data
support continue
test more in windows
avoid mixed line from multiple process, e.g. the first half of a line is from one process and the last half of the line is from another process.
replacement string {^suffix} for removing suffix
add flag --eta

bug: escape symbols

should escape symbols after filling commands:

https://github.com/shenwei356/rush/blob/master/root.go#L227

ZScaler detects rush as a virus

Prerequisites

make sure you're are using the latest version by rush -V
read the usage (rush -h) and examples

Describe your issue

describe the problem
provide a reproducible example

Download is blocked for users protected by ZScaler - can you please work with them to get off the blacklist?

Bash conditionals inside

Hi Shen, first at all, thank you very much for rush, im a gnu/parallel heavy user, and now im migrating to rush, running away form perl =]

I normally execute something like this

for PATH in $(<file.txt); do([[ -d ./$PATH ]] && echo $PATH); done

cat file.txt | parallel '[[ -d ./{} ]] && echo {}'

but in rush i go the next.

14:24:23.705 [ERRO] wait cmd #10: [[ ! -d ./example1 ]] && echo 1: exit status 1

with command

rush --jobs 1 '[[ ! -d ./{} ]] && echo {}'

Do you mind what could happens here?

Thank you very much!.

Support reading records via stdin

Hi, great tool! I like all of your *kit programs.

One thing I'm using a lot in GNU parallel is the --pipe option, where the records are divided in blocks and provided to the commands via stdin. This is very useful if single commands work on a large number of records and stdin is better then command line arguments with size restrictions. rush can use an explicit number of records, which I sometimes prefer and which GNU parallel cannot do, because the blocksize is defined by (approximate) data size for performance reasons.

Is there any chance this feature makes it into rush (I coudn't find it)?

I'm aware that this kind of circumvents the whole custom field and parameter assignment part, but maybe you can fit it smoothly by using a BASH-like named pipe syntax to turn records and fields into virtual files using fifos. For instance

rush - n 1000 'command < <{2}' < records.txt

could provide the second field of records.txt as a file. The syntax should, of course, not clash with common Shell syntax. This example was just for illustration purposes.

Best,
Johannes

Unable to extract files from tar

How to understand which log belong to which input parameter?

Hello there,

First of all thank you , rush is great problem solver tool.
We trying to implement rush for multi cluster deployment on kubernetes and other internal deployments in parallel.

I use rush with this pattern
echo -e "rollout status deployment/name-n xyznamespace\nrollout status deployment/name-n abcnamespace" | ./rush 'kubectl {}'
This pattern work without any problem with rush. But with 2 or 3 input, logs are getting complicated and we can't follow which logs belong which input.

For example
Can we add input parameter to beginning of the log like this
[1. parameter] status is waiting
[3. parameter] status is succeed
[2. parameter] status is failed

What can be done to fix this problem. Can we achieve with current version of rust?
If development is needed, i would be very happy to contribute

error "echo %TASKS% | rush zip -@ {}.zip < {}_input.txt"

I want to parallel running 2 zips with individual input file list on Windows:
I have debug_input.txt and release_input.txt in same directory

Here is how I write the code:

Set TASKS=debug release
echo %TASKS% | rush zip -@ {}.zip < {}_input.txt -D " " --dry-run

but it shows:
The system cannot find the file specified.

if I run
echo %TASKS% | rush zip -@ {}.zip -D " " --dry-run
it shows
unknown shorthand flag: '@' in -@

looks like I need escape for the '@', I tried

echo %TASKS% | rush zip -@ {}.zip -D " " --dry-run
still
unknown shorthand flag: '\' in -@

how to write this rush command correctly?thx!

{@regexp} Question

What's is the {@regexp}, capture submatch using regular expression? Do you have instruction about the regular expression recognized by rush? Thanks

Can't use non latin encoding in windows

Hi, nice tool, thx
I need local encoding but
echo Мир |rush "echo Привет " {}

output
�� ???

variables will not interpolated

Prerequisites

make sure you're are using the latest version by rush -V
read the usage (rush -h) and examples

Describe your issue

describe the problem
Some Variable will not interpolated if the command contains some {.

I need this because: https://stackoverflow.com/questions/24597818/exit-with-error-message-in-bash-oneline

provide a reproducible example

Expected result:

seq 1 3 | rush -k '{ echo "{} {A}" && echo "{} {A}"; }' -v A=4
1 4
1 4
2 4
2 4
3 4
3 4

Actual result:

seq 1 3 | rush -k '{ echo "{} {A}" && echo "{} {A}"; }' -v A=4
{} 4
1 4
{} 4
2 4
{} 4
3 4

Workaround (use a sub shell):

seq 1 3 | rush -k '( echo "{} {A}" && echo "{} {A}"; )' -v A=4
1 4
1 4
2 4
2 4
3 4
3 4

retry-interval a fraction of a second?

This is quite a powerful tool. Thanks for it! I'm using it to do transfers on s3. While setting up retries I noticed that the --retry-interval is a number of seconds, and checked if fractions of a second are allowed using GitHub search, and found it's restricted to integers. I think either the argument could be made to support decimals (0.2 for 200ms) or an additional mutually exclusive argument for the number of milliseconds before retry could be added. I can make a PR, thought I'd ask how you'd like it designed first though.

Is it possible to escpae single quotes?

I'm trying to use rush to parallelize this command that adds a suffix to FASTA headers, in order to remove possible duplications:

The command is as follows:

pigz -d -c MYCOCOSM_TRANSCRIPTOMES/Pooled_transcriptome.part_001.fna.gz | \
awk -v OFS="" '{if($0 ~ /^>/) {x[$0]++ ; print $0,"|",x[$0]} else print $0}' | \
seqkit seq -w 0 -u | pigz -3 > MYCOCOSM_TRANSCRIPTOMES_2/Pooled_transcriptome.part_001.fna.gz

Because of the ' in awk, this command won't work in rush. I can't replace it with double quotes either because I get the following error:

$ cat list.txt | rush -d 'pigz -d -c MYCOCOSM_TRANSCRIPTOMES/{} | awk "{if($0 ~ /^>/) {x[$0]++ ; print $0 "|" x[$0]} else print $0}"'

14:08:23.319 [ERRO] compile field delimiter: error parsing regexp: invalid nested repetition operator: `++`

Is there a way of escaping the single quotes? \' didn't work.

Please Update the new feature of progress bar

Hi there,
I hope you are doing well. Rush is a seriously very good script for parallel jobs/works. Many features in there. It would be great if there was a progress bar. It would be nice to see how many lines worked. This feature included an Interlace script. (https://github.com/codingo/Interlace) This feature really awesome. Do you have any plans to do that?

{%.}, {%:}, basename without extension is just {/.} in GNU parallel

Hey, your software looks great, i'll use it soon.
I just notice in the Readme/Documentation It says this isn't supported in GNU parallel, which I think isn't true (except that parallel only removes periods by default):

Combinations (Combinations of 3+ replacement strings not supported in GNU parallel):
{%.}, {%:}, basename without extension
{2.}, {2/}, {2%.}, manipulate nth field

The first is achieved in GNU parallel simply using {/.}. You can also manipulate as many nth fields as you want:

$ VAR1=/house/inthe/tree/plays.to.mo.rrow
$ VAR2=/hey/hop/hip
$ VAR3=/up/down/left.right
$ parallel  --plus "echo {2/} {1/..} {3.}" ::: $VAR1 ::: $VAR2 ::: $VAR3
hip plays.to /up/down/left

(With plus I also removed two periods from field {1})

shenwei356 / rush Goto Github PK

rush's Introduction

rush -- a cross-platform command-line tool for executing jobs in parallel

Table of Contents

Features

Performance

Installation

Method 1: Download binaries

Method 2: For Go developer

Method 3: Compiling from source

Usage

Examples

Special Cases

Contributors

Acknowledgements

Contact

License

Starchart

rush's People

Contributors

Stargazers

Watchers

Forkers

rush's Issues

Prerequisites

Describe your issue

Prerequisites

Describe your issue

Prerequisites

Describe your issue

Prerequisites

Describe your issue

Prerequisites

Describe your issue

Prerequisites

Describe your issue

Prerequisites

Feature Request

Prerequisites

Describe your issue

Prerequisites

Describe your issue

Prerequisites

Describe your issue

Recommend Projects

Recommend Topics

Recommend Org