otiai10 / hotsub Goto Github PK

View Code? Open in Web Editor NEW

30.0 7.0 5.0 290 KB

Command line tool to run batch jobs concurrently with ETL framework on AWS or other cloud computing resources

Home Page: https://hotsub.github.io/

License: GNU General Public License v3.0

Go 84.81% Shell 11.53% Common Workflow Language 0.43% JavaScript 0.04% Dockerfile 1.82% TeX 1.21% WDL 0.17%

workflow workflow-engine etl-framework bioinformatics cwl cwl-workflow wdl wdl-workflow batch-job aws

hotsub's Introduction

hotsub

The simple batch job driver on AWS and GCP. (Azure, OpenStack are coming soon)

hotsub run \
  --script ./star-alignment.sh \
  --tasks ./star-alignment-tasks.csv \
  --image friend1ws/star-alignment \
  --aws-ec2-instance-type t2.2xlarge \
  --verbose

It will

execute workflow described in star-alignment.sh
for each samples specified in star-alignment.csv
in friend1ws/star-alignment docker containers
on EC2 instances of type t2.2xlarge

and automatically upload the output files to S3 and clean up EC2 instances after all.

See Documentation for more details.

Why you use `hotsub`

There are 3 points why hotsub is made and why you use it

No-need to setup your cloud on web consoles:
- Since hotsub uses pure EC2 or GCE instances, you don't have to configure AWS Batch nor Dataflow on messy web consoles
Multi-platforms with the same interface of command line:
- You can switch AWS and GCP as you like only with --provider option of run command (of course you need to have credentials on your local machine)
ExTL framework available:
- In some cases of bio-informatics, the problem is how to handle common and huge refrence genome. hotsub suggests and implements ExTL framework.

Installation

Check Getting Started on GitHub Pages

Commands

NAME:
   hotsub - command line to run batch computing on AWS and GCP with the same interface

USAGE:
   hotsub [global options] command [command options] [arguments...]

VERSION:
   0.10.0

DESCRIPTION:
   Open-source command-line tool to run batch computing tasks and workflows on backend services such as Amazon Web Services.

COMMANDS:
     run       Run your jobs on cloud with specified input files and any parameters
     init      Initialize CLI environment on which hotsub runs
     template  Create a template project of hotsub
     help, h   Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h     show help
   --version, -V  print the version

Available options for `run` command

% hotsub run -h
NAME:
   hotsub run - Run your jobs on cloud with specified input files and any parameters

USAGE:
   hotsub run [command options] [arguments...]

DESCRIPTION:
   Run your jobs on cloud with specified input files and any parameters

OPTIONS:
   --verbose, -v                     Print verbose log for operation.
   --log-dir value                   Path to log directory where stdout/stderr log files will be placed (default: "${cwd}/logs/${time}")
   --concurrency value, -C value     Throttle concurrency number for running jobs (default: 8)
   --provider value, -p value        Job service provider, either of [aws, gcp, vbox, hyperv] (default: "aws")
   --tasks value                     Path to CSV of task parameters, expected to specify --env, --input, --input-recursive and --output-recursive. (required)
   --image value                     Image name from Docker Hub or other Docker image service. (default: "ubuntu:14.04")
   --script value                    Local path to a script to run inside the workflow Docker container. (required)
   --shared value, -S value          Shared data URL on cloud storage bucket. (e.g. s3://~)
   --keep                            Keep instances created for computing event after everything gets done
   --env value, -E value             Environment variables to pass to all the workflow containers
   --disk-size value                 Size of data disk to attach for each job in GB. (default: 64)
   --shareddata-disksize value       Disk size of shared data instance (in GB) (default: 64)
   --aws-region value                AWS region name in which AmazonEC2 instances would be launched (default: "ap-northeast-1")
   --aws-ec2-instance-type value     AWS EC2 instance type. If specified, all --min-cores and --min-ram would be ignored. (default: "t2.micro")
   --aws-shared-instance-type value  Shared Instance Type on AWS (default: "m4.4xlarge")
   --aws-vpc-id value                VPC ID on which computing VMs are launched
   --aws-subnet-id value             Subnet ID in which computing VMs are launched
   --google-project value            Project ID for GCP
   --google-zone value               GCP service zone name (default: "asia-northeast1-a")
   --cwl value                       CWL file to run your workflow
   --cwl-job value                   Parameter files for CWL
   --wdl value                       WDL file to run your workflow
   --wdl-job value                   Parameter files for WDL
   --include value                   Local files to be included onto workflow container

Contact

To make it transparent, ask any question from this link.

https://github.com/otiai10/hotsub/issues

hotsub's People

Contributors

Stargazers

Watchers

Forkers

friend1ws shigeohu manabuishii muharihar flazx

hotsub's Issues

[INDEPENDENCY] bin to docker-machine

as is

user should have docker-machine command available to use awsub

to be

user don't have to prepare docker-machine command to use awsub

The fatal error occurred when the task file is empty (or header only)

The error message said below.

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
main.action(0xc420080dc0, 0x0, 0x0)
        /Users/otiai10/proj/go/src/github.com/otiai10/awsub/action.go:35 +0x331
github.com/urfave/cli.HandleAction(0x7d6f40, 0x87d108, 0xc420080dc0, 0xc4201deae0, 0x0)
        /Users/otiai10/proj/go/src/github.com/urfave/cli/app.go:490 +0xd4
github.com/urfave/cli.(*App).Run(0xc420066820, 0xc420068000, 0xc, 0xc, 0x0, 0x0)
        /Users/otiai10/proj/go/src/github.com/urfave/cli/app.go:264 +0x6ac
main.main()
        /Users/otiai10/proj/go/src/github.com/otiai10/awsub/main.go:26 +0x209

goroutine 18 [chan receive]:
main.(*Handler).HandleBunch.func2(0xc4201ded20, 0xc4201decc0, 0xc4202c3de0, 0xa7c7b0, 0x0, 0x0)
        /Users/otiai10/proj/go/src/github.com/otiai10/awsub/handler.go:67 +0x94
created by main.(*Handler).HandleBunch
        /Users/otiai10/proj/go/src/github.com/otiai10/awsub/handler.go:73 +0x150
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/genomon_pipeline_cloud/batch_engine.py", line 54, in execute
    subprocess.check_call(self.generate_commands(task, general_param))
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)

Enhancement for uploading files directly to VM instances.

It would be very helpful in we can directly upload local files to the VM instances. But we should discuss what functions should be added to awsub in more detail.

Allow http/https for "--inputs"

AS IS

--inputs and --input-recursive must be either of s3://... or gs://...

TO BE

For example, http://raw.github.com/foo/bar.txt should be allowed

Error when the file-names of script or task file include "_"

star_alignment_task02-NThhYjJi: failed to create machine: exit status 1: Error creating machine: Invalid hostname specified. Allowed hostname chars are: 0-9a-zA-Z . -

This should be automatically converted to a hyphen ("-").

CI automation

Related #4

TBD:

test cases
AWS account

cc: @friend1ws

Retry on Failure of Creating Machine

Implement Prototype of Ex{E}TL

[Option] Use EFS for Shared Data Instance

Metrics

CPU usage
Memory usage
Network I/O
Disk I/O

Error creating machine

[paplot-task00] Creating docker machine
paplot-task00-NzhjNGI5: failed to create machine: exit status 1: Running pre-create checks...
Creating machine...
(paplot-task00-NzhjNGI5) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Error creating machine: Error running provisioning: Error running apt-get update: ssh command error:
command : sudo apt-get update
err     : exit status 100
output  : Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Hit:2 http://archive.ubuntu.com/ubuntu xenial InRelease
Splitting up /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_InRelease into data and signature failedErr:2 http://archive.ubuntu.com/ubuntu xenial InRelease
  Clearsigned file isn't valid, got 'NODATA' (does the network require authentication?)
Get:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Get:4 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [435 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [102 kB]
Get:6 http://security.ubuntu.com/ubuntu xenial-security/main Translation-en [189 kB]
Get:7 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [7,224 B]
Get:8 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [710 kB]
Get:9 http://security.ubuntu.com/ubuntu xenial-security/restricted Translation-en [2,152 B]
Get:10 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [200 kB]
Get:11 http://security.ubuntu.com/ubuntu xenial-security/universe Translation-en [102 kB]
Get:12 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [3,208 B]
Get:13 http://security.ubuntu.com/ubuntu xenial-security/multiverse Translation-en [1,408 B]
Get:14 http://archive.ubuntu.com/ubuntu xenial-updates/main Translation-en [295 kB]
Get:15 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [7,560 B]
Get:16 http://archive.ubuntu.com/ubuntu xenial-updates/restricted Translation-en [2,272 B]
Get:17 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [579 kB]
Get:18 http://archive.ubuntu.com/ubuntu xenial-updates/universe Translation-en [234 kB]
Get:19 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [16.2 kB]
Get:20 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse Translation-en [8,052 B]
Get:21 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [4,840 B]
Get:22 http://archive.ubuntu.com/ubuntu xenial-backports/main Translation-en [3,220 B]
Get:23 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [6,612 B]
Get:24 http://archive.ubuntu.com/ubuntu xenial-backports/universe Translation-en [3,768 B]
Reading package lists...
E: GPG error: http://archive.ubuntu.com/ubuntu xenial InRelease: Clearsigned file isn't valid, got 'NODATA' (does the network require authentication?)

The default lines below are for a sh/bash shell, you can specify the shell you're using, with the --shell flag.


1 task(s) failed with errors
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/site-packages/genomon_pipeline_cloud/batch_engine.py", line 54, in execute
    subprocess.check_call(self.generate_commands(task, general_param))
  File "/usr/local/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['awsub', '--aws-iam-instance-profile', 'testtest', '--verbose', '--aws-ec2-instance-type', 't2.small', '--script', '/usr/local/lib/python2.7/site-packages/genomon_pipeline_cloud/script/paplot.sh', '--image', 'genomon/paplot', '--tasks', '/work/genomon_pipeline_cloud/tmp/paplot-tasks.tsv']' returned non-zero exit status 1

Behavior when the task file is empty

Validate script and variables before creating machine... it's so sad ;-(

It's so sad when the command raises error after all the machines are created. To prevent that, awsub should first validate --script ./main.sh ← if it exists and variables which are referenced inside the script are all correctly given by either --tasks or --env flags.

Freeze dependencies with using `dep`

dep?

The task file cannot contain double quotes

The error occurs if executing awsub with a task file in the TSV format contains double quotes (")

--keep-shared and --shared-data-instance-ip options

--keep-shared-data-instance
# Not to delete created Shared Data Instance after all the workflow get done

--shared-data-instance-ip
# Private IP Address of Shared Data Instance to reuse existing SDI

How's it? @friend1ws

Request limit exceeded

It can cause the request limit exceeded error when launching many EC2 instances in awsub.
Note that, This error has not yet occurred in awsub.

Unable to run docker-machine provisioning

I occasionally get this error when trying to execute awsub.

sv-filt-tasks-ubuntu-20180213-03410812-mtk4yjrm: failed to create machine: exit status 1: Running pre-create checks...
Creating machine...
(sv-filt-tasks-ubuntu-20180213-03410812-mtk4yjrm) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: ssh command error:
command : sudo systemctl -f start docker
err     : exit status 1
output  : Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.


1 task(s) failed with errors
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/genomon_pipeline_cloud/batch_engine.py", line 60, in seq_execute
    subprocess.check_call(self.generate_commands(task, general_param))
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['awsub', '--aws-iam-instance-profile', 'testtest', '--verbose', '--debug-sleep', '2000', '--aws-ec2-instance-type', 't2.large', '--script', '/usr/local/lib/python2.7/dist-packages/genomon_pipeline_cloud/script/sv-filt.sh', '--image', 'genomon/sv_detection', '--tasks', '/home/ubuntu/tools/genomon_pipeline_cloud-0.1.0/tmp/sv-filt-tasks-ubuntu-20180213-034108.tsv']' returned non-zero exit status 1

Error when the specified bucket for output does not exist

upload failed: tmp/test180204/star/Hela_wt_dox-/Hela_wt_dox-.Chimeric.out.sam to s3://awsub-test-friend1ws2/test180204/star/Hela_wt_dox-/Hela_wt_dox-.Chimeric.out.sam An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist
[star-alignment-task00] [FINALIZE] Successfully uploaded: s3://awsub-test-friend1ws2/test180204/star/Hela_wt_dox-

I think we have 2 options.

First we check whether all the specified buckets in the output parameters exist or not. If some of the bucket do not exists, exit before downloading the input file with loggings.
At the stage of uploading, check whether the specified buckets exist or not and create them if not.

remain exit code of `awsub` somewhre

problem as is

when I issue awsub command at remote VM, for example AWS EC2 just as workspace, then get the terminal pipe broken, I can see the PID is alive but there is not way to get exit status code of awsub command.
Of course, by grepping all the lifecycle logs generated by computing nodes, however, it's more helpful if I can find the final exit code of whole of issuing awsub command.

STAR test script is not working

awsub STAR test does not seem to work well currently.
When I perform the test script modyifing the paths of the output directory of task file,
following message appears:

?download failed: s3://hgc-otiai10-test/examples/genomon_rna/db/GRCh37.STAR-2.5.2a/SA to tmp/GRCh37.STAR-2.5.2a/SA [Errno 28] No space left on device

and empty bam file is copied to the S3.

Integration with `cascade`

https://github.com/cascade/protocol

[INDEPENDENCY] Automatically create instance profile

as is

User have to create IAM instance profile beforehand
- it's annoying to create instance profile

to be

awsub should create default IAM instance profile with users' credentials, by hitting API

Output logs are garbled again

By using the following sources, this garbled is reproducible. Please, try running genomon_pipeline_cloud and output logs are garbled.

[installation]

git clone https://github.com/ken0-1n/genomon_pipeline_cloud.git
cd genomon_pipeline_cloud
pip install . --upgrade

[run]

genomon_pipeline_cloud dna example_conf/sample_awsub_dna.csv s3://kchiba-test-batch/genomon_cloud example_conf/param_dna_awsub.cfg

[output log]

[genomon-qc-tasks-20180207-024034-70661400] &2> + catL /tmp/genomon-resource/_GRCh37/reference/bait/refGene.coding.exon.151207.bed
[genomon-qc-tasks-20180207-024034-70661400] &2> + ・・ｱｹｵｷｶｰ
ﾛ鈑ⅰ瀅鉑ｭｲｰｱｸｰｲｰｷｭｰｲｴｰｳｴｭｷｰｶｶｱｴｰｰﾝ ｦｲｾ ｫ  ｭ・
ﾛ鈑ⅰ瀅鉑ｭｲｰｱｸｰｲｰｷｭｰｲｴｰｳｴｭｷｰｶｶｱｴｰｰﾝ ｦｲｾ ｫ 銓褞 ﾞﾀﾌ ｯ鈑ⅰ瀅鉑肄ｯﾟﾇﾒﾃ雉ｷｯ趺魲蟇粃鴟ｯ貮褓螳胥蓚鰀ｮ褸・ｮｱｵｱｲｰｷｮ粢・
ﾛ鈑ⅰ瀅鉑ｭｲｰｱｸｰｲｰｷｭｰｲｴｰｳｴｭｷｰｶｶｱｴｰｰﾝ ｦｲｾ ｫ
ﾛ鈑ⅰ瀅鉑ｭｲｰｱｸｰｲｰｷｭｰｲｴｰｳｴｭｷｰｶｶｱｴｰｰﾝ ｦｲｾ 韃砌褪ﾟ・ｰ
ﾛ鈑ⅰ瀅鉑ｭｲｰｱｸｰｲｰｷｭｰｲｴｰｳｴｭｷｰｶｶｱｴｰｰﾝ ｦｲｾ ｫ 褸 ｱｹｵｷｶｰ ｭ ｰ 
ﾛ鈑ⅰ瀅鉑ｭｲｰｱｸｰｲｰｷｭｰｲｴｰｳｴｭｷｰｶｶｱｴｰｰﾝ ｦｲｾ 蓊ﾟ・ｱｹｵｷｶｰ
ﾛ鈑ⅰ瀅鉑ｭｲｰｱｸｰｲｰｷｭｰｲｴｰｳｴｭｷｰｶｶｱｴｰｰﾝ ｦｲｾ ｫ ・ｭｱｹｵｷｶｰﾌ ｯ

Option for spot instance

[survey] Support Azure

Timestamp change because of uploading to S3.

The timestamps of .bam files and .bam.bai files (bam index) can flip when they are uploaded to S3.
This can cause the following warning messages when using these bam and index files in later steps:

[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai

Can we keep the order of the timestamps
(bam files is the first and bam.bai files are the second)?

Behavior when the column value of '--input' is empty

If empty with the column value of '--input' in the task file, the below error is returned.

mutation-call-task00-YzJjNzIy: failed to prepare input tasks: failed to download input file `` with status code 1, please check output with --verbose option 1 task(s) failed with errors

I do not want an error to occur even if I use an empty string value.

[temporary] Use on-demand instances as default

https://github.com/otiai10/awsub/blob/master/machine.go#L41-L42

cc: @friend1ws

[FATAL] awsub process died when concurrency is more than (for example) 64

facts

When executing awsub with 64 tasks and --concurrency 64
- awsub command PID died but machines still remain alive

reports

Not reproduced with 30 tasks and --concurrency 30
Not reproduced with 64 tasks and --concurrency 32

expected

awsub process died/been killed somehow, without defered Destroy

Generation of log files

For each task, the contents of the standard output and the standard error should be output to files and should be transferred to a specified directory in S3.

This is very helpful especially for debugging.

Too many logs for downloading

Too many downloading logs appears when verbose mode when the downloaded file is somewhat large such as:

[fusionfusion-task03] [PREPARE] &1> Completed 256.0 KiB/3.0 GiB (311.3 KiB/s) with 1 file(s) remaining
[fusionfusion-task03] [PREPARE] &1> Completed 512.0 KiB/3.0 GiB (617.8 KiB/s) with 1 file(s) remaining
[fusionfusion-task03] [PREPARE] &1> Completed 768.0 KiB/3.0 GiB (921.7 KiB/s) with 1 file(s) remaining
[fusionfusion-task03] [PREPARE] &1> Completed 1.0 MiB/3.0 GiB (1.2 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 1.2 MiB/3.0 GiB (1.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 1.5 MiB/3.0 GiB (1.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 1.8 MiB/3.0 GiB (2.1 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 2.0 MiB/3.0 GiB (2.4 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 2.2 MiB/3.0 GiB (2.7 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 2.5 MiB/3.0 GiB (3.0 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 2.8 MiB/3.0 GiB (3.2 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 3.0 MiB/3.0 GiB (3.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 3.2 MiB/3.0 GiB (3.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 3.5 MiB/3.0 GiB (4.1 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 3.8 MiB/3.0 GiB (4.4 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 4.0 MiB/3.0 GiB (4.6 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 4.2 MiB/3.0 GiB (4.9 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 4.5 MiB/3.0 GiB (5.2 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 4.8 MiB/3.0 GiB (5.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 5.0 MiB/3.0 GiB (5.7 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 5.2 MiB/3.0 GiB (6.0 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 5.5 MiB/3.0 GiB (6.2 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 5.8 MiB/3.0 GiB (6.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 6.0 MiB/3.0 GiB (6.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 6.2 MiB/3.0 GiB (7.0 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 6.5 MiB/3.0 GiB (7.3 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 6.8 MiB/3.0 GiB (7.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 7.0 MiB/3.0 GiB (7.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 7.2 MiB/3.0 GiB (8.1 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 7.5 MiB/3.0 GiB (8.3 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 7.8 MiB/3.0 GiB (8.6 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 8.0 MiB/3.0 GiB (8.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 8.2 MiB/3.0 GiB (9.0 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 8.5 MiB/3.0 GiB (9.3 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 8.8 MiB/3.0 GiB (9.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 9.0 MiB/3.0 GiB (9.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 9.2 MiB/3.0 GiB (10.1 MiB/s) with 1 file(s) remaining   
[fusionfusion-task03] [PREPARE] &1> Completed 9.5 MiB/3.0 GiB (10.2 MiB/s) with 1 file(s) remaining   
[fusionfusion-task03] [PREPARE] &1> Completed 9.8 MiB/3.0 GiB (10.5 MiB/s) with 1 file(s) remaining

This message continues more than 1000 lines, and
Important information for debugging is hidden...

So I prefer not to generate this kind of long downloading logs..

Pass Super Variables to user script container

AS IS

--input FOOBAR
s3://hgc-otiai10-test/foobar

transforms to

/tmp/hgc-otiai10-test/foobar

But user script don't want to or should not think about "/tmp"

TO BE

$AWSUB_ROOT/hgc-otiai10-test/foobar

Super Variables such like:

$AWSUB_ROOT

Separate "machine-create" and "containers up" process from "Create"

as is

Both "machines-create" and "containers-up" are inside job.Create method

problem

When --shared are specified, it takes so long time to "machines-create" after Shared Data Instances are created

to be

Setting up Shared Data Instance and just machines-create for jobs should be parallel
after that, inside "containers-up", the information about Shared Data Instance should be passed to the containers

Use Spot Instances

implement --output tag

Thank you in advance.

Check container errors and let it fail correctly

Even when the program ends with errors, the awsub always says:
"All * tasks completed successfully!"

For example, in the awsub quickguide, this happens even when --aws-iam-instance-profile is not specified and problems in accessing S3 occur

[PREPARE] &2> fatal error: Unable to locate credentials

awsub should check the error code for each task and should show whether each task ends correctly or not.

Security Group Limit Exceeded

The security group limit exceeded when launching many EC2 instances.

Error creating machine: Error in driver during machine creation: SecurityGroupLimitExceeded: The maximum number of security groups for VPC 'vpc-3311ce57' has been reached.
status code: 400, request id: b2de4bd5-7000-498b-9a3a-acf55af449e5

I think that it would be better to create one security group per one awsub execution, not one EC2 instance.

Add license file

GPL v3

DO NOT ignore bucket name when transform input files to envvars

AS IS

--input FOOBAR
s3://hgc-otiai10-test/foobar

transforms to

FOOBAR=$AWSUB_ROOT/foobar

TO BE

--input FOOBAR
s3://hgc-otiai10-test/foobar

should transform to

FOOBAR=$AWSUB_ROOT/hgc-otiai10-test/foobar

Support GCP

[MAINTENANCE] AWS configs cleaning up makes examples NOT working

until Apr. 17 17:00 JST

Delete machines on SIGINT

e.g. Ctrl+C

The required AWS IAM role/policy

What is the minimum set of AWS permissions policy required to execute awsub?

Output logs are garbled

The output logs on the console are garbled as below by running the command: awsub.

udownload: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37.fa.ann to tmp/_GRCh37/reference/GRCh37/GRCh37.fa.ann
udownload: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37.fa.amb to tmp/_GRCh37/reference/GRCh37/GRCh37.fa.amb
udownload: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37.fa.fai to tmp/_GRCh37/reference/GRCh37/GRCh37.fa.fai
download: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37.genome_size to tmp/_GRCh37/reference/GRCh37/GRCh37.genome_size
・ownload: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37_noScaffold_noDecoy.interval_list to tmp/_GRCh37/reference/GRCh37/GRCh37_noScaffold_noDecoy.interval_list
ECompleted 119.5 MiB/394.1 MiB (16.8 MiB/s) with 1 file(s) remaining 59.0 KiB/s) with 1 file(s) remaining
ECompleted 237.5 MiB/394.1 MiB (18.8 MiB/s) with 1 file(s) remaining
蔡・砌ｺ ｺｯｯ・韜粃ｭｭ粃隸趁ﾓ硼裃ｱｰｰ褓ｵｹｲｹﾟ肛ｲｯ褓肄ｱｮ趁・ 趁ﾓ硼裃ｱｰｰ褓ｵｹｲｹﾟ肛ｲｯ褓肄ｱｮ趁・
蔡・砌ｺ ｺｯｯ・韜粃ｭｭ粃隸趁ﾓ硼裃ｱｰｰ褓ｵｹｲｹﾟ肛ｲｯ褓肄ｲｮ趁・ 趁ﾓ硼裃ｱｰｰ褓ｵｹｲｹﾟ肛ｲｯ褓肄ｲｮ趁・

Potential conflict of downloaded input file names

It seems that when the input files in the S3 are downloaded to /tmp directory,
the directory structure is removed and only base names are kept.
So, I think in the current settings, when a task needs multiple input files,
and the base name of these input files are the same
(e.g., s3://input_seq_otiai10/sequence.txt, s3://input_seq_friend1ws/sequence.txt),
some conflicts will occur.

If my observation is true, I think the directory structure of the input file should be kept after downloading to VMs.

otiai10 / hotsub Goto Github PK

hotsub's Introduction

hotsub

Why you use hotsub

Installation

Commands

Available options for run command

Contact

hotsub's People

Contributors

Stargazers

Watchers

Forkers

hotsub's Issues

as is

to be

AS IS

TO BE

problem as is

as is

to be

facts

reports

expected

AS IS

TO BE

as is

problem

to be

AS IS

TO BE

Recommend Projects

Recommend Topics

Recommend Org

Why you use `hotsub`

Available options for `run` command