Giter Site home page Giter Site logo

otiai10 / hotsub Goto Github PK

View Code? Open in Web Editor NEW
30.0 7.0 5.0 290 KB

Command line tool to run batch jobs concurrently with ETL framework on AWS or other cloud computing resources

Home Page: https://hotsub.github.io/

License: GNU General Public License v3.0

Go 84.81% Shell 11.53% Common Workflow Language 0.43% JavaScript 0.04% Dockerfile 1.82% TeX 1.21% WDL 0.17%
workflow workflow-engine etl-framework bioinformatics cwl cwl-workflow wdl wdl-workflow batch-job aws

hotsub's Introduction

hotsub Build Status Paper Status

The simple batch job driver on AWS and GCP. (Azure, OpenStack are coming soon)

hotsub run \
  --script ./star-alignment.sh \
  --tasks ./star-alignment-tasks.csv \
  --image friend1ws/star-alignment \
  --aws-ec2-instance-type t2.2xlarge \
  --verbose

It will

  • execute workflow described in star-alignment.sh
  • for each samples specified in star-alignment.csv
  • in friend1ws/star-alignment docker containers
  • on EC2 instances of type t2.2xlarge

and automatically upload the output files to S3 and clean up EC2 instances after all.

See Documentation for more details.

Why you use hotsub

There are 3 points why hotsub is made and why you use it

  1. No-need to setup your cloud on web consoles:
    • Since hotsub uses pure EC2 or GCE instances, you don't have to configure AWS Batch nor Dataflow on messy web consoles
  2. Multi-platforms with the same interface of command line:
    • You can switch AWS and GCP as you like only with --provider option of run command (of course you need to have credentials on your local machine)
  3. ExTL framework available:
    • In some cases of bio-informatics, the problem is how to handle common and huge refrence genome. hotsub suggests and implements ExTL framework.

Installation

Check Getting Started on GitHub Pages

Commands

NAME:
   hotsub - command line to run batch computing on AWS and GCP with the same interface

USAGE:
   hotsub [global options] command [command options] [arguments...]

VERSION:
   0.10.0

DESCRIPTION:
   Open-source command-line tool to run batch computing tasks and workflows on backend services such as Amazon Web Services.

COMMANDS:
     run       Run your jobs on cloud with specified input files and any parameters
     init      Initialize CLI environment on which hotsub runs
     template  Create a template project of hotsub
     help, h   Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h     show help
   --version, -V  print the version

Available options for run command

% hotsub run -h
NAME:
   hotsub run - Run your jobs on cloud with specified input files and any parameters

USAGE:
   hotsub run [command options] [arguments...]

DESCRIPTION:
   Run your jobs on cloud with specified input files and any parameters

OPTIONS:
   --verbose, -v                     Print verbose log for operation.
   --log-dir value                   Path to log directory where stdout/stderr log files will be placed (default: "${cwd}/logs/${time}")
   --concurrency value, -C value     Throttle concurrency number for running jobs (default: 8)
   --provider value, -p value        Job service provider, either of [aws, gcp, vbox, hyperv] (default: "aws")
   --tasks value                     Path to CSV of task parameters, expected to specify --env, --input, --input-recursive and --output-recursive. (required)
   --image value                     Image name from Docker Hub or other Docker image service. (default: "ubuntu:14.04")
   --script value                    Local path to a script to run inside the workflow Docker container. (required)
   --shared value, -S value          Shared data URL on cloud storage bucket. (e.g. s3://~)
   --keep                            Keep instances created for computing event after everything gets done
   --env value, -E value             Environment variables to pass to all the workflow containers
   --disk-size value                 Size of data disk to attach for each job in GB. (default: 64)
   --shareddata-disksize value       Disk size of shared data instance (in GB) (default: 64)
   --aws-region value                AWS region name in which AmazonEC2 instances would be launched (default: "ap-northeast-1")
   --aws-ec2-instance-type value     AWS EC2 instance type. If specified, all --min-cores and --min-ram would be ignored. (default: "t2.micro")
   --aws-shared-instance-type value  Shared Instance Type on AWS (default: "m4.4xlarge")
   --aws-vpc-id value                VPC ID on which computing VMs are launched
   --aws-subnet-id value             Subnet ID in which computing VMs are launched
   --google-project value            Project ID for GCP
   --google-zone value               GCP service zone name (default: "asia-northeast1-a")
   --cwl value                       CWL file to run your workflow
   --cwl-job value                   Parameter files for CWL
   --wdl value                       WDL file to run your workflow
   --wdl-job value                   Parameter files for WDL
   --include value                   Local files to be included onto workflow container

Contact

To make it transparent, ask any question from this link.

https://github.com/otiai10/hotsub/issues

hotsub's People

Contributors

aokad avatar friend1ws avatar otiai10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hotsub's Issues

The fatal error occurred when the task file is empty (or header only)

The error message said below.

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
main.action(0xc420080dc0, 0x0, 0x0)
        /Users/otiai10/proj/go/src/github.com/otiai10/awsub/action.go:35 +0x331
github.com/urfave/cli.HandleAction(0x7d6f40, 0x87d108, 0xc420080dc0, 0xc4201deae0, 0x0)
        /Users/otiai10/proj/go/src/github.com/urfave/cli/app.go:490 +0xd4
github.com/urfave/cli.(*App).Run(0xc420066820, 0xc420068000, 0xc, 0xc, 0x0, 0x0)
        /Users/otiai10/proj/go/src/github.com/urfave/cli/app.go:264 +0x6ac
main.main()
        /Users/otiai10/proj/go/src/github.com/otiai10/awsub/main.go:26 +0x209

goroutine 18 [chan receive]:
main.(*Handler).HandleBunch.func2(0xc4201ded20, 0xc4201decc0, 0xc4202c3de0, 0xa7c7b0, 0x0, 0x0)
        /Users/otiai10/proj/go/src/github.com/otiai10/awsub/handler.go:67 +0x94
created by main.(*Handler).HandleBunch
        /Users/otiai10/proj/go/src/github.com/otiai10/awsub/handler.go:73 +0x150
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/genomon_pipeline_cloud/batch_engine.py", line 54, in execute
    subprocess.check_call(self.generate_commands(task, general_param))
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)

Allow http/https for "--inputs"

AS IS

--inputs and --input-recursive must be either of s3://... or gs://...

TO BE

For example, http://raw.github.com/foo/bar.txt should be allowed

Metrics

  • CPU usage
  • Memory usage
  • Network I/O
  • Disk I/O

Error creating machine

[paplot-task00] Creating docker machine
paplot-task00-NzhjNGI5: failed to create machine: exit status 1: Running pre-create checks...
Creating machine...
(paplot-task00-NzhjNGI5) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Error creating machine: Error running provisioning: Error running apt-get update: ssh command error:
command : sudo apt-get update
err     : exit status 100
output  : Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Hit:2 http://archive.ubuntu.com/ubuntu xenial InRelease
Splitting up /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_InRelease into data and signature failedErr:2 http://archive.ubuntu.com/ubuntu xenial InRelease
  Clearsigned file isn't valid, got 'NODATA' (does the network require authentication?)
Get:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Get:4 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [435 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [102 kB]
Get:6 http://security.ubuntu.com/ubuntu xenial-security/main Translation-en [189 kB]
Get:7 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [7,224 B]
Get:8 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [710 kB]
Get:9 http://security.ubuntu.com/ubuntu xenial-security/restricted Translation-en [2,152 B]
Get:10 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [200 kB]
Get:11 http://security.ubuntu.com/ubuntu xenial-security/universe Translation-en [102 kB]
Get:12 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [3,208 B]
Get:13 http://security.ubuntu.com/ubuntu xenial-security/multiverse Translation-en [1,408 B]
Get:14 http://archive.ubuntu.com/ubuntu xenial-updates/main Translation-en [295 kB]
Get:15 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [7,560 B]
Get:16 http://archive.ubuntu.com/ubuntu xenial-updates/restricted Translation-en [2,272 B]
Get:17 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [579 kB]
Get:18 http://archive.ubuntu.com/ubuntu xenial-updates/universe Translation-en [234 kB]
Get:19 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [16.2 kB]
Get:20 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse Translation-en [8,052 B]
Get:21 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [4,840 B]
Get:22 http://archive.ubuntu.com/ubuntu xenial-backports/main Translation-en [3,220 B]
Get:23 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [6,612 B]
Get:24 http://archive.ubuntu.com/ubuntu xenial-backports/universe Translation-en [3,768 B]
Reading package lists...
E: GPG error: http://archive.ubuntu.com/ubuntu xenial InRelease: Clearsigned file isn't valid, got 'NODATA' (does the network require authentication?)

The default lines below are for a sh/bash shell, you can specify the shell you're using, with the --shell flag.


1 task(s) failed with errors
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/site-packages/genomon_pipeline_cloud/batch_engine.py", line 54, in execute
    subprocess.check_call(self.generate_commands(task, general_param))
  File "/usr/local/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['awsub', '--aws-iam-instance-profile', 'testtest', '--verbose', '--aws-ec2-instance-type', 't2.small', '--script', '/usr/local/lib/python2.7/site-packages/genomon_pipeline_cloud/script/paplot.sh', '--image', 'genomon/paplot', '--tasks', '/work/genomon_pipeline_cloud/tmp/paplot-tasks.tsv']' returned non-zero exit status 1

Request limit exceeded

It can cause the request limit exceeded error when launching many EC2 instances in awsub.
Note that, This error has not yet occurred in awsub.

Unable to run docker-machine provisioning

I occasionally get this error when trying to execute awsub.

sv-filt-tasks-ubuntu-20180213-03410812-mtk4yjrm: failed to create machine: exit status 1: Running pre-create checks...
Creating machine...
(sv-filt-tasks-ubuntu-20180213-03410812-mtk4yjrm) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: ssh command error:
command : sudo systemctl -f start docker
err     : exit status 1
output  : Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.


1 task(s) failed with errors
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/genomon_pipeline_cloud/batch_engine.py", line 60, in seq_execute
    subprocess.check_call(self.generate_commands(task, general_param))
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['awsub', '--aws-iam-instance-profile', 'testtest', '--verbose', '--debug-sleep', '2000', '--aws-ec2-instance-type', 't2.large', '--script', '/usr/local/lib/python2.7/dist-packages/genomon_pipeline_cloud/script/sv-filt.sh', '--image', 'genomon/sv_detection', '--tasks', '/home/ubuntu/tools/genomon_pipeline_cloud-0.1.0/tmp/sv-filt-tasks-ubuntu-20180213-034108.tsv']' returned non-zero exit status 1

Error when the specified bucket for output does not exist

upload failed: tmp/test180204/star/Hela_wt_dox-/Hela_wt_dox-.Chimeric.out.sam to s3://awsub-test-friend1ws2/test180204/star/Hela_wt_dox-/Hela_wt_dox-.Chimeric.out.sam An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist
[star-alignment-task00] [FINALIZE] Successfully uploaded: s3://awsub-test-friend1ws2/test180204/star/Hela_wt_dox-

I think we have 2 options.

  1. First we check whether all the specified buckets in the output parameters exist or not. If some of the bucket do not exists, exit before downloading the input file with loggings.
  2. At the stage of uploading, check whether the specified buckets exist or not and create them if not.

remain exit code of `awsub` somewhre

problem as is

  • when I issue awsub command at remote VM, for example AWS EC2 just as workspace, then get the terminal pipe broken, I can see the PID is alive but there is not way to get exit status code of awsub command.
  • Of course, by grepping all the lifecycle logs generated by computing nodes, however, it's more helpful if I can find the final exit code of whole of issuing awsub command.

STAR test script is not working

awsub STAR test does not seem to work well currently.
When I perform the test script modyifing the paths of the output directory of task file,
following message appears:

?download failed: s3://hgc-otiai10-test/examples/genomon_rna/db/GRCh37.STAR-2.5.2a/SA to tmp/GRCh37.STAR-2.5.2a/SA [Errno 28] No space left on device

and empty bam file is copied to the S3.

Output logs are garbled again

By using the following sources, this garbled is reproducible. Please, try running genomon_pipeline_cloud and output logs are garbled.

[installation]

git clone https://github.com/ken0-1n/genomon_pipeline_cloud.git
cd genomon_pipeline_cloud
pip install . --upgrade

[run]

genomon_pipeline_cloud dna example_conf/sample_awsub_dna.csv s3://kchiba-test-batch/genomon_cloud example_conf/param_dna_awsub.cfg

[output log]

[genomon-qc-tasks-20180207-024034-70661400] &2> + catL /tmp/genomon-resource/_GRCh37/reference/bait/refGene.coding.exon.151207.bed
[genomon-qc-tasks-20180207-024034-70661400] &2> + ・・アケオキカー
ロ鈑ⅰ瀅鉑ュイーアクーイーキューイエーウエュキーカカアエーーン ヲイセ ォ  ュ・
ロ鈑ⅰ瀅鉑ュイーアクーイーキューイエーウエュキーカカアエーーン ヲイセ ォ 銓褞 ゙タフ ッ鈑ⅰ瀅鉑肄ッ゚ヌメテ雉キッ趺魲蟇粃鴟ッ貮褓螳胥蓚鰀ョ褸・ョアオアイーキョ粢・
ロ鈑ⅰ瀅鉑ュイーアクーイーキューイエーウエュキーカカアエーーン ヲイセ ォ
ロ鈑ⅰ瀅鉑ュイーアクーイーキューイエーウエュキーカカアエーーン ヲイセ 韃砌褪゚・ー
ロ鈑ⅰ瀅鉑ュイーアクーイーキューイエーウエュキーカカアエーーン ヲイセ ォ 褸 アケオキカー ュ ー 
ロ鈑ⅰ瀅鉑ュイーアクーイーキューイエーウエュキーカカアエーーン ヲイセ 蓊゚・アケオキカー
ロ鈑ⅰ瀅鉑ュイーアクーイーキューイエーウエュキーカカアエーーン ヲイセ ォ ・ュアケオキカーフ ッ

Timestamp change because of uploading to S3.

The timestamps of .bam files and .bam.bai files (bam index) can flip when they are uploaded to S3.
This can cause the following warning messages when using these bam and index files in later steps:

[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai
[sv-filt-task01] &2> [W::hts_idx_load2] The index file is older than the data file: /tmp/awsub-test-friend1ws/test180206/bam/5929N/5929N.markdup.bam.bai

Can we keep the order of the timestamps
(bam files is the first and bam.bai files are the second)?

Behavior when the column value of '--input' is empty

If empty with the column value of '--input' in the task file, the below error is returned.

mutation-call-task00-YzJjNzIy: failed to prepare input tasks: failed to download input file `` with status code 1, please check output with --verbose option 1 task(s) failed with errors

I do not want an error to occur even if I use an empty string value.

[FATAL] awsub process died when concurrency is more than (for example) 64

facts

  • When executing awsub with 64 tasks and --concurrency 64
    • awsub command PID died but machines still remain alive

reports

  • Not reproduced with 30 tasks and --concurrency 30
  • Not reproduced with 64 tasks and --concurrency 32

expected

  • awsub process died/been killed somehow, without defered Destroy

Generation of log files

For each task, the contents of the standard output and the standard error should be output to files and should be transferred to a specified directory in S3.

This is very helpful especially for debugging.

Too many logs for downloading

Too many downloading logs appears when verbose mode when the downloaded file is somewhat large such as:

[fusionfusion-task03] [PREPARE] &1> Completed 256.0 KiB/3.0 GiB (311.3 KiB/s) with 1 file(s) remaining
[fusionfusion-task03] [PREPARE] &1> Completed 512.0 KiB/3.0 GiB (617.8 KiB/s) with 1 file(s) remaining
[fusionfusion-task03] [PREPARE] &1> Completed 768.0 KiB/3.0 GiB (921.7 KiB/s) with 1 file(s) remaining
[fusionfusion-task03] [PREPARE] &1> Completed 1.0 MiB/3.0 GiB (1.2 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 1.2 MiB/3.0 GiB (1.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 1.5 MiB/3.0 GiB (1.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 1.8 MiB/3.0 GiB (2.1 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 2.0 MiB/3.0 GiB (2.4 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 2.2 MiB/3.0 GiB (2.7 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 2.5 MiB/3.0 GiB (3.0 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 2.8 MiB/3.0 GiB (3.2 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 3.0 MiB/3.0 GiB (3.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 3.2 MiB/3.0 GiB (3.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 3.5 MiB/3.0 GiB (4.1 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 3.8 MiB/3.0 GiB (4.4 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 4.0 MiB/3.0 GiB (4.6 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 4.2 MiB/3.0 GiB (4.9 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 4.5 MiB/3.0 GiB (5.2 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 4.8 MiB/3.0 GiB (5.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 5.0 MiB/3.0 GiB (5.7 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 5.2 MiB/3.0 GiB (6.0 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 5.5 MiB/3.0 GiB (6.2 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 5.8 MiB/3.0 GiB (6.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 6.0 MiB/3.0 GiB (6.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 6.2 MiB/3.0 GiB (7.0 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 6.5 MiB/3.0 GiB (7.3 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 6.8 MiB/3.0 GiB (7.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 7.0 MiB/3.0 GiB (7.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 7.2 MiB/3.0 GiB (8.1 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 7.5 MiB/3.0 GiB (8.3 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 7.8 MiB/3.0 GiB (8.6 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 8.0 MiB/3.0 GiB (8.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 8.2 MiB/3.0 GiB (9.0 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 8.5 MiB/3.0 GiB (9.3 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 8.8 MiB/3.0 GiB (9.5 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 9.0 MiB/3.0 GiB (9.8 MiB/s) with 1 file(s) remaining    
[fusionfusion-task03] [PREPARE] &1> Completed 9.2 MiB/3.0 GiB (10.1 MiB/s) with 1 file(s) remaining   
[fusionfusion-task03] [PREPARE] &1> Completed 9.5 MiB/3.0 GiB (10.2 MiB/s) with 1 file(s) remaining   
[fusionfusion-task03] [PREPARE] &1> Completed 9.8 MiB/3.0 GiB (10.5 MiB/s) with 1 file(s) remaining   

This message continues more than 1000 lines, and
Important information for debugging is hidden...

So I prefer not to generate this kind of long downloading logs..

Pass Super Variables to user script container

AS IS

--input FOOBAR
s3://hgc-otiai10-test/foobar

transforms to

/tmp/hgc-otiai10-test/foobar

But user script don't want to or should not think about "/tmp"

TO BE

$AWSUB_ROOT/hgc-otiai10-test/foobar

Super Variables such like:

  • $AWSUB_ROOT

Separate "machine-create" and "containers up" process from "Create"

as is

  • Both "machines-create" and "containers-up" are inside job.Create method

problem

  • When --shared are specified, it takes so long time to "machines-create" after Shared Data Instances are created

to be

  • Setting up Shared Data Instance and just machines-create for jobs should be parallel
  • after that, inside "containers-up", the information about Shared Data Instance should be passed to the containers

Check container errors and let it fail correctly

Even when the program ends with errors, the awsub always says:
"All * tasks completed successfully!"

For example, in the awsub quickguide, this happens even when --aws-iam-instance-profile is not specified and problems in accessing S3 occur

[PREPARE] &2> fatal error: Unable to locate credentials

awsub should check the error code for each task and should show whether each task ends correctly or not.

Security Group Limit Exceeded

The security group limit exceeded when launching many EC2 instances.

Error creating machine: Error in driver during machine creation: SecurityGroupLimitExceeded: The maximum number of security groups for VPC 'vpc-3311ce57' has been reached.
status code: 400, request id: b2de4bd5-7000-498b-9a3a-acf55af449e5

I think that it would be better to create one security group per one awsub execution, not one EC2 instance.

Output logs are garbled

The output logs on the console are garbled as below by running the command: awsub.

udownload: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37.fa.ann to tmp/_GRCh37/reference/GRCh37/GRCh37.fa.ann
udownload: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37.fa.amb to tmp/_GRCh37/reference/GRCh37/GRCh37.fa.amb
udownload: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37.fa.fai to tmp/_GRCh37/reference/GRCh37/GRCh37.fa.fai
download: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37.genome_size to tmp/_GRCh37/reference/GRCh37/GRCh37.genome_size
・ownload: s3://genomon-resource/_GRCh37/reference/GRCh37/GRCh37_noScaffold_noDecoy.interval_list to tmp/_GRCh37/reference/GRCh37/GRCh37_noScaffold_noDecoy.interval_list
ECompleted 119.5 MiB/394.1 MiB (16.8 MiB/s) with 1 file(s) remaining 59.0 KiB/s) with 1 file(s) remaining
ECompleted 237.5 MiB/394.1 MiB (18.8 MiB/s) with 1 file(s) remaining
蔡・砌コ コッッ・韜粃ュュ粃隸趁モ硼裃アーー褓オケイケ゚肛イッ褓肄アョ趁・ 趁モ硼裃アーー褓オケイケ゚肛イッ褓肄アョ趁・
蔡・砌コ コッッ・韜粃ュュ粃隸趁モ硼裃アーー褓オケイケ゚肛イッ褓肄イョ趁・ 趁モ硼裃アーー褓オケイケ゚肛イッ褓肄イョ趁・

Potential conflict of downloaded input file names

It seems that when the input files in the S3 are downloaded to /tmp directory,
the directory structure is removed and only base names are kept.
So, I think in the current settings, when a task needs multiple input files,
and the base name of these input files are the same
(e.g., s3://input_seq_otiai10/sequence.txt, s3://input_seq_friend1ws/sequence.txt),
some conflicts will occur.

If my observation is true, I think the directory structure of the input file should be kept after downloading to VMs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.