tgac / rampart Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 7.0 8.83 MB

A configurable de novo assembly pipeline

Home Page: http://www.tgac.ac.uk/rampart/

License: GNU General Public License v3.0

Java 96.98% Perl 0.99% TeX 2.03%

rampart's People

Stargazers

Watchers

Forkers

yinhao85 hjanime bioinformaticsarchive gsc0107 georgek valeriodevitis88 kfwins2022

rampart's Issues

Failing in mecq stage

quake V0.3
kat 1.69
jellyfish 2.3.0

I am currently working on rampart. I already installed all the dependencies. After that, i tried a test run on ecoli sample data and it already failed with mecq stage
Here is the command:

(base) hydra@Biohazard:~/Desktop/5_30_19-Try_rampart/190613$ rampart -v ecoli_full_job.xml

This is were the error starts:
`
2019-09-24 00:50:53 DEBUG ProcessRunner:153 - Return code was '0' for [ln -s -f /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/quake/ecoli/DRR015910_2.cor_single.fastq /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/output/quake/DRR015910_2.cor_single.fastq]. Redirecting stderr.
2019-09-24 00:50:53 DEBUG DefaultProcessService:162 - Finished executing command [ln -s -f /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/quake/ecoli/DRR015910_2.cor_single.fastq /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/output/quake/DRR015910_2.cor_single.fastq]. Output:

2019-09-24 00:50:53 ERROR Mecq:181 - MECQ job "quake" for sample "rampart_out" did not produce the expected output file: /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/quake/ecoli/DRR015910_1.cor.fastq
2019-09-24 00:50:53 ERROR AbstractConanTask:255 - Process 'MECQ' failed to execute, exit code: 2
2019-09-24 00:50:53 ERROR AbstractConanTask:257 - Execution exception follows
uk.ac.ebi.fgpt.conan.service.exception.ProcessExecutionException: java.io.IOException: Stage MECQ failed to produce valid output.
at uk.ac.tgac.rampart.stage.RampartProcess.execute(RampartProcess.java:187)
at uk.ac.ebi.fgpt.conan.core.process.AbstractConanProcess.execute(AbstractConanProcess.java:298)
at uk.ac.ebi.fgpt.conan.core.task.AbstractConanTask.execute(AbstractConanTask.java:233)
at uk.ac.ebi.fgpt.conan.core.task.AbstractConanTask.execute(AbstractConanTask.java:32)
at uk.ac.ebi.fgpt.conan.util.AbstractConanCLI.execute(AbstractConanCLI.java:453)
at uk.ac.tgac.rampart.RampartCLI.execute(RampartCLI.java:421)
at uk.ac.tgac.rampart.RampartCLI.main(RampartCLI.java:453)
Caused by: java.io.IOException: Stage MECQ failed to produce valid output.
at uk.ac.tgac.rampart.stage.RampartProcess.execute(RampartProcess.java:158)
... 6 more
2019-09-24 00:50:53 DEBUG AbstractConanTask:258 - Is this event to abort: false Output: [Ljava.lang.String;@48f278eb
2019-09-24 00:50:53 ERROR AbstractConanTask:550 - Task 'RAMPART' failed its current process, exit code: 2
2019-09-24 00:50:53 ERROR AbstractConanTask:556 - Output follows...
No output captured

2019-09-24 00:50:53 INFO AbstractConanTask:288 - Task 'RAMPART' execution ended. Runtime: 0:00:00.043
uk.ac.ebi.fgpt.conan.service.exception.ProcessExecutionException: java.io.IOException: Stage MECQ failed to produce valid output.
uk.ac.ebi.fgpt.conan.core.task.AbstractConanTask.execute(AbstractConanTask.java:267)
uk.ac.ebi.fgpt.conan.core.task.AbstractConanTask.execute(AbstractConanTask.java:32)
uk.ac.ebi.fgpt.conan.util.AbstractConanCLI.execute(AbstractConanCLI.java:453)
uk.ac.tgac.rampart.RampartCLI.execute(RampartCLI.java:421)
uk.ac.tgac.rampart.RampartCLI.main(RampartCLI.java:453)

RAMPART ignores unknown attributes in XML configuration file

Had an old configuration file with "stats_levels" attribute in the mass element which is no longer used. RAMPART should error here but instead simply ignores it... this can be confusing for the end user.

RAMPART fails if kmer value is too large for assembler

If the user specifies a kmer size that is too large to deal with RAMPART will fail when that assembly is executed. We should think of a more graceful way of handling this problem.

One idea would be to "fail early" by checking that the assembler can handle the kmer value before anything in RAMPART is executed. Another option would be to ignore the failed assembly and continue on regardless.

Add support for DISCOVAR

pass parameter to quast

Is it possible to specify in the xml file this option for quast?

--min-contig (or -m)
Lower threshold for a contig length. Shorter contigs won't be taken into account (except for some metrics, see section 3). The default value is 500.

thanks a lot!

Add mechanism to learn library properties

At the moment users have to enter library properties such as insert size and error tolerance in order to get the tools to execute properly. Much of this could be figured out automatically.

Failed installing RAMPART-develop

Dear Daniel,
after installing "git" and "maven3.3.9", and downloading "RAMPART-develop" branch with command "git clone https://github.com/TGAC/RAMPART.git" I commented the "create manual execution" element and tried to install typing "mvn clean install" after "cd RAMPART".
This gave error:
"Could not resolve dependencies for project uk.ac.tgac.rampart:rampart:jar:0.12.3: Failure to find uk.ac.tgac.conan:tgac-conan-process-wrappers:jar:0.12.9 in https://repos.tgac.ac.uk/maven/repo"

Then, I looked at "https://repos.tgac.ac.uk/maven/repo/uk/ac/tgac/conan/tgac-conan-process-wrappers/" and saw there isn't a 0.12.9 directory.
I also tried to modify the pom.xml file in:

<dependency>
            <groupId>uk.ac.tgac.conan</groupId>
            <artifactId>tgac-conan-process-wrappers</artifactId>
            <version>0.12.9</version>
            <exclusions>
                <exclusion>
                    <groupId>commons-cli</groupId>
                    <artifactId>commons-cli</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

substituting 0.12.9 with 0.12.1 and this didn't give the same errors as before but gave many compilation errors.
How can I install RAMPART-develop version?
Thank you

Add RAMPART version number into standard output for each run

Adding the version number into the standard output for each job allows users to identify which version of RAMPART to re-run if they are trying to reproduce results.

Bug in conan debug output

AbstractConanTask:228 toString method for this line not working properly

[mass_select] what if a metric has the same raw value in all the assemblies?

How does the scaling procedure behave in case of constant raw values?

thank you,
Alex

Check for unknown tools in rampart config file.

Potential issue with Quake?

Seems to struggle to create symbolic links to quakes output. Check directories are created and files exist before linking.

Failing on quake.py with 0.12.2

Running rampart 0.12.2

quake.py 0.3.5
kat 2.0.6
jellyfish 2.0.6

The command:

cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}

Fails with this error:

terminate called after throwing an instance of 'jellyfish::fastq_seq_qual_parser::FastqSeqQualParserError'
  what():  Truncated input file
Error: Requires at least 2 arguments.
Usage: jellyfish merge [options] db:string+
Use --help for more information
Traceback (most recent call last):
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 324, in <module>
    main()
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 89, in main
    jellyfish(options.readsf, options.reads_listf, options.k, ctsf, quality_scale, options.hash_size, options.proc)
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 290, in jellyfish
    os.rename('%s.dbm_0' % output_pre, '%s.dbm' % output_pre)
OSError: [Errno 2] No such file or directory

Log File with --verbose looks like this:

2015-10-27 12:10:58 INFO  DefaultProcessService:146 - Running command in foreground [cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}].
2015-10-27 12:12:28 DEBUG ProcessRunner:153 - Return code was '0' for [cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}].  Redirecting stderr.

Why doesn't the output from quake.py reflect in the log file?

I am running rampart in an unscheduled environment. Is this one of those errors that would be fixed by running with PBS?

Ensure finaliser does not create fasta headers with dots or pipes.

Some downstream applications that use assemblies are sensitive to the characters used in the fasta headers for the assembly. Make sure RAMPART avoids the most common characters such as dots and pipes.

Create new tool to help user's create job configuration files

Create a tool that has a GUI that helps the user create job configuration files. This would particularly help where there are defined list of options, such as for choice of assembler.

rampart ecoli_full_job.xml getting error with rampart 0.12.0

Hello,

Here is the output:

[jillian@login-0-2 example_job_configs]$ rampart ecoli_full_job.xml
2015-10-11 10:39:02 INFO RampartCLI:227 - RAMPART Version: 0.12.0
2015-10-11 10:39:02 INFO RampartCLI:228 - Output dir: /scratch/jillian/source/rampart/rampart-0.12.0/etc/example_job_configs
2015-10-11 10:39:02 INFO RampartCLI:229 - Environment configuration file: /share/apps/NYUAD/rampart/gcc_4.9.1/0.12.0/etc/conan.properties
2015-10-11 10:39:02 INFO RampartCLI:230 - Logging properties file: /share/apps/NYUAD/rampart/gcc_4.9.1/0.12.0/etc/log4j.properties
2015-10-11 10:39:02 INFO RampartCLI:231 - Job Prefix: RAMPART20151011_103902
2015-10-11 10:39:02 INFO ConanProperties:145 - Loaded Conan properties from /share/apps/NYUAD/rampart/gcc_4.9.1/0.12.0/etc/conan.properties
2015-10-11 10:39:02 INFO RampartCLI:240 - Executing the following stages: MECQ,MECQ_ANALYSIS,KMER_CALC,MASS,MASS_ANALYSIS,MASS_SELECT,AMP,AMP_ANALYSIS,FINALISE,COLLECT
2015-10-11 10:39:02 INFO RampartCLI:243 - Parsing configuration file: /scratch/jillian/source/rampart/rampart-0.12.0/etc/example_job_configs/ecoli_full_job.xml
2015-10-11 10:39:02 INFO RampartConfig:134 - Found libraries element containing 1 libraries.
2015-10-11 10:39:02 INFO RampartConfig:135 - Running in single sample mode
Error validating XML Element: pipeline; Found unexpected child node: analyse_reads
uk.ac.tgac.conan.core.util.XmlHelper.validate(XmlHelper.java:170)
uk.ac.tgac.rampart.RampartPipeline$Args.(RampartPipeline.java:171)
uk.ac.tgac.rampart.RampartConfig.internalParseXml(RampartConfig.java:154)
uk.ac.tgac.conan.core.util.AbstractXmlJobConfiguration.parseXml(AbstractXmlJobConfiguration.java:103)
uk.ac.tgac.rampart.RampartCLI.initialise(RampartCLI.java:244)
uk.ac.tgac.rampart.RampartCLI.main(RampartCLI.java:452)

Best,
Jillian

Enhance AMP - parallel execution - assembly comparison - parameter optimisation

Distinguish between AMP stages, those stages which might benefit from trying out multiple tools / settings, to those which just need to be somewhere in the pipeline. For those stages which might benefit from some kind of parameter optimisation, enable possibility to execute in parallel.

Cannot run multiple RAMPART instances on some high core machines

We have noticed that on some high core machines... e.g. SGI UV systems, that only a single RAMPART instance can be run and all other java processes will report out of memory errors. Potentially, this is due to the number of GC threads that are started. If this is confirmed to be the problem we should compile RAMPART with the "-XX:+UseSerialGC" JVM flag.

Better method for scaling metrics

Currently in V0.11.0 we use a scaling procedure to reduce any given metric into a range between 0 and 1 that can then be weighted and combined with other scaled and weighted metrics to give a final score for the assembly. An issue with this approach is that often in assembly projects we sweep large ranges of k values and get at least one useless assembly. Excluding this assembly could change the overall ranking of the 1st and 2nd assembly (depending on the weightings scores of other metrics). Scaling by the range means that the weight of the metric could be decided entirely by two outliers.

An alternative approach taken by Abbas et al, 2014 (http://link.springer.com/article/10.1186/1471-2164-15-S9-S10/fulltext.html) is to rank the assemblies based on a single metric before weighting and combining results. A problem with this method is that it can give to much credence to a metric where all assemblies contain similar values for that metric.

We are therefore looking for a better means of scaling assembly metrics. Potentially methods such as the inner quartile range or standard deviation of the metric may work better. This thread is for discussing this topic.

Feature Request - Debug mode where commands are shown but not run

Hello,

I'd like to request a feature to have a debug mode, where commands are shown, but not run.

Best,
Jillian

Add support for MaSuRCA

Compress final output into a tarball for easier distribution

Compress the final scaffolds, contigs, agp, plus any suitable stats, plots, logs and reports that might help downstream analyses into a gzipped tarball. This will make it easier to distribute the results.

input fastq.gz

Hi,
is it possible to make rampart handling input fastq.gz adding a check and eventually do an automatic gunzip?

Separate assembly selection step from assembly analysis step.

Check for unknown processes in external configuration file

Issue with running MECQ in parallel

Fails to properly wait for all ecq jobs to complete before continuing.

javadoc fails

The error is…

[ERROR] /private/tmp/rampart-ai5O/src/main/java/uk/ac/tgac/rampart/stage/MassJob.java:87: error: exception not thrown: java.io.IOException
[ERROR] * @throws IOException
[ERROR] ^

CEGMA analysis still referred to as "COMPLETENESS" in job config.

Replace CEGMA name from COMPLETENESS to CEGMA

Failing in amp_analysis stage

Although in the mass_select stage the final score for each of the assemblers run is correctly evaluated (so the file "weightings.tab" is correctly read in this stage), in rampart_out/8_amp_analysis/scores.tsv all the group scores are equal to 0.0 (also the ones for amp-stage-0, which are not equal to 0 in rampart_out/6_mass_select/scores.tsv).
Is there an issue in the reading of the "weightings.tab" file in the amp_analysis stage?
Thanks

Control whether to overwrite or keep existing assemblies

If an assembly already exists it may be that the user does not wish to overwrite it. There should be some mechanism probably at the command line, rather than in the job configuration file for forcing an overwrite if this is what's required, otherwise RAMPART should detect that an assembly already exists and then move onto the next one rather than rerunning it.

SoapDeNovo is never operational

There is an issue when running SOAP denovo on the cluster because it will add the "mode" argument after the exe name, when conducting a which. Should be an easy fix.

Implement assembly analysis for AMP stage.

The user should know what's happening to their assembly during the AMP stage. Give the user the option of running the full suite of assembly analysis tools either: not at all, once at the end, or after every stage

subsampler

Why RAMPART divides for 10.000 (2 times 100) if the probability to pick a read is greater than 1?

In the following case I had a probability 108.6% to be picked, but the log of subsampler tells that the probability is 0.010018 instead 1.0018. Logically I expected to have a file with the actual number of the reads plus a subsample of 1.0018% of "resampled" reads.

Estimated that library: Sample; has approximately 404854224 bases in both files. Estimated genome size is: 2200000; so actual coverage (per file) is approximately: 92.01232363636363; we will only keep 108.68109406214322% of the reads in each file to achieve approximately 200X coverage in both files

cat RAMPART20160912_171111-group-spades-raw-subsample-Sample_200x-file2.log
Subsampler
Seed 519289336
Readed 2029342
Printed 20330
Probability 0.010018
DONE

Thanks,
Alessandro

tgac / rampart Goto Github PK

rampart's People

Stargazers

Watchers

Forkers

rampart's Issues

Recommend Projects

Recommend Topics

Recommend Org