Giter Site home page Giter Site logo

rampart's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rampart's Issues

Failing in mecq stage

quake V0.3
kat 1.69
jellyfish 2.3.0

I am currently working on rampart. I already installed all the dependencies. After that, i tried a test run on ecoli sample data and it already failed with mecq stage
Here is the command:

(base) hydra@Biohazard:~/Desktop/5_30_19-Try_rampart/190613$ rampart -v ecoli_full_job.xml

This is were the error starts:
`
2019-09-24 00:50:53 DEBUG ProcessRunner:153 - Return code was '0' for [ln -s -f /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/quake/ecoli/DRR015910_2.cor_single.fastq /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/output/quake/DRR015910_2.cor_single.fastq]. Redirecting stderr.
2019-09-24 00:50:53 DEBUG DefaultProcessService:162 - Finished executing command [ln -s -f /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/quake/ecoli/DRR015910_2.cor_single.fastq /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/output/quake/DRR015910_2.cor_single.fastq]. Output:

2019-09-24 00:50:53 ERROR Mecq:181 - MECQ job "quake" for sample "rampart_out" did not produce the expected output file: /home/hydra/Desktop/5_30_19-Try_rampart/190613/rampart_out/1_mecq/quake/ecoli/DRR015910_1.cor.fastq
2019-09-24 00:50:53 ERROR AbstractConanTask:255 - Process 'MECQ' failed to execute, exit code: 2
2019-09-24 00:50:53 ERROR AbstractConanTask:257 - Execution exception follows
uk.ac.ebi.fgpt.conan.service.exception.ProcessExecutionException: java.io.IOException: Stage MECQ failed to produce valid output.
at uk.ac.tgac.rampart.stage.RampartProcess.execute(RampartProcess.java:187)
at uk.ac.ebi.fgpt.conan.core.process.AbstractConanProcess.execute(AbstractConanProcess.java:298)
at uk.ac.ebi.fgpt.conan.core.task.AbstractConanTask.execute(AbstractConanTask.java:233)
at uk.ac.ebi.fgpt.conan.core.task.AbstractConanTask.execute(AbstractConanTask.java:32)
at uk.ac.ebi.fgpt.conan.util.AbstractConanCLI.execute(AbstractConanCLI.java:453)
at uk.ac.tgac.rampart.RampartCLI.execute(RampartCLI.java:421)
at uk.ac.tgac.rampart.RampartCLI.main(RampartCLI.java:453)
Caused by: java.io.IOException: Stage MECQ failed to produce valid output.
at uk.ac.tgac.rampart.stage.RampartProcess.execute(RampartProcess.java:158)
... 6 more
2019-09-24 00:50:53 DEBUG AbstractConanTask:258 - Is this event to abort: false Output: [Ljava.lang.String;@48f278eb
2019-09-24 00:50:53 ERROR AbstractConanTask:550 - Task 'RAMPART' failed its current process, exit code: 2
2019-09-24 00:50:53 ERROR AbstractConanTask:556 - Output follows...
No output captured

2019-09-24 00:50:53 INFO AbstractConanTask:288 - Task 'RAMPART' execution ended. Runtime: 0:00:00.043
uk.ac.ebi.fgpt.conan.service.exception.ProcessExecutionException: java.io.IOException: Stage MECQ failed to produce valid output.
uk.ac.ebi.fgpt.conan.core.task.AbstractConanTask.execute(AbstractConanTask.java:267)
uk.ac.ebi.fgpt.conan.core.task.AbstractConanTask.execute(AbstractConanTask.java:32)
uk.ac.ebi.fgpt.conan.util.AbstractConanCLI.execute(AbstractConanCLI.java:453)
uk.ac.tgac.rampart.RampartCLI.execute(RampartCLI.java:421)
uk.ac.tgac.rampart.RampartCLI.main(RampartCLI.java:453)

`

RAMPART fails if kmer value is too large for assembler

If the user specifies a kmer size that is too large to deal with RAMPART will fail when that assembly is executed. We should think of a more graceful way of handling this problem.

One idea would be to "fail early" by checking that the assembler can handle the kmer value before anything in RAMPART is executed. Another option would be to ignore the failed assembly and continue on regardless.

pass parameter to quast

Is it possible to specify in the xml file this option for quast?

--min-contig (or -m)
Lower threshold for a contig length. Shorter contigs won't be taken into account (except for some metrics, see section 3). The default value is 500.

thanks a lot!

Add mechanism to learn library properties

At the moment users have to enter library properties such as insert size and error tolerance in order to get the tools to execute properly. Much of this could be figured out automatically.

Failed installing RAMPART-develop

Dear Daniel,
after installing "git" and "maven3.3.9", and downloading "RAMPART-develop" branch with command "git clone https://github.com/TGAC/RAMPART.git" I commented the "create manual execution" element and tried to install typing "mvn clean install" after "cd RAMPART".
This gave error:
"Could not resolve dependencies for project uk.ac.tgac.rampart:rampart:jar:0.12.3: Failure to find uk.ac.tgac.conan:tgac-conan-process-wrappers:jar:0.12.9 in https://repos.tgac.ac.uk/maven/repo"

Then, I looked at "https://repos.tgac.ac.uk/maven/repo/uk/ac/tgac/conan/tgac-conan-process-wrappers/" and saw there isn't a 0.12.9 directory.
I also tried to modify the pom.xml file in:

<dependency>
            <groupId>uk.ac.tgac.conan</groupId>
            <artifactId>tgac-conan-process-wrappers</artifactId>
            <version>0.12.9</version>
            <exclusions>
                <exclusion>
                    <groupId>commons-cli</groupId>
                    <artifactId>commons-cli</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

substituting 0.12.9 with 0.12.1 and this didn't give the same errors as before but gave many compilation errors.
How can I install RAMPART-develop version?
Thank you

Potential issue with Quake?

Seems to struggle to create symbolic links to quakes output. Check directories are created and files exist before linking.

Failing on quake.py with 0.12.2

Running rampart 0.12.2

quake.py 0.3.5
kat 2.0.6
jellyfish 2.0.6

The command:

cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}

Fails with this error:

terminate called after throwing an instance of 'jellyfish::fastq_seq_qual_parser::FastqSeqQualParserError'
  what():  Truncated input file
Error: Requires at least 2 arguments.
Usage: jellyfish merge [options] db:string+
Use --help for more information
Traceback (most recent call last):
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 324, in <module>
    main()
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 89, in main
    jellyfish(options.readsf, options.reads_listf, options.k, ctsf, quality_scale, options.hash_size, options.proc)
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 290, in jellyfish
    os.rename('%s.dbm_0' % output_pre, '%s.dbm' % output_pre)
OSError: [Errno 2] No such file or directory

Log File with --verbose looks like this:

2015-10-27 12:10:58 INFO  DefaultProcessService:146 - Running command in foreground [cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}].
2015-10-27 12:12:28 DEBUG ProcessRunner:153 - Return code was '0' for [cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}].  Redirecting stderr.

Why doesn't the output from quake.py reflect in the log file?

I am running rampart in an unscheduled environment. Is this one of those errors that would be fixed by running with PBS?

rampart ecoli_full_job.xml getting error with rampart 0.12.0

Hello,

Here is the output:

[jillian@login-0-2 example_job_configs]$ rampart ecoli_full_job.xml
2015-10-11 10:39:02 INFO RampartCLI:227 - RAMPART Version: 0.12.0
2015-10-11 10:39:02 INFO RampartCLI:228 - Output dir: /scratch/jillian/source/rampart/rampart-0.12.0/etc/example_job_configs
2015-10-11 10:39:02 INFO RampartCLI:229 - Environment configuration file: /share/apps/NYUAD/rampart/gcc_4.9.1/0.12.0/etc/conan.properties
2015-10-11 10:39:02 INFO RampartCLI:230 - Logging properties file: /share/apps/NYUAD/rampart/gcc_4.9.1/0.12.0/etc/log4j.properties
2015-10-11 10:39:02 INFO RampartCLI:231 - Job Prefix: RAMPART20151011_103902
2015-10-11 10:39:02 INFO ConanProperties:145 - Loaded Conan properties from /share/apps/NYUAD/rampart/gcc_4.9.1/0.12.0/etc/conan.properties
2015-10-11 10:39:02 INFO RampartCLI:240 - Executing the following stages: MECQ,MECQ_ANALYSIS,KMER_CALC,MASS,MASS_ANALYSIS,MASS_SELECT,AMP,AMP_ANALYSIS,FINALISE,COLLECT
2015-10-11 10:39:02 INFO RampartCLI:243 - Parsing configuration file: /scratch/jillian/source/rampart/rampart-0.12.0/etc/example_job_configs/ecoli_full_job.xml
2015-10-11 10:39:02 INFO RampartConfig:134 - Found libraries element containing 1 libraries.
2015-10-11 10:39:02 INFO RampartConfig:135 - Running in single sample mode
Error validating XML Element: pipeline; Found unexpected child node: analyse_reads
uk.ac.tgac.conan.core.util.XmlHelper.validate(XmlHelper.java:170)
uk.ac.tgac.rampart.RampartPipeline$Args.(RampartPipeline.java:171)
uk.ac.tgac.rampart.RampartConfig.internalParseXml(RampartConfig.java:154)
uk.ac.tgac.conan.core.util.AbstractXmlJobConfiguration.parseXml(AbstractXmlJobConfiguration.java:103)
uk.ac.tgac.rampart.RampartCLI.initialise(RampartCLI.java:244)
uk.ac.tgac.rampart.RampartCLI.main(RampartCLI.java:452)

Best,
Jillian

Cannot run multiple RAMPART instances on some high core machines

We have noticed that on some high core machines... e.g. SGI UV systems, that only a single RAMPART instance can be run and all other java processes will report out of memory errors. Potentially, this is due to the number of GC threads that are started. If this is confirmed to be the problem we should compile RAMPART with the "-XX:+UseSerialGC" JVM flag.

Better method for scaling metrics

Currently in V0.11.0 we use a scaling procedure to reduce any given metric into a range between 0 and 1 that can then be weighted and combined with other scaled and weighted metrics to give a final score for the assembly. An issue with this approach is that often in assembly projects we sweep large ranges of k values and get at least one useless assembly. Excluding this assembly could change the overall ranking of the 1st and 2nd assembly (depending on the weightings scores of other metrics). Scaling by the range means that the weight of the metric could be decided entirely by two outliers.

An alternative approach taken by Abbas et al, 2014 (http://link.springer.com/article/10.1186/1471-2164-15-S9-S10/fulltext.html) is to rank the assemblies based on a single metric before weighting and combining results. A problem with this method is that it can give to much credence to a metric where all assemblies contain similar values for that metric.

We are therefore looking for a better means of scaling assembly metrics. Potentially methods such as the inner quartile range or standard deviation of the metric may work better. This thread is for discussing this topic.

input fastq.gz

Hi,
is it possible to make rampart handling input fastq.gz adding a check and eventually do an automatic gunzip?

javadoc fails

The error is…

[ERROR] /private/tmp/rampart-ai5O/src/main/java/uk/ac/tgac/rampart/stage/MassJob.java:87: error: exception not thrown: java.io.IOException
[ERROR] * @throws IOException
[ERROR] ^

Failing in amp_analysis stage

Although in the mass_select stage the final score for each of the assemblers run is correctly evaluated (so the file "weightings.tab" is correctly read in this stage), in rampart_out/8_amp_analysis/scores.tsv all the group scores are equal to 0.0 (also the ones for amp-stage-0, which are not equal to 0 in rampart_out/6_mass_select/scores.tsv).
Is there an issue in the reading of the "weightings.tab" file in the amp_analysis stage?
Thanks

Control whether to overwrite or keep existing assemblies

If an assembly already exists it may be that the user does not wish to overwrite it. There should be some mechanism probably at the command line, rather than in the job configuration file for forcing an overwrite if this is what's required, otherwise RAMPART should detect that an assembly already exists and then move onto the next one rather than rerunning it.

SoapDeNovo is never operational

There is an issue when running SOAP denovo on the cluster because it will add the "mode" argument after the exe name, when conducting a which. Should be an easy fix.

Implement assembly analysis for AMP stage.

The user should know what's happening to their assembly during the AMP stage. Give the user the option of running the full suite of assembly analysis tools either: not at all, once at the end, or after every stage

subsampler

Why RAMPART divides for 10.000 (2 times 100) if the probability to pick a read is greater than 1?

In the following case I had a probability 108.6% to be picked, but the log of subsampler tells that the probability is 0.010018 instead 1.0018. Logically I expected to have a file with the actual number of the reads plus a subsample of 1.0018% of "resampled" reads.

Estimated that library: Sample; has approximately 404854224 bases in both files. Estimated genome size is: 2200000; so actual coverage (per file) is approximately: 92.01232363636363; we will only keep 108.68109406214322% of the reads in each file to achieve approximately 200X coverage in both files

cat RAMPART20160912_171111-group-spades-raw-subsample-Sample_200x-file2.log
Subsampler
Seed 519289336
Readed 2029342
Printed 20330
Probability 0.010018
DONE

Thanks,
Alessandro

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.