Giter Site home page Giter Site logo

gnomad exome issue about cloudbiolinux HOT 5 CLOSED

matthdsm avatar matthdsm commented on August 16, 2024
gnomad exome issue

from cloudbiolinux.

Comments (5)

pfpjs avatar pfpjs commented on August 16, 2024

Hi @matthdsm, @chapmanb, @naumenko-sa,

I'm getting the same truncated files as @matthdsm, but only for GRCh37 and hg38, while hg19 completed successfully.

Looking at the ggd recipes, hg19 uses wget to fetch the gnomad VCFs (gnomad_exome and gnomad_genome), while GRCh37 and hg38 both use the URL directly as an argument to vt decompose.

I've tried rerunning the GRCh37 and hg38 recipes three times now, and they always fail at roughly (but not exactly) the same spot - leaving me with a ~400Mb gnomad_exome.vcf.gz file and a ~340Mb gnomad_genome.vcf.gz file.

For reference, here are the sizes of the hg19 version of these two files:

gnomad_exome.vcf.gz: 5391108579 bytes
gnomad_genome.vcf.gz: 26405145616 bytes

-- Paulo

from cloudbiolinux.

naumenko-sa avatar naumenko-sa commented on August 16, 2024

Hi Matthias @matthdsm and Paulo @pfpjs !

Thanks for reporting the issue and pointing out how to fix it!
I indeed see the same effect on my system.

I pushed a fix in my fork https://github.com/naumenko-sa/cloudbiolinux
and created a pull request for Brad @chapmanb.

Please let me know if this resolved the issue!
Sergey

from cloudbiolinux.

chapmanb avatar chapmanb commented on August 16, 2024

Thanks Sergey -- this got pulled and is available now. Thank you all for testing and reporting and let us know if you hit any other problems.

from cloudbiolinux.

matthdsm avatar matthdsm commented on August 16, 2024

Hi guys,

I was doing some digging and turns out we're making things too hard on ourselves. The annotation works just as well without decomposing/normalizing the gnomad vcf. What are your opinions on this?

Cheers
M

from cloudbiolinux.

pfpjs avatar pfpjs commented on August 16, 2024

Hi @naumenko-sa,

GRCh37 gnomad_exome.vcf.gz completed sucessfully, thank you!

However, gnomad_genome.vcf.gz also needs the wget "treatment". Thanks!

from cloudbiolinux.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.