Giter Site home page Giter Site logo

Comments (8)

jng2 avatar jng2 commented on June 16, 2024 1

Sorry, I'm not sure what you mean by "save" this info. I think restricting the run to the "regular" chromosomes should do the trick.

from tortoise.

jng2 avatar jng2 commented on June 16, 2024

Hi there!

GATK has an interval command, -L (https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists), which allows you to pick which contigs you would like to use. So adding -L to the GATK call should help with your issue.

Please let me know if this doesn't work.

Best,

Jeff

from tortoise.

lauragails avatar lauragails commented on June 16, 2024

Thank you! Now that I know what to look for I see that I can add INTERVAL_FILE in as the inputs (ie to read into line 83 in tortoise_1.2.smk)

that is "interval_file" in config.json.

I checked the other variables in config.json and I am sure that the other ones are either correct, or correctly left empty, so knock on wood that should do it!

The reason (I think) I got this error is that I generated a dictionary with the reference fasta that came with the pacbio reference directory (which only includes the "regular" ie non-scaffold etc. chromosomes). Do you know if there's a way to "save" this info? If not, because the dictionary only contains the "regular" chroms, I'll restrict to those regions and re-run.

The quick response to my original question is especially appreciated. I wish I asked earlier!

from tortoise.

lauragails avatar lauragails commented on June 16, 2024

I mean being able to use the information that's not in the dictionary generated from a basic reference file. Maybe you could do it by making a more complex .list file, since the manual says you don't need a dictionary if you have a list with chromosome and start/stop position.

But fortunately it also allows for the inputting of "chr__" without chr/pos, as long as a dictionary is specified. So fingers crossed, it will work!

Thanks again.

from tortoise.

lauragails avatar lauragails commented on June 16, 2024

Progress! But when I run on hpc using singularity and google deep variant that is installed there, whose versions match the ones you suggest, I get a module not found error (below)

It is entirely possible/probable that it's because python 3.9 is not installed (yet). Also, I don't think we can load two modules at once (unless a python 2 is called using "pyton2" vs. "python" which implies whatever 3.XX module is loaded. I am noting this because the front page suggests that both be installed.

Do you know where this module would be used, and what software I can use to make sure it is installed: third_party.nucleus.io.clif_postproc

Thank you!

I0624 03:06:10.781211 47766986753856 postprocess_variants.py:971] Using sample name from call_variants output. Sample name: [redacted]
2023-06-24 03:06:10.786312: I deepvariant/postprocess_variants.cc:88] Read from: /tmp/tmpncogiwdj/call_variants_output.tfrecord.gz
2023-06-24 03:06:56.226107: I deepvariant/postprocess_variants.cc:103] Total #entries in single_site_calls = 11112971
I0624 03:08:29.075627 47766986753856 postprocess_variants.py:1036] CVO sorting took 2.3048147598902387 minutes
I0624 03:08:29.077477 47766986753856 postprocess_variants.py:1039] Transforming call_variants_output to variants.
I0624 03:33:55.377552 47766986753856 postprocess_variants.py:1079] Processing variants (and writing to temporary file) took 25.438181654612222 minutes
I0624 03:47:44.937586 47766986753856 postprocess_variants.py:1092] Finished writing VCF and gVCF in 13.825993386904399 minutes.
I0624 03:47:44.998353 47766986753856 genomics_reader.py:222] Reading /path/to/redacted.cpu.dv.vcf.gz with NativeVcfReader
I0624 03:54:05.123418 47766986753856 postprocess_variants.py:1103] Generating VCF stats took 6.336423444747925 minutes.

real    47m58.453s
user    47m8.501s
sys     0m48.710s
 File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/tmp/Bazel.runfiles_hbm44wtf/runfiles/absl_py/absl/app.py", line 300, in run
  File "/tmp/Bazel.runfiles_hbm44wtf/runfiles/absl_py/absl/app.py", line 251, in _run_main
  File "/tmp/Bazel.runfiles_hbm44wtf/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1099, in main
  File "/tmp/Bazel.runfiles_hbm44wtf/runfiles/com_google_deepvariant/third_party/nucleus/io/genomics_reader.py", line 244, in iterate
  File "/tmp/Bazel.runfiles_hbm44wtf/runfiles/com_google_deepvariant/third_party/nucleus/io/vcf.py", line 201, in iterate
ModuleNotFoundError: No module named 'third_party.nucleus.io.clif_postproc'

real    38m25.744s
user    37m57.789s
sys     0m26.734s

Edit: This was in postprocess_variants.log it does appear that call_variants.log and make_examples.log look good, and indexed vcf files were created (that visually look good as well, I haven't dug into them)

from tortoise.

jng2 avatar jng2 commented on June 16, 2024

Hi,

Sorry about the delayed response. I'm not too sure where you would find that, I believe that is downloaded when DeepVariant is built based on some Googling. Here is an (old) closed issue from DeepVariant where I think the person had a similar issue to yours I think: google/deepvariant#103

from tortoise.

jng2 avatar jng2 commented on June 16, 2024

I'm going to close this issue. Please let me know if you have further problems.

from tortoise.

lauragails avatar lauragails commented on June 16, 2024

I was able to get this working! The original error was because the tmp directory was full.

When that was fixed, I got a separate internal issue because I can't use python2 and python3 at the same time (when you loaded the python3 module on our system it messed up a behind-the-hood file for python2 so the final scripts I pushed through by hand).

Which leads me to: I see for the CPG file and _position file in your described outputs you're grepping lines that have a comment and piping them to the file, but I don't see any output going there? It could be because I'm copy pasting.

But I do see the de novos that I need(!!) so I'm not concerned about this per-se.

Thank you so much!

from tortoise.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.