Giter Site home page Giter Site logo

brwnj / idplot Goto Github PK

View Code? Open in Web Editor NEW
24.0 2.0 4.0 2.85 MB

compare sequences to a shared root reference sequence.

Home Page: https://brwnj.github.io/idplot/

License: MIT License

Dockerfile 0.03% Nextflow 0.42% Python 3.85% HTML 95.71%
nextflow recombination plotly fasta virus sars-cov-2

idplot's Introduction

Compare similar sequences (*.fasta) to a reference (.fasta).

See the example report at: https://brwnj.github.io/idplot/

Setup

Nextflow is used to run the pipeline. Its installation instructions can be found at https://www.nextflow.io/ or installed via conda by way of bioconda. Bioconda includes a complete setup guide at https://bioconda.github.io/user/install.html.

Once your install completes and your channels are configured, run:

conda install nextflow

Usage

Executing the workflow using nextflow:

nextflow run brwnj/idplot -latest -with-docker \
    --reference data/MN996532.fasta \
    --fasta 'data/query_seqs/*.fasta'

This generally takes only a few minutes to complete which enables rapid screening for localized sequence similarity.

Parameters

The reference sequence (--reference) should be a fasta with only one sequence in it. Query sequences (--fasta) may either be single sequence files or multi-sequence fasta files and you can specify more than one using wildcards ('*').

Example sequences are found in data/query_sequences.

By default, output is written to ./results/idplot.html and can be opened with an internet browser.

An example report is available at: https://brwnj.github.io/idplot/

Using a custom alignment

In some cases it may be necessary to manually correct an alignment. In this case, idplot can accept the alignment and skip its internal alignment step. To do so, run:

nextflow run brwnj/idplot -latest -with-docker \
    --alignment my_alignment_msa.fasta

The first sequence in the file will be used as the reference (root) sequence.

Options --reference and --fasta are both omitted in this case.

Including breakpoint detection

We have opted to employ GARD via HyPhy to identify breakpoints. For each GARD fit iteration, we pull the sequences for each breakpoint and build a tree using FastTree.

An example command enabling GARD with 12 MPI processes:

nextflow run brwnj/idplot -latest -with-docker \
    --reference data/MN996532.fasta \
    --fasta 'data/query_seqs/*.fasta' \
    --gard --cpus 12

Including a reference annotation

Publicly available reference sequences, like in NCBI, often have an accompanying annotation that can be included within the ANI plot. On NCBI, one can get a GFF from the reference page under Send to:

ani

nextflow run brwnj/idplot -latest -with-docker \
    --reference data/MN996532.fasta \
    --fasta 'data/query_seqs/*.fasta' \
    --gff MN996532.gff3

Within the report, this renders as:

gff

As GFFs may have multiple feature types, we allow the reader to select their preferred feature type from the report header.

Coordinates in the original GFF will likely not match what is being displayed. Start and end coordinates are updated based on gaps introduced into the reference sequence during multiple sequence alignment.

Interpreting the report

Multiple sequence alignment

msa

The reference sequence is fully colored in. Hovering along the reference shows the base for a given color.

Query sequences are colored at mismatches and gaps (gray).

Percent ID (ANI)

ani

Percent ID is calculated across the window (default 500 bp) with the value being plotted at the center point. A 500 bp window will have 250 bp dead spots at the beginning and end of the reference length. No special treatment is given with respect to sequence content.

Sequences

seqs

Sequence selection is based on the level of x-axis zoom of the plot. Sequence gaps can be removed using the toggle. The selected region can be copied to clipboard, sent directly to BLAST (when selection length is less than 8kb), or all sequences for a given region can be exported to FASTA.

With GARD results

Including --gard in your nextflow command adds breakpoint detection and updates available data and visualizations in the report.

Breakpoints track

gardtrack

Regions identified by GARD as breakpoints are highlighted between the MSA and ANI plots. Clicks on the regions will navigate to the respective dendrogram.

Dendrograms

dendrograms

Per region dendrograms are generated using FastTree based off of regions identified by GARD.

Hovering over regions highlights the respective region in the GARD breakpoints track.

Clicking the region link will zoom the plot to facilitate downloading the sequence content for a given region.

Refinements

refinements

Breakpoints are identified over iterations by GARD, often to an unhelpful degree. This plot allows the user to explore breakpoints and trees across all GARD iterations. Selecting a new point will update the dendrograms and GARD breakpoints track.

idplot's People

Contributors

brwnj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

idplot's Issues

No such variable: msa_gard_ch

Hi.
I was trying to run this interesting script however I am getting this error while trying to execute it:

No such variable: msa_gard_ch

-- Check script '.nextflow/assets/brwnj/idplot/main.nf' at line: 85 or see '.nextflow.log' file for more details

Hopefully there is a quick fix for this.
Thank you so much.

the log file is the following:

Jun-09 03:05:31.466 [main] DEBUG nextflow.cli.Launcher - $> nextflow run brwnj/idplot -latest --cpus 12 --window 500 --gard --reference ./data/MN996532.fasta --fasta './data/query_seqs/*.fasta'
Jun-09 03:05:31.565 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 22.04.0
Jun-09 03:05:32.459 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/jovyan/.nextflow/assets/brwnj/idplot/.git/config; branch: master; remote: origin; url: https://github.com/brwnj/idplot.git
Jun-09 03:05:32.479 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/jovyan/.nextflow/assets/brwnj/idplot/.git/config; branch: master; remote: origin; url: https://github.com/brwnj/idplot.git
Jun-09 03:05:32.480 [main] INFO nextflow.cli.CmdRun - Pulling brwnj/idplot ...
Jun-09 03:05:32.480 [main] DEBUG nextflow.scm.AssetManager - Pull pipeline brwnj/idplot -- Using local path: /home/jovyan/.nextflow/assets/brwnj/idplot
Jun-09 03:05:33.485 [main] INFO nextflow.cli.CmdRun - Already-up-to-date
Jun-09 03:05:33.498 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /home/jovyan/.nextflow/assets/brwnj/idplot/nextflow.config
Jun-09 03:05:33.498 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /home/jovyan/work/idplot/nextflow.config
Jun-09 03:05:33.499 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/jovyan/.nextflow/assets/brwnj/idplot/nextflow.config
Jun-09 03:05:33.499 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/jovyan/work/idplot/nextflow.config
Jun-09 03:05:33.508 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: standard
Jun-09 03:05:33.581 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: standard
Jun-09 03:05:33.621 [main] INFO nextflow.cli.CmdRun - Launching https://github.com/brwnj/idplot [nostalgic_meninsky] DSL2 - revision: 355bb5f [master]
Jun-09 03:05:33.636 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; plugins-dir=/home/jovyan/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
Jun-09 03:05:33.637 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Jun-09 03:05:33.649 [main] INFO org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Jun-09 03:05:33.651 [main] INFO org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Jun-09 03:05:33.656 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Jun-09 03:05:33.664 [main] INFO org.pf4j.AbstractPluginManager - No plugins
Jun-09 03:05:33.719 [main] DEBUG nextflow.Session - Session uuid: 28b6a58e-fd78-40c8-a763-cfe9a9bdbe42
Jun-09 03:05:33.719 [main] DEBUG nextflow.Session - Run name: nostalgic_meninsky
Jun-09 03:05:33.719 [main] DEBUG nextflow.Session - Executor pool size: 40
Jun-09 03:05:33.749 [main] DEBUG nextflow.cli.CmdRun -
Version: 22.04.0 build 5697
Created: 23-04-2022 18:00 UTC
System: Linux 3.10.0-1160.49.1.el7.x86_64
Runtime: Groovy 3.0.10 on OpenJDK 64-Bit Server VM 11.0.9.1-internal+0-adhoc..src
Encoding: UTF-8 (UTF-8)
Process: 16458@3c5b05d34503 [10.0.3.144]
CPUs: 40 - Mem: 251.4 GB (10.2 GB) - Swap: 128 GB (128 GB)
Jun-09 03:05:33.770 [main] DEBUG nextflow.Session - Work-dir: /home/jovyan/work/idplot/work [nfs]
Jun-09 03:05:33.771 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/jovyan/.nextflow/assets/brwnj/idplot/bin
Jun-09 03:05:33.781 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Jun-09 03:05:33.795 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Jun-09 03:05:33.852 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Jun-09 03:05:33.864 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 41; maxThreads: 1000
Jun-09 03:05:33.963 [main] DEBUG nextflow.Session - Session start invoked
Jun-09 03:05:33.968 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /home/jovyan/work/idplot/results/logs/trace.txt
Jun-09 03:05:34.676 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Jun-09 03:05:34.769 [main] DEBUG nextflow.Session - Session aborted -- Cause: No such property: msa_gard_ch for class: Script_4d2b5c58
Jun-09 03:05:34.812 [main] ERROR nextflow.cli.Launcher - @unknown
groovy.lang.MissingPropertyException: No such property: msa_gard_ch for class: Script_4d2b5c58
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:65)
at org.codehaus.groovy.runtime.callsite.PogoGetPropertySite.getProperty(PogoGetPropertySite.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGroovyObjectGetProperty(AbstractCallSite.java:341)
at Script_4d2b5c58.runScript(Script_4d2b5c58:85)
at nextflow.script.BaseScript.runDsl2(BaseScript.groovy:170)
at nextflow.script.BaseScript.run(BaseScript.groovy:203)
at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:220)
at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:212)
at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:120)
at nextflow.cli.CmdRun.run(CmdRun.groovy:334)
at nextflow.cli.Launcher.run(Launcher.groovy:480)
at nextflow.cli.Launcher.main(Launcher.groovy:639)

improve gard breakpoint display

"Since GARD doesn't assign p-values to individual breakpoints (instead doing AIC-c values) I don't think the idea of highlighting particular regions as "more legitimate" breakpoint calls is going to work. But, as GARD tries to improve the model, some breakpoints appear in more models than others. I am wondering if it's possible to, in addition to showing the best fit model and trees (as now), also show the breakpoints that are called most frequently (>50% of models, more than 75%? or whatever). Or, maybe 100 nt regions that contain frequently called breakpoints (kind of similar to how 3seq shows a region rather than an exact nt call). "

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.