Giter Site home page Giter Site logo

poonlab / micall-lite Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cfe-lab/micall

4.0 4.0 3.0 22.4 MB

Pipeline for processing FASTQ data from an Illumina MiSeq for human RNA virus (HIV, hepatitis C virus) genotyping

License: GNU Affero General Public License v3.0

Python 83.60% Makefile 0.74% C++ 4.56% C 3.88% HTML 0.94% JavaScript 5.65% R 0.63% Shell 0.01%

micall-lite's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

micall-lite's Issues

Bad magic number

rng64@Arsibalt:~/git/micall$ python3 run-sample.py -1 /data/fq/SRR1577734_1.fastq -2 /data/fq/SRR1577734_2.fastq -u 
Traceback (most recent call last):
  File "run-sample.py", line 7, in <module>
    from micall.core.parse_interop import read_errors, write_phix_csv
ImportError: bad magic number in 'micall': b'\x03\xf3\r\n'

Loading arbitrary local JSON file

JS makes this difficult because it leads to a vulnerability (see also issue #17). In the current branch fbc51c7, the input for specifying a file displays a dummy path:

$("#fil").val()
"C:\fakepath\projects.json"

On further inspection, this input is storing the file metadata in a less obvious location:

$("#fil")[0].files[0]
File(775706) {name: "projects.json", lastModified: 1532438548156, lastModifiedDate: Tue Jul 24 2018 09:22:28 GMT-0400 (Eastern Daylight Time), webkitRelativePath: "", size: 775706, …}

Check that this is the sanctioned approach to loading a local file.

There are two problems when running MiCall

Hello, I have two questions ,

1.After adding the file ErrorMetricsOut.bin, the operation error occurs, and the prompt is as shown in the figure:
ERROR

2.There is no problem when running some .fastq.gz files, but other .fastq.gz files have errors when running remap, the prompts are as follows:
REMAP

Hope to answer, thank you very much!

Major problems installing MiCall-Lite on CentOS 7

  • installing Python3 requires configuring yum to use additional repository
  • Python3 is installed to /opt/rh/rh-python36/root/bin, so user has to reconfigure $PATH
  • setup.py script fails to find Python3 include file, which is located at /opt/rh/rh-python36/root/usr/include/python3.6m:
  micall/alignment/src/_gotoh2.c:1:20: fatal error: Python.h: No such file or directory
 #include <Python.h>
  • it also fails to find many other include files at this same path
  • GCC compiler in CentOS 7 complains about local variable declaration in for loops:
micall/alignment/src/_gotoh2.c: In function ‘map_ascii_to_alphabet’:
micall/alignment/src/_gotoh2.c:73:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
     for (int i = 0; i < 256; i++) {
     ^
micall/alignment/src/_gotoh2.c:73:5: note: use option -std=c99 or -std=gnu99 to compile your code

JSON editing use cases

The purpose of enumerating plausible use cases is to have some basis for designing the user interface for editing JSON files.

Use cases

  1. User wants to add an additional seed reference sequence to an existing project.

Launching JSON editor

If you are working on the edit-json branch (git fetch; git checkout edit-json):

cd html
python3 -m http.server

Open web browser and navigate to URL localhost:8000. Select editor.html. Open console to inspect object.
TODO: find some way to simplify this for the end user!

alignment of 10 million COVID19 genomes

Dear Art,

I am wondering does GISAID provide an aligned COVID19 sequence or if only raw fastq is provided and others should make alignment by the user? How long times it will take if we want to alignment of 10 million COVID19 genomes?

Thanks.

Shicheng

Validate MiCall on SARS-COV-2 Illumina data

  • There are presently 34 SARS-COV-2 samples on NCBI generated on an Illumina platform: see https://trace.ncbi.nlm.nih.gov/
  • There are a variety of approaches being taken to assemble or map these reads to generate a sample consensus sequence
  • Does MiCall generate the same consensus sequence from the FASTQ data as other groups? (concordance)
  • Are there links between the FASTQ data on NCBI SRA and consensus sequences in NCBI Genbank or GISAID?

User cannot specify custom JSON file

The following scripts;

  • prelim_map.py
  • remap.py
  • aln2counts.py
    employ the following line:
projects = project_config.ProjectConfig.loadDefault()

to load the default JSON file projects.json.

HIV drug resistance

Hi,
I have encountered problems in analyzing HIV drug resistance.

  1. The running commands and prompt errors are as follows:
    780606979c7b9c23d1a87940511e93c
    514bcb658c4690e2e29d6560c598789
    ffbe334b1b45a06376162fb5f357774

  2. Now only S34 can get results, the .tsv file for each sample is as follows:
    d2caad7c7c2da4559ec1f01a8893326

gotoh2 not found after installation

Traceback (most recent call last):
  File "run-sample.py", line 11, in <module>
    from micall.core.remap import remap
  File "/home/rng64/git/micall/micall/core/remap.py", line 23, in <module>
    from micall.alignment.gotoh2 import Aligner
  File "/home/rng64/git/micall/micall/alignment/gotoh2.py", line 1, in <module>
    from micall.alignment import _gotoh2
ImportError: cannot import name '_gotoh2'

Clean up temp files

Currently running the pipeline produces a lot of extra files:

art@orolo:~/git/micall/working$ ls
micall.fasta            SRS5100454_1.amino.csv   temp.fasta
reference.1.bt2         SRS5100454_1.conseq.csv  temp.fasta.1.bt2
reference.2.bt2         SRS5100454_1.fastq       temp.fasta.2.bt2
reference.3.bt2         SRS5100454_1.insert.csv  temp.fasta.3.bt2
reference.4.bt2         SRS5100454_1.nuc.csv     temp.fasta.4.bt2
reference.rev.1.bt2     SRS5100454_1.prelim.csv  temp.fasta.rev.1.bt2
reference.rev.2.bt2     SRS5100454_1.remap.csv   temp.fasta.rev.2.bt2
SRS5100454_1.align.csv  SRS5100454_2.fastq       temp.sam

How to with multi-file batch processing

I just used this program, but I am not a computer professional, I now have 10 files, I want to dispose of one time, do not want to dispose of one time, what should I do? I tried to run in many terminals at the same time, but I couldn't. I hope the author can give me some advice.

Couldn't download JSON from HTML editor

scripts.js:890 Not allowed to navigate top frame to data URL: data:text/plain;charset=utf-8,%7B%0A%20%20%20%20%22projects%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%22ERCC%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22max_variants%22%3A%200%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22description%22%3A%20%22External%20RNA%20Controls%20Consortium%20reference%3B%20https%3A%2F%2Fdoi.org%2F10.1186%2F1471-2164-6-150%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22regions%22%3A%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20...20%20%20%20%20%20%20%20%20%20%22GKIGNMRQAHCNISRAKWNNTLKQIASKLREQFGNNKTIIFKQSSGGDPEIVTHSFNCGGEFFYCNSTQLFNSTWFNSTW%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22STEGSNNTEGSDTITLPCRIKQIINMWQKVGKAMYAPPISGQIRCSSNITGLLLTRDGGNSNNESEIFRPGGGDMRDNWR%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22SELYKYKVVKIEPLGVAPTKAKRRVVQREKR%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22seed_group%22%3A%20null%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%7D

Problem with subdirectory having same name as module

This causes problems when we try to execute the run-sample.py wrapper script in the main directory; Python attempts to load modules locally instead of loading them from the default package directory (e.g., /usr/local/lib/python3.5/dist-packages. In this example, there is no _gotoh2 dynamic library in my local repo directory, but there is one at /usr/local

Traceback (most recent call last):
  File "run-sample.py", line 11, in <module>
    from micall.core.remap import remap
  File "/home/art/git/MiCall-Lite/micall/core/remap.py", line 23, in <module>
    from micall.alignment.gotoh2 import Aligner
  File "/home/art/git/MiCall-Lite/micall/alignment/gotoh2.py", line 1, in <module>
    from micall.alignment import _gotoh2
ImportError: cannot import name '_gotoh2'

.csv files generated by Micall do not contain data other than the headings

When running micall, the .csv files generated do not contain data other than the headings. Below is a test case consisting of .fastq files with 1000 lines.

ng64@Arsibalt:~$ micall -u /data/temp1.fastq /data/temp2.fastq
MiCall-Lite running sample temp1...
  Preliminary map
  Iterative remap
  Generating alignment file
  Generating count files

restore unit test suite

For starters, all unit tests using StringIO objects are broken since migration from Python 2 to 3

Problems with usage

Following the instructions of the README for uncompressed FASTQs leads to the following error.

rng64@Arsibalt: ~/git/micall$ python2 run-sample.py -u SRR1577734_1.fastq SRR1577734_2.fastq

usage: run-sample.py [-h] [--fastq1 FASTQ1] [--fastq2 FASTQ2]
                     [--outdir OUTDIR] [--unzipped] [--interop INTEROP]
                     [--readlen READLEN] [--index INDEX]
run-sample.py: error: unrecognized arguments: SRR1577734_1.fastq SRR1577734_2.fastq

Providing the file path for the fastq files yields a different error.

rng64@Arsibalt:~/git/micall$ python2 run-sample.py -u --fastq1 /data/fq/SRR1577734_1.fastq --fastq2 /data/fq/SRR1577734_2.fastq
Traceback (most recent call last):
  File "run-sample.py", line 69, in <module>
    main()
  File "run-sample.py", line 41, in main
    args = parseArgs()
  File "run-sample.py", line 36, in parseArgs
    args.outdir = os.path.dirname(args.fastq1)
  File "/usr/lib/python2.7/posixpath.py", line 122, in dirname
    i = p.rfind('/') + 1
AttributeError: 'file' object has no attribute 'rfind'

Make most arguments optional

Most users won't want to have to specify so many output files when using the core pipeline, e.g., insertion lists

Initial load of JSON file

On opening the HTML page, the fields are blank and the user has to click on the Load button in order to access the default projects.json file. The expected behaviour is either:

  1. for the default file to be automatically loaded upon loading the page, or
  2. clicking the button should open a file dialog so the user selects a particular JSON file.

Loading a default JSON automatically is not working for some reason (trigger load with page load) - the panels are empty upon refreshing the page. This may be due to the panels being rendered prior to the JSON being loaded. Check for asynchrony.

Deprecated function 'itertools.imap'

I'm getting the following error when running micall-lite v0.1rc2 on python 3.7:

Traceback (most recent call last):
  File "/home/dfornika/miniconda3/envs/[email protected]/bin/micall", line 248, in <module>
    run_sample(args)
  File "/home/dfornika/miniconda3/envs/[email protected]/bin/micall", line 163, in run_sample
    keep=args.keep
  File "/home/dfornika/miniconda3/envs/[email protected]/lib/python3.7/site-packages/micall/core/remap.py", line 526, in remap
    conseqs = build_conseqs(samfile, seeds=seeds, worker_pool=worker_pool)
  File "/home/dfornika/miniconda3/envs/[email protected]/lib/python3.7/site-packages/micall/core/remap.py", line 363, in build_conseqs
    distance_report=distance_report)
  File "/home/dfornika/miniconda3/envs/[email protected]/lib/python3.7/site-packages/micall/core/remap.py", line 178, in sam_to_conseqs
    merged_reads = itertools.imap(
AttributeError: module 'itertools' has no attribute 'imap'

It seems that the imap function was removed from itertools, but the built-in map() function can be used in its place.

https://stackoverflow.com/a/30271758

Unregistered loader type

Attempting to install and run on Ubuntu 16.04, Python 3.5.2:

Traceback (most recent call last):
  File "run-sample.py", line 11, in <module>
    from micall.core.remap import remap
  File "/home/art/git/MiCall-Lite/micall/core/remap.py", line 34, in <module>
    aligner = Aligner(gop=15, gep=3, is_global=True)
  File "/home/art/git/MiCall-Lite/micall/alignment/gotoh2.py", line 23, in __init__
    files = pkgres.resource_listdir('micall.alignment', 'models')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1190, in resource_listdir
    resource_name
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1486, in resource_listdir
    return self._listdir(self._fn(self.module_path, resource_name))
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1525, in _listdir
    "Can't perform this operation for unregistered loader type"
NotImplementedError: Can't perform this operation for unregistered loader type

Error when --threads flag is used

Running micall with the --threads flag produces this error:

[dfornika@sabin hcv_micall_testing]$ micall --threads 8 input/HCV300_1.fastq.gz input/HCV300_2.fastq.gz --outdir outdir
Using /home/dfornika/miniconda3/envs/micall-lite-gitub/bin/bowtie2 version 2.2.8
MiCall-Lite running sample HCV300_1...
  Preliminary map
  Iterative remap
Traceback (most recent call last):
  File "/home/dfornika/miniconda3/envs/micall-lite-gitub/bin/micall", line 248, in <module>
    run_sample(args)
  File "/home/dfornika/miniconda3/envs/micall-lite-gitub/bin/micall", line 163, in run_sample
    keep=args.keep
  File "/home/dfornika/miniconda3/envs/micall-lite-gitub/lib/python3.5/site-packages/micall/core/remap.py", line 441, in remap
    worker_pool = multiprocessing.Pool(processes=nthreads) if nthreads > 1 else None
TypeError: unorderable types: str() > int()

Running on the same sample without specifying --threads results in no error:

[dfornika@sabin hcv_micall_testing]$ micall input/HCV300_1.fastq.gz input/HCV300_2.fastq.gz --outdir outdir
Using /home/dfornika/miniconda3/envs/micall-lite-gitub/bin/bowtie2 version 2.2.8
MiCall-Lite running sample HCV300_1...
  Preliminary map
  Iterative remap
  Generating alignment file
  Generating count files

Input fastq files are from this sample:

https://www.ebi.ac.uk/ena/data/view/SAMN11126538

Display contents of JSON in web browser

Right now we are adopting a top and bottom panel format, where the top panel comprises four columns for:

  • projects
  • seed groups
  • seed references
  • coordinate references

Clicking (selecting) an entry in one of these columns should highlight entries in the other columns that are associated with the selected object. The bottom panel should display the attributes of the selected object (e.g., project description).

Make separate issues for editing and adding objects, respectively.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.