Giter Site home page Giter Site logo

fastq-and-furious's People

Contributors

evanbiederstedt avatar lgautier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fastq-and-furious's Issues

Error importing libraries as written in README

I installed the library as follows

pip install git+https://github.com/lgautier/fastq-and-furious.git

using Python 3.7.3

>>> import sys
>>> print(sys.version)
3.7.3 (default, Sep  5 2019, 17:14:41) 
[Clang 11.0.0 (clang-1100.0.33.8)]

This is the example code snippet in the README:

from fastqandfurious import fastqandfurious, entryfunc

bufsize = 20000
with open("a/fastq/file.fq") as fh:
    it = fastqandfurious.readfastq_iter(fh, bufsize, entryfunc)
    for sequence in it:
        # do something
	pass

I get the following error importing the libraries:

>>> from fastqandfurious import fastqandfurious, entryfunc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'entryfunc' from 'fastqandfurious' (unknown location)
>>> 

The following does work though:

>>> from fastqandfurious.fastqandfurious import entryfunc
>>> from fastqandfurious import fastqandfurious

TypeError: biopython_entryfunc() takes 2 positional arguments but 3 were given

Hi there

This may be another README confusion, but I've yet to figure out what is the problem---I'm trying the code in the README for the biopython adapter code:

from fastqandfurious import fastqandfurious
from fastqandfurious.fastqandfurious import entryfunc
from fastqandfurious._fastqandfurious import arrayadd_b
from Bio.SeqRecord import SeqRecord
from array import array

def biopython_entryfunc(buf, posarray):
    name = buf[posarray[0]:posarray[1]].decode('ascii')
    quality = array('b')
    quality.frombytes(buf[posarray[4]:posarray[5]])
    arrayadd_b(quality, -33)
    entry = SeqRecord(seq=buf[posarray[2]:posarray[3]].decode('ascii'),
                      id=name,
                      name=name,
                      letter_annotations={'phred_quality': quality})
    return entry

bufsize = 20000
with open("a/fastq/file.fq") as fh:
    it = fastqandfurious.readfastq_iter(fh, bufsize, biopython_entryfunc)
    for entry in it:
        # do something
	pass

I run into the following error:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/evanbiederstedt/.pyenv/versions/3.6.8/lib/python3.6/site-packages/fastqandfurious/fastqandfurious.py", line 223, in readfastq_iter
    yield entryfunc(blob, posbuffer, globaloffset)
TypeError: biopython_entryfunc() takes 2 positional arguments but 3 were given

TypeError: must be str, not bytes, Python 3.7.3

Hi there

This makes me think this may be a Python3.x error? I'm using Python version 3.7.3, and trying installing via pip and the github repo:


from fastqandfurious.fastqandfurious import entryfunc
from fastqandfurious import fastqandfurious

myFastq = "a/fastq/file.fq"

bufsize = 20000
with open(myFastq) as fh:
    it = fastqandfurious.readfastq_iter(fh, bufsize, entryfunc)
    for sequence in it:
        print(sequence)

Here is the error:


Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/evanbiederstedt/Library/Python/3.7/lib/python/site-packages/fastqandfurious/fastqandfurious.py", line 191, in readfastq_iter
    npos = _entrypos(blob, offset, posbuffer)
  File "/Users/evanbiederstedt/Library/Python/3.7/lib/python/site-packages/fastqandfurious/fastqandfurious.py", line 53, in _entrypos
    headerbeg_i = blob.find(b'@', offset)
TypeError: must be str, not bytes

The issue is here in _entrypos(), https://github.com/lgautier/fastq-and-furious/blob/master/src/fastqandfurious.py#L65

def _entrypos(blob, offset, posbuffer):
    posbuffer[:] = ARRAY_INIT
    lblob = len(blob)
    # header
    headerbeg_i = blob.find(b'@', offset)
    posbuffer[0] = headerbeg_i
    ...

Perhaps this is a new issue? Let me know if I could provide more details and help debug.

Paired-end read workflow?

Hi @lgautier -- this looks great; I think that it could substantially speed up my analyses.

I've implemented a non-trivial example for single-end read (following the documentation), but I was curious what your recommended approach would be for a pair-end sample. In essence, I want to be able to perform an operation to each of the ends as I iterate read-by-read through two files.

I'm presently considering using something similar to the following:

gen = function_that_returns_a_generator(param1, param2)
if gen: # in case the generator is null
    while True:
        try:
            print gen.next()
        except StopIteration:
            break

from here: https://stackoverflow.com/questions/11539194/how-to-loop-through-a-generator

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.