lgautier / fastq-and-furious Goto Github PK
View Code? Open in Web Editor NEWEfficient handling of FASTQ files from Python
Home Page: https://lgautier.github.io/fastq-and-furious/
License: MIT License
Efficient handling of FASTQ files from Python
Home Page: https://lgautier.github.io/fastq-and-furious/
License: MIT License
Hi there
This makes me think this may be a Python3.x error? I'm using Python version 3.7.3, and trying installing via pip and the github repo:
from fastqandfurious.fastqandfurious import entryfunc
from fastqandfurious import fastqandfurious
myFastq = "a/fastq/file.fq"
bufsize = 20000
with open(myFastq) as fh:
it = fastqandfurious.readfastq_iter(fh, bufsize, entryfunc)
for sequence in it:
print(sequence)
Here is the error:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/Users/evanbiederstedt/Library/Python/3.7/lib/python/site-packages/fastqandfurious/fastqandfurious.py", line 191, in readfastq_iter
npos = _entrypos(blob, offset, posbuffer)
File "/Users/evanbiederstedt/Library/Python/3.7/lib/python/site-packages/fastqandfurious/fastqandfurious.py", line 53, in _entrypos
headerbeg_i = blob.find(b'@', offset)
TypeError: must be str, not bytes
The issue is here in _entrypos()
, https://github.com/lgautier/fastq-and-furious/blob/master/src/fastqandfurious.py#L65
def _entrypos(blob, offset, posbuffer):
posbuffer[:] = ARRAY_INIT
lblob = len(blob)
# header
headerbeg_i = blob.find(b'@', offset)
posbuffer[0] = headerbeg_i
...
Perhaps this is a new issue? Let me know if I could provide more details and help debug.
@peterjc's request - see https://twitter.com/pjacock/status/829353750678618112
I installed the library as follows
pip install git+https://github.com/lgautier/fastq-and-furious.git
using Python 3.7.3
>>> import sys
>>> print(sys.version)
3.7.3 (default, Sep 5 2019, 17:14:41)
[Clang 11.0.0 (clang-1100.0.33.8)]
This is the example code snippet in the README:
from fastqandfurious import fastqandfurious, entryfunc
bufsize = 20000
with open("a/fastq/file.fq") as fh:
it = fastqandfurious.readfastq_iter(fh, bufsize, entryfunc)
for sequence in it:
# do something
pass
I get the following error importing the libraries:
>>> from fastqandfurious import fastqandfurious, entryfunc
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'entryfunc' from 'fastqandfurious' (unknown location)
>>>
The following does work though:
>>> from fastqandfurious.fastqandfurious import entryfunc
>>> from fastqandfurious import fastqandfurious
Hi @lgautier -- this looks great; I think that it could substantially speed up my analyses.
I've implemented a non-trivial example for single-end read (following the documentation), but I was curious what your recommended approach would be for a pair-end sample. In essence, I want to be able to perform an operation to each of the ends as I iterate read-by-read through two files.
I'm presently considering using something similar to the following:
gen = function_that_returns_a_generator(param1, param2)
if gen: # in case the generator is null
while True:
try:
print gen.next()
except StopIteration:
break
from here: https://stackoverflow.com/questions/11539194/how-to-loop-through-a-generator
Thanks!
Would be nice to have the compiled version available for install through bioconda.
Hi there
This may be another README confusion, but I've yet to figure out what is the problem---I'm trying the code in the README for the biopython adapter code:
from fastqandfurious import fastqandfurious
from fastqandfurious.fastqandfurious import entryfunc
from fastqandfurious._fastqandfurious import arrayadd_b
from Bio.SeqRecord import SeqRecord
from array import array
def biopython_entryfunc(buf, posarray):
name = buf[posarray[0]:posarray[1]].decode('ascii')
quality = array('b')
quality.frombytes(buf[posarray[4]:posarray[5]])
arrayadd_b(quality, -33)
entry = SeqRecord(seq=buf[posarray[2]:posarray[3]].decode('ascii'),
id=name,
name=name,
letter_annotations={'phred_quality': quality})
return entry
bufsize = 20000
with open("a/fastq/file.fq") as fh:
it = fastqandfurious.readfastq_iter(fh, bufsize, biopython_entryfunc)
for entry in it:
# do something
pass
I run into the following error:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/Users/evanbiederstedt/.pyenv/versions/3.6.8/lib/python3.6/site-packages/fastqandfurious/fastqandfurious.py", line 223, in readfastq_iter
yield entryfunc(blob, posbuffer, globaloffset)
TypeError: biopython_entryfunc() takes 2 positional arguments but 3 were given
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.