dridk / pyvcf3 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from jamescasbon/pyvcf
A Variant Call Format reader for Python.
Home Page: http://pyvcf.readthedocs.org/en/latest/index.html
License: Other
This project forked from jamescasbon/pyvcf
A Variant Call Format reader for Python.
Home Page: http://pyvcf.readthedocs.org/en/latest/index.html
License: Other
when building pyVCF3 against python 3.11 the test are failing
running pytest
/usr/lib/python3.11/site-packages/setuptools/command/test.py:194: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************
!!
ir_d = dist.fetch_build_eggs(dist.install_requires)
/usr/lib/python3.11/site-packages/setuptools/command/test.py:195: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************
!!
tr_d = dist.fetch_build_eggs(dist.tests_require or [])
/usr/lib/python3.11/site-packages/setuptools/command/test.py:196: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************
!!
er_d = dist.fetch_build_eggs(
running egg_info
writing PyVCF3.egg-info/PKG-INFO
writing dependency_links to PyVCF3.egg-info/dependency_links.txt
writing entry points to PyVCF3.egg-info/entry_points.txt
writing requirements to PyVCF3.egg-info/requires.txt
writing top-level names to PyVCF3.egg-info/top_level.txt
reading manifest file 'PyVCF3.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'PyVCF3.egg-info/SOURCES.txt'
running build_ext
skipping 'vcf/cparse.c' Cython extension (up-to-date)
building 'vcf.cparse' extension
gcc -DNDEBUG -g -fwrapv -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/python/src=/usr/src/debug/python -flto=auto -ffat-lto-objects -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/python/src=/usr/src/debug/python -flto=auto -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/python/src=/usr/src/debug/python -flto=auto -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fPIC -I/usr/include/python3.11 -c vcf/cparse.c -o build/temp.linux-x86_64-cpython-311/vcf/cparse.o
gcc -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection build/temp.linux-x86_64-cpython-311/vcf/cparse.o -L/usr/lib -o /build/python-pyvcf3/src/PyVCF3-1.0.3/vcf/cparse.cpython-311-x86_64-linux-gnu.so
============================= test session starts ==============================
platform linux -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0
rootdir: /build/python-pyvcf3/src/PyVCF3-1.0.3
collected 104 items
vcf/test/test_vcf.py ........................F...F.F..FFFF.............. [ 49%]
.....................sssssss............ssFss.F....F. [100%]
=================================== FAILURES ===================================
___________________________ Test1kgSites.test_writer ___________________________
self = <vcf.test.test_vcf.Test1kgSites testMethod=test_writer>
def test_writer(self):
"""FORMAT should not be written if not present in the template and no
extra tab character should be printed if there are no FORMAT fields."""
reader = vcf.Reader(fh('1kg.sites.vcf', 'r'))
out = StringIO()
> writer = vcf.Writer(out, reader, lineterminator='\n')
vcf/test/test_vcf.py:298:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff899922510>
stream = <_io.StringIO object at 0x7ff89a033760>
template = <vcf.parser.Reader object at 0x7ff899c73e50>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
__________________________ TestInfoOrder.test_writer ___________________________
self = <vcf.test.test_vcf.TestInfoOrder testMethod=test_writer>
def test_writer(self):
"""
Order of INFO fields should be compatible with the order of their
definition in the header and undefined fields should be last and in
alphabetical order.
"""
reader = vcf.Reader(fh('1kg.sites.vcf', 'r'))
out = StringIO()
> writer = vcf.Writer(out, reader, lineterminator='\n')
vcf/test/test_vcf.py:354:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff8996a2190>
stream = <_io.StringIO object at 0x7ff899a00700>
template = <vcf.parser.Reader object at 0x7ff8996a31d0>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
_______________________ TestInfoTypeCharacter.test_write _______________________
self = <vcf.test.test_vcf.TestInfoTypeCharacter testMethod=test_write>
def test_write(self):
reader = vcf.Reader(fh('info-type-character.vcf'))
out = StringIO()
> writer = vcf.Writer(out, reader)
vcf/test/test_vcf.py:383:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff8999d2c10>
stream = <_io.StringIO object at 0x7ff899a01990>
template = <vcf.parser.Reader object at 0x7ff8999d2d50>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
_________________________ TestParseMetaLine.test_write _________________________
self = <vcf.test.test_vcf.TestParseMetaLine testMethod=test_write>
def test_write(self):
reader = vcf.Reader(fh('parse-meta-line.vcf'))
out = StringIO()
> writer = vcf.Writer(out, reader)
vcf/test/test_vcf.py:427:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff899f4d790>
stream = <_io.StringIO object at 0x7ff899a01900>
template = <vcf.parser.Reader object at 0x7ff899f4cc50>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
________________________ TestGatkOutputWriter.testWrite ________________________
self = <vcf.test.test_vcf.TestGatkOutputWriter testMethod=testWrite>
def testWrite(self):
reader = vcf.Reader(fh('gatk.vcf'))
out = StringIO()
> writer = vcf.Writer(out, reader)
vcf/test/test_vcf.py:452:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff8996a1890>
stream = <_io.StringIO object at 0x7ff89a033910>
template = <vcf.parser.Reader object at 0x7ff8996a1010>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
______________________ TestBcfToolsOutputWriter.testWrite ______________________
self = <vcf.test.test_vcf.TestBcfToolsOutputWriter testMethod=testWrite>
def testWrite(self):
reader = vcf.Reader(fh('bcftools.vcf'))
out = StringIO()
> writer = vcf.Writer(out, reader)
vcf/test/test_vcf.py:486:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff899d78390>
stream = <_io.StringIO object at 0x7ff899a00e50>
template = <vcf.parser.Reader object at 0x7ff899d79850>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
______________________ TestWriterDictionaryMeta.testWrite ______________________
self = <vcf.test.test_vcf.TestWriterDictionaryMeta testMethod=testWrite>
def testWrite(self):
reader = vcf.Reader(fh('example-4.1-bnd.vcf'))
out = StringIO()
> writer = vcf.Writer(out, reader)
vcf/test/test_vcf.py:515:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff89a1fd2d0>
stream = <_io.StringIO object at 0x7ff899a01510>
template = <vcf.parser.Reader object at 0x7ff89a1fc290>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
___________________ TestSampleFilter.testSampleFilterModule ____________________
self = <vcf.test.test_vcf.TestSampleFilter testMethod=testSampleFilterModule>
@unittest.skipUnless(IS_NOT_PYPY, "test broken for PyPy")
def testSampleFilterModule(self):
# init filter with filename, get list of samples
filt = vcf.SampleFilter('vcf/test/example-4.1.vcf')
self.assertEqual(filt.samples, ['NA00001', 'NA00002', 'NA00003'])
# set filter, check which samples will be kept
filtered = filt.set_filters(filters="0", invert=True)
self.assertEqual(filtered, ['NA00001'])
# write filtered file to StringIO
buf = StringIO()
> filt.write(buf)
vcf/test/test_vcf.py:1482:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
vcf/sample_filter.py:109: in write
writer = Writer(_out, self.parser)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff89b863a90>
stream = <_io.StringIO object at 0x7ff89a033910>
template = <vcf.parser.Reader object at 0x7ff8996bbd50>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
----------------------------- Captured stdout call -----------------------------
Keeping these samples: ['NA00001']
Writing to '<_io.StringIO object at 0x7ff89a033910>'
------------------------------ Captured log call -------------------------------
INFO root:sample_filter.py:96 Keeping these samples: ['NA00001']
INFO root:sample_filter.py:108 Writing to '<_io.StringIO object at 0x7ff89a033910>'
________________________ TestRegression.test_null_mono _________________________
self = <vcf.test.test_vcf.TestRegression testMethod=test_null_mono>
def test_null_mono(self):
# null qualities were written as blank, causing subsequent parse to fail
print(os.path.abspath(os.path.join(os.path.dirname(__file__), 'null_genotype_mono.vcf') ))
p = vcf.Reader(fh('null_genotype_mono.vcf'))
assert p.samples
out = StringIO()
> writer = vcf.Writer(out, p)
vcf/test/test_vcf.py:1559:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff899dea350>
stream = <_io.StringIO object at 0x7ff89a030820>
template = <vcf.parser.Reader object at 0x7ff899dea390>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
----------------------------- Captured stdout call -----------------------------
/build/python-pyvcf3/src/PyVCF3-1.0.3/vcf/test/null_genotype_mono.vcf
__________________ TestUncalledGenotypes.test_write_uncalled ___________________
self = <vcf.test.test_vcf.TestUncalledGenotypes testMethod=test_write_uncalled>
def test_write_uncalled(self):
"""Test that uncalled genotypes are written just as
they were read in the input file."""
reader = vcf.Reader(fh('uncalled_genotypes.vcf'))
# Write all reader records to a stream.
out = StringIO()
> writer = vcf.Writer(out, reader, lineterminator='\n')
vcf/test/test_vcf.py:1697:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <vcf.parser.Writer object at 0x7ff899deb7d0>
stream = <_io.StringIO object at 0x7ff899a01cf0>
template = <vcf.parser.Reader object at 0x7ff899deba50>, lineterminator = '\n'
def __init__(self, stream, template, lineterminator="\n"):
> self.writer = csv.writer(
stream,
delimiter="\t",
lineterminator=lineterminator,
quotechar="",
quoting=csv.QUOTE_NONE,
)
E TypeError: "quotechar" must be a 1-character string
vcf/parser.py:776: TypeError
=============================== warnings summary ===============================
vcf/parser.py:380
/build/python-pyvcf3/src/PyVCF3-1.0.3/vcf/parser.py:380: DeprecationWarning: invalid escape sequence '\['
self._alt_pattern = re.compile("[\[\]]")
vcf/parser.py:654
/build/python-pyvcf3/src/PyVCF3-1.0.3/vcf/parser.py:654: DeprecationWarning: invalid escape sequence '\['
remoteOrientation = re.search("\[", str) is not None
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED vcf/test/test_vcf.py::Test1kgSites::test_writer - TypeError: "quotecha...
FAILED vcf/test/test_vcf.py::TestInfoOrder::test_writer - TypeError: "quotech...
FAILED vcf/test/test_vcf.py::TestInfoTypeCharacter::test_write - TypeError: "...
FAILED vcf/test/test_vcf.py::TestParseMetaLine::test_write - TypeError: "quot...
FAILED vcf/test/test_vcf.py::TestGatkOutputWriter::testWrite - TypeError: "qu...
FAILED vcf/test/test_vcf.py::TestBcfToolsOutputWriter::testWrite - TypeError:...
FAILED vcf/test/test_vcf.py::TestWriterDictionaryMeta::testWrite - TypeError:...
FAILED vcf/test/test_vcf.py::TestSampleFilter::testSampleFilterModule - TypeE...
FAILED vcf/test/test_vcf.py::TestRegression::test_null_mono - TypeError: "quo...
FAILED vcf/test/test_vcf.py::TestUncalledGenotypes::test_write_uncalled - Typ...
============ 10 failed, 83 passed, 11 skipped, 2 warnings in 3.55s =============
I have the vcf.utils.walk_together
working, I wonder if there is an easy way to merge records to write a new VCF?
Itβs present in the PyPi release, but not in the git sources. Could you add it please? (Or better: Add a dependency on biopython and use that.)
In vcf/parser.py we open filename
but never close it which is causing a Resource warning while using the library.
Background:
In FILTER, multiple filters should be separated by semicolons. The widely used, but not actively maintained, VarScan2 genomic variant caller uses commas instead. Moreover, VarScan2 does not add ##FILTER metadata for most of its filters. Picard FixVcfHeader can be used to fix missing FILTER metadata. A "fixed" metadata row will look like:
##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
Error:
PyVCF fails with:
`
Traceback (most recent call last):
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 236, in
main()
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 232, in main
run(parser.parse_args())
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 166, in run
df_1 = vcf_to_dataframe(args.vcf_1)
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 74, in vcf_to_dataframe
vcf_reader = vcf.Reader(open(vcf_file, "r"))
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 300, in init
self._parse_metainfo()
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 326, in _parse_metainfo
key, val = parser.read_filter(line)
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 142, in read_filter
raise SyntaxError(
SyntaxError: One of the FILTER lines is malformed: ##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
`
Issue:
It might be more robust for PyVCF to treat a filter with commas as just one big filter name, as does Picard FixVcfHeader.
Instead of raising an exception, accept metadata with a filter ID inside double quotes and containing commas, e.g., ID="RefAvgRL,VarAvgRL".
Similarly, in the data, treat a FILTER value like RefAvgRL,VarAvgRL as a single entity. I think this solution is consistent with the VCF 4.2 spec for a filter name: String, no whitespace or semicolons permitted.
Possible pull request:
This hack (changing [^,] + to .+ worked to get me through an urgent analysis, but it may not be the best solution. At parser.py line 142
self.filter_pattern = re.compile(r'''##FILTER=< ID=(?P.+),\s* Description="(?P[^"]*)" >''', re.VERBOSE)
=======
I get the same problem, any update on this issue ?
I hoped switching to PyVCF3 (c.f. jamescasbon#335 ) would solve the issue but apparently not.
My bad, in my case the problem originated from a tag Source in a FILTER field:
##FILTER=<ID=xxx,Description="yyy",Source="zzz">
which is a INFO field tag according to https://samtools.github.io/hts-specs/ and not a FILTER field tag.
Hi,
I'm trying to use the project to read a VCARD file generated from Android Phone App. This App generate a version 3.0 VCard that I'm trying to read with the code below:
import vcf
reader = vcf.Reader(open('a.vcf','r'))
for e in reader:
print(e)
The file 'a.vcf' has only one register and you can see its content below:
BEGIN:VCARD
VERSION:3.0
N:123;Joao;;;
FN:Joao 123
TEL;TYPE=CELL:997-345-21
END:VCARD
When I run the code I get this error message:
Traceback (most recent call last):
File "./vcf2Panda.py", line 17, in <module>
for e in reader:
File "/opt/python-3.6.6/lib/python3.6/site-packages/vcf/parser.py", line 684, in __next__
pos = int(row[1])
IndexError: list index out of range
My environment is an Debian Linux with Python 3.6.6 installed. How can I solve this problem?
Dear Dridk,
I develop a script to manage databases in VCF and Parquet. I need specifically to deal with VCF header, to check VCF structure (INFO fields, FORMAT...).
When I create an instance with an empty VCF, I mean only VCF header, the content of the object is empty (no variant, which is ok), but I do not have any information about the header (e.g. list of INFO fields). Apparently, only INFO field present in variant INFO column are store in the header of the object.
Is there a way to keep INFO field (and other header information) even if thy are not present in variant? Especially if the VCF is empty (no variant)?
Thank!
Best,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.