sciunto-org / python-bibtexparser Goto Github PK
View Code? Open in Web Editor NEWBibtex parser for Python 3
Home Page: https://bibtexparser.readthedocs.io
License: MIT License
Bibtex parser for Python 3
Home Page: https://bibtexparser.readthedocs.io
License: MIT License
homogeneize_latex_encoding
has several issues:
I get output this on a .bib file which has no syntax errors according to bibtex. How do I figure out which line in the .bib file is causing this?
2015-08-05 14:21:25,893 [DEBUG] bibtexparser.bparser _parse_records:157: Inspect line 0
2015-08-05 14:21:25,893 [DEBUG] bibtexparser.bparser _parse_records:157: Inspect line 1
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _parse_records:157: Inspect line 2
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _parse_records:161: Line starts with @
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _add_parsed_record:143: The record is not empty. Let's parse it.
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _parse_record:190: The record does not start with @. Return empty dict.
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _add_parsed_record:149: Nothing returned from the parsed record!
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _parse_records:165: The record is set to empty
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _parse_records:157: Inspect line 3
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _parse_records:161: Line starts with @
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _add_parsed_record:143: The record is not empty. Let's parse it.
2015-08-05 14:21:25,894 [DEBUG] bibtexparser.bparser _parse_record:226: The record startswith @string
Traceback (most recent call last):
File "./try_bibtexparser.py", line 43, in
dump('../bibtex/misc_place_names.bib')
File "./try_bibtexparser.py", line 38, in dump
bib_database=bibtexparser.loads(bibtex_str)
File "/usr/local/lib/python3.4/dist-packages/bibtexparser/init.py", line 43, in loads
return parser.parse(bibtex_str)
File "/usr/local/lib/python3.4/dist-packages/bibtexparser/bparser.py", line 119, in parse
self._parse_records(customization=self.customization)
File "/usr/local/lib/python3.4/dist-packages/bibtexparser/bparser.py", line 163, in _parse_records
_add_parsed_record(record, records)
File "/usr/local/lib/python3.4/dist-packages/bibtexparser/bparser.py", line 144, in _add_parsed_record
parsed = self._parse_record(record, customization=customization)
File "/usr/local/lib/python3.4/dist-packages/bibtexparser/bparser.py", line 227, in _parse_record
key, val = [i.strip().strip('{').strip('}').replace('\n', ' ') for i in record.split('{', 1)[1].strip('\n').strip(',').strip('}').split('=')]
IndexError: list index out of range
Hello, I installed this module through pip, and I couldn't make anything work because there were files missing. I could install it manually by downloading the files from here.
Related question on StackOverflow:
http://stackoverflow.com/questions/25683092/python-module-bibtexparser-doesnt-contain-load-and-loads-attributes
Hello,
I just tried to use the BibTexParser and after some testing that was not successfull I found that it must be written for Python 3. Using Python 2 I always got errors when str-functions were used. Maybe an information about the intended Python-version would be userful.
Greets
MrLeeh
I've found it useful to treat editor fields in the same way as author fields i.e. by writing a function that's the equivalent of author()
that splits then into lists of 'Name, Surname':
def editor(record):
"""
Split editor field into a list of "Name, Surname".
:param record: the record.
:type record: dict
:returns: dict -- the modified record.
"""
if "editor" in record:
if record["editor"]:
record["editor"] = getnames([i.strip() for i in record["editor"].replace('\n', ' ').split(" and ")])
else:
del record["editor"]
return record
Do you think that this is worth adding to the main package?
The parser is not able to parse this.
@string{mystring = "Hello"}
@string{myconf = "My International Conference"}
@string{myname = "Doe"}
@inproceedings{mykey,
author = "John " # myname,
title = {Cool Stuff},
booktitle = myconf,
year = 2014,
}
The unittest is implemented and skipped.
As described in http://maverick.inria.fr/~Xavier.Decoret/resources/xdkbibtex/bibtex_summary.html replacements in strings must be also supported.
python-bibtexparser might be the best bibtex parser currently. I'm wondering if you consider to port it to python 2.x? Or, I can give it a try to see if I can make it work under python 2.x.
I'm having trouble writing the bibtex file (containing unicode) below. I assume the unicode characters fail to write because bibtexparser
wants to write ascii, which would work if the unicode would be substituted by the respective latex string. So how do I enforce unicode-to-latex conversion? The correct conversion seems to be in latexenc.py
: ("\u2009", "\\hspace{0.167em}"),
, but it does not seem to get applied.
Thanks
This code
with open(outfile, 'w') as bibtex_file:
bibtexparser.dump(bibdb, bibtex_file, writer=writer)
causes this error
Traceback (most recent call last):
[..]
bibtex_file.write(bibtex_str)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2009' in position 76: ordinal not in range(128)
on this bibtex file
@article{Mesa-Gresa2013,
abstract = {During a 4-week period half the mice (n = 16) were exposed to EE and the other half (n = 16) remained in a standard environment (SE). Aggr. Behav. 9999:XX-XX, 2013. © 2013 Wiley Periodicals, Inc.},
author = {Mesa-Gresa, Patricia and P\'{e}rez-Martinez, Asunci\'{o}n and Redolat, Rosa},
doi = {10.1002/ab.21481},
file = {:Users/jscholz/Documents/mendeley/Mesa-Gresa, P\'{e}rez-Martinez, Redolat - 2013 - Environmental Enrichment Improves Novel Object Recognition and Enhances Agonistic Behavior.pdf:pdf},
issn = {1098-2337},
journal = {Aggressive behavior},
month = apr,
number = {April},
pages = {269--279},
pmid = {23588702},
title = {{Environmental Enrichment Improves Novel Object Recognition and Enhances Agonistic Behavior in Male Mice.}},
url = {http://www.ncbi.nlm.nih.gov/pubmed/23588702},
volume = {39},
year = {2013}
}
With an input field:
volume="n.s.~2",
I get as output:
volume = {n.s.\textasciitilde 2},
This should not happen. ~ here is a tie, not a tilde character.
This code works
25 def _customizations_latex(record):
26 """
27 This function curstumizes record for bibtex.
28 See bibtexparser lib for more info.
29 """
30 record = customization.page_double_hyphen(record)
31 record = customization.homogeneize_latex_encoding(record)
32 record = customization.author(record)
33 return record
and this one doen't
25 def _customizations_latex(record):
26 """
27 This function curstumizes record for bibtex.
28 See bibtexparser lib for more info.
29 """
32 record = customization.author(record)
30 record = customization.page_double_hyphen(record)
31 record = customization.homogeneize_latex_encoding(record)
33 return record
authors content is now a string, not a list anymore!
Hi,
first thank you for this library!
Have you ever considered integrating the biblatex entry types @collection
, @report
or @thesis
? I'm currently exporting my biblatex file from Zotero and get many errors like:
Entry type collection not standard. Not considered.
Entry type report not standard. Not considered.
Entry type thesis not standard. Not considered.
Would it be easy to integrate those types into the library or would it be too much work? @collection
seems to be a pretty often used one ...
Thank you!
https://bibtexparser.readthedocs.org/en/v0.6.1/install.html says to "download the archive" from https://source.sciunto.org/bibtexparser/, but there are only old versions there.
Is this behaviour normal or desired?
>>> from bibtexparser import * Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'dumpbparser' >>> import bibtexparser >>>
Sometimes names have braces, and these aren't understood by getnames. For example,
getnames(['A. {Delgado de Molina}'])
returns ['de Molina}, A. {Delgado']
instead, it should return ['Delgado de Molina, A.']
Hi,
Having used this library before with great success, I decided to use it again on the publications page on my website.
I'm using a static site generator that allows the embedding of Python code:
https://bitbucket.org/obensonne/poole
I am embedding code a bit like this to print my publications:
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import convert_to_unicode
with open('input/pubs/pubs.bib') as bibtex_file:
bibtex_str = bibtex_file.read()
db = BibTexParser(bibtex_str)
#recs = db.records
recs = db.get_entry_list()
for rec in recs:
print("- %s, %s" % (rec["title"], rec["author"]))
This prints a markdown bullet list which is then parsed into HTML.
However, I notice that bibtex syntax characters, such as '{' and '}' remain and are displayed on the web page.
Is there a way to strip them?
Hi,
I need to parse some Bibtex, change some entries or add new entries and then write it back in bibtex.
For now, I'm using this module, getting a dict of the entries, updating the entries and passing it to a custom function to write it back in Bibtex format.
It would be really nice to have functions to update the fields and write it back in bibtex, directly built-in this module. :)
Change default value of parser.homogenise_fields in bparser to be more conservative by default.
Also, check that some unittests exist to test this feature.
Add some comments in the tutorial.
Original bug report #93
Hi,
I'm not sure It's in bibtexparser scope for cross-ref but it could be implemented in bibdatabase.
I'm just opening an issue to discuss about it.
I am a beginner in Python trying to use BibtexParser, and when I run the code on this tutorial the ouput I get, using this code is:
[{u'keyword': u'keyword1, keyword2', u'title': u'An amazing title', 'ENTRYTYPE': u'article', u'journal': u'Nice Journal', u'author': u'Jean C\xe9sar', u'abstract': u'This is an abstract. This line should be long enough to test\nmultilines...', u'month': u'jan', u'volume': u'12', u'comments': u'A comment', u'year': u'2013', 'ID': 'Cesar2013', u'pages': u'12--23'}]
That is, the é
is not properly parsed. I have tried customizations and still the same. Any help is appreciated.
Hi guys,
I managed to edit the example from the wiki into a valid BibTex item that is not correctly parsed by bibtexparser 0.6.0
It looks as follows (I've removed the multiline for simplicity):
@ARTICLE{Cesar2013
, author = {Jean César}
, title = {An amazing title}
, year = {2013}
, month = jan
, volume = {12}
, pages = {12--23}
, journal = {Nice Journal}
, abstract = {This is an abstract. This line should be long enough to test}
, comments = {A comment}
, keywords = {keyword1, keyword2}
}
The comma first syntax is valid in BibTex, e.g. I have a reasonably big Bibtex database in a working project and good ol' Patashnik's bibtex have no problems with it. Patashnik's parser uses a BNF coding so it does not care where lines start or end.
On the other hand bibtexparser only splits on commas at the end of the lines (seen in bparser.py), which is not true for the comma first syntax. If you change
kvs = [i.strip() for i in record.split(',\n')]
to
kvs = [i.strip() for i in record.split(',')]
At line 239 of bparser.py it seems to do the trick and parse the file correctly.
This change shall not have impact on the rest of the package as the newline is stripped in i.strip()
right away, in the same list comprehension.
I have tested this change with and without multiline and with and without comma first syntax and it seems to do fine.
If no one has anything against BibTeX comma first syntax (Algol60 purists maybe?) I'll make a pull request in 24-48h.
In your bparse.py file, when I am trying to run a bibtex file, it stops on the first line to read the file, and says there is no read method.
I am actually fairly new at Python, but I was curious if this is a version change with strings. I tried running it with 3.4 and then I tried 3.3 and the same error appeared.
My friend ran it with 3.3.5 and was able to do it.
Hi, sorry for the drive-by bug report. I found your project through http://bibtexparser.readthedocs.org when I was looking for examples of Sphinx documentation.
I noticed you have :retuns:
instead of :returns:
on lines 81 and 89 of bibtexparser/bparser.py
.
It's such a small problem that if I branched and made a merge request, it'd be more trouble than if you just fixed it yourself, so I created this ticket. I hope you don't mind.
Cool project!
Here's a simple valid bibtex file (simple.bib
) showing limitations of python-bibtexparser
@comment{ignore this line!}
@string{mystring = "Hello"}
@string{myconf = "My International Conference"}
@string{myname = "Doe"}
@inproceedings{mykey,
author = "John" # myname,
title = {Cool Stuff},
booktitle = myconf,
year = 2014,
}
Here's a simple python script to run the example:
import sys
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import homogeneize_latex_encoding
import logging
import logging.config
logger = logging.getLogger(__name__)
logging.config.dictConfig({
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'standard': {
'format': '%(asctime)s [%(levelname)s] %(name)s %(funcName)s:%(lineno)d: %(message)s'
},
},
'handlers': {
'default': {
'level':'DEBUG',
'formatter': 'standard',
'class':'logging.StreamHandler',
},
},
'loggers': {
'': {
'handlers': ['default'],
'level': 'DEBUG',
'formatter': 'standard',
'propagate': True
}
}
})
def parseBibtex(filename):
with open(filename,'rt') as fid:
bibtex = BibTexParser(fid)
print bibtex.get_entry_list()
if __name__ == "__main__":
parseBibtex('simple.bib')
And here's the output:
$ python bibtexparser_simple_test.py simple.bib
2014-03-14 16:59:16,644 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 0
2014-03-14 16:59:16,644 [DEBUG] bibtexparser.bparser _parse_records:129: Line starts with @
2014-03-14 16:59:16,644 [DEBUG] bibtexparser.bparser _add_parsed_record:117: The record is empty
2014-03-14 16:59:16,644 [DEBUG] bibtexparser.bparser _parse_records:131: The record is set to empty
2014-03-14 16:59:16,644 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 1
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 2
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_records:129: Line starts with @
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _add_parsed_record:109: The record is not empty. Let's parse it.
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_record:174: Split the record of its lines and treat them
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_record:179: Inspect: @comment{ignore this line!
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_record:182: Line starts with @ and the key is not stored yet.
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_record:212: All lines have been treated
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_record:214: The dict is empty, return it.
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _add_parsed_record:115: Nothing returned from the parsed record!
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_records:131: The record is set to empty
2014-03-14 16:59:16,645 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 3
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_records:129: Line starts with @
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _add_parsed_record:109: The record is not empty. Let's parse it.
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_record:167: The record startswith @string
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_record:170: Return a dict
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _add_parsed_record:115: Nothing returned from the parsed record!
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_records:131: The record is set to empty
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 4
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_records:129: Line starts with @
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _add_parsed_record:109: The record is not empty. Let's parse it.
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_record:167: The record startswith @string
2014-03-14 16:59:16,646 [DEBUG] bibtexparser.bparser _parse_record:170: Return a dict
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _add_parsed_record:115: Nothing returned from the parsed record!
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_records:131: The record is set to empty
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 5
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 6
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_records:129: Line starts with @
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _add_parsed_record:109: The record is not empty. Let's parse it.
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_record:167: The record startswith @string
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_record:170: Return a dict
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _add_parsed_record:115: Nothing returned from the parsed record!
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_records:131: The record is set to empty
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,647 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 7
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 8
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 9
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 10
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 11
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:134: The line is not empty, add it to record
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_records:123: Inspect line 12
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _add_parsed_record:109: The record is not empty. Let's parse it.
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_record:174: Split the record of its lines and treat them
2014-03-14 16:59:16,648 [DEBUG] bibtexparser.bparser _parse_record:179: Inspect: @inproceedings{mykey
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:182: Line starts with @ and the key is not stored yet.
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:179: Inspect: author = "John" # myname
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:188: Line contains a key-pair value and the key is not stored yet.
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:193: The line is not ending the record.
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:179: Inspect: title = {Cool Stuff}
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:200: Continues the previous line to complete the key pair value...
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:210: This line does NOT represent the end of the current key-pair value
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:179: Inspect: booktitle = myconf
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:200: Continues the previous line to complete the key pair value...
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:210: This line does NOT represent the end of the current key-pair value
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:179: Inspect: year = 2014
2014-03-14 16:59:16,649 [DEBUG] bibtexparser.bparser _parse_record:200: Continues the previous line to complete the key pair value...
2014-03-14 16:59:16,650 [DEBUG] bibtexparser.bparser _parse_record:210: This line does NOT represent the end of the current key-pair value
2014-03-14 16:59:16,650 [DEBUG] bibtexparser.bparser _parse_record:179: Inspect:
2014-03-14 16:59:16,650 [DEBUG] bibtexparser.bparser _parse_record:200: Continues the previous line to complete the key pair value...
2014-03-14 16:59:16,650 [DEBUG] bibtexparser.bparser _parse_record:210: This line does NOT represent the end of the current key-pair value
2014-03-14 16:59:16,650 [DEBUG] bibtexparser.bparser _parse_record:212: All lines have been treated
2014-03-14 16:59:16,650 [DEBUG] bibtexparser.bparser _parse_record:214: The dict is empty, return it.
2014-03-14 16:59:16,650 [DEBUG] bibtexparser.bparser _add_parsed_record:115: Nothing returned from the parsed record!
2014-03-14 16:59:16,650 [DEBUG] bibtexparser.bparser _parse_records:139: Return the result
[]
@comment
should be ignored instead of trying to parse it, @preamble
is also a missing specific entry.As I know, it is not necessarily to end the last tag with comma in BibTex files.
But it seems that BibTexParser does not parse the last tag unless it ends with comma. Here is an example: (Generated using dx.doi.org)
>>> t = '''@article{Atkins_2002,
doi = {10.1038/nrd842},
url = {http://dx.doi.org/10.1038/nrd842},
year = 2002,
month = {Jul},
publisher = {Nature Publishing Group},
volume = {1},
number = {7},
pages = {491-492},
author = {Joshua H. Atkins and Leland J. Gershell},
title = {From the analyst's couch: Selective anticancer drugs},
journal = {Nature Reviews Drug Discovery}
}'''
>>> from pprint import pprint as pp
>>> pp(bibtexparser.bparser.BibTexParser(t).get_entry_list()[0])
{'author': 'Joshua H. Atkins and Leland J. Gershell',
'doi': '10.1038/nrd842',
'id': 'Atkins_2002',
'link': 'http://dx.doi.org/10.1038/nrd842',
'month': 'Jul',
'number': '7',
'pages': '491-492',
'publisher': 'Nature Publishing Group',
'title': "From the analyst's couch: Selective anticancer drugs",
'type': 'article',
'volume': '1',
'year': '2002'}
(As you can see, there is no 'journal' in the output dictionary.)
Now let's try the same BibTex string with a comma added to the last tag:
>>> t = '''@article{Atkins_2002,
doi = {10.1038/nrd842},
url = {http://dx.doi.org/10.1038/nrd842},
year = 2002,
month = {Jul},
publisher = {Nature Publishing Group},
volume = {1},
number = {7},
pages = {491-492},
author = {Joshua H. Atkins and Leland J. Gershell},
title = {From the analyst's couch: Selective anticancer drugs},
journal = {Nature Reviews Drug Discovery},
}'''
>>> pp(bibtexparser.bparser.BibTexParser(t).get_entry_list()[0])
{'author': 'Joshua H. Atkins and Leland J. Gershell',
'doi': '10.1038/nrd842',
'id': 'Atkins_2002',
'journal': 'Nature Reviews Drug Discovery',
'link': 'http://dx.doi.org/10.1038/nrd842',
'month': 'Jul',
'number': '7',
'pages': '491-492',
'publisher': 'Nature Publishing Group',
'title': "From the analyst's couch: Selective anticancer drugs",
'type': 'article',
'volume': '1',
'year': '2002'}
And now it's parsed completely.
(I'm using Python 3.4.1 and bibtexparser (0.5.5))
I've noticed that the to_bibtex
function from the writer fails when the records being written have values which are lists rather than strings. This happens after using the author
customisation function, among others. I was wondering whether (i) something should be included in the documentation to make this clear, and/or (ii) the lists could be joined into strings before writing?
I did a very simple versions of (ii) when I was setting up my own script:
def join_author(record):
"""
Convert authors as lists of strings to strings joined by "and".
:param record: the record.
:type record: dict
:returns: dict -- the modified record.
"""
if "author" in record:
record["author"] = " and ".join(record["author"])
return record
This meets my need, but something more sophisticated that e.g. checked whether a value was a list and converted it into a string in a way appropriate to that sort of field would be possible. I might try to write one for my own education/amusement.
Do you think that's the best way to deal with this, or am I missing something?
Hi,
Thanks for nice parser. Seems like abstract is not included in the entry dictionary even though documentation says so. I've tried many different ways.
bibtex:
@article{Ninlawan20151732,
title = "Factors Which Affect Teachers’ Professional Development in Teaching Innovation and Educational Technology in the 21st Century under the Bureau of Special Education, Office of the Basic Education Commission ",
journal = "Procedia - Social and Behavioral Sciences ",
volume = "197",
number = "",
pages = "1732 - 1735",
year = "2015",
note = "7th World Conference on Educational Sciences ",
issn = "1877-0428",
doi = "http://dx.doi.org/10.1016/j.sbspro.2015.07.228",
url = "http://www.sciencedirect.com/science/article/pii/S1877042815042299",
author = "Ganratchakan Ninlawan",
keywords = "the 21st Century",
keywords = "Teachers’ Professional ;Special Education ; ",
abstract = "Abstract The study aimed to investigate factors which affect teachers’ professional development in teaching innovation and educational technology in the 21st century under the Bureau of Special Education, Office of Basic Education. There were 400 participants. The statistical tool used in the study was Multiple Regression Analysis. The independent variables were entered in the stepwise method. The study found that there were positive correlations between teachers’ professional development and classroom management in the 21st century, concerning creative and innovative skills, communication, information, and media awareness, and computer literacy and information technology. The correlation coefficients were at .295, .349, and .408 respectively. "
}
Result:
{'year': '2015', 'volume': '197', 'title': 'Factors Which Affect Teachers’ Professional Development in Teaching Innovation and Educational Technology in the 21st Century under the Bureau of Special Education, Office of the Basic Education Commission', 'doi': 'http://dx.doi.org/10.1016/j.sbspro.2015.07.228', 'note': '7th World Conference on Educational Sciences', 'pages': '1732 - 1735', 'keyword': 'Teachers’ Professional ;Special Education ;', 'number': '', 'journal': 'Procedia - Social and Behavioral Sciences', 'issn': '1877-0428', 'ID': 'Ninlawan20151732', 'author': 'Ganratchakan Ninlawan', 'link': 'http://www.sciencedirect.com/science/article/pii/S1877042815042299', 'ENTRYTYPE': 'article'}
The parser does a little bit too much cleanup. Entire values enclosed in brackets are stripped of the enclosing brackets. There are numerous instances where bracketing entire values are appropriate, for example with institutional authors. An example:
@book{SillyWalks1970,
author = {{Ministry of Silly Walks}},
location = {London},
publisher = {We Publish Anything},
title = {Handbook of Silly Walks},
year = {1970}
}
The above bibtex entry yields (linebreaks added for readability):
[{u'publisher': u'We Publish Anything',
u'title': u'Handbook of Silly Walks',
u'author': u'Ministry of Silly Walks',
'ID': 'SillyWalks1970',
u'location': u'London',
u'year': u'1970',
'ENTRYTYPE': u'book'}]
Value of author
should be u'{Ministry of Silly Walks}'
, with brackets left untouched.
Hi,
I have this error when giving an UTF-8 input to BibTexParser:
In [3]: BibTexParser(u'blabla')
/usr/lib/python2.7/site-packages/bibtexparser/bparser.py:57: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if data[:3] == '\xef\xbb\xbf':
Out[3]: <bibtexparser.bparser.BibTexParser at 0x1882890>
Original comment from @omangin in #64.
So I believe the conclusion is that we have to accept common strings by default. (That is common_strings=True
as the default value for the parser.)
Then comes a new question. Currently the new parser uses a special object from bibdatabase.py
called BibDataString
to represent parsed string object. This, in theory, enables to delay the interpolation of bibtex strings and chose, for example, to later write it as such in a new file. It seems that the current parser by default interpolates strings and that the default interpolation content for undefined strings is the key. This means that month = jan
is parsed into 'month': 'jan'
which would be exported as month = {jan}
by the writer. Then I guess bibtex itself would not interpolate "jan" since it is not parsed as a string name any more.
What is yet implemented for the new parser is to interpolate known strings ("known" meaning previously defined in the bibtex file or being a common string if the option is set). This is close to the previous behavior but not exactly the same since unknown strings raise errors.
So here are the options to implement:
string
and BibDataString
). This unables the interpolation as well as the export in the original form. This however breaks the compatibility with the current behavior. (Although it is possible to hack these bibtex string expressions to behave as python string.)I think that the interpolation fits well as a customization function where I should move it. The other solutions are all easy to implement but might break some compatibility. What do you think should be the default?
When I feed this entry:
@book{Baehrens:Panegyrici,
title= "XII panegyrici Latini: post Aemilium Baehrensium iterum recensuit Guilielmus Baehrens",
editor= "W. Baehrens and E. Baehrens",
year= 1911,
publisher="B. G. Teubner",
address= "Lipsiae [Leipzig]",
keywords= {primary_source},
}
through this code:
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.bwriter import BibTexWriter
from bibtexparser.customization import homogeneize_latex_encoding
if name=='main':
parser=BibTexParser()
writer=BibTexWriter()
parser.customization=homogeneize_latex_encoding
with open('../bibtex/primary_sources.bib') as bibtex_file:
bib_database=bibtexparser.load(bibtex_file,parser=parser)
print(writer.write(bib_database))
I get:
@book{Baehrens:Panegyrici,
address = {},
editor = {},
isbn = {},
keyword = {primary_source},
publisher = {},
series = {},
shorthand = {},
sortname = {},
title = {},
volume = {},
year = {}
}
Why does it lose all the field values except one, and why does it invent non-existent fields?
In BibTex is valid that fields like month or year are not braced. However, when these are the last field in a record, they are not parsed.
Test case with both, month and year:
from bibtexparser.bparser import BibTexParser
if __name__ == '__main__':
bibtex = """@ARTICLE{Cesar2013,
author = {Jean César},
title = {An amazing title},
month = jan,
year = 2013,
}
@ARTICLE{Cesar2014,
author = {Jean César},
title = {An amazing title},
month = jan,
year = 2014
}
@ARTICLE{Cesar2012,
author = {Jean César},
title = {An amazing title},
year = 2012,
month = jan
}
@ARTICLE{Cesar2011,
author = {Jean César},
year = 2011,
title = {An amazing title},
month = jan,
"""
with open('/tmp/bibtex.bib', 'w') as bibfile:
bibfile.write(bibtex)
with open('/tmp/bibtex.bib', 'r') as bibfile:
bd = BibTexParser(bibfile.read())
for entry in bd.get_entry_list():
print([entry[x] for x in sorted(entry)])
The output is:
['Jean César', 'Cesar2013', 'jan', 'An amazing title', 'article', '2013']
['Jean César', 'Cesar2014', 'jan', 'An amazing title', 'article']
['Jean César', 'Cesar2012', 'An amazing title', 'article', '2012']
['Jean César', 'Cesar2011', 'jan', 'An amazing title', 'article', '2011']
The expected output is:
['Jean César', 'Cesar2013', 'jan', 'An amazing title', 'article', '2013']
['Jean César', 'Cesar2014', 'jan', 'An amazing title', 'article', '2014']
['Jean César', 'Cesar2012', 'jan', 'An amazing title', 'article', '2012']
['Jean César', 'Cesar2011', 'jan', 'An amazing title', 'article', '2011']
Hi,
I am using your library for a bibtex preprocessor. I notice that the library strips the escapes from the bibtex strings. E.g.:
title = {P\#: A Concurrent Prolog for the {.NET} Framework},
will strip the leading slash off '#'.
This causes problems when I come to write out the bibtex record as '#' alone is a latex error.
Is there a sane way I can prevent this. I have looked into the latexenc tables, but if I use those I will end up replacing even spaces with \space.
$ python setup.py install
Traceback (most recent call last):
File "setup.py", line 4, in <module>
from bibtexparser import info
File "/home/nschloe/Downloads/bibtexparser-0.2/bibtexparser/__init__.py", line 2, in <module>
from .bibtexparser import *
File "/home/nschloe/Downloads/bibtexparser-0.2/bibtexparser/bibtexparser.py", line 22, in <module>
from bibtexparser.latexenc import unicode_to_latex, unicode_to_crappy_latex1, unicode_to_crappy_latex2
ImportError: No module named latexenc
Not only commas, please.
Example (from IEEE):
@ARTICLE{825694,
author={Altman, E.R. and Kaeli, D. and Sheffer, Y.},
journal={Computer},
title={Welcome to the opportunities of binary translation},
year={2000},
volume={33},
number={3},
pages={40-45},
keywords={program interpreters;software portability;automatic executable code conversion;binary translation;code portability;hardware developers;processor architecture;software developers},
doi={10.1109/2.825694},
ISSN={0018-9162},}
Hi there,
Perchance, do you have a direct way in the module to dump a straight list of dicts to a bibtex-formatted string directly, or first convert them into a bibtex object? Looking at the API docs, it looks like there's a method to get strings into bibtex objects, which include the data as lists of dicts, and bibtex objects to bibtex-formatted strings, but no obvious way to get a list of dicts programmatically created to output as a bibtex formatted string...
Hi,
I need to parse a bibtex string in Python and I found this library. However, it only supports files as input. This can be bypassed thanks to StringIO module, but it would really be nice to have a way to use string directly in the parser.
What do you think about this ?
Ref: Tame the Beast, p.21
@string{AW = "Addison-Wesley"}
@book{companion,
author = "Goossens, Michel and Mittelbach, Franck and Samarin, Alexander",
title = "The {{\LaTeX}} {C}ompanion",
booktitle = "The {{\LaTeX}} {C}ompanion",
publisher = AW,
year = 1993,
month = "December",
ISBN = "0-201-54199-8",
library = "Yes",
}
Writing this entry via BibTexWriter.write
would output publisher = {AW},
, with the string name AW
in braces, causing bibtex to not recognize it. The correct writing has no braces.
Hi,
If I do
bibtex = BibTexParser("foo",
customization=homogeneize_latex_encoding)
bwriter(bibtex)
The output file has extra {}, as expected (for instance F becomes {F}).
Now, if I iterate this multiple times, I obtain multiple extra {} which is not expected. For instance,
F -> {F} -> {{F}} -> … -> {{…{{F}}…}}
At least, that was not the behaviour I expected, but it may be normal ?
Hi, in this bibtex file, bibtexparser 0.5.5 can correctly read the year from amos2013applying
,
but bibtexparser 0.6.0 doesn't show a year.
I'm using the following as a short example showing this between the 2 versions.
Can you take a look at this when you get a chance?
Regards,
Brandon.
#!/usr/bin/env python3
from bibtexparser.customization import *
from bibtexparser.bparser import BibTexParser
with open('publications.bib', 'r') as f:
p = BibTexParser(f.read(), author).get_entry_list()
[print(x) for x in p]
#0.5.5 Output
{'booktitle': "IWCMC'13 Security, Trust and Privacy Symposium", 'title': 'Applying machine learning classifiers to dynamic Android\nmalware detection at scale', 'id': 'amos2013applying', 'type': 'inproceedings', 'year': '2013', 'author': ['Amos, Brandon', 'Turner, Hamilton', 'White, Jules']}
#0.6 Output
{'type': 'inproceedings', 'author': ['Amos, Brandon', 'Turner, Hamilton', 'White, Jules'], 'id': 'amos2013applying', 'title': 'Applying machine learning classifiers to dynamic Android\nmalware detection at scale', 'booktitle': "IWCMC'13 Security, Trust and Privacy Symposium"}
By using biber testing data, I see @set entries
After searching, I found this:
http://ctan.mirrors.hoobly.com/macros/latex/contrib/biblatex/doc/biblatex.pdf page 113 (3.11.5.1 Static entry sets)
It could be interessting to parse it and store it.
Like cross-refs, I'm opening an issue to discuss it :)
I'm not sure if such lines conform to bibtex syntax, but they are found in standard packages. See, for example, IEEEabrv.bib.
Some minimal examples:
@STRING{Foo = "bar"}
This is a comment
This is a second comment.
This returns the incorrect dictionary entry OrderedDict([('foo', 'bar"} This is a comment This is a second comment.')])
@STRING{Foo = "bar"}
This is a comment
STRING{Baz = "This should be interpreted as comment."}
This raises an exception when parsed.
I had a quick look at the parser code. Maybe, instead of using a new @
line as the record delimiter, it should use the braces?
As a workaround, I'm removing all lines not starting with @
before parsing the aforementioned IEEEabrv.bib fil, but this only works because this particular file only has one-liners.
The problem seems to come from the lack of "," character at the end of the last field, it thus ends with a closing bracket ( "}" ) and will be taken out by the code on line 170 of bparser.
I fixed it by removing the "}" closing the entry and stripping again (so that the last field does not end with "}\n" anymore), but I'm not sure what this code is actually supposed to do...
EDIT: I'm really sorry, I was actually using homogeneize_latex_encoding. convert_to_unicode works fine.
After parsing this bibtex file:
@misc{dmeyer-instruccions-3-act-post,
author="Meyer, Dan",
title="The Three Acts Of A Mathematical Story",
url="http://blog.mrmeyer.com/2011/the-three-acts-of-a-mathematical-story/",
note="\hyphenatedurl{http://blog.mrmeyer.com/2011/the-three-acts-of-a-mathematical-story/}",
year="2011"}
@misc{emergentmath-sneaky-activities,
author="Krall, Geoff (aka emergentmath)",
title="Seven (Sneaky) Activities To Get Your Students Talking Mathematically",
year="2012",
url="http://emergentmath.com/2012/03/01/seven-sneaky-activities-to-get-your-students-talking-mathematically/",
note="\hyphenatedurl{http://emergentmath.com/2012/03/01/seven-sneaky-activities-to-get-your-students-talking-mathematically/}"
}
@misc{markdown,
author="Gruber, John",
title="Markdown",
url="http://daringfireball.net/projects/markdown/"
}
@misc{wk-markdown,
author="Wikipedia",
title="Markdown",
url="https://en.wikipedia.org/wiki/Markdown"
}
@misc{wk-html5,
author="Wikipedia",
title="HTML5",
url="https://en.wikipedia.org/wiki/HTML5"
}
@misc{wk-ensaimada,
author="Wikipedia",
title="Ensaïmada",
url="https://en.wikipedia.org/wiki/Ensa\%C3\%AFmada"
}
@misc{propi-matching-dominis-pdf,
author = "Bordoy, Xavier",
title = "Activitat d'emparellament de domini, gràfica, expressió algebraica i taula de valors",
url = "http://somenxavier.xyz/static/files/01-matching-dominis-gràfiques.pdf",
note = "In catalan"
}
@book{5-practices-math,
author = "Stein, Mary Kay and Smith, Margaret Schwan",
title = "5 Practices for Orchestrating Productive Mathematics Discussions",
url = "http://www.nctm.org/store/Products/5--Practices-for-Orchestrating-Productive-Mathematics-Discussions/",
year = 2011,
publisher = "National Council of Teachers of Mathematics"
}
@book{5-practices-science,
author = "Cartier, Jennifer L. and Smith, Margaret S. and Stein, Mary Kay and Ross, Danielle K.",
title = " Practices for Orchestrating Productive Task-Based Discussions in Science",
url = "http://www.nctm.org/store/Products/5-Practices-for-Orchestrating-Task-Based-Discussions-in-Science/",
year = 2013,
publisher = "National Council of Teachers of Mathematics"
}
@misc{steven-cavadino,
author = "Cavadino, Steven",
title = "Cavmaths. Maths, teaching and life",
url = "https://cavmaths.wordpress.com/",
note = "https://twitter.com/srcav"
}
@misc{exploremtbos-a-new-expl,
author = "Explore the MTBoS",
title = "A new exploration!",
year = "2015",
url = "https://exploremtbos.wordpress.com/2015/10/18/a-new-exploration/"
}
I get this:
[{'title': 'The Three Acts Of A Mathematical Story', 'ENTRYTYPE': 'misc', 'ID': 'dmeyer-instruccions-3-act-post', 'link': 'http://blog.mrmeyer.com/2011/the-three-acts-of-a-mathematical-story/', 'author': 'Meyer, Dan', 'note': '\\hyphenatedurl{http://blog.mrmeyer.com/2011/the-three-acts-of-a-mathematical-story/}'}, {'title': 'Seven (Sneaky) Activities To Get Your Students Talking Mathematically', 'ENTRYTYPE': 'misc', 'ID': 'emergentmath-sneaky-activities', 'year': '2012', 'author': 'Krall, Geoff (aka emergentmath)', 'link': 'http://emergentmath.com/2012/03/01/seven-sneaky-activities-to-get-your-students-talking-mathematically/'}, {'title': 'Markdown', 'author': 'Gruber, John', 'ENTRYTYPE': 'misc', 'ID': 'markdown'}, {'title': 'Markdown', 'author': 'Wikipedia', 'ENTRYTYPE': 'misc', 'ID': 'wk-markdown'}, {'title': 'HTML5', 'author': 'Wikipedia', 'ENTRYTYPE': 'misc', 'ID': 'wk-html5'}, {'title': 'Ensaïmada', 'author': 'Wikipedia', 'ENTRYTYPE': 'misc', 'ID': 'wk-ensaimada'}, {'title': "Activitat d'emparellament de domini, gràfica, expressió algebraica i taula de valors", 'link': 'http://somenxavier.xyz/static/files/01-matching-dominis-gràfiques.pdf', 'author': 'Bordoy, Xavier', 'ENTRYTYPE': 'misc', 'ID': 'propi-matching-dominis-pdf'}, {'title': '5 Practices for Orchestrating Productive Mathematics Discussions', 'ENTRYTYPE': 'book', 'ID': '5-practices-math', 'link': 'http://www.nctm.org/store/Products/5--Practices-for-Orchestrating-Productive-Mathematics-Discussions/', 'author': 'Stein, Mary Kay and Smith, Margaret Schwan', 'year': '2011'}, {'title': 'Practices for Orchestrating Productive Task-Based Discussions in Science', 'ENTRYTYPE': 'book', 'ID': '5-practices-science', 'link': 'http://www.nctm.org/store/Products/5-Practices-for-Orchestrating-Task-Based-Discussions-in-Science/', 'author': 'Cartier, Jennifer L. and Smith, Margaret S. and Stein, Mary Kay and Ross, Danielle K.', 'year': '2013'}, {'title': 'Cavmaths. Maths, teaching and life', 'link': 'https://cavmaths.wordpress.com/', 'author': 'Cavadino, Steven', 'ENTRYTYPE': 'misc', 'ID': 'steven-cavadino'}, {'title': 'A new exploration!', 'year': '2015', 'author': 'Explore the MTBoS', 'ENTRYTYPE': 'misc', 'ID': 'exploremtbos-a-new-expl'}]
if I print it out:
# Parsing references in BibTeX format
with open(referencesdb) as bibtex_file:
bibtex_str = bibtex_file.read()
old_references = bibtexparser.loads(bibtex_str)
print(old_references.entries)
So there are entries which have url
field but no link
in dict is parsed
Thanks,
Some sources (such as ORCiD) return bibtex records with no newlines between the fields:
@article { cortadella1993,title = {Division with speculation of quotient digits},journal = {Proceedings - Symposium on Computer Arithmetic},year = {1993},pages = {87-94},author = {Cortadella, Jordi and Lang, Tomas}}
It seems that the parser does not support this format.
I've embedded this library in a plugin for the Sublime Text editor, see https://github.com/sjpfenninger/citebibtex. The module therefore isn't properly installed, and I needed to change some of your import statements from absolute to relative. For example, in bparser.py, I changed
from bibtexparser.bibdatabase import BibDatabase
to
from .bibdatabase import BibDatabase
Is there a reason to have absolute imports? I think changing these to relative will have no adverse effects in any case.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.