Giter Site home page Giter Site logo

Comments (8)

tdegeus avatar tdegeus commented on July 26, 2024

Thanks!

  • We should add a middleware that normalizes field names.
  • We could consider a default lower-case mapping.

from python-bibtexparser.

Technologicat avatar Technologicat commented on July 26, 2024

Maybe something like this (Works For Meβ„’)?

import bibtexparser
from bibtexparser.library import Library
from bibtexparser.model import Block, Entry

class NormalizeFieldNames(bibtexparser.middlewares.middleware.BlockMiddleware):
    def __init__(self,
                 allow_inplace_modification: bool = True):
        super().__init__(allow_inplace_modification=allow_inplace_modification,
                         allow_parallel_execution=True)

    def transform_entry(self, entry: Entry, library: "Library") -> Union[Block, Collection[Block], None]:
        for field in entry.fields:
            field.key = field.key.lower()
        return entry

Usage example:

        library = bibtexparser.parse_file(filename,
                                          append_middleware=[NormalizeFieldNames(),
                                                             bibtexparser.middlewares.SeparateCoAuthors(),
                                                             bibtexparser.middlewares.SplitNameParts()])

from python-bibtexparser.

tdegeus avatar tdegeus commented on July 26, 2024

That's probably alright. Would you be willing to convert it to a PR (adding a test)? I think this is a quite common use-case that we should support.

from python-bibtexparser.

MiWeiss avatar MiWeiss commented on July 26, 2024

Fully agree with @tdegeus, and would appreciate a PR by @Technologicat

Just one remark: We'd have to be able to handle "new" duplicates somehow (i.e., if two field keys exist in the original block which only differ in their capitalization). That's particularly important now that we're pushing the use of entries as dicts. In principle, we have an entry type DuplicateFieldKeyBlock that should be used here, but I am also happy to support additional suggestions. These would probably have to be enabled with a corresponding constructor parameter (e.g. raising an exception). Does this make sense?

from python-bibtexparser.

Technologicat avatar Technologicat commented on July 26, 2024

@tdegeus: Sure.

@MiWeiss: Good point about conflicting keys. But I'll need a bit more information about the desired way to tackle it.

The way this approximately went is, yesterday I got a sudden need to extract some data from BibTeX in Python.

Within an hour, I had installed bibtexparser, upgraded it to 2.x, ran into this issue (since my datafiles happened to use capitalized keys), written the simplest possible field key normalizer, and posted a copy here. So it's fair to say I'm kind of new to this project :)

from python-bibtexparser.

csware avatar csware commented on July 26, 2024

A solution would be to issue a warning (similar to library.failed_blocks) and use the last key value.

from python-bibtexparser.

Technologicat avatar Technologicat commented on July 26, 2024

@csware: Thanks. Yes, that's one possible solution, and probably the simplest one that works.

Considering alternatives, what about the DuplicateFieldKeyBlock mentioned by @MiWeiss? EDIT: Nevermind, I think I understood what you all meant now.

from python-bibtexparser.

Technologicat avatar Technologicat commented on July 26, 2024

Implemented, using @csware's suggestion of emitting a warning and letting the last value win. Please review.

from python-bibtexparser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.