Giter Site home page Giter Site logo

openmensa-parsers's Introduction

OpenMensa Parsers

Build Status

OpenMensa is a free canteen database. It depends on external parsers that provide the meal information for the specific canteens.

This repository contains a large collection of parsers for different canteens all over Germany - mostly university canteens provided by student unions.

Before you continue you may want to read the feed documentation describing the exchange format between parsers and OpenMensa.

Contribute

Corrections Welcome: As I do not use most of the parsers myself it is likely that I miss some parsing issues. Feel free to report an issue or even better provide a pull request.

Hosting Provided: Your canteen is missing? You could write a parser for your canteen (in Python)? But you do not know where to host the parser? I can host your parser at omfeeds.devtation.de. Please provide a PR with the new parser.

Overall Structure

The parsers itself are independent. But there is a small framework handling common tasks like URL parsing and output generation.

Each provider has it's own Python module. A provider represents a collection of canteens which are organisationally dependent and therefore can be parsed by the same process. The module itself has to implement a parse_url(canteenidentifier, today=False) method. This method is called to parse and return the feed for a specific canteen. What the canteenidentifier is exactly is up to the provider - mostly they are URL parts or URL suffixes.

The config.py contains a list of all known providers and it's canteens (plus the canteenidentifier that is passed to the parse_url method). The structure is hopefully self explaining. If not, please open an issue.

Common implementation details of providers

At the moment all providers are using PyOpenMensa (documentation, repo) to generate the XML feed and for some help for the parsing itself.

As many meal information are only available online as HTML, Beautiful Soup 4 is used as a robust but easy to use HTML parser.

Get started

  1. Clone the source code

     git clone --recurse-submodules git://github.com/mswart/openmensa-parsers
    
  2. Install the dependencies:

    • Python 3
    • Beautiful Soup 4 - needed for most parsers/providers.
    • python-lxml Some parsers use the lxml backend of Beautiful Soup, so you might need the Python lxml module/extension.
  3. Try some parsers

     python3 parse.py magdeburg ovgu-unten full.xml
    

    general:

     python3 parse.py <provider name> <canteen name> <feed name>.xml
    

    Almost all parsers implement a feed called full including all available menu information. Most parsers implement also a today feed returning primarily the menu for today.

Running the full server

You can either use travis or use this similiar approach:

Prequesites: Apache, Unix (Ubuntu for the following lines)

sudo apt-get install libapache2-mod-wsgi-py3 python3-pip
sudo pip3 install bs4 lxml

Create new file wsgi.conf for apache2.

sudo touch /etc/apache2/conf-available/wsgi.conf

Insert those lines (depending on your path)

WSGIPythonPath /var/www/html/openmensa-parsers/
WSGICallableObject handler
WSGIScriptAlias /get /var/www/html/openmensa-parsers/wsgihandler.py

Restart Apache2

sudo systemctl restart apache2

Check if it works (should return xml links)

wget -O - 127.0.0.1/dresden/index.json

Tips for adding a new provider

  1. Search where the meal information are accessible online. JSON or CSV downloads are mostly the best, HTML sites are also possible, but PDF is tricky.
  2. Maybe take some look on other parsers how they solve the problem / which libraries they use.
  3. Create the new provider
  4. Register your provider with its canteens in config.py
  5. Submit a PR
  6. Wait until the PR is reviewed and merged
  7. Register the new canteens on openmensa with the feed from http://omfeeds.devtation.de/<provider identifier>/<canteen identifier>.xml and (optional) today feed from http://omfeeds.devtation.de/<provider identifier>/<canteen identifier>/today.xml

Further questions

openmensa-parsers's People

Contributors

a-andre avatar abma avatar anarchoschnitzel avatar athemis avatar azrdev avatar cooperrs avatar cvzi avatar floedelmann avatar hesstobi avatar ialokim avatar jenswinter avatar jgraichen avatar klemens avatar kristjanesperanto avatar macintosh-hd avatar manorom avatar martinif avatar martinimoe avatar mlewe avatar mswart avatar pstiegele avatar shad0w73 avatar skruppy avatar thescrabi avatar thor77 avatar unlifate avatar wolfbeisz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openmensa-parsers's Issues

CI seems broken

upon running Travis for my lates PR I noticed that the stup script for the test environment failed. May someone please have a look at it: https://travis-ci.org/github/mswart/openmensa-parsers/builds/77044442

0.01s$ source ~/virtualenv/python3.4/bin/activate

$ python --version

Python 3.4.6

$ pip --version

pip 9.0.1 from /home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages (python 3.4)
before_install

0.00s$ deactivate

8.34s$ sudo build_scripts/travis-install-deps.sh

+ apt-get update -qq

W: GPG error: http://dl.google.com/linux/chrome/deb stable InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 78BD65473CB3BD13

W: The repository 'http://dl.google.com/linux/chrome/deb stable InRelease' is not signed.

W: There is no public key available for the following key IDs:

78BD65473CB3BD13  

W: The repository 'http://apt.postgresql.org/pub/repos/apt trusty-pgdg Release' does not have a Release file.

W: The repository 'http://www.apache.org/dist/cassandra/debian 39x Release' does not have a Release file.

W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6B05F25D762E3157

W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest algorithm (SHA1)

W: Failed to fetch https://packagecloud.io/github/git-lfs/ubuntu/dists/trusty/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6B05F25D762E3157

E: Failed to fetch http://apt.postgresql.org/pub/repos/apt/dists/trusty-pgdg/main/binary-amd64/Packages  404  Not Found [IP: 147.75.85.69 80]

E: Failed to fetch http://apt.postgresql.org/pub/repos/apt/dists/trusty-pgdg/main/binary-i386/Packages  404  Not Found [IP: 147.75.85.69 80]

E: Failed to fetch https://downloads.apache.org/cassandra/debian/dists/39x/main/binary-amd64/Packages  404  Not Found

E: Failed to fetch https://downloads.apache.org/cassandra/debian/dists/39x/main/binary-i386/Packages  404  Not Found

W: Some index files failed to download. They have been ignored, or old ones used instead.

The command "sudo build_scripts/travis-install-deps.sh" failed and exited with 100 during .

Paser broken for TUHH

The Parser for Hamburg, Mensa Harburg seems to be broken: no meals are being displayed

Squashing commits on merge

I'd like to make an argument against squashing the commits on merge:

Squashing the commits makes it hard to identify bugs that were introduced in a PR. If we notice a bug on master, and say we narrow it down to commit 5dd40a4 that introduced it, then that doesn't help us much. We would need to somehow go through the PR's commits to narrow it down appropriately.

The only reason I can imagine that speaks for squashing the commits, is that the history is "cleaner" in the sense that it is easier to figure out what commits pertain to a PR, since they all get squashed into one.
But if you have a tree view of the commits, is very easy to identify which belong to a PR, but it allows to narrow down bugs much more easily.
image
I have to admit that the tree view gets a little bit ugly when a PR has to handle merge conflicts with master during its lifetime, but the benefit of being able to track down bugs and see exactly what happened is very important to me.

What do you think?

Parse Side dish seperated for Uni Leipzig

At uni Leipzig side dishes (french fries, rice etc) are listed as the last item(s) of the list of additives and allergens. (see: https://openmensa.org/c/67 )
I guess it would be possible to get them seperated, as the officual homepage has it seperated ( https://www.studentenwerk-leipzig.de/mensen-cafeterien/speiseplan?location=106 )

There would be needed another category of "side dishes" under each dish (or it could be merged with the title "dish-name mit side-dish1, side dish2...",

Sadly I don't have the knowledge myself for such improvements.

add howto for developers of new parser

I built a parser similar to the ones in this repo, and using your pyopenmensa. How would I (or anyone else) continue now, send a pull request and wait? If a new parser appears here, how will its output get into the openmensa database? (Yes, that last question is not really your concern, but I didn't find any answer to it, and I suppose you did)

Might be a good idea to also write this up in the readme for anybody else :-)

Upload of new packages from travis to apt.dxtt.de fails with 406 Not Acceptable

@mswart
The travis runs for the last three commits on master (cedd712, 2eea58b, and 3fd2f37) failed to upload the created package to apt.dxtt.de with the following error:

Resolving apt.dxtt.de (apt.dxtt.de)... 46.38.231.137, 2a03:4000:20:39::34
Connecting to apt.dxtt.de (apt.dxtt.de)|46.38.231.137|:80... connected.
HTTP request sent, awaiting response... 406 Not Acceptable
2018-12-12 20:07:41 ERROR 406: Not Acceptable.

Switch submodule inclusion to secure alternatives

This git repo is configured to use git:// as a protocol to include the submodule pyopenmensa. But git:// is no longer supported by github software and should be replaced by more secure alternatives e.g. https://

current .git\config

[submodule "pyopenmensa"]
	url = git://github.com/mswart/pyopenmensa.git

Docs outdated?

I tried to run a parser as specified in the readme with:
python3 parse.py magdeburg ovgu-unten

but I get this error:
python3 parse.py magdeburg ovgu-unten Traceback (most recent call last): File "parse.py", line 14, in <module> print(parse(SimulatedRequest(), *sys.argv[1:])) File ".../openmensa-parsers/config.py", line 15, in parse return parsers[parser_name].parse(request, *args) File ".../openmensa-parsers/utils.py", line 50, in parse return self.sources[source].parse(request, *args) TypeError: parse() missing 1 required positional argument: 'feed'

Parser muenchen.py responds with an emtpy array

Hi,
thank you for this great project. We are using the parser for quite some time now (via openmensa.org) and it has always worked great, but since a couple of days we get an empty array as response instead of a list of meals.
It seems that the website http://www.studentenwerk-muenchen.de/mensa has changed its layout and therefore the parsing does not work anymore.
Do you have the time and are willing to adapt those changes so the API on openmensa.org works again?
Best regards

backtrace on LazyBuilder.toXMLFeed() when no meals set, but dates

Currently people are doing weird things [1] which result in the darmstadt parser adding _date_s to its LazyBuilder, but no _meal_s. Should result in an empty feed, but instead LazyBuilder.toXMLFeed() gives the following backtrace:

  File "parse.py", line 14, in <module>
    print(parse(SimulatedRequest(), *sys.argv[1:]))
  File "~/openmensa/openmensa-parsers/config.py", line 15, in parse
    return parsers[parser_name].parse(request, *args)
  File "~/openmensa/openmensa-parsers/utils.py", line 50, in parse
    return self.sources[source].parse(request, *args)
  File "~/openmensa/openmensa-parsers/utils.py", line 173, in parse
    return self.handler(*self.args, today=feed == 'today.xml', **self.kwargs)
  File "~/openmensa/openmensa-parsers/darmstadt.py", line 103, in parse_url
    return canteen.toXMLFeed()
  File "~/openmensa/openmensa-parsers/pyopenmensa/feed.py", line 481, in toXMLFeed
    feed = self.toXML()
  File "~/openmensa/openmensa-parsers/pyopenmensa/feed.py", line 472, in toXML
    feed.appendChild(self.toTag(document))
  File "~/openmensa/openmensa-parsers/pyopenmensa/feed.py", line 543, in toTag
    categoryname, self._days[date][categoryname], output))
  File "/usr/lib/python3.5/xml/dom/minidom.py", line 114, in appendChild
    if node.nodeType == self.DOCUMENT_FRAGMENT_NODE:
AttributeError: 'NoneType' object has no attribute 'nodeType'

[1] http://www.stwda.de/components/com_spk/spk_Dieburg_print.php?ansicht=week

Update/Fix/Add munich canteens

This is a issue collecting various updates for munich canteens i would propose and also implement.

  • Add more canteens managed by the Munich Student Union. There are currently 27 canteens but only 17 are supported by openmensa. Now we will be supporting 20/27 canteens
  • grosshadern: rename it and fix the broken link
  • test all existing canteens
  • rename archisstrasse #93

Since I would be reworking the canteens, I could also fix Issue #93 here

Parser for Studierendenwerk Hamburg outdated

Hey! It seems like none of the canteens in Hamburg can be pulled at the moment, although they are open and serve meals. Most probably this is caused by the new way the Studierendenwerk Hamburg publishes their menus.

Change of Parsers for Canteens in Thuringia

So a couple of days ago I opened a pull request ( #79 ) for a new parser from the canteens of Studentenwerk Thüringen which should add a lot of canteens for Thuringia.
The reason I added this parser is the fact that I found out that a lot of canteens from Thuringia lack data / don't contain data at all (e.g. Erfurt, Jena, Ilmenau) in the current OpenMensa REST API.
The question I have is: once this pull request is merged, will the new parser be available on the server for this project and if so, how can we change the data source from the already existing Thuringia Canteens to the new parsers so OpenMensa will be fed with actual Data for this canteens?

Thanks in advance and kind regards :)

Thuringia updated

The site from canteens in thuringia was updated. Openmensa doesn't show meals anymore.

index-url does not work anymore

I figured something was changed lately, because the Index url for my parser is not working anymore, and I get mail every day. It says it wants json now, but I still provide html.
screen shot 2018-03-28 at 11 33 00

What should I do here?

Lack of Feedback from Maintainer

I just wanted to discuss the issue of the frankly slow feedback and merges from @mswart. It's difficult to stay excited and help with this project, without getting meaningful feedback and changes.

I know it sounds harsh, but I do not mean it as a personal attack. He has built all of this and continues to run everything, and I am sure everyone is very thankful for all the effort he puts into this project.
This issue is primarily about discussing how we all can help with the fact that @mswart has other responsibilities than this project. None of us, can spend all their time on here, which makes the contributions and work from @mswart even more impressive.

Maybe there is some simple change that will help solve this issue. Could we declare a "vice-president' that has the authority to guide the project when he's inactive? Maybe @mswart can check in twice a week, without sacrificing too much time?
I'm just throwing ideas around. What I'd like is to discuss this issue, because I truly believe that it hurts OpenMensa.

Again, @mswart is at no fault. I know that it easily sounds like I would think that. Changes in the amount of time someone can spend on a project like this just doesn't stay the same forever. This is just about figuring out how we can best deal with this. :)

improvements for parser Dresden

Seems the meal plan for Dresden is a little bit inconsistent with naming meals. Resulting in some daily occuring issues concerning the category field:

  1. original text: "Hausgemachte frische Pasta, heute: Amori Aglio e Olio, dazu italienischer Hartkäse Grana Padano"
    parsers result:

    • category: "Hausgemachte frische Pasta , heute"
    • description: "Amori Aglio e Olio, dazu italienischer Hartkäse Grana Padano"

    improvement:

    • category: "Hausgemachte frische Pasta"
    • description: "Amori Aglio e Olio, dazu italienischer Hartkäse Grana Padano"
  2. original text: "Pizza Bel Paese mit Tomaten und Hirtenkäse"
    parser result:

    • category: "Angebote"
    • description: "Pizza Bel Paese mit Tomaten und Hirtenkäse"

    improvement:

    • category: "Ofenfrisch" (locally and at different Mensen category is called "Ofenfrisch" [1])
    • description: "Pizza Bel Paese mit Tomaten und Hirtenkäse"

[1] (https://www.studentenwerk-dresden.de/mensen/speiseplan/zeltschloesschen.html)

Hannover: Parse html output not text

Parsing the txt output relies heavily on correct regex. The resulted in a couple or parse errors during the last month. Parsing the html page is probably more reliable (I know sounds wired).

Missing/Incorrect prices for employees and others in Aachen

In the parser of Aachen a fixed value for others is set which unfortunately deviates extremely from the real prices. furthermore there are no prices for employees.
There is an actual list which is unfortunately a png which category costs how much. See here "https://www.studierendenwerk-aachen.de/de/aktuelles/beitrag/erhoehung-der-mensapreise-zum-1-juni.html".
For the parser I have written a not nice solution to include this list.

My solution:

# All prices come from the
# "https://www.studierendenwerk-aachen.de/de/aktuelles/beitrag/erhoehung-der-mensapreise-zum-1-juni.html" website.
# The figures are from June 1, 2022.
# The list can not be parsed until now
def get_price_surcharge(category, student_price: str, role):
    value = locale.atof(student_price.replace(" €", "").replace(",", "."))
    if category == "Tellergericht" or category == "Tellergericht vegetarisch":
        # All "Tellergericht" are also "Tellergericht Eintopf" and "Tellergericht Süßspeise"
        # the difference is the price.
        # At "Tellergericht Eintopf" it is at 2,00€ and at "Tellergericht Süßspeise" 1,60€
        if value > 1.6:
            if role == "employee":
                value += 3.3
            elif role == "other":
                value += 3.9
        else:
            if role == "employee":
                value += 3.2
            elif role == "other":
                value += 3.8
    elif category == "Vegetarisch":
        if role == "employee":
            value += 3.2
        elif role == "other":
            value += 3.7
    elif category == "Klassiker":
        if role == "employee":
            value += 3.2
        elif role == "other":
            value += 3.8
    elif category == "Empfehlung des Tages":
        if role == "employee":
            value += 3.2
        elif role == "other":
            value += 3.7
    elif category == "Wok" or category == "Wokgericht":
        if role == "employee":
            value += 2.2
        elif role == "other":
            value += 2.7
    elif category == "Pasta":
        if role == "employee":
            value += 2.1
        elif role == "other":
            value += 2.7
    elif category == "Pizza":
        if role == "employee":
            value += 2.1
        elif role == "other":
            value += 2.7
    elif category == "Burger Classics":
        # all "Burger Classic" and "Burger des Tages mit Pommes" are put
        # on the same page and have the same surcharge of "Burger des Tages mit Pommes"
        if role == "employee":
            value += 2.15
        elif role == "other":
            value += 2.65
    elif category == "Grillgericht":
        if role == "employee":
            value += 2.1
        elif role == "other":
            value += 2.7
    elif category == "Fingerfood":
        if role == "employee":
            value += 2.2
        elif role == "other":
            value += 2.8
    elif category == "Ofenkartoffel" or category == "Ofenkartoffel vegetarisch":
        if role == "employee":
            value += 2.2
        elif role == "other":
            value += 2.7
    else:
        # if none of the courts here has a category I have taken a Median of fixed price from the
        # "Tellergericht Eintopf", "Tellergericht Süßspeise", "Vegetarisch" ,"Klassiker" and "Empfehlung des Tages".
        if role == "employee":
            value += 3.2
        elif role == "other":
            value += 3.8
    return locale.str(value).replace(".", ",") + " €"

Erlangen-Nuernberg not all Menus displayed

I got something interesting...
It seems that sometimes a menu gets dropped...

Is there a way to get the Output XML Feed for further Investigation?
(The Following is Related to THI, but should affect all SW-Erlangen-Nuernberg canteens)
I compared the Original Page SW Erlangen/Mensa THI

With the used XML feed used by the Parser INFO-MAX XML Feed

The Problematic Item:

<item>
    <category>Essen 3</category>
    <title>Hähnchenstreifen in süß-saurer Soße (Sel,Sen) mit Reis</title>
    <description/>
    <beilagen/>
    <preis1>2,00</preis1>
    <preis2>3,00</preis2>
    <preis3>4,00</preis3>
    <einheit/>
    <piktogramme>
        <img src='http://www.max-manager.de/daten-extern/sw-erlangen-nuernberg/icons/G.png?ts=1570784103' class='infomax-food-icon G' width='50' height='50' alt='food-icon'>
    </piktogramme>
    <foto/>
</item>

The API Out of OpenMensa was
JSON API CALL

[
    {
        "id": 4594985,
        "name": "Blumenkohl-Broccoli-Auflauf  mit Bechamelsauce ",
        "category": "Vegetarisch",
        "prices": {
            "students": 2.09,
            "employees": 3.3,
            "pupils": null,
            "others": 4.18
        },
        "notes": [
            "mit Milch/Laktose",
            "mit Weizen",
            "mit Milch/Laktose",
            "mit Eier"
        ]
    },
    {
        "id": 4564077,
        "name": "Kaiserschmarrn mit Mandeln und Rosinen  mit Apfelmus ",
        "category": "Vegetarisch",
        "prices": {
            "students": 2.19,
            "employees": 3.3,
            "pupils": null,
            "others": 4.38
        },
        "notes": [
            "mit Mandeln",
            "mit Milch/Laktose",
            "mit Eier",
            "mit Weizen"
        ]
    },
    {
        "id": 4564078,
        "name": "Pasta  Napoli ",
        "category": "Vegetarisch",
        "prices": {
            "students": 1.75,
            "employees": 3.3,
            "pupils": null,
            "others": 3.5
        },
        "notes": [
            "mit Weizen",
            "mit Weizen"
        ]
    },
    {
        "id": 4564075,
        "name": "Schwarzbierpfanne vom Rind  mit Sp\u00e4tzle ",
        "category": "Rind",
        "prices": {
            "students": 3.27,
            "employees": 4.27,
            "pupils": null,
            "others": 6.54
        },
        "notes": [
            "mit Eier",
            "mit Weizen",
            "mit undefinierter Chemikalie Ge",
            "mit Milch/Laktose"
        ]
    },
    {
        "id": 4564076,
        "name": "MSC - Hoki-Filet paniert mit Zitrone  und Remouladenso\u00dfe ",
        "category": "Fisch MSC Fisch",
        "prices": {
            "students": 2.42,
            "employees": 3.42,
            "pupils": null,
            "others": 4.84
        },
        "notes": [
            "mit Milch/Laktose",
            "mit Eier",
            "mit Weizen",
            "mit Weizen"
        ]
    }
]

As you can see the above XML excerpt is missing...

At the beginning of the week the missing Objekt was one of the "Aktions Essen"
But Today it was available...

As you can see its not about Character Encodings,
I actually can't find the Problem in the current code... erlangen_nuernberg.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.