Giter Site home page Giter Site logo

kiliankoe / emeal-server Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 0.0 224 KB

๐ŸŒฏ Scraping Dresden's canteens for juicy meal data

License: MIT License

Swift 99.78% Makefile 0.16% Shell 0.06%
dresden studentenwerk-dresden mensa canteen emeal tu-dresden htw-dresden

emeal-server's Issues

endpoint for all meals for a single day

Currently only possible via /meals for today's meals. Would be great to have this for any possible date or maybe using the week and day params same as the StuWe?

Extend tests

Currently only a few attributes of the scraping code are being tested. There's a lot of fragile untested corners left.

It might also make sense to periodically run tests against live data. Via travis' cron jobs for example to ensure that failures are found quickly.

Recycled meal IDs

It turns out there's some meals that share a single ID across several days as they're apparently declared for an entire date range ([...] Angebot vom Mo 8.1.18 - Fr 12.1.18) instead of a single day?

See here for an example.

This breaks the current handling of meals since it's interpreted as an update and leads to deletion of the original meal. It then appears as only occurring on the last day it was discovered on, meh.

Sold out meals

Apparently some canteens mark meals als being sold out, others just remove them. The current crawler updates new meals, but keeps the ones that have been removed intact. Ideally these should be marked as being sold out. Not quite sure where to model this though.

One way would be to make the meal models timestampable, which Vapor supports and take that route somehow to check which meals are stale and mark those as sold out? That seems rather fragile though.

Another option would be to check which meals are still present compared to all previously known meals on an update, filter those that are not in the new list and then mark these as sold out. Sounds just as fragile :/

counter

try and parse the entire meal title maybe?

Persist meal array fields

Currently all array fields of Meal are not persisted due to the fact that Fluent (or SQLite down below) can't handle arrays directly. It would probably work to just encode those as semicolon separated strings on the fly in both directions.

limit meal output for single canteen

Currently all meals for a single canteen are being output on /meals/<canteen_id>. That should probably be limited to the current week only?

In turn that should also result in some way of accessing the data for the next two weeks somehow. URL param maybe?

No meals found

Currently, no meals are found, e.g., when using the /meals endpoint. It seems, that somewhen, the layout of the website of Studentenwerk was updated and this change has not been reflected here since.
Is there anything planned for updating this repo, or is there another more recent project?

The logs of a freshly created container using docker-compose tell the following when GETting /meals:

emeal_1  | The current hash key "0000000000000000" is not secure.
emeal_1  | Update hash.key in Config/crypto.json before using in production.
emeal_1  | Use `openssl rand -base64 <length>` to generate a random string.
emeal_1  | The current cipher key "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" is not secure.
emeal_1  | Update cipher.key in Config/crypto.json before using in production.
emeal_1  | Use `openssl rand -base64 32` to generate a random string.
emeal_1  | Production mode enabled, disabling informational logs.
emeal_1  | Database prepared
emeal_1  | Starting server on 0.0.0.0:8080
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d1.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d2.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d3.html
emeal_1  | [Abort request error: Not Found] [Identifier: Vapor.Abort.notFound]
emeal_1  | [Abort request error: Not Found] [Identifier: Vapor.Abort.notFound]
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d4.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d5.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d6.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d0.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d1.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d2.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d3.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d4.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d5.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d6.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d0.html

Mensa Mahlwerk

Another new canteen? Can't find any details though...

Fix MealInformation log errors

Also think about not keeping an exhaustive list of all allergens and additives, but just stripping them to their identifiers. Meal.Information feels like a good thing to keep.

Version API

As in add the path component v1 to the URL. Just in case incompatible changes happen in the future.

Parallelize scraping

Currently all scraping requests are completely synchronous, albeit being run in the background. It obviously takes a little while to get through them all, especially on the initial fetch all.

Update mechanism

  • update the current day at regular intervals (every 15 minutes?)
  • update the next day every few hours
  • update the current week once a day (just in case)
  • update the next two weeks only on app startup

does that make sense?

It would probably make sense to limit updating of the current day to times throughout the day (not at night) and don't do so as often on weekends (and holidays)?

Update meal properties instead of overwrite

Currently all meal properties are overwritten on an update. This unfortunately removes images and maybe other metadata as well since the StuWe removes these for some reason ๐Ÿ˜•

It would definitely make sense to just update instead. I'll consider the current behavior a bug.

Meal duplicates

Apparently the query param at the end of the meal detail URL is not static between refreshes. It should probably be stripped.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.