kiliankoe / emeal-server Goto Github PK
View Code? Open in Web Editor NEW๐ฏ Scraping Dresden's canteens for juicy meal data
License: MIT License
๐ฏ Scraping Dresden's canteens for juicy meal data
License: MIT License
Currently only possible via /meals
for today's meals. Would be great to have this for any possible date or maybe using the week and day params same as the StuWe?
Currently only a few attributes of the scraping code are being tested. There's a lot of fragile untested corners left.
It might also make sense to periodically run tests against live data. Via travis' cron jobs for example to ensure that failures are found quickly.
It turns out there's some meals that share a single ID across several days as they're apparently declared for an entire date range ([...] Angebot vom Mo 8.1.18 - Fr 12.1.18
) instead of a single day?
See here for an example.
This breaks the current handling of meals since it's interpreted as an update and leads to deletion of the original meal. It then appears as only occurring on the last day it was discovered on, meh.
Apparently some canteens mark meals als being sold out, others just remove them. The current crawler updates new meals, but keeps the ones that have been removed intact. Ideally these should be marked as being sold out. Not quite sure where to model this though.
One way would be to make the meal models timestampable, which Vapor supports and take that route somehow to check which meals are stale and mark those as sold out? That seems rather fragile though.
Another option would be to check which meals are still present compared to all previously known meals on an update, filter those that are not in the new list and then mark these as sold out. Sounds just as fragile :/
Currently list 1 is interpreted as ingredients and list 2 as allergens. If additives are present however, they are list 2 and allergens are list 3.
try and parse the entire meal title maybe?
Currently all array fields of Meal
are not persisted due to the fact that Fluent (or SQLite down below) can't handle arrays directly. It would probably work to just encode those as semicolon separated strings on the fly in both directions.
Ping @lucasvog about this :P
Currently all meals for a single canteen are being output on /meals/<canteen_id>
. That should probably be limited to the current week only?
In turn that should also result in some way of accessing the data for the next two weeks somehow. URL param maybe?
Makes it a little easier for the StuWe servers on container restarts.
Seems like a new canteen was added.
https://www.studentenwerk-dresden.de/mensen/details-mensa-rothenburg.html
Currently, no meals are found, e.g., when using the /meals
endpoint. It seems, that somewhen, the layout of the website of Studentenwerk was updated and this change has not been reflected here since.
Is there anything planned for updating this repo, or is there another more recent project?
The logs of a freshly created container using docker-compose tell the following when GETting /meals
:
emeal_1 | The current hash key "0000000000000000" is not secure.
emeal_1 | Update hash.key in Config/crypto.json before using in production.
emeal_1 | Use `openssl rand -base64 <length>` to generate a random string.
emeal_1 | The current cipher key "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" is not secure.
emeal_1 | Update cipher.key in Config/crypto.json before using in production.
emeal_1 | Use `openssl rand -base64 32` to generate a random string.
emeal_1 | Production mode enabled, disabling informational logs.
emeal_1 | Database prepared
emeal_1 | Starting server on 0.0.0.0:8080
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d1.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d2.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d3.html
emeal_1 | [Abort request error: Not Found] [Identifier: Vapor.Abort.notFound]
emeal_1 | [Abort request error: Not Found] [Identifier: Vapor.Abort.notFound]
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d4.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d5.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d6.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d0.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d1.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d2.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d3.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d4.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d5.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d6.html
emeal_1 | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d0.html
Another new canteen? Can't find any details though...
Also think about not keeping an exhaustive list of all allergens and additives, but just stripping them to their identifiers. Meal.Information feels like a good thing to keep.
As in add the path component v1
to the URL. Just in case incompatible changes happen in the future.
Currently all scraping requests are completely synchronous, albeit being run in the background. It obviously takes a little while to get through them all, especially on the initial fetch all.
does that make sense?
It would probably make sense to limit updating of the current day to times throughout the day (not at night) and don't do so as often on weekends (and holidays)?
Currently all meal properties are overwritten on an update. This unfortunately removes images and maybe other metadata as well since the StuWe removes these for some reason ๐
It would definitely make sense to just update instead. I'll consider the current behavior a bug.
Apparently the query param at the end of the meal detail URL is not static between refreshes. It should probably be stripped.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.