m-rtijn / argostime Goto Github PK
View Code? Open in Web Editor NEWKeep an eye on prices
Home Page: https://argostime.mrtijn.nl/
License: GNU Affero General Public License v3.0
Keep an eye on prices
Home Page: https://argostime.mrtijn.nl/
License: GNU Affero General Public License v3.0
The idea is to add a simple JSON API via which Argostimè data can be requested, so that there is not only human-friendly access (via the main website) but also machine-friendly access (via the API)
Possible things we have to think about is:
At the time of writing this issue, the product "rode paprika" shows a price of 9.99, while the Albert Heijn site shows 1.09: https://argostime.mrtijn.nl/product/wi4117.
The issue also arises with AH Ribbelchips Naturel: https://argostime.mrtijn.nl/product/wi448475
Currently there is no proper documentation, good documentation could make the project more attractive to potential developers.
Documentation to add:
E.g.
Will report a current price of €-1,00 and a lowest price of €9223372036854775808.00.
The old CSP broke #24. So we need a new one
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy
The idea is to get historic data from the (sporadic) archives that exist of product pages, on sites like web.archive.org or archive.is
Maybe switch to Apache ECharts?
It would be nice to have an easier way to add products and view existing data when you are visiting the shop website.
I think we can achieve this with a separate browser extension for Chrome and Firefox. Feel free to share your thoughts on this idea here.
Some resources:
An EAN code uniquely identifies the product. Therefore if we find an EAN code at different shops, we know they are selling the same product!
Some products are already linked by hand but we should automate this!
Keep in mind though that EAN codes are not always available.
This affects the crawlers for:
It looks like Jumbo is missing on https://argostime.mrtijn.nl/ but is available in this repository. Is https://argostime.mrtijn.nl/ in need of an update?
Graphs only need to be updated once per day (except when you're debugging), so they are a good candidate for caching.
See the discussion in #47
The core of a crawler is to find a product name, product code and at least a normal price. Some things are optional (discount, EANs). There should be a clear format to comment in the source code of a specific crawler to show what it supports.
As an optional feature, a crawler may add availability information to a newly crawled price.
Availability information will be an extra property of the Price
data model class. Possible values for the availability property are:
The current frontpage is getting really big because the product database has grown a lot.
Suggested changes:
Daily statistics
It takes a very long time to load the Albert Heijn page: https://argostime.mrtijn.nl/shop/1
I think we might be able to fix this using a more clever SQL query or adding indexes to the tables.
Resource on indexes: https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html
We're using python 3.10 instead of 3.8 now
As the different shops can grow, and to keep oversight or search for deals, it can be useful to have some basic sorting on the tables in the shop overviews.
This can be achieved by using some Javascript, this one for example seems helpful:
https://github.com/LeeWannacott/table-sort-js
If we start a separate thread per webshop, we can still keep the amount of requests/hour low per webshop, but still in a faster way check all products in the database
I discovered that the Etos crawler cannot detect some discounts, they possibly added new types of discounts.
Types of discounts that do not work that I have found are:
There is a bug in the AH scraper.
When a product is marked as being on discount next week, it is not parsed correctly.
Examples:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.