choonster / catalogue-scanner Goto Github PK
View Code? Open in Web Editor NEWScans catalogues for specific items
Scans catalogues for specific items
Deployment of the Functions App often fails with the following error:
When request Azure resource at PublishContent, Sync Trigger Functionapp : Failed to perform sync trigger on function app. Function app may have malformed content. Please manually restart your function app and inspect the package from WEBSITE_RUN_FROM_PACKAGE.
Based on this comment, it seems that Zip Deploy is recommend over the RBAC deployment we're currently using.
Azure App Configuration supports JSON configuration values, this may be a better format than individual key/value pairs for match rules.
If scanning a catalogue gets stuck on an error, the user should be able to reset the scan state to allow the next invocation of the scan function to run completely.
Either add this to configuration UI via a HTTP trigger function or add a manual trigger function that can be triggered from the Azure Portal.
When a user opens the Configuration UI for the first time in a browsing session, Azure's automatic app service authentication authenticates the user and creates a session cookie. Some time after this (roughly a few hours), the token obtained from ITokenAcquisition.GetAccessTokenForUserAsync
in _Host.cshtml seems to expire and requests to the Web API functions start returning 401, despite the session still being valid for the UI. This persists until the user ends their browsing session (e.g. by closing the browser) or manually clears the session cookie.
The UI should automatically detect this and obtain a new token, possibly by clearing the session cookie to force re-authentication.
Add support for more stores and catalogue providers.
Need to set up automatic build and publish with GitHub Actions.
It should be possible to configure nested match rules with operators like AND
(all child rules must match) or OR
(any child rule can match).
Need to figure out how to edit them in the configuration UI. Possibly a dialog with the same table layout as the main Matching Configuration page?
Coles have recently merged their online store into the main https://www.coles.com.au/ site, with a new frontend and backend. The new backend no longer returns obfuscated/compressed data, so we should move away from Playwright and make requests to the backend directly (like we do for other stores/catalogues).
DownloadColesOnlineSpecialsPage
often times out, sometimes causing the orchestration to fail. One possible way to fix this would be to throttle the number of concurrent executions of the function using the durableTask/maxConcurrentActivityFunctions
setting described here; but this would also apply to other functions.
IDurableEntityClient.CleanEntityStorageAsync
can remove empty entities and release orphaned locks.
It should be possible to deploy an instance of Catalogue Scanner and its related resources through Azure Resource Manager.
In addition to scanning the current catalogue, it should also be possible to scan next week's catalogue and any additional catalogues; when these are available.
The notification email should include the start and end dates of the catalogue and indicate whether it's current or future.
With the change in 41bad70, the application fails at startup on Linux app service plan with this error:
System.Diagnostics.Process: An error occurred trying to start process '/home/site/wwwroot/bin/.playwright/node/linux/playwright.sh' with working directory '/'. Permission denied.
This could be due to the directory the script is in, or the file permissions on the script itself (e.g. execute permission not set).
The update to .NET 6 may remove the need for CatalogueScanner.WebScraping.API to be a separate Web API application, if Playwright works inside the Azure Functions host process (Durable Functions still aren't supported in the isolated process model).
This will also allow CatalogueScanner.DefaultHost, CatalogueScanner.ConfigurationUI and all the class libraries to target the same framework version, rather than a mix of .NET Standard/Core/5 like they do now.
The Scan States page of Configuration UI test site seemed to be working earlier today, but the production site wasn't. Now the test site doesn't seem to be working either.
It should be possible to add match rules to filter on item prices, and item prices should be included in the digest email.
This should be relatively easy to implement for Coles/Woolworths Online as their response data includes prices, but it may be more difficult for SaleFinder catalogues.
When a scanning orchestrator function fails with an exception, CatalogueScanState.ScanState
should be set to a new Failed
value.
The build ID in the Coles Online Data URLs has been updated. We need to automatically fetch the current build ID from the website instead of hardcoding it.
SaleFinder requests like https://embed.salefinder.com.au/catalogues/view/128/?format=json&locationId=-1&callback=
are now returning 403 Forbidden, causing the functions to fail. This could be due to the callback
query parameter, or due to the outdated user agent header being sent with the requests.
Currently, the Woolworths Online specials scanning will always use the default location; which is probably based on a geo IP lookup of the Functions app. Ideally there should be an option to configure this in the configuration UI, but it looks like Woolworths Online only supports this for logged-in users rather than using a simple cookie like other sites.
Woolworths Online requests often time out, causing the scan function to fail. We may be able to mitigate this and #56 by throttling the number of individual functions that can run concurrently, probably by splitting the download functions into "pages" of 25(?) and waiting for each page to complete before starting on the next one.
Coles Online only loads 48 products at a time on its specials page, we'll need to add some JavaScript code to page through the specials and load all of them rather than just the first page.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.