Comments (5)
I don't recall this from the top of my head, but I know there is a "hook" the other way around that when a page is flushed, it's automatically added to the crawler queue again.
But it sounds like it could be the issue as pages are already cached. I would have to think of a solution for this.
from crawler.
Hi there, thank you for taking your time to create your first issue. Please give us a bit of time to review it.
from crawler.
After some more research I found out, that the problem only exists, if I add multiple crawler configurations (Argument "conf") to a crawler:buildQueue task.
In my case I added three configurations by a comma separated list of its names (pages,news,press). One for the normal pages, one for news (tx_news) and one for press messages (also tx_news).
If I only enter the configuration for the normal pages, the indexing works as expected, also with disabled frontend indexing.
Having realised this I created extra tasks for news and press indexing, but it seems, that they are not getting indexed with disabled frontend indexing.
The crawler config for news looks like this:
name:
news
pidsonly:
880
configuration
&tx_news_pi1[controller]=News&tx_news_pi1[action]=detail&tx_news_pi1[news]=[_TABLE:tx_news_domain_model_news; _PID:879; _WHERE: hidden = 0]
pid 880 is the news detail page. pid 879 is the sysfolder with the news records.
In the corresponding crawler:buildQueue task, I entered the detail pid 880 into the Argument "page" field
I will dig a bit deeper now and hopefully find the cause of the problem ...
from crawler.
Thanks for your report. We will look into this. The response time, I currently longer than normally, I will get back to you though.
from crawler.
Hey, I had a little crawler configuration session today and had the same constellation with disableFrontendIndexing
enabled. This flag is really useful, as we have cached pagination pages, which should not be indexed when the user browses through them. The problem however is, that as soon as any page is cached, crawling them will not trigger re-indexing. I don't know if there is already a solution for this, but theoretically the page cache must be flushed before each crawl to the page. Correct me if I'm wrong.
So maybe the problem described here is also about having already cached pages?
from crawler.
Related Issues (20)
- PHP Warning: Undefined array key "set_id" and Undefined array key "qid" HOT 2
- Problem with hardcoded typo3conf/ext path inside bootstrap.php when using composer-installers >= version 4 HOT 3
- [DOC] Documentation missing about _cli_-user needing admin-rights HOT 2
- PHP warning: undefined array key uid and username HOT 7
- php 8 warnings in FrontendUserAuthenticator HOT 7
- [CI] Add PHP 8.3 to CI
- Hardning Tests
- CI: Get Acceptance Tests Working again
- CI: Get Functional Tests Green again
- [BUG] Ensure that var/lock is generated
- [CI] Set Random Order for tests
- Crawler does not include external files (pdf) HOT 3
- Bug and resulting Php Warnings with not translated mount points HOT 4
- Drop support for symfony < 6.4
- Drop support for Symfony ^6.4 components
- Update to PHPUnit 11.x
- [FEATURE] Flush Process List
- File missing autoload.php HOT 9
- Patch this "fix" seems missing in main (v12) branch.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crawler.