Comments (10)
Agree. I think bumping the autoSoftCommit
time up (or enabling it in some other way) makes sense. The only reason it would be disabled in my mind is if its responsibility was being handled by some other means. We should assign a proper value to that setting, and document what that time is, so that consumers of the module have realistic expectations of when they'll see their changes.
from silverstripe-fulltextsearch.
Just tracing back steps a bit, here's what the configuration docs say:
Publish a page in the CMS
[...] This tracks changes to the database, so any alterations will trigger a reindex. In order to minimise delays to those users, the index update is deferred until after the actual request returns to the user, through PHP's register_shutdown_function() functionality.
[...]
Queued jobs
If the Queued Jobs module is installed, updates are queued up instead of executed in the same request. Queued jobs are usually processed every minute. Large index updates will be batched into multiple queued jobs to ensure a job can run to completion within common constraints, such as memory and execution time limits.
Solr Reindex
[...] If you have the Queued Jobs module installed, then this task will create multiple reindex jobs that are processed asynchronously; unless you are in dev mode, in which case the index will be processed immediately (see processor.yml). Otherwise, it will run in one process. Often, if you are running it via the web, the request will time out. Usually this means the actually process is still running in the background, but it can be alarming to the user, so bear that in mind.
CWP docs say:
CWP's Solr server ignores all search index commit requests, and instead relies on auto-commits to update indexes. This preserves stability for all users of the shared service. This will manifest as index updates taking a minute or two to appear in the search results, while on local development environment they are immediate.
So following the docs, we should create jobs for both update and commit when the module is installed, by default. That's broken because a SearchUpdateProcessor instance has been replaced during the 3>4, effectively hardwiring it to SearchUpdateImmediateProcessor
instead of using Injector to optionally use SearchUpdateQueuedJobProcessor
.
Here's a post explaining autoSoftCommit.maxTime=-1. And one explaining the difference between soft and hard commits.
Constraints from my perspective:
- Enable "async" indexing on publish (single object), as well as batch indexing
- In both cases, indexing works by default on our platforms (CWP and SC) without further configuration (either through jobs, crontasks, or immediate reindex on shutdown). Note that we can't ensure this in other people's infrastructure (e.g. ensure that queuedjobs get run when the module is installed), so we still need good setup docs.
- Avoid increasing resource usage on reindex beyond current levels (or have a good idea on the impact)
- Avoid delaying reindexes longer than currently experienced by authors and users
- Minimise time until search results reflect reindexed content
- Avoid any solutions which would reduce the availability of the search solution (server/core restarts?)
- Avoid data loss (acknowledging that search indexes are not the source of truth, and missing data can be "restored" through a reindex)
- Minimise time to recover a Solr server after server crashes, or ops-level server/service restarts
My gut feel is to restore the intended solution here (run jobs for update and commit), which seems like it would be achieved through Naomi's PR. If we change the commit configuration, let's validate that against the constraints above - predominantly in the platforms where we have that level of visibility.
from silverstripe-fulltextsearch.
@adrexia @chillu there is a difference between a (hard)commit, softCommit and core reload. The former does not get the results updated, only the latter two. Platform had hard-commit configured, but that just flushes to disk. You need to reload the core (which is what Solr_Configure does, or soft-commit (not sure if there is an API for that?).
From platform performance perspective, soft commits are probably the best of both worlds - setting those to 15-60s doesn't have any visible impact, and can even be a net-positive thing if it helps limit hard-commits (which flush onto disk) and core restarts (which can be resource-intensive for big cores, or so I think).
I'm not sure if soft-commits can be triggered via API. Solrconfig.xml allows you to make those commits automatic (so you don't have to make an API call). Pretty much means the ticker starts at the point of index update, and triggers commit at timeout.
CWP currently has autoSoftCommit=60000 (60s) and autoCommit=300000 (300s).
from silverstripe-fulltextsearch.
I don't personally know of a reason the default for autoSoftCommit should stay at -1. There are jobs ( 1 & 2 ) that can be run, but they seemingly also do not successfully commit the changes.
from silverstripe-fulltextsearch.
@chillu unfortuantely, my PR alone does not fix this problem on Platform. We have it set up and running there - jobs are created and look to be successful - but we still have the issue of the indexes not being properly committed until a full reindex is run.
from silverstripe-fulltextsearch.
I guess one more thing to keep in mind is soft-commits might result in different index contents compared to core reload and also compared to full reindexes. I haven't heard anything specific around that though from CWP perspective, and that has been using autoSoftCommits for ~5yrs, so should be fine for casual use?
from silverstripe-fulltextsearch.
So could someone maybe at least suggest in the docs how to customise solrconfig.xml?
from silverstripe-fulltextsearch.
I'm keen to get the default changed, as its basically broken from the perspective of (I think) most of this module's users outside a cwp environment. I could document the how of customising solrconfig.xml
, but I'm still not entirely clear on the reasons why you might want to customise the autoSoftCommit
1 if we change the default (other then the more general desire to customise the extras configurations).
I think both the SearchUpdateImmediateProcessor
and the SearchUpdateQueuedJobProcessor
rely on autoSoftCommits
not being disabled. In the very least, changing the autoSoftCommit
value appears to be the way to get the queued jobs working properly. I'm unsure if the functionality around publish object->update index
has ever worked with Solr 4? It's the sort of thing that people might not notice straight away2.
@chillu, @unclecheese - what are your thoughts?
1. What are the effects on the server if its 1 minute, 5 minutes, or 30 seconds? Are there any? What are the reasons to disable?
2. Which is apparent from the fact the queued jobs functionality has been broken since the Silverstripe 4 upgrade.
from silverstripe-fulltextsearch.
We want less devs customising solrconfig.xml rather than more of them.
I'm unsure if the functionality around publish object->update index has ever worked with Solr 4? It's the sort of thing that people might not notice straight away
It does work as long as autoSoftCommit
is enabled, although with the delay configured there. I've installed fulltextsearch-localsolr on cwp/installer:2.5.x-dev
, with the latest silverstripe/fulltextsearch:3.x-dev
(incl. your fix). With the default config of autoSoftCommit.maxtime:-1
, so effectively disabled. Published a page, ensured the queue ran through, and the new content was available for searching in the index after 15000ms (the "hard commit" threshold). I've stepped my way through with breakpoints, and that's the case after only calling <add>
commands in Solr (without any explicit <commit>
). So the results were available for new search requests without ever calling commits afterwards, because it actually opened a new "searcher", auto-warmed it, and then put it in service for the next search request (see logs). That's mystifying to me, since autoCommit.openSearcher:false
. but I think it's somewhere around the behaviour of maxWarmingSearchers
.
openSearcher
is described as follows:
if false, the commit causes recent index changes to be flushed to stable storage, but does not cause a new searcher to be opened to make those changes visible.
I haven't gotten to the bottom of this, but it seems likely that Solr just tries to be helpful here and makes the new results available (see https://issues.apache.org/jira/browse/SOLR-5783 for some insights in how complex that decision making is). In conclusion, I can't reproduce the issue locally, but after reading about "soft commits" I also don't see the harm in setting autoSoftCommit
to the same configuration in the module that's worked for us for many years in CWP (and effectively enabling it in SC for anyone updating the module). Even with autoSoftCommit
, keeping a separate SearchUpdateCommitJobProcessor
job makes sense because that might trigger Solr to commit faster than either through it's own heuristics, or through the autoCommit
and autoSoftCommit
maxTime settings.
I've created a PR at #278, haven't succeeded in getting search results on an SC testing box yet though.
from silverstripe-fulltextsearch.
Linked PR has been tested and merged and released as 3.11.0, closing now
from silverstripe-fulltextsearch.
Related Issues (20)
- Filters are not escaped HOT 2
- Variant state not reset during processing. HOT 3
- Test and merge "allow update processor to be overridden by existing yml"
- Works fine on Dev, but on Prod many index files are missing and all owned by root (same database)
- Solr now at version 8 HOT 1
- Migrate to new Symfony cache classes HOT 5
- `SearchableService::isSearchable` incorrectly assumes return type of `canView` HOT 2
- Solr_Reindex build task doesn't work in dev mode in 3.9.0 HOT 4
- Allowing Symfony 3.2 or 4 can break reindexing HOT 10
- Stage param not being passed to Symfony
- 4.11.0-beta1 PHP 8.1 - Deprecation issue in solr-php-client HOT 1
- 4.11.0-beta2 PHP 8.1 - Changes in content are not reflected in search results HOT 7
- SearchIndex::fieldData is inefficient
- RFC: Search index update/new searcher on auto commit HOT 10
- 4.12 regression: Elemental block content not showing on search summary HOT 8
- `SearchVariant` replaced by `Injector` not consistent when used as part of `_documentid` HOT 1
- Deprecated code PHP 8.2 HOT 2
- New model field added to index, but doesn't appear in Solr HOT 1
- Add belongs_many_many support to search field scaffolding HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from silverstripe-fulltextsearch.