Giter Site home page Giter Site logo

Comments (10)

unclecheese avatar unclecheese commented on July 18, 2024 1

Agree. I think bumping the autoSoftCommit time up (or enabling it in some other way) makes sense. The only reason it would be disabled in my mind is if its responsibility was being handled by some other means. We should assign a proper value to that setting, and document what that time is, so that consumers of the module have realistic expectations of when they'll see their changes.

from silverstripe-fulltextsearch.

chillu avatar chillu commented on July 18, 2024 1

Just tracing back steps a bit, here's what the configuration docs say:

Publish a page in the CMS
[...] This tracks changes to the database, so any alterations will trigger a reindex. In order to minimise delays to those users, the index update is deferred until after the actual request returns to the user, through PHP's register_shutdown_function() functionality.
[...]
Queued jobs
If the Queued Jobs module is installed, updates are queued up instead of executed in the same request. Queued jobs are usually processed every minute. Large index updates will be batched into multiple queued jobs to ensure a job can run to completion within common constraints, such as memory and execution time limits.
Solr Reindex
[...] If you have the Queued Jobs module installed, then this task will create multiple reindex jobs that are processed asynchronously; unless you are in dev mode, in which case the index will be processed immediately (see processor.yml). Otherwise, it will run in one process. Often, if you are running it via the web, the request will time out. Usually this means the actually process is still running in the background, but it can be alarming to the user, so bear that in mind.

CWP docs say:

CWP's Solr server ignores all search index commit requests, and instead relies on auto-commits to update indexes. This preserves stability for all users of the shared service. This will manifest as index updates taking a minute or two to appear in the search results, while on local development environment they are immediate.

So following the docs, we should create jobs for both update and commit when the module is installed, by default. That's broken because a SearchUpdateProcessor instance has been replaced during the 3>4, effectively hardwiring it to SearchUpdateImmediateProcessor instead of using Injector to optionally use SearchUpdateQueuedJobProcessor.

Here's a post explaining autoSoftCommit.maxTime=-1. And one explaining the difference between soft and hard commits.

Constraints from my perspective:

  • Enable "async" indexing on publish (single object), as well as batch indexing
  • In both cases, indexing works by default on our platforms (CWP and SC) without further configuration (either through jobs, crontasks, or immediate reindex on shutdown). Note that we can't ensure this in other people's infrastructure (e.g. ensure that queuedjobs get run when the module is installed), so we still need good setup docs.
  • Avoid increasing resource usage on reindex beyond current levels (or have a good idea on the impact)
  • Avoid delaying reindexes longer than currently experienced by authors and users
  • Minimise time until search results reflect reindexed content
  • Avoid any solutions which would reduce the availability of the search solution (server/core restarts?)
  • Avoid data loss (acknowledging that search indexes are not the source of truth, and missing data can be "restored" through a reindex)
  • Minimise time to recover a Solr server after server crashes, or ops-level server/service restarts

My gut feel is to restore the intended solution here (run jobs for update and commit), which seems like it would be achieved through Naomi's PR. If we change the commit configuration, let's validate that against the constraints above - predominantly in the platforms where we have that level of visibility.

from silverstripe-fulltextsearch.

mateusz avatar mateusz commented on July 18, 2024 1

@adrexia @chillu there is a difference between a (hard)commit, softCommit and core reload. The former does not get the results updated, only the latter two. Platform had hard-commit configured, but that just flushes to disk. You need to reload the core (which is what Solr_Configure does, or soft-commit (not sure if there is an API for that?).

From platform performance perspective, soft commits are probably the best of both worlds - setting those to 15-60s doesn't have any visible impact, and can even be a net-positive thing if it helps limit hard-commits (which flush onto disk) and core restarts (which can be resource-intensive for big cores, or so I think).

I'm not sure if soft-commits can be triggered via API. Solrconfig.xml allows you to make those commits automatic (so you don't have to make an API call). Pretty much means the ticker starts at the point of index update, and triggers commit at timeout.

CWP currently has autoSoftCommit=60000 (60s) and autoCommit=300000 (300s).

from silverstripe-fulltextsearch.

adrexia avatar adrexia commented on July 18, 2024

I don't personally know of a reason the default for autoSoftCommit should stay at -1. There are jobs ( 1 & 2 ) that can be run, but they seemingly also do not successfully commit the changes.

from silverstripe-fulltextsearch.

adrexia avatar adrexia commented on July 18, 2024

@chillu unfortuantely, my PR alone does not fix this problem on Platform. We have it set up and running there - jobs are created and look to be successful - but we still have the issue of the indexes not being properly committed until a full reindex is run.

from silverstripe-fulltextsearch.

mateusz avatar mateusz commented on July 18, 2024

I guess one more thing to keep in mind is soft-commits might result in different index contents compared to core reload and also compared to full reindexes. I haven't heard anything specific around that though from CWP perspective, and that has been using autoSoftCommits for ~5yrs, so should be fine for casual use?

from silverstripe-fulltextsearch.

mateusz avatar mateusz commented on July 18, 2024

So could someone maybe at least suggest in the docs how to customise solrconfig.xml?

from silverstripe-fulltextsearch.

adrexia avatar adrexia commented on July 18, 2024

I'm keen to get the default changed, as its basically broken from the perspective of (I think) most of this module's users outside a cwp environment. I could document the how of customising solrconfig.xml, but I'm still not entirely clear on the reasons why you might want to customise the autoSoftCommit1 if we change the default (other then the more general desire to customise the extras configurations).

I think both the SearchUpdateImmediateProcessor and the SearchUpdateQueuedJobProcessor rely on autoSoftCommits not being disabled. In the very least, changing the autoSoftCommit value appears to be the way to get the queued jobs working properly. I'm unsure if the functionality around publish object->update index has ever worked with Solr 4? It's the sort of thing that people might not notice straight away2.

@chillu, @unclecheese - what are your thoughts?


1. What are the effects on the server if its 1 minute, 5 minutes, or 30 seconds? Are there any? What are the reasons to disable?
2. Which is apparent from the fact the queued jobs functionality has been broken since the Silverstripe 4 upgrade.

from silverstripe-fulltextsearch.

chillu avatar chillu commented on July 18, 2024

We want less devs customising solrconfig.xml rather than more of them.

I'm unsure if the functionality around publish object->update index has ever worked with Solr 4? It's the sort of thing that people might not notice straight away

It does work as long as autoSoftCommit is enabled, although with the delay configured there. I've installed fulltextsearch-localsolr on cwp/installer:2.5.x-dev, with the latest silverstripe/fulltextsearch:3.x-dev (incl. your fix). With the default config of autoSoftCommit.maxtime:-1, so effectively disabled. Published a page, ensured the queue ran through, and the new content was available for searching in the index after 15000ms (the "hard commit" threshold). I've stepped my way through with breakpoints, and that's the case after only calling <add> commands in Solr (without any explicit <commit>). So the results were available for new search requests without ever calling commits afterwards, because it actually opened a new "searcher", auto-warmed it, and then put it in service for the next search request (see logs). That's mystifying to me, since autoCommit.openSearcher:false. but I think it's somewhere around the behaviour of maxWarmingSearchers.

openSearcher is described as follows:

if false, the commit causes recent index changes to be flushed to stable storage, but does not cause a new searcher to be opened to make those changes visible.

I haven't gotten to the bottom of this, but it seems likely that Solr just tries to be helpful here and makes the new results available (see https://issues.apache.org/jira/browse/SOLR-5783 for some insights in how complex that decision making is). In conclusion, I can't reproduce the issue locally, but after reading about "soft commits" I also don't see the harm in setting autoSoftCommit to the same configuration in the module that's worked for us for many years in CWP (and effectively enabling it in SC for anyone updating the module). Even with autoSoftCommit, keeping a separate SearchUpdateCommitJobProcessor job makes sense because that might trigger Solr to commit faster than either through it's own heuristics, or through the autoCommit and autoSoftCommit maxTime settings.

I've created a PR at #278, haven't succeeded in getting search results on an SC testing box yet though.

from silverstripe-fulltextsearch.

emteknetnz avatar emteknetnz commented on July 18, 2024

Linked PR has been tested and merged and released as 3.11.0, closing now

from silverstripe-fulltextsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.