Giter Site home page Giter Site logo

flowpack / flowpack.elasticsearch.contentrepositoryqueueindexer Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 23.0 231 KB

Neos CMS ElasticSearch indexer based on the Flowpack JobQueue (to handle big indexing tags, +50'000 nodes)

License: MIT License

PHP 100.00%
beanstalkd elasticsearch flow-framework neoscms

flowpack.elasticsearch.contentrepositoryqueueindexer's People

Contributors

daniellienert avatar dfeyer avatar gjwnc avatar htuscher avatar johannessteu avatar kdambekalns avatar kitsunet avatar lhortmann avatar mireo91 avatar nikdro avatar peterbucher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

flowpack.elasticsearch.contentrepositoryqueueindexer's Issues

Indexing nodes with job:work (Doctrine) only applies first changes to a node: remaining are lost

Hi,

we just upgraded a site from Neos 4.x to Flow/Neos 7.x and were indexing nodes using the DoctrineQueue with:

./flow job:work Flowpack.ElasticSearch.ContentRepositoryQueueIndexer.Live --verbose

We see the indexing is being done when we edit nodes in the backend, and also indexed is triggered when we later publish the same node.

The problem: only the first change is stored in ES, all further changes to the node (i.e. the "publish") are not being reflected into the ES index. Just try it yourself, change the title of a node from "node1" to "node2", publish that, and then change it to "node3" etc and see what is really written in the ES.

Could you reproduce that, is it a bug? It worked before with the same setup.

It looks like there is some caching going on, and the problem only happens in the endless PHP runner (job:work without a --limit). Our workaround is thus currently:

while true; do ./flow job:work Flowpack.ElasticSearch.ContentRepositoryQueueIndexer.Live --limit 1 ; done

so that every indexing starts a fresh new PHP process.

Default batch size too high

As pointed out in passing in #15 the batch size of 500 breaks (initial) indexing for me. The job payload simply is so big that it causes "argument list too long" when the jobs are passed to a command run (due to executeIsolated being true, for good reasons) as a shell command.

For me it works with 150, but that could be different depending on the shell in use and other factors.

I'd suggest to use a rather "conservative" (low) default and suggest to try and raise it if needed.

Error during live indexing of changes saved to user workspace

When a change is saved to a user workspace, the live indexing queue is filled. When the created job is being worked on, an exception is raised:

Exception in line 403 of /var/www/cms/Packages/Libraries/doctrine/orm/lib/Doctrine/ORM/EntityManager.php: The identifier name is missing for a query of Neos\ContentRepository\Domain\Model\Workspace - See also: 20170518132608fbe4ff.txt

This is the trace:

Exception in line 403 of /var/www/cms/Packages/Libraries/doctrine/orm/lib/Doctrine/ORM/EntityManager.php: The identifier name is missing for a query of Neos\ContentRepository\Domain\Model\Workspace

38 Doctrine\ORM\ORMException::missingIdentifierField("Neos\ContentRepository\Domain\Model\Workspace", "name")
37 Doctrine\ORM\EntityManager::find("Neos\ContentRepository\Domain\Model\Workspace", array|1|)
36 Neos\Flow\Persistence\Doctrine\PersistenceManager_Original::getObjectByIdentifier(NULL, "Neos\ContentRepository\Domain\Model\Workspace")
35 Neos\Flow\Persistence\Repository::findByIdentifier(NULL)
34 call_user_func_array(array|2|, array|1|)
33 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("findByIdentifier", array|1|)
32 Neos\ContentRepository\Domain\Service\Context_Original::getWorkspace()
31 Neos\ContentRepository\Domain\Service\Context_Original::getNodeByIdentifier("fa08cdd4-1ee4-3b63-0916-ff08239c5d89")
30 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\{closure}(Neos\ContentRepository\Domain\Model\Node, Neos\ContentRepository\Domain\Service\Context)
29 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::indexNode(Neos\ContentRepository\Domain\Model\Node)
28 Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\IndexingJob_Original::Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\{closure}()
27 Closure::__invoke()
26 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::withBulkProcessing(Closure)
25 call_user_func_array(array|2|, array|1|)
24 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("withBulkProcessing", array|1|)
23 Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\IndexingJob_Original::execute(Flowpack\JobQueue\Redis\Queue\RedisQueue, Flowpack\JobQueue\Common\Queue\Message)
22 Flowpack\JobQueue\Common\Job\JobManager_Original::executeJobForMessage(Flowpack\JobQueue\Redis\Queue\RedisQueue, Flowpack\JobQueue\Common\Queue\Message)
21 call_user_func_array(array|2|, array|2|)
20 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("executeJobForMessage", array|2|)
19 Flowpack\JobQueue\Common\Command\JobCommandController_Original::executeCommand(Flowpack\JobQueue\Redis\Queue\RedisQueue, "TzozODoiRmxvd3BhY2tcSm9iUXVldWVcQ29tbW9uXFF1ZXVlXE…19IjtzOjE5OiIAKgBudW1iZXJPZlJlbGVhc2VzIjtpOjA7fQ==")
18 call_user_func_array(array|2|, array|2|)
17 Neos\Flow\Cli\CommandController_Original::callCommandMethod()
16 Neos\Flow\Cli\CommandController_Original::processRequest(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
15 Neos\Flow\Mvc\Dispatcher_Original::initiateDispatchLoop(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
14 Neos\Flow\Mvc\Dispatcher_Original::dispatch(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
13 Neos\Flow\Cli\CommandRequestHandler::Neos\Flow\Cli\{closure}()
12 Closure::__invoke()
11 Neos\Flow\Security\Context_Original::withoutAuthorizationChecks(Closure)
10 Neos\Flow\Security\Context::withoutAuthorizationChecks(Closure)
9 call_user_func_array(array|2|, array|1|)
8 Neos\Flow\Security\Context::Flow_Aop_Proxy_invokeJoinPoint(Neos\Flow\Aop\JoinPoint)
7 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
6 Neos\Flow\Session\Aspect\LazyLoadingAspect_Original::callMethodOnOriginalSessionObject(Neos\Flow\Aop\JoinPoint)
5 Neos\Flow\Aop\Advice\AroundAdvice::invoke(Neos\Flow\Aop\JoinPoint)
4 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
3 Neos\Flow\Security\Context::withoutAuthorizationChecks(Closure)
2 Neos\Flow\Cli\CommandRequestHandler::handleRequest()
1 Neos\Flow\Core\Bootstrap::run()

My guess: the target workspace can be null and is passed as is to the job in https://github.com/ttreeagency/Flowpack.ElasticSearch.ContentRepositoryQueueIndexer/blob/3.0/Classes/Flowpack/ElasticSearch/ContentRepositoryQueueIndexer/Indexer/NodeIndexer.php#L45 which later leads to the problem.

NodeTypeMappingBuilderInterface does not exist

This commits 464e2ab adds the NodeTypeMappingBuilderInterface to an upcoming 5.0 release for CR-Adaptor. But it is already released in 3.1.0 wich is compativle to CR-Adaptor 4.0 and the Interface does not exist there. So this breaks with

The object "Flowpack\ElasticSearch\ContentRepositoryAdaptor\Driver\NodeTypeMappingBuilderInterface" which was specified as a property in the object configuration of object "Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\Command\NodeIndexQueueCommandController" (automatically registered class) does not exist.

Settings should use "presets" to allow for easy adjustments

When using the presets option of Flowpack.JobQueue.Common to define queue settings, the className has to be given, because of the way precedence works (see getQueueSettings() in QueueManager.)

If the ContentRepositoryQueueIndexer would use presets directly, this would no longer be needed.

Make Queuename configurable

I think it would be nice if the queuename was configurable.
I am using this package with the Doctrine Backend and Doctrine did not like the default queuename Flowpack.ElasticSearch.ContentRepositoryQueueIndexer. The dots mixed up the SQL.

Update needed?

Hi!
I was trying to use this package, but there was an error.
It seems that TYPO3\Jobqueue was renamed to Flowpack\JobQueue?

I had to change some lines and after that it worked.

Multiple job worker seem not to scale

Hello,

we are using your supervisord config and getting a lot of errors like this:


Cannot delete job 88: NOT_FOUND

  Type: Pheanstalk\Exception\ServerException
  File: Packages/Libraries/pda/pheanstalk/src/Command/DeleteCommand.php
  Line: 44

Seems like the daemons (job:work) are working on the same job and not on 12 diffrent job. Is this true? Any way to solve this? Because it doesn't scale with multiple jobs like it should be.

NodeIndexer must be marked singleton

When using the CLI command to index a node:

./flow flowpack.elasticsearch.contentrepositoryadaptor:nodeindex:indexnode --workspace live --identifier da1b02a9-cafd-487c-c78f-5c714996df8c

everything is logged as successful, but no actual index update happens. The currentBulkRequest is always empty, even though it is filled before, as can be seen by the log messages.

This happens only with enableLiveAsyncIndexing set to false, in the default state (async indexing enabled) it works as expected.

REASON

The Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer is a singleton, but the Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\Indexer\NodeIndexer is not.

Inject custom NodeIndexer

We are using a custom NodeIndexer which extends from Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer which was kindly provided by @dfeyer.
This NodeIndexer uses an Objects.yaml definition like the Flowpack.ElasticSearch.ContentRepositoryQueueIndexer does.

If I understand the code correctly, there is currently no way to use the ContentRepositoryQueueIndexer if a custom NodeIndexer is provided, because the
AbstractIndexingJob

abstract class AbstractIndexingJob implements JobInterface
injects the ElasticSearch NodeIndexer via use Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer;, right?

So I would request a change to allow a custom NodeIndexer.
I would like to provide the code changes by myself, but I'm not yet very confident with neos/flow, such that I see a good/simple solution.

Maybe someone could provide basic ideas and what must be changed to support this use case or give me a hint how I need to modify our own project to inject the custom indexer if this feature might work already.

Error during live indexing while trying to index deleted nodes

When one ore more nodes are deleted and live indexing runs, an exception is raised:

Exception: Argument 1 passed to Neos\ContentRepository\Domain\Factory\NodeFactory_Original::createFromNodeData() must be an instance of Neos\ContentRepository\Domain\Model\NodeData, null given

I suspect, "$nodeData" (https://github.com/ttreeagency/Flowpack.ElasticSearch.ContentRepositoryQueueIndexer/blob/3.0/Classes/Flowpack/ElasticSearch/ContentRepositoryQueueIndexer/IndexingJob.php#L102) is null because they deleted node is no longer found in the repository.

Trace:

Exception: Argument 1 passed to Neos\ContentRepository\Domain\Factory\NodeFactory_Original::createFromNodeData() must be an instance of Neos\ContentRepository\Domain\Model\NodeData, null given
 
31 Neos\ContentRepository\Domain\Factory\NodeFactory_Original::createFromNodeData(NULL, Neos\Neos\Domain\Service\ContentContext)
30 call_user_func_array(array|2|, array|2|)
29 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("createFromNodeData", array|2|)
28 Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\IndexingJob_Original::Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\{closure}()
27 Closure::__invoke()
26 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::withBulkProcessing(Closure)
25 call_user_func_array(array|2|, array|1|)
24 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("withBulkProcessing", array|1|)
23 Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\IndexingJob_Original::execute(Flowpack\JobQueue\Redis\Queue\RedisQueue, Flowpack\JobQueue\Common\Queue\Message)
22 Flowpack\JobQueue\Common\Job\JobManager_Original::executeJobForMessage(Flowpack\JobQueue\Redis\Queue\RedisQueue, Flowpack\JobQueue\Common\Queue\Message)
21 call_user_func_array(array|2|, array|2|)
20 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("executeJobForMessage", array|2|)
19 Flowpack\JobQueue\Common\Command\JobCommandController_Original::executeCommand(Flowpack\JobQueue\Redis\Queue\RedisQueue, "TzozODoiRmxvd3BhY2tcSm9iUXVldWVcQ29tbW9uXFF1ZXVlXE…19IjtzOjE5OiIAKgBudW1iZXJPZlJlbGVhc2VzIjtpOjA7fQ==")
18 call_user_func_array(array|2|, array|2|)
17 Neos\Flow\Cli\CommandController_Original::callCommandMethod()
16 Neos\Flow\Cli\CommandController_Original::processRequest(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
15 Neos\Flow\Mvc\Dispatcher_Original::initiateDispatchLoop(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
14 Neos\Flow\Mvc\Dispatcher_Original::dispatch(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
13 Neos\Flow\Cli\CommandRequestHandler::Neos\Flow\Cli\{closure}()
12 Closure::__invoke()
11 Neos\Flow\Security\Context_Original::withoutAuthorizationChecks(Closure)
10 Neos\Flow\Security\Context::withoutAuthorizationChecks(Closure)
9 call_user_func_array(array|2|, array|1|)
8 Neos\Flow\Security\Context::Flow_Aop_Proxy_invokeJoinPoint(Neos\Flow\Aop\JoinPoint)
7 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
6 Neos\Flow\Session\Aspect\LazyLoadingAspect_Original::callMethodOnOriginalSessionObject(Neos\Flow\Aop\JoinPoint)
5 Neos\Flow\Aop\Advice\AroundAdvice::invoke(Neos\Flow\Aop\JoinPoint)
4 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
3 Neos\Flow\Security\Context::withoutAuthorizationChecks(Closure)
2 Neos\Flow\Cli\CommandRequestHandler::handleRequest()
1 Neos\Flow\Core\Bootstrap::run()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.