flowpack / flowpack.elasticsearch.contentrepositoryadaptor Goto Github PK

View Code? Open in Web Editor NEW

43.0 16.0 71.0 1.4 MB

Flowpack.ElasticSearch adapter to support the Neos Content Repository

License: GNU Lesser General Public License v3.0

PHP 100.00%

neoscms elasticsearch hacktoberfest

flowpack.elasticsearch.contentrepositoryadaptor's People

Stargazers

Watchers

Forkers

simplyadmire mgoldbeck mocdk networkteam monofone techdivision ttreeagency jrenggli core4 cron-eu skurfuerst chhunlong dfeyer damobert daniellienert hphoeksma robertlemke miegli elementareteilchen kitsunet psmb duske gerhard-boden mathiasjahn kuborgh-mspindelhirn sbruggmann paxuclus torsten85 futape akappler belliconag dlubitz mficzel mstruebing johannessteu markusguenther rolandschuetz darmstaedter shizonic format-d webandco gesagtgetan breadlesscode lorenzulrich regniets drillsergeant abteilung mariofischer paavo sitegeist sandstorm-ghbot marcrobertscamao typoniels queogmbh kaufmanndigital haase-fabian puchsteink erkenes amtee soee mindscreen pskwira camao florianklueckmann patricekaufmann

flowpack.elasticsearch.contentrepositoryadaptor's Issues

Remove "$targetWorkspaceName" from indexer -- as that's a very crude hack

... Instead of manipulating context node path using string replace, we should instead do the following:

if NodeInUserWorkspace is published + should be indexed, create an new context (with same dimension values and DIFFERENT workspace). Then, using this new context, we fetch the published node by node path. Then, we use that one for indexing.

... Discovered together with Christopher at T3CON.

Broken with ES 1.0.0

Due to the change to the way fields are returned in ES 1.0.0 (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_return_values.html) the handling around https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/master/Classes/Flowpack/ElasticSearch/ContentRepositoryAdaptor/Eel/ElasticSearchQueryBuilder.php#L362 is broken.

ElasticSearchQueryBuilder::getTotalItems() always return NULL

ElasticSearchQueryBuilder::getTotalItems() checks for a key total in the result but that never exist. Instead the total hits are returned in $this->results['hits']['total'] instead.

So shouldn't:

    public function getTotalItems()
    {
        if (array_key_exists('total', $this->result)) {
            return (int)$this->result['total'];
        }
    }

be changed to sth like:

    public function getTotalItems()
    {
        if (isset($this->result['hits']['total'])) {
            return (int)$this->result['hits']['total'];
        }
        return 0;
    }

Build a new master on top of Version 4.0

In order to replace the master and remain as much commit information as possible I would propose the following:

Adjust the 3.0 Version to fit to current master:

Change code structure to PSR-4 (#189)
Branch a 4.0
Adjust namespaces to Neos 3.0

Do the merge:

Do a merge into master (git merge -X theirs 4.0)
Diff the changes of the result to 4.0 and make it working.

The [filtered] query is deprecated

The [filtered] query is deprecated, please use a [bool] query instead with a [must] clause for the query part and a [filter] clause for the filter part.

Querying count() when no results returns always 2

Using Neos 3.0, Elasticsearch 2.3 and the most current versions of this ContentRepositoryAdaptor.
When using Flowpack.SearchPlugin and an extended PaginateController, i ran into an Issue.

On a query which is an instance of "Neos\ContentRepository\Search\Search\QueryBuilderInterface" i call the count() method like this, to get the count of all search results:

$this->resultCount = $this->query->count();

Unfortunately this gives strange Results:

If there are one result found with a search query, the count() method returns 1, which is correct.
If there are multiple results found with a search query, the count() method returns e.g. 4, which is correct.
Now,
if there are no results found with a search query, the count() method returns 2, which is incorrect.

I analyzed the problem and found a fix, but it is more a hack than a fix.

When doing a search to Elasticsearch, the query will be the same, as the query which is submitted, when i call the count() method.

The count() method is implemented as a request to "/_count" (https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/master/Classes/Eel/ElasticSearchQueryBuilder.php#L596) and extracts then the count (https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/master/Classes/Eel/ElasticSearchQueryBuilder.php#L600)

_I tried to change this request to "/search" and extract the count from this. And voila, the count is always correct.

I guess the problem lies in the query which the ContentRepositoryAdaptor generates.

See PR #226

Example Query which was used:

{
  "query": {
    "filtered": {
      "query": {
        "bool": {
          "must": [
            {
              "match_all": []
            },
            {
              "query_string": {
                "query": "\"fsdfsdfsd\""
              }
            }
          ]
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "bool": {
                "should": [
                  {
                    "term": {
                      "__parentPath": "/sites/cms"
                    }
                  },
                  {
                    "term": {
                      "__path": "/sites/cms"
                    }
                  }
                ]
              }
            },
            {
              "terms": {
                "__workspace": [
                  "live"
                ]
              }
            },
            {
              "term": {
                "__dimensionCombinationHash": "9dfdd8b029869edee199f9f9f920b723"
              }
            },
            {
              "term": {
                "__typeAndSupertypes": "Neos.Neos:Document"
              }
            }
          ],
          "should": [],
          "must_not": [
            {
              "term": {
                "_hidden": true
              }
            },
            {
              "range": {
                "_hiddenBeforeDateTime": {
                  "gt": "now"
                }
              }
            },
            {
              "range": {
                "_hiddenAfterDateTime": {
                  "lt": "now"
                }
              }
            },
            {
              "term": {
                "hiddenInSearchResults": true
              }
            }
          ]
        }
      }
    }
  }
}

Type error in in "Objects.yaml"-configuration

On running the "flow configuration:validate"-command i receive an error, for the Flowpack\ElasticSearch\ContentRepositoryAdaptor\Command\NodeIndexCommandController-configuration in the Objects.yaml, stating that it expected a string but got an array:

- Objects.Flowpack.ElasticSearch.ContentRepositoryAdaptor.Flowpack\ElasticSearch\ContentRepositoryAdaptor\LoggerInterface.arguments.__index_3.value -> expected: type=string found: type=array

and in deed the code looks like this:

Flowpack\ElasticSearch\ContentRepositoryAdaptor\Command\NodeIndexCommandController:
  properties:
    logger:
      object:
        factoryObjectName: TYPO3\Flow\Log\LoggerFactory
        factoryMethodName: create
        arguments:
          3:
            value:
              fileBackend: 'TYPO3\Flow\Log\Backend\FileBackend'
              ansiConsoleBackend: 'TYPO3\Flow\Log\Backend\AnsiConsoleBackend'

I don't know whether this is actually correct and the "flow configuration:validate"-command is wrong.

Bug: Highlighting doesn't add em-tags

Hi,
I had some trouble with getting the search word highlighted in the result.
The problem with the current version is, that elasticsearch doesn't think that the fields definition in the highlight section of the search query doesn't match with the searched fields. Therefore the highlighting doesn't work and changes via .highlight() eel helper have no effects.

My suggested solution would be to add the missing parameters fields => ['__fulltext*'] in the FilteredQuery.php at the fulltext function.

After I added this, the search word got highlighted as expected.

A quick workaround is to tweak the fusion searchQuery in your own package like this:

prototype(Flowpack.SearchPlugin:Search) < prototype(Neos.Neos:Content) {

    queryStringFields = Neos.Fusion:RawArray {
        fields = Neos.Fusion:RawArray {
            0 = "__fulltext*"
        }
    }

    searchQuery = ${this.searchTerm ? Search.query(site).fulltext(this.searchTerm).request('query.filtered.query.bool.must.1.query_string', this.queryStringFields).nodeType('Neos.Neos:Document') : null}    
}

I'll add an PR.

Travis CI doesn't test pull request

See the Neos or Flow development collections for a workaround to test pull requests and post merges.

Travis failing on install of java8-update package

$ travis_apt_get_update
0.48s$ sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install oracle-java8-set-default
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package oracle-java8-set-default
apt-get.diagnostics
apt-get install failed
$ cat ${TRAVIS_HOME}/apt-get-update.log
...
Fetched 52.0 MB in 4s (11.4 MB/s)
Reading package lists...
The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install oracle-java8-set-default" failed and exited with 100 during .

I found this: https://stackoverflow.com/questions/25289482/installing-jdk8-on-ubuntu-unable-to-locate-package-update-doesnt-fix
But not sure (yet) how to apply this to travis.

Related to #302

Analyzer for `_all` not working

I'm trying to create an index on Elastic 5.6.13 and I'd like to configure the analyzer for fulltext search to be german.

According to the readme, I did this:

'Neos.Neos:Node':
  search:
    elasticSearchMapping:
      _all:
        analyzer: german

But Elasticsearch is throwing an error and can't create the mapping:

Elasticsearch request failed.
[PUT
http://localhost:9200/neoscr-stage-1547824666/Neos-Neos:Shortcut/_mapping]:
Array
(
    [root_cause] => Array
        (
            [0] => Array
                (
                    [type] => illegal_argument_exception
                    [reason] => Mapper for [_all] conflicts with existing mapping in other types:
                        [mapper [_all] has different [analyzer], mapper [_all] is used by
                        multiple types. Set update_all_types to true to update [search_analyzer]
                        across all types., mapper [_all] is used by multiple types. Set
                        update_all_types to true to update [search_quote_analyzer] across all
                        types.]
                )

        )

    [type] => illegal_argument_exception
    [reason] => Mapper for [_all] conflicts with existing mapping in other types:
                    [mapper [_all] has different [analyzer], mapper [_all] is used by
                    multiple types. Set update_all_types to true to update [search_analyzer]
                    across all types., mapper [_all] is used by multiple types. Set
                    update_all_types to true to update [search_quote_analyzer] across all
                    types.]
)
 
Response body:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Mapper for [_all] conflicts with existing mapping in other types:\n[mapper [_all] has different [analyzer], mapper [_all] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [_all] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]"}],"type":"illegal_argument_exception","reason":"Mapper for 'Neos.Neos:Node': [_all] conflicts with existing mapping in other types:\n[mapper [_all] has different [analyzer], mapper [_all] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [_all] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]"},"status":400} 
Request data:
{"Neos-Neos:Shortcut":{"dynamic_templates":[{"dimensions":{"path_match":"__dimensionCombinations.*","match_mapping_type":"string","mapping":{"type":"text"}}}],"properties":{"__dimensionCombinationHash":{"type":"keyword"},"_creationDateTime":{"type":"date","format":"date_time_no_millis"},"_lastModificationDateTime":{"type":"date","format":"date_time_no_millis"},"_lastPublicationDateTime":{"type":"date","format":"date_time_no_millis"},"_path":{"type":"keyword","index":true},"_name":{"type":"keyword","index":true},"_nodeType":{"type":"keyword","index":true},"__identifier":{"type":"keyword","index":true},"__workspace":{"type":"keyword","index":true},"__path":{"type":"keyword","index":true},"__parentPath":{"type":"keyword","index":true},"__sortIndex":{"type":"integer"},"__typeAndSupertypes":{"type":"keyword","index":true},"_hidden":{"type":"boolean"},"_hiddenBeforeDateTime":{"type":"date","format":"date_time_no_millis"},"_hiddenAfterDateTime":{"type":"date","format":"date_time_no_millis"},"title":{"type":"keyword","index":true},"uriPathSegment":{"type":"keyword","index":true},"_hiddenInIndex":{"type":"boolean"},"__fulltextParts":{"type":"object","enabled":false},"__fulltext":{"type":"object","properties":{"h1":{"type":"keyword","boost":20,"copy_to":"_all","index":true},"h2":{"type":"keyword","boost":12,"copy_to":"_all","index":true},"h3":{"type":"keyword","boost":10,"copy_to":"_all","index":true},"h4":{"type":"keyword","boost":5,"copy_to":"_all","index":true},"h5":{"type":"keyword","boost":3,"copy_to":"_all","index":true},"h6":{"type":"keyword","boost":2,"copy_to":"_all","index":true},"text":{"type":"keyword","boost":1,"copy_to":"_all","index":true}}},"targetMode":{"type":"keyword","index":true},"target":{"type":"keyword","index":true}},"_all":{"analyzer":"german"}}}

Remove support for Elasticsearch Version 1.x and 2.x

The driver concept of this package makes it possible to support different versions of elasticsearch and with the current version 4.1, it supports versions 1.x, 2.x and 5.x, which makes it usable also on older environments.

Although, ES 1.x is EOL since January 2017 and ES 2.x since Februar 2018. (See https://www.elastic.co/support/eol) and using unsupported software versions should be discouraged.

With the next major, I would like to remove the support for ES 1.x and 2.x to make it easier to integrate new features without backporting it to older version. Also testing gets lot easier.

4.1.1 broken (fatal error)

The upmerge from 3.0 to 4.0 caused a fatal error

PHP Fatal error:  Declaration of Flowpack\ElasticSearch\ContentRepositoryAdaptor\Driver\Version1\Query\FilteredQuery_Original::fulltext($searchWord) must be compatible with Flowpack\ElasticSearch\ContentRepositoryAdaptor\Driver\QueryInterface::fulltext(string $searchWord, array $options = Array) in Data/Temporary/Production/SubContextAws/Cache/Code/Flow_Object_Classes/Flowpack_ElasticSearch_ContentRepositoryAdaptor_Driver_Version1_Query_FilteredQuery.php on line 20

Most likely caused by #290, however not visible as it was a clean merge.

Before https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/4.1.0/Classes/Driver/Version1/Query/FilteredQuery.php#L56
After https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/4.1.1/Classes/Driver/Version1/Query/FilteredQuery.php#L56

`nodeindex:build` creates internal workspaces if they don't exist yet

A new internal workspace will be created, if the workspace flag is set to a none existing workspace nodeindex:build --workspace=this-workspace-does-not-exist

(Neos 2.3 / ContentRepositoryAdaptor 3.0.x-dev)

Queries always use "live" workspace, leading to unexpected results

When using queries in Fusion to fetch nodes, a condition on the workspace is added, always restricting results to the live workspace.

This leads to unexpected results, since changes from a workspace that affect those queries will not be observed.

The queries should usually be working on the workspace that is currently shown (personal workspace, preview workspace, …).

ElasticSearch timezone configuration (since version 1.4)

Since version 1.4 there's a configuration for setting the time_zone to be used in a query (See the docs, the commit, and the original issue). (Note that this is apparently still missing for aggregations.)

This is important for all queries which use relative dates and/or "magic" keywords ("now", etc).

One thing that's not clear to me, is how ElasticSearch figures out what "now" means. In either case, it's a lot safer if we make sure it's using the currently configured PHP Timezone, since that's what the node dates also use.

It would probably be ideal to add this option to the default ElasticSearchQueryBuilder query.

Regression: indexing fails if no "language" dimension is present

#108 introduced a regression by hardcoding the "language" dimension into the NodeIndexer. There are, however, sites which don't have a "language" dimension or which might have dimensions with other names fulfilling the same role like a language.

We need to revert that behaviour or at least make it compatible with other dimension setups.
@dfeyer

Improve documentation on when custom onfiguration of indexing is necessary

Currently to enable sorting by date (or by any other property as well?), one must configure the relevant property with this config in NodeTypes.yaml:

      search:
        elasticSearchMapping:
          type: date
          include_in_all: false
          format: 'date_time_no_millis'
        indexing: '${(value ? Date.format(value, "Y-m-d\TH:i:sP") : null)}'

But the docs say: Normally, this does not need to be touched, as this package supports all Neos data types natively.

Can somebody who's into the subject clarify, under what circumstances must one configure the indexing config?

Switching base workspace must update index

Given that:

a shared workspace "relaunch" exists
user "john" has been working based on the "live" workspace
Elasticsearch is used to query for certain things in Fusion

When I:

log in as "john"
and switch the base workspace of my personal workspace to "relaunch"

Then:

Es-based Fusion queries do not know anything about the "relaunch" content

This is because the workspace for "john" has been indexed already. Nor does that switch trigger any reindexing, since no publishing is involved. Thus it is not being reindexed after the base workspace switch.

illegal_argument_exception: mapper [heading] cannot be changed from type [int] to [string]

Situation

Two different Packages with different NodeTypes which both are using a Property named "heading", but one is type of int and one is type of string.

Action

./flow nodeindex:build

Result

Array
(
    [root_cause] => Array
        (
            [0] => Array
                (
                    [type] => illegal_argument_exception
                    [reason] => mapper [heading] cannot be changed from type [int] to [string]
                )

        )

    [type] => illegal_argument_exception
    [reason] => mapper [heading] cannot be changed from type [int] to [string]
)

Create one index per site

Currently we use only one index for all sites, that can make some scoring feature of Elastic to not work as expected, by ex. word frequency will take in account other site, ...

Removing a node from the index does not remove the variants in all case

Given I have a node in dimension A, on a website with 2 dimanesions A and B and my node does not exist in dimension B

During indexing we create two document in the index one for dimension A and one for dimension B.

If I remove the node in dimension A, the document in this dimension is removed correctly but the one in the dimension B. It's not a big issue, because the document of the Elastic query are filtered and only existing document in the CR are returned.

But it's not clean ;) and in some case it can have an impact on the query algorythm, because Elastic use data that are not in sync with the CR.

Affected version: all

Possible solution

It's a bit complexe, in my case it's easy, we can remove all variants (because node does not exist in dimension B).

But it's a valid case, to have the node in both A and B dimensions, but only removing the one in dimension A (in this case the current behaviour is correct.

So we have two scenario:

Node existe in all dimensions and remove works as expected
Node exists only in a subset off all dimensions in this case the content of the index is out of sync as soon as we delete the node in one or some dimensions (but not all)

How to detect those two scenario ?

Fulltext indexing for nodes "hidden through parent" is not correct

Given:

a node that is contained in a container, e.g. a multi-column element, containing text "Foobar"
that container is marked as hidden
the fulltext index is built

Problem:

If the document that contains this node is found using the FLowpack.SearchPlugin, the content excerpt contains the "Foobar" string, even though it's not visible in the rendered page

Conclusion:

The indexer checks each node for it's hidden state, but since the node containing "Foobar" is not hidden by itself, but only invisible because it's container is hidden, it is indexed and the extracted fulltext parts are added to the fulltext root (document node).

Unicoded searchstring for cyrillic letter

If i search cyrillic words the result page shows me a unicode string. It's displayed in Flowpack.SearchPlugin:Search -> Search.html

I don't know if its allready delivered by the ContentRepositoryAdapter so i give you an issue here ...

You can see my example here:

http://www.neo-angin.ua/search.html?search=%D0%B7%D0%BD%D0%B0%D0%B9%D0%B4%D0%B5%D0%BD%D0%BE

I made a workaround in JS to decode the original letters, so if you disable JavaScript you can see the unicode search string.

Multiple Datasets in Index

For one Node there are several datasets in the Elastic Index. So there are to many Nodes indexed which has at least influences for the result number, when all results are shown,
For searching with a dedicated search term the result number is the correct one. (SEARCHING DIRECTLY IN THE INDEX))

This behavior probably depends on the first indexing of the data. For every workspace, all nodes are indexed. If there are no nodes in the user-jondoe workspace the nodes are indexed for workspace live.

during indexing, we use the NodeData's workspace name from time to time instead of the context

For instance, see the following code:

protected function updateFulltext(Node $node, array $fulltextIndexOfNode, $targetWorkspaceName = NULL) {
        if ((($targetWorkspaceName !== NULL && $targetWorkspaceName !== 'live') || $node->getWorkspace()->getName() !== 'live') || count($fulltextIndexOfNode) === 0) {
            return;
        }

Here, we use $node->getWorkspace(), returning the workspace of the UNDERLYING node data. which I think is wrong -- instead, $node->getContext()->getWorkspace() should be used.

Same in the following code:

'__workspace':
      search:
        elasticSearchMapping:
          type: string
          index: not_analyzed
          include_in_all: false
        indexing: '${node.workspace.name}'

I am 95% confident this must be node.context.workspace.name -- but I wonder why nobody noticed that before (weird).

Greets, Sebastian

count() is not working

The _count call doesn't work.

It seems to trip over the 'fields' => array('__path') part.
I couldn't find a way to fix it, so I solved it in my own QueryBuilder doing:

public function count() {
    $response = $this->elasticSearchClient->getIndex()->request('GET', '/_search', array(), json_encode($this->request));
    $treatedContent = $response->getTreatedContent();
    return (int)$treatedContent['hits']['total'];
}

Not so nice. But better suggestions welcome ;-)

Error while publishing Nodes after NodeType-Change

When I change the NodeType of a Document-Node and try to publish afterwards, I get an error:

Exception #1338977435: Elasticsearch request failed.
[POST http://127.0.0.1:9200/typo3cr/_bulk]: Array
(
    [root_cause] => Array
        (
            [0] => Array
                (
                    [type] => parse_exception
                    [reason] => Failed to derive xcontent
                )

      )

   [type] => parse_exception
   [reason] => Failed to derive xcontent
)
; Response body: {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"parse_exception","reason":"Failed to derive xcontent"},"status":400}

Request data: 
(Error ends here, Request isn't shown)

The Stracktrace:

34 Flowpack\ElasticSearch\Transfer\Response_Original::__construct(TYPO3\Flow\Http\Response, TYPO3\Flow\Http\Request)
33 call_user_func_array("parent::__construct", array|2|)
32 Flowpack\ElasticSearch\Transfer\Response::__construct(TYPO3\Flow\Http\Response, TYPO3\Flow\Http\Request)
31 Flowpack\ElasticSearch\Transfer\RequestService_Original::request("POST", Flowpack\ElasticSearch\ContentRepositoryAdaptor\ElasticSearchClient, "/typo3cr/_bulk", array|0|, "")
30 Flowpack\ElasticSearch\Domain\Model\Client_Original::request("POST", "/typo3cr/_bulk", array|0|, "")
29 Flowpack\ElasticSearch\Domain\Model\Index_Original::request("POST", "/_bulk", array|0|, "")
28 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::removeDuplicateDocuments("/sites/magazine/node-577a218fc9713/node-57d7b6ab93765@live", "6c675d686e3229af21556a9ae83570f3445bc00e", TYPO3\TYPO3CR\Domain\Model\Node)
27 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\{closure}(TYPO3\TYPO3CR\Domain\Model\Node, "live")
26 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::indexNode(TYPO3\TYPO3CR\Domain\Model\Node, "live")
25 call_user_func_array(array|2|, array|2|)
24 TYPO3\Flow\Object\DependencyInjection\DependencyProxy::__call("indexNode", array|2|)
23 TYPO3\Flow\Object\DependencyInjection\DependencyProxy::indexNode(TYPO3\TYPO3CR\Domain\Model\Node, "live")
22 TYPO3\TYPO3CR\Search\Indexer\NodeIndexingManager_Original::TYPO3\TYPO3CR\Search\Indexer\{closure}()
21 TYPO3\TYPO3CR\Search\Indexer\NodeIndexingManager_Original::flushQueues("TYPO3\Flow\Persistence\Doctrine\PersistenceManager::allObjectsPersisted", "TYPO3\Flow\Persistence\Doctrine\PersistenceManager::allObjectsPersisted", "TYPO3\Flow\Persistence\Doctrine\PersistenceManager::allObjectsPersisted", "TYPO3\Flow\Persistence\Doctrine\PersistenceManager::allObjectsPersisted")
20 call_user_func_array(array|2|, array|4|)
19 TYPO3\Flow\SignalSlot\Dispatcher::dispatch("TYPO3\Flow\Persistence\Doctrine\PersistenceManager", "allObjectsPersisted", array|0|)
18 TYPO3\Flow\SignalSlot\SignalAspect_Original::forwardSignalToDispatcher(TYPO3\Flow\Aop\JoinPoint)
17 TYPO3\Flow\Aop\Advice\AbstractAdvice::invoke(TYPO3\Flow\Aop\JoinPoint)
16 TYPO3\Flow\Persistence\Doctrine\PersistenceManager::emitAllObjectsPersisted()
15 TYPO3\Flow\Persistence\Doctrine\PersistenceManager_Original::persistAll()
14 TYPO3\Flow\Package::TYPO3\Flow\{closure}(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response, TYPO3\Neos\Service\Controller\WorkspaceController, "TYPO3\Flow\Mvc\Dispatcher::afterControllerInvocation")
13 Closure::__invoke(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response, TYPO3\Neos\Service\Controller\WorkspaceController, "TYPO3\Flow\Mvc\Dispatcher::afterControllerInvocation")
12 call_user_func_array(array|2|, array|4|)
11 TYPO3\Flow\SignalSlot\Dispatcher::dispatch("TYPO3\Flow\Mvc\Dispatcher", "afterControllerInvocation", array|3|)
10 TYPO3\Flow\SignalSlot\SignalAspect_Original::forwardSignalToDispatcher(TYPO3\Flow\Aop\JoinPoint)
9 TYPO3\Flow\Aop\Advice\AbstractAdvice::invoke(TYPO3\Flow\Aop\JoinPoint)
8 TYPO3\Flow\Mvc\Dispatcher::emitAfterControllerInvocation(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response, TYPO3\Neos\Service\Controller\WorkspaceController)
7 TYPO3\Flow\Mvc\Dispatcher_Original::initiateDispatchLoop(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response)
6 TYPO3\Flow\Mvc\Dispatcher_Original::dispatch(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response)
5 TYPO3\Flow\Mvc\DispatchComponent_Original::handle(TYPO3\Flow\Http\Component\ComponentContext)
4 TYPO3\Flow\Http\Component\ComponentChain_Original::handle(TYPO3\Flow\Http\Component\ComponentContext)
3 TYPO3\Flow\Http\Component\ComponentChain_Original::handle(TYPO3\Flow\Http\Component\ComponentContext)
2 TYPO3\Flow\Http\RequestHandler::handleRequest()
1 TYPO3\Flow\Core\Bootstrap::run()

HTTP REQUEST:
PUT /neos/service/workspaces-rpc/publish-nodes HTTP/1.1

Publishing without changing NodeTypes doesn't make problems...

ContentRepositoryAdaptor: dev-master
Neos: 2.3.7
ES: Official Docker-Container (Tag: 2)

Add support for async indexer

Currently the ES index is build during the request, this can be a problem for bigger setup, or complex index configuration. We also need per ex. in Neos to create a full index when we create a new user or workspace, this can take some times so we need to be able to do that async

PSR-2 Check

Merge request should be checked for correct psr-2 code style

Change NodeType results in error

I just installed this package and set up ElasticSearch for a simple site package, I'm just creating.

The bug: Changing a node's type immediately results in an error.
This appears for changing document aswell as content nodes.

Elasticsearch request failed.
[GET http://search:9200//_search/scroll?scroll=1m]: Array ( [root_cause] => Array ( [0] => Array ( [type] => illegal_argument_exception [reason] => request [//_search/scroll] contains unrecognized parameter: [scroll] ) ) [type] => illegal_argument_exception [reason] => request [//_search/scroll] contains unrecognized parameter: [scroll] ) ; Response body: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [//_search/scroll] contains unrecognized parameter: [scroll]"}],"type":"illegal_argument_exception","reason":"request [//_search/scroll] contains unrecognized parameter: [scroll]"},"status":400} Request data: DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAA6AFkxiQjJSS2gxVDZlbUIyUV9YWklPeEEAAAAAAAAOghZMYkIyUktoMVQ2ZW1CMlFfWFpJT3hBAAAAAAAADn8WTGJCMlJLaDFUNmVtQjJRX1haSU94QQAAAAAAAA6BFkxiQjJSS2gxVDZlbUIyUV9YWklPeEEAAAAAAAAOgxZMYkIyUktoMVQ2ZW1CMlFfWFpJT3hB

Edit: I'm using the latest versions of all related packages:
Neos/Neos: 4.3.0
Flowpack/ElasticSearch: 2.0.4
Flowpack/ElasticSearch.ContentRepositoryAdapter: 5.0.1
ElasticSearch is running in a docker container using the official Elasticsearch image, version 5.

Publishing a lot of nodes "skips" index updates

Given that:

a workspace with a lot of changes exist (more than the default threshold of 100 to flush the indexing queues, indexingBatchSize)

When I:

publish all those changes in the workspace module (publishing from anywhere should have the same effect)

Then:

the indexer will log a lot of Could not index node with identifier …, not found in workspace live

This seems to be caused by the indexer looking for the node to index in the target workspace. But if the indexing threshold is reached before the publishing has actually been "persisted to disk", the nodes for that run cannot yet be found in the target workspace.

If this has happened, a nodeindex:build is needed to update the index to the expected state.

filename elasticsearch.yml wrong in documentation

Following the documentation in README.md I always got the following error message

ElasticsearchIllegalArgumentException[failed to execute script]; nested: ScriptException[dynamic scripting for [groovy] disabled];

According to https://github.com/elastic/elasticsearch/blob/master/config/elasticsearch.yml the filename should be called elasticsearch.yml instead of elasticsearch.yaml

Correct support for multi dimensions

We need to update the current implementation to update all node variants in the Indexer and not only the current like it's done now.

Check how it's done in the SimpleSearch adaptor:
https://github.com/Flowpack/Flowpack.SimpleSearch.ContentRepositoryAdaptor/blob/master/Classes/Flowpack/SimpleSearch/ContentRepositoryAdaptor/Indexer/NodeIndexer.php

DocumentDriver::deleteDuplicateDocumentNotMatchingType() must return array, none returned

A regression through #295 - the method never had a return value, but the docblock lied to me…

Stabilize `nodeindex:build` command

The current nodeindex:build implementation is not capable of indexing an existing site with a lot of nodes because it consumes too much memory for any system to handle.
There is a --limit flag but it doesn't split the tasks in junks but instead just stops after x nodes. What's the usecase for this?
Besides it would be useful to be able to only index certain dimension combinations and/or node types.

But most importantly it should be possible to be able to process single nodes/batches in separate processes (so this might be related to #121).

For a current project I hooked it our JobQueue packages to intercept the NodeIndexingManage::indexNode() call and create a job that is handled in (multiple) separate workers.
This most probably won't be faster because each worker has to boot up the framework, fetch the node again and index it.. But it is more stable and scalable and it should be possible to process a batch of nodes at once per worker to find the sweet spot between memory consumption and performance..

Make Shards and Replicas configurable via Index Template

Deprecation message while indexing

When I build up an index I get this message:
Deprecated field [inline] used, expected [source] instead

Incompatible with ES 1.2 with default configuration

Currently the indexing is broken for ES versions 1.2+ due to a security issue with dynamic scripting, leading them to disable dynamic scripting by default. Although the security issue only applied if the service is public. The security issue has not been fixed, although planned, since it requires sandboxing. So to make it work ES has to be configured to enable dynamic scripting. To avoid this the scripting can be ported to Groovy instead of MVEL since Groovy is sandboxed and therefore dynamic scripting is enabled by default for Groovy. Additionally the plan for ES is to use Groovy as the default scripting language instead of MVEL from 1.4, requiring a custom MVEL plugin to be installed. So actually porting to Groovy is the only way to support the default ES installation in the future as well.

There are currently only two occurrences of MVEL scripting located here in
Classes/Flowpack/ElasticSearch/ContentRepositoryAdaptor/Indexer/NodeIndexer.php on line 255 and again on line 323.

Enabling dynamic scripting:
Set "script.disable_dynamic: false" in elasticsearch.yml

Unable to prepare statement: 5, database is locked

--- ups wrong repository :/ sorry

Slash breaks search query

Searching for a Slash (e.G. "11/2011" or "a/b" ) breaks the ES query.
Example response:

                    [reason] => Array
                        (
                            [type] => query_parsing_exception
                            [reason] => Failed to parse query [11/2010]
                            [index] => production-1485418788
                            [line] => 1
                            [col] => 100
                            [caused_by] => Array
                                (
                                    [type] => parse_exception
                                    [reason] => Cannot parse '11/2010': Lexical error at line 1, column 8.  Encountered: <EOF> after : "/2010"
                                    [caused_by] => Array
                                        (
                                            [type] => token_mgr_error
                                            [reason] => Lexical error at line 1, column 8.  Encountered: <EOF> after : "/2010"
                                        )

                                )

                        )

ES-Version: 2.4.x
CRA-Version: 3.0.0

@daniellienert can confirm this issue

The [filtered] query is deprecated, please use a [bool] query instead with a [must] clause for the query part and a [filter] clause for the filter part.

Look like we use a deprecated query, and we expose some internals, like in QueryBuilder::appendAtPath we need to plan how to move to more decoupled query

nodeindex:cleanup is not working

With actual master of the Adapter and elastic 5.6.8 nodeindex:cleanup is not working.

./flow nodeindex:cleanup
Nothing removed. ElasticSearch responded with status 400, saying "illegal_argument_exception: No endpoint or operation is available at [_status]"

Did we need to remove the support for Types and move to one index per concrete NodeType

This issue is to keep track of informations regarding our usage of Types, wrong based on the official documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html#_type_takeaways

But we need to take care of this https://stackoverflow.com/questions/14465668/elastic-search-multiple-indexes-vs-one-index-and-types-for-different-data-sets (sharding issue with a lots of indices)

Using limit() on collection breaks iterator.isLast

I have the following setup to limit a collection and return it as json. But iterator.isLast is not true in the last cycle, so the json format is wrong because the last ',' is not removed. Did I miss something in the config or my fusion setup?

prototype(Vendor.Site:NewsListJson) < prototype(TYPO3.TypoScript:Collection) {
	collection = ${Search.query(site).nodeType('Vendor.Site:News').sortDesc('date').limit(5).execute()}
	itemName = 'node'
	iterationName = 'iterator'
	itemRenderer = Vendor.Site:NewsListItemJson {

	}

	@process.1 = ${'[' + value + ']'}
	@cache {
		mode = 'cached'
		entryTags {
			1 = ${'NodeType_Vendor.Site:News'}
		}
	}
}

prototype(Vendor.Site:NewsListItemJson) < prototype(Vendor.Site:JsonObjectRenderer) {
	id = ${q(node).property('uriPathSegment')}
	title = ${String.stripTags(q(node).property('title'))}
	text = ${String.crop(q(node).property('text'), 150, '...')}
	image = TYPO3.Neos:ImageUri {
		asset = ${q(node).property('image')}
		maximumWidth = 80
		maximumHeight = 60
		allowCropping = TRUE
		allowUpScaling = TRUE
	}
	entryDate = ${Date.format(q(node).property('date'), 'F jS, Y')}
	category = ${q(node).property('category')}
}

prototype(Vendor.Site:JsonObjectRenderer) < prototype(TYPO3.TypoScript:RawArray) {
	@process.1 = ${Json.stringify(value)}
	@process.2 = ${'' + value}
	@process.3 = ${iterator.isLast ? value : value + ','}
}

Output of ${iterator}:
[{"index":0,"cycle":1,"isFirst":true,"isLast":false,"isEven":false,"isOdd":true},{"index":1,"cycle":2,"isFirst":false,"isLast":false,"isEven":true,"isOdd":false},{"index":2,"cycle":3,"isFirst":false,"isLast":false,"isEven":false,"isOdd":true},{"index":3,"cycle":4,"isFirst":false,"isLast":false,"isEven":true,"isOdd":false},{"index":4,"cycle":5,"isFirst":false,"isLast":false,"isEven":false,"isOdd":true},]

Using:
Flowpack.ElasticSearch.ContentRepositoryAdaptor from master branch
Neos 2.3.8

Thank you.

Fatal error because package pulls in classes from non-dependencies

Currently the package uses a trait from neos/neos which is incorrect as it should be usable standalone:

Required class "Neos\Neos\Controller\CreateContentContextTrait" could not be loaded properly for reflection.

Hide page in search?

Hey there,
just tested this plugin and it works great, thanks!
One thing that come to mind is that I've some sites that should not appear in the search results (e.g. the page that is shown after successful contact form submit, or the search page itself).

I've added a property to TYPO3.Neos:Documentcalled hideInSearch. But how do I adjust the query to only consider pages that do not have this attribute?

Something like
`Search.query(site).dontMatch('hideInSearch', true).fulltext(this.searchTerm)``

Thanks in advance
Torsten

Fulltext parts are lost on document update

Whenever a document node itself is indexed (again), the document in the index is deleted, thus removing all existing __fulltextParts and by this removing the child nodes' content from the fulltext index.

The reason: the str_replace() on the node type name is plain wrong, it would have been better to use NodeTypeMappingBuilder::convertNodeTypeNameToMappingName() - that would have used - instead of /.

A regression introduced with #76.

Support pagination by implementing QueryResultInterface

The request returned an invalid JSON string

Hi,

always I create or edit a node I got these error:

Elasticsearch request failed.

[DELETE http://localhost:9200/typo3cr/_query]: The request returned an invalid JSON string which was "No handler found for uri [/typo3cr/_query] and method [DELETE]".; Response body: No handler found for uri [/typo3cr/_query] and method [DELETE] Request data: {"query":{"bool":{"must":{"ids":{"values":["14d654232cd3e874453eabfe39642f556dfdc1f6"]}},"must_not":{"term":{"_type":"Neos-NodeTypes:Text"}}}}}Exception Code1338976439Exception TypeFlowpack\ElasticSearch\Transfer\ExceptionLog Reference201707041108013cd1faThrown in FileData/Temporary/Development/Cache/Code/Flow_Object_Classes/Flowpack_ElasticSearch_Transfer_Response.phpLine45Original FilePackages/Application/Flowpack.ElasticSearch/Classes/Transfer/Response.php

Is this a wrong config? I dont know how to debug this error.

flowpack/elasticsearch                          2.0.1              
flowpack/elasticsearch-contentrepositoryadaptor 4.0.4
neos/neos                                       3.1.2
php                                             7.1.6