flowpack / flowpack.elasticsearch.contentrepositoryadaptor Goto Github PK
View Code? Open in Web Editor NEWFlowpack.ElasticSearch adapter to support the Neos Content Repository
License: GNU Lesser General Public License v3.0
Flowpack.ElasticSearch adapter to support the Neos Content Repository
License: GNU Lesser General Public License v3.0
... Instead of manipulating context node path using string replace, we should instead do the following:
if NodeInUserWorkspace is published + should be indexed, create an new context (with same dimension values and DIFFERENT workspace). Then, using this new context, we fetch the published node by node path. Then, we use that one for indexing.
... Discovered together with Christopher at T3CON.
Due to the change to the way fields are returned in ES 1.0.0 (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_return_values.html) the handling around https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/master/Classes/Flowpack/ElasticSearch/ContentRepositoryAdaptor/Eel/ElasticSearchQueryBuilder.php#L362 is broken.
ElasticSearchQueryBuilder::getTotalItems()
checks for a key total
in the result but that never exist. Instead the total hits are returned in $this->results['hits']['total']
instead.
So shouldn't:
public function getTotalItems()
{
if (array_key_exists('total', $this->result)) {
return (int)$this->result['total'];
}
}
be changed to sth like:
public function getTotalItems()
{
if (isset($this->result['hits']['total'])) {
return (int)$this->result['hits']['total'];
}
return 0;
}
?
In order to replace
the master and remain as much commit information as possible I would propose the following:
Adjust the 3.0 Version to fit to current master:
Do the merge:
git merge -X theirs 4.0
)The [filtered] query is deprecated, please use a [bool] query instead with a [must] clause for the query part and a [filter] clause for the filter part.
Using Neos 3.0, Elasticsearch 2.3 and the most current versions of this ContentRepositoryAdaptor.
When using Flowpack.SearchPlugin and an extended PaginateController, i ran into an Issue.
On a query which is an instance of "Neos\ContentRepository\Search\Search\QueryBuilderInterface" i call the count() method like this, to get the count of all search results:
$this->resultCount = $this->query->count();
Unfortunately this gives strange Results:
I analyzed the problem and found a fix, but it is more a hack than a fix.
When doing a search to Elasticsearch, the query will be the same, as the query which is submitted, when i call the count() method.
The count() method is implemented as a request to "/_count" (https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/master/Classes/Eel/ElasticSearchQueryBuilder.php#L596) and extracts then the count (https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/master/Classes/Eel/ElasticSearchQueryBuilder.php#L600)
_I tried to change this request to "/search" and extract the count from this. And voila, the count is always correct.
I guess the problem lies in the query which the ContentRepositoryAdaptor generates.
See PR #226
Example Query which was used:
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match_all": []
},
{
"query_string": {
"query": "\"fsdfsdfsd\""
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"__parentPath": "/sites/cms"
}
},
{
"term": {
"__path": "/sites/cms"
}
}
]
}
},
{
"terms": {
"__workspace": [
"live"
]
}
},
{
"term": {
"__dimensionCombinationHash": "9dfdd8b029869edee199f9f9f920b723"
}
},
{
"term": {
"__typeAndSupertypes": "Neos.Neos:Document"
}
}
],
"should": [],
"must_not": [
{
"term": {
"_hidden": true
}
},
{
"range": {
"_hiddenBeforeDateTime": {
"gt": "now"
}
}
},
{
"range": {
"_hiddenAfterDateTime": {
"lt": "now"
}
}
},
{
"term": {
"hiddenInSearchResults": true
}
}
]
}
}
}
}
}
On running the "flow configuration:validate"-command i receive an error, for the Flowpack\ElasticSearch\ContentRepositoryAdaptor\Command\NodeIndexCommandController
-configuration in the Objects.yaml, stating that it expected a string but got an array:
- Objects.Flowpack.ElasticSearch.ContentRepositoryAdaptor.Flowpack\ElasticSearch\ContentRepositoryAdaptor\LoggerInterface.arguments.__index_3.value -> expected: type=string found: type=array
and in deed the code looks like this:
Flowpack\ElasticSearch\ContentRepositoryAdaptor\Command\NodeIndexCommandController:
properties:
logger:
object:
factoryObjectName: TYPO3\Flow\Log\LoggerFactory
factoryMethodName: create
arguments:
3:
value:
fileBackend: 'TYPO3\Flow\Log\Backend\FileBackend'
ansiConsoleBackend: 'TYPO3\Flow\Log\Backend\AnsiConsoleBackend'
I don't know whether this is actually correct and the "flow configuration:validate"-command is wrong.
Hi,
I had some trouble with getting the search word highlighted in the result.
The problem with the current version is, that elasticsearch doesn't think that the fields definition in the highlight section of the search query doesn't match with the searched fields. Therefore the highlighting doesn't work and changes via .highlight()
eel helper have no effects.
My suggested solution would be to add the missing parameters fields => ['__fulltext*']
in the FilteredQuery.php at the fulltext
function.
After I added this, the search word got highlighted as expected.
A quick workaround is to tweak the fusion searchQuery in your own package like this:
prototype(Flowpack.SearchPlugin:Search) < prototype(Neos.Neos:Content) {
queryStringFields = Neos.Fusion:RawArray {
fields = Neos.Fusion:RawArray {
0 = "__fulltext*"
}
}
searchQuery = ${this.searchTerm ? Search.query(site).fulltext(this.searchTerm).request('query.filtered.query.bool.must.1.query_string', this.queryStringFields).nodeType('Neos.Neos:Document') : null}
}
I'll add an PR.
See the Neos or Flow development collections for a workaround to test pull requests and post merges.
$ travis_apt_get_update
0.48s$ sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install oracle-java8-set-default
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package oracle-java8-set-default
apt-get.diagnostics
apt-get install failed
$ cat ${TRAVIS_HOME}/apt-get-update.log
...
Fetched 52.0 MB in 4s (11.4 MB/s)
Reading package lists...
The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install oracle-java8-set-default" failed and exited with 100 during .
I found this: https://stackoverflow.com/questions/25289482/installing-jdk8-on-ubuntu-unable-to-locate-package-update-doesnt-fix
But not sure (yet) how to apply this to travis.
Related to #302
I'm trying to create an index on Elastic 5.6.13 and I'd like to configure the analyzer for fulltext search to be german
.
According to the readme, I did this:
'Neos.Neos:Node':
search:
elasticSearchMapping:
_all:
analyzer: german
But Elasticsearch is throwing an error and can't create the mapping:
Elasticsearch request failed.
[PUT
http://localhost:9200/neoscr-stage-1547824666/Neos-Neos:Shortcut/_mapping]:
Array
(
[root_cause] => Array
(
[0] => Array
(
[type] => illegal_argument_exception
[reason] => Mapper for [_all] conflicts with existing mapping in other types:
[mapper [_all] has different [analyzer], mapper [_all] is used by
multiple types. Set update_all_types to true to update [search_analyzer]
across all types., mapper [_all] is used by multiple types. Set
update_all_types to true to update [search_quote_analyzer] across all
types.]
)
)
[type] => illegal_argument_exception
[reason] => Mapper for [_all] conflicts with existing mapping in other types:
[mapper [_all] has different [analyzer], mapper [_all] is used by
multiple types. Set update_all_types to true to update [search_analyzer]
across all types., mapper [_all] is used by multiple types. Set
update_all_types to true to update [search_quote_analyzer] across all
types.]
)
Response body:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Mapper for [_all] conflicts with existing mapping in other types:\n[mapper [_all] has different [analyzer], mapper [_all] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [_all] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]"}],"type":"illegal_argument_exception","reason":"Mapper for 'Neos.Neos:Node': [_all] conflicts with existing mapping in other types:\n[mapper [_all] has different [analyzer], mapper [_all] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [_all] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]"},"status":400}
Request data:
{"Neos-Neos:Shortcut":{"dynamic_templates":[{"dimensions":{"path_match":"__dimensionCombinations.*","match_mapping_type":"string","mapping":{"type":"text"}}}],"properties":{"__dimensionCombinationHash":{"type":"keyword"},"_creationDateTime":{"type":"date","format":"date_time_no_millis"},"_lastModificationDateTime":{"type":"date","format":"date_time_no_millis"},"_lastPublicationDateTime":{"type":"date","format":"date_time_no_millis"},"_path":{"type":"keyword","index":true},"_name":{"type":"keyword","index":true},"_nodeType":{"type":"keyword","index":true},"__identifier":{"type":"keyword","index":true},"__workspace":{"type":"keyword","index":true},"__path":{"type":"keyword","index":true},"__parentPath":{"type":"keyword","index":true},"__sortIndex":{"type":"integer"},"__typeAndSupertypes":{"type":"keyword","index":true},"_hidden":{"type":"boolean"},"_hiddenBeforeDateTime":{"type":"date","format":"date_time_no_millis"},"_hiddenAfterDateTime":{"type":"date","format":"date_time_no_millis"},"title":{"type":"keyword","index":true},"uriPathSegment":{"type":"keyword","index":true},"_hiddenInIndex":{"type":"boolean"},"__fulltextParts":{"type":"object","enabled":false},"__fulltext":{"type":"object","properties":{"h1":{"type":"keyword","boost":20,"copy_to":"_all","index":true},"h2":{"type":"keyword","boost":12,"copy_to":"_all","index":true},"h3":{"type":"keyword","boost":10,"copy_to":"_all","index":true},"h4":{"type":"keyword","boost":5,"copy_to":"_all","index":true},"h5":{"type":"keyword","boost":3,"copy_to":"_all","index":true},"h6":{"type":"keyword","boost":2,"copy_to":"_all","index":true},"text":{"type":"keyword","boost":1,"copy_to":"_all","index":true}}},"targetMode":{"type":"keyword","index":true},"target":{"type":"keyword","index":true}},"_all":{"analyzer":"german"}}}
The driver concept of this package makes it possible to support different versions of elasticsearch and with the current version 4.1, it supports versions 1.x, 2.x and 5.x, which makes it usable also on older environments.
Although, ES 1.x is EOL since January 2017 and ES 2.x since Februar 2018. (See https://www.elastic.co/support/eol) and using unsupported software versions should be discouraged.
With the next major, I would like to remove the support for ES 1.x and 2.x to make it easier to integrate new features without backporting it to older version. Also testing gets lot easier.
The upmerge from 3.0 to 4.0 caused a fatal error
PHP Fatal error: Declaration of Flowpack\ElasticSearch\ContentRepositoryAdaptor\Driver\Version1\Query\FilteredQuery_Original::fulltext($searchWord) must be compatible with Flowpack\ElasticSearch\ContentRepositoryAdaptor\Driver\QueryInterface::fulltext(string $searchWord, array $options = Array) in Data/Temporary/Production/SubContextAws/Cache/Code/Flow_Object_Classes/Flowpack_ElasticSearch_ContentRepositoryAdaptor_Driver_Version1_Query_FilteredQuery.php on line 20
Most likely caused by #290, however not visible as it was a clean merge.
Before https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/4.1.0/Classes/Driver/Version1/Query/FilteredQuery.php#L56
After https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/4.1.1/Classes/Driver/Version1/Query/FilteredQuery.php#L56
When using queries in Fusion to fetch nodes, a condition on the workspace is added, always restricting results to the live
workspace.
This leads to unexpected results, since changes from a workspace that affect those queries will not be observed.
The queries should usually be working on the workspace that is currently shown (personal workspace, preview workspace, …).
Since version 1.4 there's a configuration for setting the time_zone to be used in a query (See the docs, the commit, and the original issue). (Note that this is apparently still missing for aggregations.)
This is important for all queries which use relative dates and/or "magic" keywords ("now", etc).
One thing that's not clear to me, is how ElasticSearch figures out what "now" means. In either case, it's a lot safer if we make sure it's using the currently configured PHP Timezone, since that's what the node dates also use.
It would probably be ideal to add this option to the default ElasticSearchQueryBuilder query.
#108 introduced a regression by hardcoding the "language" dimension into the NodeIndexer. There are, however, sites which don't have a "language" dimension or which might have dimensions with other names fulfilling the same role like a language.
We need to revert that behaviour or at least make it compatible with other dimension setups.
@dfeyer
Currently to enable sorting by date (or by any other property as well?), one must configure the relevant property with this config in NodeTypes.yaml:
search:
elasticSearchMapping:
type: date
include_in_all: false
format: 'date_time_no_millis'
indexing: '${(value ? Date.format(value, "Y-m-d\TH:i:sP") : null)}'
But the docs say: Normally, this does not need to be touched, as this package supports all Neos data types natively.
Can somebody who's into the subject clarify, under what circumstances must one configure the indexing
config?
Given that:
When I:
Then:
This is because the workspace for "john" has been indexed already. Nor does that switch trigger any reindexing, since no publishing is involved. Thus it is not being reindexed after the base workspace switch.
Situation
Two different Packages with different NodeTypes which both are using a Property named "heading", but one is type of int and one is type of string.
Action
./flow nodeindex:build
Result
Array
(
[root_cause] => Array
(
[0] => Array
(
[type] => illegal_argument_exception
[reason] => mapper [heading] cannot be changed from type [int] to [string]
)
)
[type] => illegal_argument_exception
[reason] => mapper [heading] cannot be changed from type [int] to [string]
)
Currently we use only one index for all sites, that can make some scoring feature of Elastic to not work as expected, by ex. word frequency will take in account other site, ...
Given I have a node in dimension A, on a website with 2 dimanesions A and B and my node does not exist in dimension B
During indexing we create two document in the index one for dimension A and one for dimension B.
If I remove the node in dimension A, the document in this dimension is removed correctly but the one in the dimension B. It's not a big issue, because the document of the Elastic query are filtered and only existing document in the CR are returned.
But it's not clean ;) and in some case it can have an impact on the query algorythm, because Elastic use data that are not in sync with the CR.
Affected version: all
It's a bit complexe, in my case it's easy, we can remove all variants (because node does not exist in dimension B).
But it's a valid case, to have the node in both A and B dimensions, but only removing the one in dimension A (in this case the current behaviour is correct.
So we have two scenario:
How to detect those two scenario ?
Given:
Problem:
If the document that contains this node is found using the FLowpack.SearchPlugin, the content excerpt contains the "Foobar" string, even though it's not visible in the rendered page
Conclusion:
The indexer checks each node for it's hidden state, but since the node containing "Foobar" is not hidden by itself, but only invisible because it's container is hidden, it is indexed and the extracted fulltext parts are added to the fulltext root (document node).
If i search cyrillic words the result page shows me a unicode string. It's displayed in Flowpack.SearchPlugin:Search -> Search.html
I don't know if its allready delivered by the ContentRepositoryAdapter so i give you an issue here ...
You can see my example here:
http://www.neo-angin.ua/search.html?search=%D0%B7%D0%BD%D0%B0%D0%B9%D0%B4%D0%B5%D0%BD%D0%BE
I made a workaround in JS to decode the original letters, so if you disable JavaScript you can see the unicode search string.
For one Node there are several datasets in the Elastic Index. So there are to many Nodes indexed which has at least influences for the result number, when all results are shown,
For searching with a dedicated search term the result number is the correct one. (SEARCHING DIRECTLY IN THE INDEX))
This behavior probably depends on the first indexing of the data. For every workspace, all nodes are indexed. If there are no nodes in the user-jondoe workspace the nodes are indexed for workspace live.
For instance, see the following code:
protected function updateFulltext(Node $node, array $fulltextIndexOfNode, $targetWorkspaceName = NULL) {
if ((($targetWorkspaceName !== NULL && $targetWorkspaceName !== 'live') || $node->getWorkspace()->getName() !== 'live') || count($fulltextIndexOfNode) === 0) {
return;
}
Here, we use $node->getWorkspace(), returning the workspace of the UNDERLYING node data. which I think is wrong -- instead, $node->getContext()->getWorkspace() should be used.
Same in the following code:
'__workspace':
search:
elasticSearchMapping:
type: string
index: not_analyzed
include_in_all: false
indexing: '${node.workspace.name}'
I am 95% confident this must be node.context.workspace.name -- but I wonder why nobody noticed that before (weird).
Greets, Sebastian
The _count call doesn't work.
It seems to trip over the 'fields' => array('__path') part.
I couldn't find a way to fix it, so I solved it in my own QueryBuilder doing:
public function count() {
$response = $this->elasticSearchClient->getIndex()->request('GET', '/_search', array(), json_encode($this->request));
$treatedContent = $response->getTreatedContent();
return (int)$treatedContent['hits']['total'];
}
Not so nice. But better suggestions welcome ;-)
When I change the NodeType of a Document-Node and try to publish afterwards, I get an error:
Exception #1338977435: Elasticsearch request failed.
[POST http://127.0.0.1:9200/typo3cr/_bulk]: Array
(
[root_cause] => Array
(
[0] => Array
(
[type] => parse_exception
[reason] => Failed to derive xcontent
)
)
[type] => parse_exception
[reason] => Failed to derive xcontent
)
; Response body: {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"parse_exception","reason":"Failed to derive xcontent"},"status":400}
Request data:
(Error ends here, Request isn't shown)
The Stracktrace:
34 Flowpack\ElasticSearch\Transfer\Response_Original::__construct(TYPO3\Flow\Http\Response, TYPO3\Flow\Http\Request)
33 call_user_func_array("parent::__construct", array|2|)
32 Flowpack\ElasticSearch\Transfer\Response::__construct(TYPO3\Flow\Http\Response, TYPO3\Flow\Http\Request)
31 Flowpack\ElasticSearch\Transfer\RequestService_Original::request("POST", Flowpack\ElasticSearch\ContentRepositoryAdaptor\ElasticSearchClient, "/typo3cr/_bulk", array|0|, "")
30 Flowpack\ElasticSearch\Domain\Model\Client_Original::request("POST", "/typo3cr/_bulk", array|0|, "")
29 Flowpack\ElasticSearch\Domain\Model\Index_Original::request("POST", "/_bulk", array|0|, "")
28 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::removeDuplicateDocuments("/sites/magazine/node-577a218fc9713/node-57d7b6ab93765@live", "6c675d686e3229af21556a9ae83570f3445bc00e", TYPO3\TYPO3CR\Domain\Model\Node)
27 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\{closure}(TYPO3\TYPO3CR\Domain\Model\Node, "live")
26 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::indexNode(TYPO3\TYPO3CR\Domain\Model\Node, "live")
25 call_user_func_array(array|2|, array|2|)
24 TYPO3\Flow\Object\DependencyInjection\DependencyProxy::__call("indexNode", array|2|)
23 TYPO3\Flow\Object\DependencyInjection\DependencyProxy::indexNode(TYPO3\TYPO3CR\Domain\Model\Node, "live")
22 TYPO3\TYPO3CR\Search\Indexer\NodeIndexingManager_Original::TYPO3\TYPO3CR\Search\Indexer\{closure}()
21 TYPO3\TYPO3CR\Search\Indexer\NodeIndexingManager_Original::flushQueues("TYPO3\Flow\Persistence\Doctrine\PersistenceManager::allObjectsPersisted", "TYPO3\Flow\Persistence\Doctrine\PersistenceManager::allObjectsPersisted", "TYPO3\Flow\Persistence\Doctrine\PersistenceManager::allObjectsPersisted", "TYPO3\Flow\Persistence\Doctrine\PersistenceManager::allObjectsPersisted")
20 call_user_func_array(array|2|, array|4|)
19 TYPO3\Flow\SignalSlot\Dispatcher::dispatch("TYPO3\Flow\Persistence\Doctrine\PersistenceManager", "allObjectsPersisted", array|0|)
18 TYPO3\Flow\SignalSlot\SignalAspect_Original::forwardSignalToDispatcher(TYPO3\Flow\Aop\JoinPoint)
17 TYPO3\Flow\Aop\Advice\AbstractAdvice::invoke(TYPO3\Flow\Aop\JoinPoint)
16 TYPO3\Flow\Persistence\Doctrine\PersistenceManager::emitAllObjectsPersisted()
15 TYPO3\Flow\Persistence\Doctrine\PersistenceManager_Original::persistAll()
14 TYPO3\Flow\Package::TYPO3\Flow\{closure}(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response, TYPO3\Neos\Service\Controller\WorkspaceController, "TYPO3\Flow\Mvc\Dispatcher::afterControllerInvocation")
13 Closure::__invoke(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response, TYPO3\Neos\Service\Controller\WorkspaceController, "TYPO3\Flow\Mvc\Dispatcher::afterControllerInvocation")
12 call_user_func_array(array|2|, array|4|)
11 TYPO3\Flow\SignalSlot\Dispatcher::dispatch("TYPO3\Flow\Mvc\Dispatcher", "afterControllerInvocation", array|3|)
10 TYPO3\Flow\SignalSlot\SignalAspect_Original::forwardSignalToDispatcher(TYPO3\Flow\Aop\JoinPoint)
9 TYPO3\Flow\Aop\Advice\AbstractAdvice::invoke(TYPO3\Flow\Aop\JoinPoint)
8 TYPO3\Flow\Mvc\Dispatcher::emitAfterControllerInvocation(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response, TYPO3\Neos\Service\Controller\WorkspaceController)
7 TYPO3\Flow\Mvc\Dispatcher_Original::initiateDispatchLoop(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response)
6 TYPO3\Flow\Mvc\Dispatcher_Original::dispatch(TYPO3\Flow\Mvc\ActionRequest, TYPO3\Flow\Http\Response)
5 TYPO3\Flow\Mvc\DispatchComponent_Original::handle(TYPO3\Flow\Http\Component\ComponentContext)
4 TYPO3\Flow\Http\Component\ComponentChain_Original::handle(TYPO3\Flow\Http\Component\ComponentContext)
3 TYPO3\Flow\Http\Component\ComponentChain_Original::handle(TYPO3\Flow\Http\Component\ComponentContext)
2 TYPO3\Flow\Http\RequestHandler::handleRequest()
1 TYPO3\Flow\Core\Bootstrap::run()
HTTP REQUEST:
PUT /neos/service/workspaces-rpc/publish-nodes HTTP/1.1
Publishing without changing NodeTypes doesn't make problems...
ContentRepositoryAdaptor: dev-master
Neos: 2.3.7
ES: Official Docker-Container (Tag: 2)
Currently the ES index is build during the request, this can be a problem for bigger setup, or complex index configuration. We also need per ex. in Neos to create a full index when we create a new user or workspace, this can take some times so we need to be able to do that async
Merge request should be checked for correct psr-2 code style
I just installed this package and set up ElasticSearch for a simple site package, I'm just creating.
The bug: Changing a node's type immediately results in an error.
This appears for changing document aswell as content nodes.
Elasticsearch request failed.
[GET http://search:9200//_search/scroll?scroll=1m]: Array ( [root_cause] => Array ( [0] => Array ( [type] => illegal_argument_exception [reason] => request [//_search/scroll] contains unrecognized parameter: [scroll] ) ) [type] => illegal_argument_exception [reason] => request [//_search/scroll] contains unrecognized parameter: [scroll] ) ; Response body: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [//_search/scroll] contains unrecognized parameter: [scroll]"}],"type":"illegal_argument_exception","reason":"request [//_search/scroll] contains unrecognized parameter: [scroll]"},"status":400} Request data: DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAA6AFkxiQjJSS2gxVDZlbUIyUV9YWklPeEEAAAAAAAAOghZMYkIyUktoMVQ2ZW1CMlFfWFpJT3hBAAAAAAAADn8WTGJCMlJLaDFUNmVtQjJRX1haSU94QQAAAAAAAA6BFkxiQjJSS2gxVDZlbUIyUV9YWklPeEEAAAAAAAAOgxZMYkIyUktoMVQ2ZW1CMlFfWFpJT3hB
Edit: I'm using the latest versions of all related packages:
Neos/Neos: 4.3.0
Flowpack/ElasticSearch: 2.0.4
Flowpack/ElasticSearch.ContentRepositoryAdapter: 5.0.1
ElasticSearch is running in a docker container using the official Elasticsearch image, version 5.
Given that:
indexingBatchSize
)When I:
Then:
Could not index node with identifier …, not found in workspace live
This seems to be caused by the indexer looking for the node to index in the target workspace. But if the indexing threshold is reached before the publishing has actually been "persisted to disk", the nodes for that run cannot yet be found in the target workspace.
If this has happened, a nodeindex:build
is needed to update the index to the expected state.
Following the documentation in README.md
I always got the following error message
ElasticsearchIllegalArgumentException[failed to execute script]; nested: ScriptException[dynamic scripting for [groovy] disabled];
According to https://github.com/elastic/elasticsearch/blob/master/config/elasticsearch.yml the filename should be called elasticsearch.yml instead of elasticsearch.yaml
We need to update the current implementation to update all node variants in the Indexer and not only the current like it's done now.
Check how it's done in the SimpleSearch adaptor:
https://github.com/Flowpack/Flowpack.SimpleSearch.ContentRepositoryAdaptor/blob/master/Classes/Flowpack/SimpleSearch/ContentRepositoryAdaptor/Indexer/NodeIndexer.php
A regression through #295 - the method never had a return value, but the docblock lied to me…
The current nodeindex:build
implementation is not capable of indexing an existing site with a lot of nodes because it consumes too much memory for any system to handle.
There is a --limit
flag but it doesn't split the tasks in junks but instead just stops after x nodes. What's the usecase for this?
Besides it would be useful to be able to only index certain dimension combinations and/or node types.
But most importantly it should be possible to be able to process single nodes/batches in separate processes (so this might be related to #121).
For a current project I hooked it our JobQueue packages to intercept the NodeIndexingManage::indexNode()
call and create a job that is handled in (multiple) separate workers.
This most probably won't be faster because each worker has to boot up the framework, fetch the node again and index it.. But it is more stable and scalable and it should be possible to process a batch of nodes at once per worker to find the sweet spot between memory consumption and performance..
When I build up an index I get this message:
Deprecated field [inline] used, expected [source] instead
Currently the indexing is broken for ES versions 1.2+ due to a security issue with dynamic scripting, leading them to disable dynamic scripting by default. Although the security issue only applied if the service is public. The security issue has not been fixed, although planned, since it requires sandboxing. So to make it work ES has to be configured to enable dynamic scripting. To avoid this the scripting can be ported to Groovy instead of MVEL since Groovy is sandboxed and therefore dynamic scripting is enabled by default for Groovy. Additionally the plan for ES is to use Groovy as the default scripting language instead of MVEL from 1.4, requiring a custom MVEL plugin to be installed. So actually porting to Groovy is the only way to support the default ES installation in the future as well.
There are currently only two occurrences of MVEL scripting located here in
Classes/Flowpack/ElasticSearch/ContentRepositoryAdaptor/Indexer/NodeIndexer.php on line 255 and again on line 323.
Enabling dynamic scripting:
Set "script.disable_dynamic: false" in elasticsearch.yml
Read more here:
http://www.elasticsearch.org/blog/scripting/
http://www.elasticsearch.org/blog/scripting-security/
http://www.elasticsearch.org/blog/elasticsearch-1-3-0-released/
--- ups wrong repository :/ sorry
Searching for a Slash (e.G. "11/2011" or "a/b" ) breaks the ES query.
Example response:
[reason] => Array
(
[type] => query_parsing_exception
[reason] => Failed to parse query [11/2010]
[index] => production-1485418788
[line] => 1
[col] => 100
[caused_by] => Array
(
[type] => parse_exception
[reason] => Cannot parse '11/2010': Lexical error at line 1, column 8. Encountered: <EOF> after : "/2010"
[caused_by] => Array
(
[type] => token_mgr_error
[reason] => Lexical error at line 1, column 8. Encountered: <EOF> after : "/2010"
)
)
)
ES-Version: 2.4.x
CRA-Version: 3.0.0
@daniellienert can confirm this issue
Look like we use a deprecated query, and we expose some internals, like in QueryBuilder::appendAtPath we need to plan how to move to more decoupled query
With actual master of the Adapter and elastic 5.6.8 nodeindex:cleanup is not working.
./flow nodeindex:cleanup
Nothing removed. ElasticSearch responded with status 400, saying "illegal_argument_exception: No endpoint or operation is available at [_status]"
This issue is to keep track of informations regarding our usage of Types, wrong based on the official documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html#_type_takeaways
But we need to take care of this https://stackoverflow.com/questions/14465668/elastic-search-multiple-indexes-vs-one-index-and-types-for-different-data-sets (sharding issue with a lots of indices)
I have the following setup to limit a collection and return it as json. But iterator.isLast is not true in the last cycle, so the json format is wrong because the last ',' is not removed. Did I miss something in the config or my fusion setup?
prototype(Vendor.Site:NewsListJson) < prototype(TYPO3.TypoScript:Collection) {
collection = ${Search.query(site).nodeType('Vendor.Site:News').sortDesc('date').limit(5).execute()}
itemName = 'node'
iterationName = 'iterator'
itemRenderer = Vendor.Site:NewsListItemJson {
}
@process.1 = ${'[' + value + ']'}
@cache {
mode = 'cached'
entryTags {
1 = ${'NodeType_Vendor.Site:News'}
}
}
}
prototype(Vendor.Site:NewsListItemJson) < prototype(Vendor.Site:JsonObjectRenderer) {
id = ${q(node).property('uriPathSegment')}
title = ${String.stripTags(q(node).property('title'))}
text = ${String.crop(q(node).property('text'), 150, '...')}
image = TYPO3.Neos:ImageUri {
asset = ${q(node).property('image')}
maximumWidth = 80
maximumHeight = 60
allowCropping = TRUE
allowUpScaling = TRUE
}
entryDate = ${Date.format(q(node).property('date'), 'F jS, Y')}
category = ${q(node).property('category')}
}
prototype(Vendor.Site:JsonObjectRenderer) < prototype(TYPO3.TypoScript:RawArray) {
@process.1 = ${Json.stringify(value)}
@process.2 = ${'' + value}
@process.3 = ${iterator.isLast ? value : value + ','}
}
Output of ${iterator}:
[{"index":0,"cycle":1,"isFirst":true,"isLast":false,"isEven":false,"isOdd":true},{"index":1,"cycle":2,"isFirst":false,"isLast":false,"isEven":true,"isOdd":false},{"index":2,"cycle":3,"isFirst":false,"isLast":false,"isEven":false,"isOdd":true},{"index":3,"cycle":4,"isFirst":false,"isLast":false,"isEven":true,"isOdd":false},{"index":4,"cycle":5,"isFirst":false,"isLast":false,"isEven":false,"isOdd":true},]
Using:
Flowpack.ElasticSearch.ContentRepositoryAdaptor from master branch
Neos 2.3.8
Thank you.
Currently the package uses a trait from neos/neos which is incorrect as it should be usable standalone:
Required class "Neos\Neos\Controller\CreateContentContextTrait" could not be loaded properly for reflection.
Hey there,
just tested this plugin and it works great, thanks!
One thing that come to mind is that I've some sites that should not appear in the search results (e.g. the page that is shown after successful contact form submit, or the search page itself).
I've added a property to TYPO3.Neos:Document
called hideInSearch
. But how do I adjust the query to only consider pages that do not have this attribute?
Something like
`Search.query(site).dontMatch('hideInSearch', true).fulltext(this.searchTerm)``
Thanks in advance
Torsten
Whenever a document node itself is indexed (again), the document in the index is deleted, thus removing all existing __fulltextParts
and by this removing the child nodes' content from the fulltext index.
The reason: the str_replace()
on the node type name is plain wrong, it would have been better to use NodeTypeMappingBuilder::convertNodeTypeNameToMappingName()
- that would have used -
instead of /
.
A regression introduced with #76.
Hi,
always I create or edit a node I got these error:
Elasticsearch request failed.
[DELETE http://localhost:9200/typo3cr/_query]: The request returned an invalid JSON string which was "No handler found for uri [/typo3cr/_query] and method [DELETE]".; Response body: No handler found for uri [/typo3cr/_query] and method [DELETE] Request data: {"query":{"bool":{"must":{"ids":{"values":["14d654232cd3e874453eabfe39642f556dfdc1f6"]}},"must_not":{"term":{"_type":"Neos-NodeTypes:Text"}}}}}Exception Code1338976439Exception TypeFlowpack\ElasticSearch\Transfer\ExceptionLog Reference201707041108013cd1faThrown in FileData/Temporary/Development/Cache/Code/Flow_Object_Classes/Flowpack_ElasticSearch_Transfer_Response.phpLine45Original FilePackages/Application/Flowpack.ElasticSearch/Classes/Transfer/Response.php
Is this a wrong config? I dont know how to debug this error.
flowpack/elasticsearch 2.0.1
flowpack/elasticsearch-contentrepositoryadaptor 4.0.4
neos/neos 3.1.2
php 7.1.6
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.