Giter Site home page Giter Site logo

elasticsearch-river-remote's Introduction

Searchisko Build Status Coverage Status

Searchisko is an open source project that allows to quickly build secured role-based REST service to index, search, retrieve and aggregate content from heterogeneous sources. It can attribute content to people and projects regardless of where the content originated.

Searchisko is Java EE 6 application which runs in the JBoss EAP 6 application server to provide REST API, and using Apache Lucene based full-text search engine and relational database in the background to provide powerful content retrieval, full-text search and aggregation functions.

Searchisko High-level View

Why have we created it?

Initially Searchisko was intended to provide a unified search experience across the multiple applications hosted at jboss.org but over time we realised it could also provide a powerful way to understand how people contribute to upstream projects using these applications and others on the internet.

Documentation

Anyone who would like to use Searchisko or implement a REST client for Searchisko can learn more in the following documentation:

Other resources

License

Copyright 2012 Red Hat Inc. and/or its affiliates and other contributors
as indicated by the @authors tag.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

elasticsearch-river-remote's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

elasticsearch-river-remote's Issues

River doesn't index each object - just the objektlist

Hi,

We have a request that the river should visit each object it discovers in the documentlist, so that we can use the REST-river on various REST-APIs where the full fieldlist isn't available in the documentlist.

There is support now for specifying the "id" of an object in the documentlist, so we would like the river to do another crawl to /{space}/ to index the full object where all the fields are available to us.

Is this something you can consider, and if so what timeframe do you estimate for a new release?

  • Henrik

How to handle incorrect data from remote system

Hi Vlastimil,

I have a question regarding the behavior of the river when indexing data from the remote system fails, in our case due to invalid data types (we get a number for a field that is defined as a string). Currently it looks like the river aborts indexing when this happens. Is this as designed? Would it be possible to make this behavior configurable, enabling a mode where the river ignores objects with incorrect values and continue indexing? Preferably logging the error?

Regards,
Martin Torgersen

Issue: 403 Error while creating river

I had been receiving a Peer Not Authenticated until I installed cert chain from target....Just a 403 now. How does the river format the username and password.

I am receiving the following error on River:

Failed remote system REST API call to the url 'https://seadevice4200.company.com/api/1.0/buildings/';. HTTP error code: 403 Response body: {"detail": "You do not have permission to access this resource. You may need to login or otherwise authenticate the request."}

This is a working curl:
curl -i -H "Accept: application/json" -X GET -u 'username:password' https://seadevice4200.company.com/api/1.0/devices/ --insecure

sample: part of river conf
"remote" : {
"urlGetDocuments" : "https://seadevice4200.company.com/api/1.0/{space}/",
"timeout" : "5s",
"username" : "username",
"password" : "password",
"spacesIndexed" : "buildings",
"spaceKeysExcluded" : "",
"indexUpdatePeriod" : "1m",
"indexFullUpdatePeriod" : "1h",
"simpleGetDocuments" : "true",

"maxIndexingThreads" : 2
}

Logs:
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << HTTP/1.1 403 FORBIDDEN
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << Server: nginx/1.1.19
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << Date: Tue, 22 Apr 2014 23:00:56 GMT
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << Content-Type: application/json
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << Transfer-Encoding: chunked
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << Connection: keep-alive
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << Vary: Authenticate, Accept, Cookie
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << Allow: GET, POST, HEAD, OPTIONS
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.headers ] << Set-Cookie: d42sessnid=c27ee673a281a611241fff56ed84e8be; httponly; Path=/
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.client.protocol.ResponseProcessCookies] Cookie accepted: "[version: 0][name: d42sessnid][value: c27ee673a281a611241fff56ed84e8be][domain: seadevice4200.company.com][path: /][expiry: null]".
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.impl.client.DefaultHttpClient] Connection can be kept alive indefinitely
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.wire ] << "7e[\r][\n]"
[2014-04-22 15:56:30,301][DEBUG][org.apache.http.wire ] << "{"detail": "You do not have permission to access this resource. You may need to login or otherwise authenticate the request."}"

Authentication is failing.
Error in River:
{
"_index": "remote_river_activity",
"_type": "remote_river_indexupdate",
"_id": "xVTyaAKpRJSoG_Mm6asohg",
"_score": null,
"_source": {
"space_key": "devices",
"update_type": "FULL",
"start_date": "2014-04-22T23:07:48.851Z",
"documents_updated": 0,
"documents_deleted": 0,
"result": "ERROR",
"time_elapsed": "158ms",
"error_message": "Failed remote system REST API call to the url 'https://seadevice4200.company.com/api/1.0/devices/'. HTTP error code: 403 Response body: {"detail": "You do not have permission to access this resource. You may need to login or otherwise authenticate the request."}"
},
"sort": [
null
]
}

I

Issue with remote river creation

#18 & #20 Linked references to this new issue.

I installed the latest plugin 1.3.3 - The previous error of remote_field_updated is now gone.

After retrying to create the river:
The latest error was:
CreationException[Guice creation errors:

  1. Error injecting constructor, org.elasticsearch.common.settings.SettingsException: Value must be provided for 'index/fields' configuration!
    at org.jboss.elasticsearch.river.remote.RemoteRiver.(Unknown Source)
    while locating org.jboss.elasticsearch.river.remote.RemoteRiver
    while locating org.elasticsearch.river.River

1 error]; nested: SettingsException[Value must be provided for 'index/fields' configuration!];

Thus I tried to add : index:fields * statement
I receive the following error:
CreationException[Guice creation errors:

  1. Error injecting constructor, java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
    at org.jboss.elasticsearch.river.remote.RemoteRiver.(Unknown Source)
    while locating org.jboss.elasticsearch.river.remote.RemoteRiver
    while locating org.elasticsearch.river.River

1 error]; nested: ClassCastException[java.lang.String cannot be cast to java.util.Map];

Current config:
{
"type" : "remote",
"remote" : {
"urlGetDocuments" : "http://docs.appdynamics.com/download/attachments/20187207/REST_WildCardBT_metric-dataJSON.txt?version=1&modificationDate=1394226069000&api=v2",
"username" :"",
"password" :"",
"timeout" : "5s",
"spacesIndexed" : "MAIN",
"spaceKeysExcluded" : "",
"indexUpdatePeriod" : "1m",
"indexFullUpdatePeriod" : "1h",
"simpleGetDocuments" : "true",
"maxIndexingThreads" : 2
},
"index" : {
"index" : "my_remote_index",
"type" : "remote_document",
"remote_field_document_id" : "metricPath",
"fields" : "*"
},
"activity_log": {
"index" : "remote_river_activity",
"type" : "remote_river_indexupdate"
}
}

Am I missing something in the config?

startAtIndex issue

We are trying to implement pagination by using {startAtIndex}. We have total count in API json and registered in river too. It looks like startAtIndex is always 0.
Are we doing something wrong?

River :
....
"remote": {
"urlGetDocuments": "http://server_api_url/{space}?skip={startAtIndex}",
"urlGetDocumentDetails": "http://server_api_url/{space}/{id}",
"getDocsResFieldDocuments": "documents",
"getDocsResFieldTotalcount" : "total",
"timeout": "30s",
"indexUpdatePeriod": "10m",
"indexFullUpdatePeriod": "30m",
"spacesIndexed": "article",
"maxIndexingThreads": 5
},
"index": {
"index": "article",
"type": "techno",
"field_document_id": "_id",
"remote_field_document_id": "_id",
"field_space_key": "article",
"remote_field_updated": "updated",
"field_updated": "updated",
"fields": {...}
}...

Documents list:
{"documents":[{"_id":"524081f9b8cb77df150006b6","updated":"2010-11-29T14:05:59.000Z","title":"Test 1"},{"_id":"524081f9b8cb77df15000885","updated":"2010-12-03T10:48:24.000Z","title":"test 2"}],"count":2,"total":50}

Document detail :
{"_id":"524081f9b8cb77df150006b6","provider":"DNT","created":"2010-11-29T14:05:59.000Z","updated":"2010-11-29T14:05:59.000Z", "content":"........"}

API supports pagination by using skip parameter.

We hope that you can help us.

Regards,
Martin and Oleg

Authentication 401 Error on River

Authentication is failing on the river:
I was able to request using a CURL>

curl --user username@account:password "https://companytemp.saas.appdynamics.com/controller/rest/applications/Service%20Management%20-
%20SEA%20PR/metric-data?metric-path=Overall%20Application%20Performance%7CAverag
e%20Response%20Time%20%28ms%29&time-range-type=BEFORE_NOW&duration-in-mins=15&ou
tput=json" -k

####### output

[{
"frequency": "ONE_MIN",
"metricPath": "Overall Application Performance|Average Response Time (ms)",
"metricValues": [ {
"current": 234,
"max": 274717,
"min": 0,
"startTimeInMillis": 1397762100000,
"value": 238
}]
}]

####### end output

I used the index/username and password fields in the river.

I tested two ways in the config:

  1. "username" : "username@account ",
     "password" : "password",
    
  2. "username" : "username@account:password ",
     "password" : "",
    
####### ElasticSearch Log

407][DEBUG][org.apache.http.client.protocol.RequestAddCookies] CookieSpec selected: best-match
407][DEBUG][org.apache.http.client.protocol.RequestAuthCache] Re-using cached 'basic' auth scheme for https://companytemp.saas.appdynamics.com:443
408][DEBUG][org.apache.http.client.protocol.RequestAuthCache] No credentials for preemptive authentication
408][DEBUG][org.apache.http.client.protocol.RequestTargetAuthentication] Target auth state: UNCHALLENGED
408][DEBUG][org.apache.http.client.protocol.RequestProxyAuthentication] Proxy auth state: UNCHALLENGED
408][DEBUG][org.apache.http.impl.client.DefaultHttpClient] Attempt 1 to execute request
409][DEBUG][org.apache.http.impl.conn.DefaultClientConnection] Sending request: GET /controller/rest/applications/Service%20Management%20-%20SEA%20PR/metric-data?metric-path=Overall%20Application%20Performance%7CAverage%20Response%20Time%20%28ms%29&time-range-type=BEFORE_NOW&duration-in-mins=15&output=json HTTP/1.1
410][DEBUG][org.apache.http.wire ] >> "GET /controller/rest/applications/Service%20Management%20-%20SEA%20PR/metric-data?metric-path=Overall%20Application%20Performance%7CAverage%20Response%20Time%20%28ms%29&time-range-type=BEFORE_NOW&duration-in-mins=15&output=json HTTP/1.1[\r][\n]"
410][DEBUG][org.apache.http.wire ] >> "Accept: application/json[\r][\n]"
411][DEBUG][org.apache.http.wire ] >> "Host: companytemp.saas.appdynamics.com[\r][\n]"
411][DEBUG][org.apache.http.wire ] >> "Connection: Keep-Alive[\r][\n]"
411][DEBUG][org.apache.http.wire ] >> "User-Agent: Apache-HttpClient/4.2.3 (java 1.5)[\r][\n]"
411][DEBUG][org.apache.http.wire ] >> "[\r][\n]"
411][DEBUG][org.apache.http.headers ] >> GET /controller/rest/applications/Service%20Management%20-%20SEA%20PR/metric-data?metric-path=Overall%20Application%20Performance%7CAverage%20Response%20Time%20%28ms%29&time-range-type=BEFORE_NOW&duration-in-mins=15&output=json HTTP/1.1
412][DEBUG][org.apache.http.headers ] >> Accept: application/json
412][DEBUG][org.apache.http.headers ] >> Host: companytemp.saas.appdynamics.com
413][DEBUG][org.apache.http.headers ] >> Connection: Keep-Alive
413][DEBUG][org.apache.http.headers ] >> User-Agent: Apache-HttpClient/4.2.3 (java 1.5)
467][DEBUG][org.apache.http.wire ] << "HTTP/1.1 401 Unauthorized[\r][\n]"
467][DEBUG][org.apache.http.wire ] << "X-Powered-By: Servlet/3.0 JSP/2.2 (GlassFish Server Open Source Edition 3.1.2.2 Java/Sun Microsystems Inc./1.6)[\r][\n]"
468][DEBUG][org.apache.http.wire ] << "Server: GlassFish Server Open Source Edition 3.1.2.2[\r][\n]"
468][DEBUG][org.apache.http.wire ] << "Pragma: No-cache[\r][\n]"
468][DEBUG][org.apache.http.wire ] << "Cache-Control: no-cache[\r][\n]"
469][DEBUG][org.apache.http.wire ] << "Expires: Wed, 31 Dec 1969 16:00:00 PST[\r][\n]"
469][DEBUG][org.apache.http.wire ] << "WWW-Authenticate: Basic realm="controller_realm"[\r][\n]"
469][DEBUG][org.apache.http.wire ] << "Content-Type: text/html[\r][\n]"
469][DEBUG][org.apache.http.wire ] << "Content-Length: 1073[\r][\n]"
470][DEBUG][org.apache.http.wire ] << "Date: Thu, 17 Apr 2014 20:40:07 GMT[\r][\n]"
470][DEBUG][org.apache.http.wire ] << "X-Varnish: 1905701652[\r][\n]"
470][DEBUG][org.apache.http.wire ] << "Age: 0[\r][\n]"
470][DEBUG][org.apache.http.wire ] << "Via: 1.1 varnish[\r][\n]"
471][DEBUG][org.apache.http.wire ] << "Connection: keep-alive[\r][\n]"
471][DEBUG][org.apache.http.wire ] << "[\r][\n]"
471][DEBUG][org.apache.http.impl.conn.DefaultClientConnection] Receiving response: HTTP/1.1 401 Unauthorized
471][DEBUG][org.apache.http.headers ] << HTTP/1.1 401 Unauthorized
472][DEBUG][org.apache.http.headers ] << X-Powered-By: Servlet/3.0 JSP/2.2 (GlassFish Server Open Source Edition 3.1.2.2 Java/Sun Microsystems Inc./1.6)
472][DEBUG][org.apache.http.headers ] << Server: GlassFish Server Open Source Edition 3.1.2.2
472][DEBUG][org.apache.http.headers ] << Pragma: No-cache
472][DEBUG][org.apache.http.headers ] << Cache-Control: no-cache
472][DEBUG][org.apache.http.headers ] << Expires: Wed, 31 Dec 1969 16:00:00 PST
473][DEBUG][org.apache.http.headers ] << WWW-Authenticate: Basic realm="controller_realm"
473][DEBUG][org.apache.http.headers ] << Content-Type: text/html
473][DEBUG][org.apache.http.headers ] << Content-Length: 1073
473][DEBUG][org.apache.http.headers ] << Date: Thu, 17 Apr 2014 20:40:07 GMT
473][DEBUG][org.apache.http.headers ] << X-Varnish: 1905701652
474][DEBUG][org.apache.http.headers ] << Age: 0
474][DEBUG][org.apache.http.headers ] << Via: 1.1 varnish
474][DEBUG][org.apache.http.headers ] << Connection: keep-alive
474][DEBUG][org.apache.http.impl.client.DefaultHttpClient] Connection can be kept alive indefinitely
475][DEBUG][org.apache.http.impl.client.DefaultHttpClient] Authentication required
475][DEBUG][org.apache.http.impl.client.DefaultHttpClient] companytemp.saas.appdynamics.com:443 requested authentication
475][DEBUG][org.apache.http.impl.client.TargetAuthenticationStrategy] Authentication schemes in the order of preference: [negotiate, Kerberos, NTLM, Digest, Basic]
475][DEBUG][org.apache.http.impl.client.TargetAuthenticationStrategy] Challenge for negotiate authentication scheme not available
476][DEBUG][org.apache.http.impl.client.TargetAuthenticationStrategy] Challenge for Kerberos authentication scheme not available
476][DEBUG][org.apache.http.impl.client.TargetAuthenticationStrategy] Challenge for NTLM authentication scheme not available
476][DEBUG][org.apache.http.impl.client.TargetAuthenticationStrategy] Challenge for Digest authentication scheme not available

Use of file with list of URL's

Is it possible to provide a file in the "urlGetDocuments" field, instead of URL string?
The file would contain the list of URL's. For example the use of sitemaps or csv...

Config for additional Documents - how to configure river to create additional/incremental _id

I currently have a config that pulls data from remote API. This returns a single document with a specified _id.
Is it possible to have the river create additional/incremental _id of the same config?
index/remote_field_document_id seems to be mandatory...

I would like to have the trending data returned over time..

Currently it updates the document in ES and increases the _version...

Data example - indexed in ES:
"current": [
0
],
"min": [
0
],
"max": [
540
],
"startTimeInMillis": [
1410362460000
],
"count": [
734506
],
"sum": [
285910
]

CONFIG:
{
"type" : "remote",
"remote" : {
"urlGetDocuments" : "https://company.api.org/metric-data?metric-path=1230_1331&time-range-type=BEFORE_NOW&duration-in-mins=15&output=json",
"timeout" : "5s",
"spacesIndexed" : "MAIN",
"username" : "username@domain",
"pwd" : "passw0rd",
"spaceKeysExcluded" : "",
"indexUpdatePeriod" : "1m",
"indexFullUpdatePeriod" : "0",
"simpleGetDocuments" : "true",
"maxIndexingThreads" : 2
},
"index" : {
"index" : "metrics",
"type" : "remote_api_test",
"remote_field_document_id" : "metricPath",
"fields" : {
"frequency" : {"remote_field" : "frequency"},
"metricPath" : {"remote_field" : "metricPath"},
"count" : {"remote_field" : "metricValues.count"},
"current" : {"remote_field" : "metricValues.current"},
"max" : {"remote_field" : "metricValues.max"},
"min" : {"remote_field" : "metricValues.min"},
"occurences" : {"remote_field" : "metricValues.occurences"},
"standardDeviation" : {"remote_field" : "metricValues.standardDeviation"},
"sum" : {"remote_field" : "metricValues.sum"},
"startTimeInMillis" : {"remote_field" : "metricValues.startTimeInMillis"},
"value" : {"remote_field" : "metricValues.value"}
}

},
"activity_log": {
    "index" : "remote_river_activity",
    "type"  : "remote_river_indexupdate"
}

}

Authentication - SSL verification chain

Is there a way to relax SSL verification in the config? ie, i have a river failing due to SSL Chain.
I can test using the {CURL ... -k or --basic }. This works but other methods fail due to CERT...

Improve error handling for org.apache.http.client.ClientProtocolException

When org.apache.http.client.ClientProtocolException is thrown then null message is in log.
This exception is subclass of IOException, in our example it has cause org.apache.http.client.CircularRedirectException: Circular redirect to 'http://www.jboss.org/products/datavirt/overview/' for example.

Support for offset parameter to objectlist url

Is it possible to add support for offset parameter to the urlGetDocuments for traversing a long list of documents? Is there any other way to fetch thousands of documents (eg. initial indexing)?

"urlGetDocuments": "https://system.org/rest/document?docSpace={space}&docUpdatedAfter={updatedAfter}&listOffset={offset}"

Allow to trigger indexing immediately over management REST API

Allow to trigger indexing of one space or all spaces immediately over management REST API operation. Current implementation of "Force full index update" management operation do not run indexing immediately, it schedules it for next time when incremental update should be performed. So we have to change this behaviour to run indexing ASAP.

River is unable to complete indexing when API returns a lot of 404's

Hi Vlastimil,

We are using version 1.3.1 of the river and have encountered a situation where it seems to get stuck and is unable to complete the indexing. This happens when the API returns a lot of 404's.

Here's what we see in the log:

[2014-04-05 22:34:33,736][DEBUG][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Go to ask remote system for updated documents for space bilder with startAt 0 and updated after Mon Mar 31 11:42:21 GMT 2014
[2014-04-05 22:34:34,181][DEBUG][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Go to update index for document 5242a068f92e7d7112034542 with updated Mon Mar 31 11:42:21 GMT 2014
[2014-04-05 22:34:34,422][WARN ][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Document '531f5cf90a180037350002d0' details not found on server, so skip it: org.jboss.elasticsearch.river.remote.GetJSONClient$RestCallHttpException: Failed remote system REST API call to the url 'http://api.nasjonalturbase.no/bilder/531f5cf90a180037350002d0'. HTTP error code: 404 Response body: {"message":"Objekt ikke funnet"}
(...18 more 404's)
[2014-04-05 22:34:39,561][DEBUG][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Go to ask remote system for updated documents for space bilder with startAt 20 and updated after Mon Mar 31 11:42:21 GMT 2014
(...20 404's)
[2014-04-05 22:34:44,525][DEBUG][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Go to ask remote system for updated documents for space bilder with startAt 0 and updated after Mon Mar 31 11:42:21 GMT 2014
[2014-04-05 22:34:44,971][DEBUG][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Go to update index for document 5242a068f92e7d7112034542 with updated Mon Mar 31 11:42:21 GMT 2014
[2014-04-05 22:34:34,422][WARN ][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Document '531f5cf90a180037350002d0' details not found on server, so skip it: org.jboss.elasticsearch.river.remote.GetJSONClient$RestCallHttpException: Failed remote system REST API call to the url 'http://api.nasjonalturbase.no/bilder/531f5cf90a180037350002d0'. HTTP error code: 404 Response body: {"message":"Objekt ikke funnet"}
(...18 more 404's)
[2014-04-05 22:34:49,595][DEBUG][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Go to ask remote system for updated documents for space bilder with startAt 20 and updated after Mon Mar 31 11:42:21 GMT 2014
(...20 404's)
(etc)

This then goes on forever. Looks like it happens when all the items between each fetch return 404? This might have been fixed in 1.3.2 though, haven't had the chance to test.

Cheers,
Martin

Error on REST Get Documents

I'm receiving an error on the river:

Authentication is working.

[2014-04-23 15:21:43,645][ERROR][org.jboss.elasticsearch.river.remote.SpaceByLastUpdateTimestampIndexer] Failed full update for Space subnets due: Get Documents REST response structure is invalid [B@7c3028ff

The following CURL command and return:
curl -X GET -u 'username:password' https://seadevice4200.company.com/api/1.0/subnets/ --insecure

{
"subnets": [
{
"range_end": "1.1.1.2",
"network": "1.1.1.0",
"vrf_group_id": null,
"subnet_id": 120,
"range_begin": "1.1.1.1",
"mask_bits": 30,
"number": null,
"name": null,
"parent_subnet_id": null,
"parent_vlan_id": null,
"notes": null,
"customer_id": null,
"gateway": null,
"description": null
},
{
"range_end": "1.1.1.42",
"network": "1.1.1.40",
"vrf_group_id": null,
"subnet_id": 124,
"range_begin": "1.1.1.41",
"mask_bits": 30,
"number": null,
"name": null,
"parent_subnet_id": null,
"parent_vlan_id": null,
"notes": null,
"customer_id": null,
"gateway": null,
"description": null
}
]
}

### Mapping used: /device4200/subnets/_mapping

{
"subnets" : {
"_timestamp" : { "enabled" : true },
"properties" : {
"space_key" : {"type" : "string", "analyzer" : "keyword"},
"source" : {"type" : "string", "analyzer" : "keyword"}
}
}
}

# River Config:

{
"type" : "remote",
"remote" : {
"urlGetDocuments" : "https://seadevice4200.company.com/api/1.0/{space}/?format=json",
"timeout" : "5s",
"username" : "username",
"password" : "passwords",
"spacesIndexed" : "subnets",
"spaceKeysExcluded" : "",
"indexUpdatePeriod" : "1m",
"indexFullUpdatePeriod" : "1h",
"simpleGetDocuments" : "true",
"maxIndexingThreads" : 2
},
"index" : {
"index" : "device42",
"type" : "subnets",
"remote_field_document_id" : "subnets.network",
"fields" : {
"range_end": {"remote_field" : "subnets.room"},
"network": {"remote_field" : "subnets.room"},
"vrf_group_id": {"remote_field" : "subnets.room"},
"subnet_id": {"remote_field" : "subnets.room"},
"range_begin": {"remote_field" : "subnets.room"},
"mask_bits": {"remote_field" : "subnets.room"},
"number": {"remote_field" : "subnets.room"},
"name": {"remote_field" : "subnets.room"},
"parent_subnet_id": {"remote_field" : "subnets.room"},
"parent_vlan_id": {"remote_field" : "subnets.room"},
"notes": {"remote_field" : "subnets.room"},
"customer_id": {"remote_field" : "subnets.room"},
"gateway": {"remote_field" : "subnets.room"},
"description": {"remote_field" : "subnets.room"}
}
},
"activity_log": {
"index" : "remote_river_activity",
"type" : "remote_river_indexupdate"
}
}

Allow to index data from unsorted paginated REST API list

River currently allows to obtain data from two types of REST API list operations (see https://github.com/searchisko/elasticsearch-river-remote#list-documents)

  • list ordered and filtered by last modification date
  • simple list with all items in one call without pagination support

We should add new mode which allows to obtain data from list REST API with pagination support.
Simply call REST API, process all returned items, then call REST API again with startAtIndex increased by number of processed items. Process stops when returned list is empty.
It is not possible to do incremental updates for this type of API, so only full updates will be performed.

Authentication 401 Error on River from #29

Sorry to open a duplicate issue. I wasnt able to re-open the existing one.
#29

I'm back to this issue.
ES - 1.3.2 + Remote_plugin 1.5.0
I have tested forcing the -basic auth in CURL, which works correctly.

FYI - I am working on a clean no old _river system.
I was able to connect to the open API test endpoint. This one does not require authentication. This proved the plugin works... {{ http://docs.appdynamics.com/download/attachments/20187207/REST_WildCardBT_metric-dataJSON.txt?version=1&modificationDate=1394226069000&api=v2 }}

The error has not changed and no other details are available in the logs...

error_message: "Failed remote system HTTP GET request to the url 'https://company.saas.appdynamics.com/controller/rest/applications/SEAP%20-%20CE%20-%20Production%20-%201/metric-data?metric-path=Backends%7CDefault%20Web%20Site/ClaqServices%7CAverage%20Response%20Time%20%28ms%29&time-range-type=BEFORE_NOW&duration-in-mins=15&output=json'. HTTP error code: 401 Response body:

GlassFish Server Open Source Edition 3.1.2.2 - Error report<!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}HR {color : #525D76;}-->
HTTP Status 401 -

type Status report

message

descriptionThis request requires HTTP authentication ().

Creation Exception - related to issue #44 - how to configure river to create additional/incremental _id

CreationException[Guice creation errors:

  1. Error injecting constructor, java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.util.Map
    at org.jboss.elasticsearch.river.remote.RemoteRiver.(Unknown Source)
    while locating org.jboss.elasticsearch.river.remote.RemoteRiver
    while locating org.elasticsearch.river.River

1 error]; nested: ClassCastException[java.util.ArrayList cannot be cast to java.util.Map];

Feature request: perform full update at a given time of the day

The remote endpoint we are indexing would like if we did the full update at night (around 2 AM). We have tried to set the indexFullUpdatePeriod to 24H but then the next full update depends on when the river-remote last completed it's full update.

Would it be possible to have indexFullUpdateTime option where you could specify the time of the day when the full update should happen?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.