Giter Site home page Giter Site logo

Comments (5)

essiembre avatar essiembre commented on August 22, 2024

A new snapshot release was just made, adding the following config options:

      <connectionTimeout>(milliseconds)</connectionTimeout>
      <socketTimeout>(milliseconds)</socketTimeout>

You can use plain Eglish for the milliseconds (e.g., 5 minutes, 30 seconds, etc.).

Please have a try and confirm.

from committer-elasticsearch.

jmrichardson avatar jmrichardson commented on August 22, 2024

Hi, sorry for the delay. I had a long running collector job running and didn't want to cancel it. Unfortunately, I am still getting the same error:

INFO  [AbstractCollectorConfig] Configuration loaded: id=Text Files; logsDir=/home/es/elastic/ingest/norconex/workdir-clients/logs; progressDir=/home/es/elastic/ingest/norconex/workdir-clients/progress
INFO  [JobSuite] JEF work directory is: /home/es/elastic/ingest/norconex/workdir-clients/progress
INFO  [JobSuite] JEF log manager is : FileLogManager
INFO  [JobSuite] JEF job status store is : FileJobStatusStore
INFO  [AbstractCollector] Suite of 1 crawler jobs created.
INFO  [JobSuite] Initialization...
INFO  [JobSuite] Previous execution detected.
INFO  [JobSuite] Backing up previous execution status and log files.
INFO  [JobSuite] Starting execution.
INFO  [AbstractCollector] Version: Norconex Filesystem Collector 2.7.2-SNAPSHOT (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Collector Core 1.9.0-SNAPSHOT (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Importer 2.8.0-SNAPSHOT (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex JEF 4.1.0 (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Committer Core 2.1.2-SNAPSHOT (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Committer Elasticsearch 4.1.0-SNAPSHOT (Norconex Inc.)
INFO  [JobSuite] Running WM Search Elastic: BEGIN (Tue Oct 31 16:04:23 EDT 2017)
INFO  [FilesystemCrawler] 0 start paths identified.
INFO  [CrawlerEventManager]           CRAWLER_STARTED
INFO  [AbstractCrawler] WM Search Elastic: Crawling references...
INFO  [AbstractCrawler] WM Search Elastic: Reprocessing any cached/orphan references...
INFO  [AbstractCrawler] WM Search Elastic: Crawler finishing: committing documents.
INFO  [AbstractFileQueueCommitter] Committing 1000 files
INFO  [ElasticsearchCommitter] Sending 50 commit operations to Elasticsearch.
ERROR [ElasticsearchCommitter$1] Failure occured on node: "http://localhost:9200". Check node logs.
INFO  [ElasticsearchCommitter] Elasticsearch RestClient closed.
ERROR [AbstractBatchCommitter] Could not commit batched operations.
com.norconex.committer.core.CommitterException: Could not commit JSON batch to Elasticsearch.
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:534)
        at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179)
        at com.norconex.committer.core.AbstractBatchCommitter.cacheOperationAndCommitIfReady(AbstractBatchCommitter.java:208)
        at com.norconex.committer.core.AbstractBatchCommitter.commitAddition(AbstractBatchCommitter.java:143)
        at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:222)
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commit(ElasticsearchCommitter.java:472)
        at com.norconex.collector.core.crawler.AbstractCrawler.execute(AbstractCrawler.java:274)
        at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:228)
        at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:184)
        at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49)
        at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:355)
        at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:296)
        at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:168)
        at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:132)
        at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95)
        at com.norconex.collector.fs.FilesystemCollector.main(FilesystemCollector.java:76)
Caused by: java.io.IOException: listener timeout after waiting for [30000] ms
        at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:660)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:219)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:191)
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:525)
        ... 15 more

Here is my config xml:

<fscollector id="Text Files">

  <logsDir>/home/es/elastic/ingest/norconex/workdir-clients/logs</logsDir>
  <progressDir>/home/es/elastic/ingest/norconex/workdir-clients/progress</progressDir>


  <crawlers>
    <crawler id="WM Search Elastic">

      <workDir>/home/es/elastic/ingest/norconex/workdir-clients</workDir>
      <numThreads>1</numThreads>
      <keepDownloads>false</keepDownloads>

      <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
        <nodes>http://localhost:9200</nodes>
        <indexName>wmsearch</indexName>
        <queueDir>/home/es/elastic/ingest/norconex/workdir-clients/commit</queueDir>
        <jsonFieldsPattern>scope</jsonFieldsPattern>
        <connectionTimeout>5 minutes</connectionTimeout>
        <socketTimeout>5 minutes</socketTimeout>
        <typeName>Documents</typeName>
        <commitBatchSize>50</commitBatchSize>
        <maxRetries>1</maxRetries>
      </committer>

    </crawler>
  </crawlers>

</fscollector>

Note that I am just committing the remaining documents in the queue (not crawling) as there were too many to reprocess for a test. It looks like it's not picking up the "5 minutes" for the timeouts in the above configuration.

Here is how I ran the test:

/home/es/elastic/norconex/collector-fs.sh -a resume -c /home/es/elastic/ingest/norconex/config/config-clients.xml

Let me know if you need more detail. Thanks

from committer-elasticsearch.

essiembre avatar essiembre commented on August 22, 2024

Strange, it has been implemented as per the link you provided. This will require a bit more investigation. Do you have other related messages in your Elasticsearch logs?

from committer-elasticsearch.

essiembre avatar essiembre commented on August 22, 2024

It turns out I misinterpreted what the maxRetryTimeout was for in ES REST API client. I left it out wrongfully thinking it was duplicating the behavior of maxRetryWait that is common to most Committers. So I added it, and there is now a new configuration option called <maxRetryTimeout> in the latest snapshot release.

Please give it a try and confirm.

from committer-elasticsearch.

jmrichardson avatar jmrichardson commented on August 22, 2024

Yay! That worked :) Thank you so much. I think I am getting close to having all the issues worked out. I need to purchase so more SSDs because I am running out of space for the indexes. So, once they come in I expect to be able to ingest all the files successfully. Will let you know if I run into any other issues.

from committer-elasticsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.