Giter Site home page Giter Site logo

Comments (14)

essiembre avatar essiembre commented on August 22, 2024

Hello @jacksonp2008,

You are correct that "typeName" is no longer supported, in line with Elasticsearch's evolving API.

Which version of Elasticsearch are you using? Support for type has been deprecated in Elasticsearch 6.x and effectively removed starting from Elasticsearch 7.x.

You may consider upgrading your version of Elasticsearch.

You can find the Committer version 5 documentation from the JavaDoc:.

from committer-elasticsearch.

jacksonp2008 avatar jacksonp2008 commented on August 22, 2024

Version 6.3 via AWS. (6.8 is the latest they offer) Can I use an older committer with 3.0.0?

from committer-elasticsearch.

essiembre avatar essiembre commented on August 22, 2024

Which zone are you in? Last time I checked, AWS Elasticsearch service was offering up to 7.9, as described here.

Unfortunately, older committers are not compatible with 3.0.0.

We can make it a feature request to support version 6.x of Elasticsearch but you can likely upgrade Elasticsearch faster.

from committer-elasticsearch.

jacksonp2008 avatar jacksonp2008 commented on August 22, 2024

Unfortunately I can't easily upgrade beyond 6 as there are a lot of tools using ES right now and we would have to do a lot of testing. I may have to find another way using some of your previous recommendations from Norconex/crawlers#739

I'll try the Phantomjs approach next. Thank-you Pacal

from committer-elasticsearch.

essiembre avatar essiembre commented on August 22, 2024

I created a new snapshot release of Elasticsearch Committer V5 (working with HTTP Collector V3 stack) that introduce back the typeName for backward compatibility. If you want to go back to trying popular browsers for crawling, please try this snapshot release and confirm if it works for you.

from committer-elasticsearch.

jacksonp2008 avatar jacksonp2008 commented on August 22, 2024

from committer-elasticsearch.

jacksonp2008 avatar jacksonp2008 commented on August 22, 2024

Alright I am trying this and it runs, but there are some issues I see:

  1. There doesn't appear to be a field which contains the "title" of the page
  2. rename tagger doesn't seem to be renaming (example document reference to fs_reference)
  3. CurrentDateTagger doesn't seem to be setting @timestamp
  4. ConstantTagger doesn't seem to be setting search_title
  5. The document content doesn't seem to show up anywhere, should be in fs_content

Here is the current config for completeness.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<!-- This is for Version 3 to deal with the zoomin site -->
<httpcollector id="FS-docs-Collector">

  <!-- Decide where to store generated files. -->
  <workDir>./forescout/docs/docs-output</workDir>

  <crawlers>
    <!-- you can have multiple crawlers -->
    <crawler id="FS-docs-Crawler">
      <startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
        <url>https://docs.forescout.com</url>
      </startURLs>

      <robotsTxt ignore="true"/>

      <!-- Put a maximum depth to avoid infinite crawling (e.g. calendars). -->
      <maxDepth>24</maxDepth>

      <sitemapResolver ignore="false"/>

      <!-- Be as nice as you can to sites you crawl. -->
      <delay default="500"/>

      <!-- Document Filtering -->
      <documentFilters>
      <filter class="com.norconex.collector.core.filter.impl.ExtensionReferenceFilter" onMatch="exclude">
        jpg,jpeg,gif,png
      </filter>
      </documentFilters>

      <!-- Document importing -->
      <importer>

        <preParseHandlers>
          <!-- Pre parsing taggers can go here -->
          <!-- sample DebugTagger below <tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger" logFields="_id,id,content,title,keywords,description,document.reference" logLevel="INFO" /> -->
          <handler class="com.norconex.importer.handler.tagger.impl.DebugTagger" logLevel="INFO"/>

        </preParseHandlers>

        <postParseHandlers>
          <!-- Rename fields with a prefix for the search engine, the document can be renamed in the committer -->
          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
              <restrictTo caseSensitive="false"
                      field="title">
              </restrictTo>
              <rename fromField="title" toField="fs_title" overwrite="true" />
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
              <restrictTo caseSensitive="false"
                      field="document.reference">
              </restrictTo>
              <rename fromField="document.reference" toField="fs_reference" overwrite="true" />
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger"
            field="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" />

          <handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
            <constant name="search_title">Docs Portal</constant>
          </handler>

          <!-- If your target repository does not support arbitrary fields, make sure you only keep the fields you need
          <handler class="KeepOnlyTagger">
            <fieldMatcher method="csv">title,keywords,description,document.reference</fieldMatcher>
          </handler>
        -->
        </postParseHandlers>
      </importer>

      <!-- Decide what to do with your files by specifying a Committer. -->
      <committers>
        <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
          <!-- elastic dev site -->
          <nodes>https://search-sesasdfsafsfdsadfsafasfsafasfsafsdf1.es.amazonaws.com:443</nodes>
          <indexName>docs</indexName>
          <typeName>docs</typeName>
          <targetContentField>fs_content</targetContentField>
          <fixBadIds>true</fixBadIds>
        </committer>
      </committers>

    </crawler>
  </crawlers>
  </httpcollector>


Here is a record as shown in Kibana

{
"_index": "docs",
"_type": "docs",
"_id": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html",
"_version": 1,
"_score": null,
"_source": {
"fs_content": "\n \n\n \n",
"document.contentFamily": "html",
"Server": "cloudflare",
"collector.sitemap-changefreq": "daily",
"Content-Location": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html",
"document.reference": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html",
"X-Frame-Options": "DENY",
"Referrer-Policy": "no-referrer-when-downgrade",
"Strict-Transport-Security": "max-age=31536000; includeSubDomains",
"Content-Security-Policy": "frame-ancestors 'self'",
"collector.is-crawl-new": "true",
"Content-Encoding": "UTF-8",
"collector.http-fetcher": "com.norconex.collector.http.fetch.impl.GenericHttpFetcher",
"collector.depth": "0",
"X-XSS-Protection": "1; mode=block",
"Content-Length": "6434",
"Content-Type": "text/html; charset=UTF-8",
"cf-request-id": "08fdad0fc90000cf686b3a0000000001",
"Transfer-Encoding": "chunked",
"X-Parsed-By": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.html.HtmlParser"
],
"collector.sitemap-priority": "0.5",
"CF-RAY": "6342e45fa8ebcf68-IAD",
"X-Content-Type-Options": "nosniff",
"Connection": "keep-alive",
"collector.sitemap-lastmod": "2020-12-12T00:00Z",
"document.contentEncoding": "UTF-8",
"X-Content-Security-Policy": "frame-ancestors 'self'",
"Date": "Mon, 22 Mar 2021 22:35:15 GMT",
"X-WebKit-CSP": "frame-ancestors 'self'",
"CF-Cache-Status": "DYNAMIC",
"viewport": "width=device-width, initial-scale=1, shrink-to-fit=no",
"document.contentType": "text/html",
"Content-Language": "en",
"Expect-CT": "max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""
},
"fields": {
"collector.sitemap-lastmod": [
"2020-12-12T00:00:00.000Z"
]
},
"sort": [
1607731200000
]
}

from committer-elasticsearch.

essiembre avatar essiembre commented on August 22, 2024

Hello @jacksonp2008, I am surprised it worked at all for you since you have XML configuration syntax errors. V3 is not a straight replacement for V2. There were some changes in the config. You should find exceptions telling you so when you try to launch, like:

...
Caused by: com.norconex.commons.lang.xml.XMLException: "field" attribute has been deprecated in favor of: toField. Update your XML configuration accordingly.
...
Caused by: com.norconex.commons.lang.xml.XMLException: "field" attribute has been deprecated in favor of: fieldMatcher. Update your XML configuration accordingly.
...

Once adapted to V3, the affected handlers should look like this:

          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
              <rename toField="fs_title" onSet="replace">
                <fieldMatcher>title</fieldMatcher>
              </rename>
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
              <rename toField="fs_reference" onSet="replace">
                <fieldMatcher>document.reference</fieldMatcher>
              </rename>
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger"
            toField="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" />

          <handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
            <constant name="search_title">Docs Portal</constant>
          </handler>

I was able to run your config after making these changes. I can confirm getting all the values you mention as expected.

title has no values because there are no titles in all documents I have quickly checked. It is likely generated via JavaScript.

It seems you are ready for the next step, where you would try to crawl with a browser using (WebDriverHttpFetcher)

from committer-elasticsearch.

jacksonp2008 avatar jacksonp2008 commented on August 22, 2024

alright thanks again Pascal, will give this a try.

from committer-elasticsearch.

forescout-spollock avatar forescout-spollock commented on August 22, 2024

Something doesn't compute, I downloaded the chromedriver and added to the config per below. I tried the <fetcher under <httpcollector, then <crawler, then under <importer and it remains unhappy.

Made the handler changes you mentioned as well:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<!-- This is for Version 3 to deal with the zoomin site -->
<httpcollector id="FS-docs-Collector">
  
  <fetcher class="com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher">
    <browser>chrome</browser>
    <driverPath>/home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver</driverPath>
    <restrictions>
      <restrictTo field="document.reference">
        .*dynamic.*$
      </restrictTo>
    </restrictions>
  </fetcher>
  
  <!-- Decide where to store generated files. -->
  <workDir>./forescout/docs/docs-output</workDir>

  <crawlers>
    <!-- you can have multiple crawlers -->
    <crawler id="FS-docs-Crawler">
      <startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
        <url>https://docs.forescout.com</url>
      </startURLs>

      <robotsTxt ignore="true"/>

      <!-- Put a maximum depth to avoid infinite crawling (e.g. calendars). -->
      <maxDepth>24</maxDepth>

      <sitemapResolver ignore="false"/>

      <!-- Be as nice as you can to sites you crawl. -->
      <delay default="500"/>

      <!-- Document Filtering -->
      <documentFilters>
        <filter class="com.norconex.collector.core.filter.impl.ExtensionReferenceFilter" onMatch="exclude">
          jpg,jpeg,gif,png
        </filter>
      </documentFilters>

      <!-- Document importing -->
      <importer>

        <preParseHandlers>
          <!-- Pre parsing taggers can go here -->
          <!-- sample DebugTagger below <tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger" logFields="_id,id,content,title,keywords,description,document.reference" logLevel="INFO" /> -->
          <handler class="com.norconex.importer.handler.tagger.impl.DebugTagger" logLevel="INFO"/>

        </preParseHandlers>

        <postParseHandlers>
          <!-- Rename fields with a prefix for the search engine, the document can be renamed in the committer -->
          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
            <rename toField="fs_title" onSet="replace">
              <fieldMatcher>title</fieldMatcher>
            </rename>
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
            <rename toField="fs_reference" onSet="replace">
              <fieldMatcher>document.reference</fieldMatcher>
            </rename>
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger" toField="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"/>

          <handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
            <constant name="search_title">Docs Portal</constant>
          </handler>

          <!-- If your target repository does not support arbitrary fields, make sure you only keep the fields you need <handler class="KeepOnlyTagger"> <fieldMatcher method="csv">title,keywords,description,document.reference</fieldMatcher> </handler> -->
        </postParseHandlers>
      </importer>

      <!-- Decide what to do with your files by specifying a Committer. -->
      <committers>
        <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
          <!-- elastic dev site -->
          <nodes>https://search-sasdfasdfsafdsdfsfa.us-east-1.es.amazonaws.com:443</nodes>
          <indexName>docs</indexName>
          <typeName>docs</typeName>
          <targetContentField>fs_content</targetContentField>
          <fixBadIds>true</fixBadIds>
        </committer>
      </committers>

    </crawler>
  </crawlers>
</httpcollector>

from committer-elasticsearch.

forescout-spollock avatar forescout-spollock commented on August 22, 2024

ok, I found this: https://opensource.norconex.com/collectors/http/v3/apidocs/com/norconex/collector/http/crawler/HttpCrawlerConfig.html

and was able to get it to pass config test.

Now I am getting chrome driver issues. Seems to work when called directly?
/home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 9515
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.

Config is same:

      <httpFetchers>
    <fetcher class="com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher">
      <browser>chrome</browser>
      <driverPath>/home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver</driverPath>
    </fetcher>
</httpFetchers>

But seeing these errors:

3:43:24.966 [FS-docs-Crawler/1] INFO  CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/1,5,main]
13:43:24.967 [FS-docs-Crawler/1] INFO  Browser - Creating local "ChromeDriver" web driver.
13:43:24.975 [FS-docs-Crawler/2] INFO  CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/2,5,main]
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 13608
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.539 [FS-docs-Crawler/2] INFO  Browser - Creating local "ChromeDriver" web driver.
13:43:25.541 [FS-docs-Crawler/1] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x563b2ec582b9 <unknown>

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
	at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
	at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
	at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
	at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 12 more
13:43:25.551 [FS-docs-Crawler/1] INFO  CRAWLER_RUN_THREAD_END - Thread[FS-docs-Crawler/1,5,main]
13:43:25.551 [FS-docs-Crawler/1] INFO  WebDriverHttpFetcher - Shutting down CHROME web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 21025
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.599 [FS-docs-Crawler/2] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x564bc37f62b9 <unknown>

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
	at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
	at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
	at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
	at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 12 more
13:43:25.602 [FS-docs-Crawler] INFO  Crawler - Reprocessing any cached/orphan references...
13:43:25.609 [FS-docs-Crawler/2] INFO  CRAWLER_RUN_THREAD_END - Thread[FS-docs-Crawler/2,5,main]
13:43:25.609 [FS-docs-Crawler/2] INFO  WebDriverHttpFetcher - Shutting down CHROME web driver.
13:43:25.620 [FS-docs-Crawler] INFO  Browser - Creating local "ChromeDriver" web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 31545
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.653 [FS-docs-Crawler] ERROR HttpFetchClient - Fetcher WebDriverHttpFetcher failed to execute request.
com.norconex.collector.core.CollectorException: Could not build web driver
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetchDocumentContent(WebDriverHttpFetcher.java:312) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetch(WebDriverHttpFetcher.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.HttpFetchClient.fetch(HttpFetchClient.java:102) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveLocation(GenericSitemapResolver.java:292) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveSitemaps(GenericSitemapResolver.java:227) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.pipeline.queue.HttpQueuePipeline$SitemapStage.executeStage(HttpQueuePipeline.java:104) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:31) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:24) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.crawler.HttpCrawler.executeQueuePipeline(HttpCrawler.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.lambda$reprocessCacheOrphans$0(Crawler.java:476) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.store.impl.mvstore.MVStoreDataStore.forEach(MVStoreDataStore.java:118) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.doc.CrawlDocInfoService.forEachCached(CrawlDocInfoService.java:240) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.reprocessCacheOrphans(Crawler.java:475) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.handleOrphans(Crawler.java:448) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.doExecute(Crawler.java:413) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.startExecution(Crawler.java:277) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.jef5.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
	at com.norconex.jef5.suite.JobSuite.runJob(JobSuite.java:519) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
	at com.norconex.jef5.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:135) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
	at com.norconex.jef5.job.group.AsyncJobGroup.lambda$executeGroup$0(AsyncJobGroup.java:104) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 26 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x55b1ecb7b2b9 <unknown>

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
	at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
	at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
	at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
	at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 26 more
13:43:25.665 [FS-docs-Crawler] ERROR GenericSitemapResolver - Could not obtain sitemap: https://docs.forescout.com/sitemap.xml. Expected status code 200, but got 0.
13:43:25.665 [FS-docs-Crawler] INFO  Browser - Creating local "ChromeDriver" web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 11263
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.695 [FS-docs-Crawler] ERROR HttpFetchClient - Fetcher WebDriverHttpFetcher failed to execute request.
com.norconex.collector.core.CollectorException: Could not build web driver
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetchDocumentContent(WebDriverHttpFetcher.java:312) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetch(WebDriverHttpFetcher.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.HttpFetchClient.fetch(HttpFetchClient.java:102) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveLocation(GenericSitemapResolver.java:292) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveSitemaps(GenericSitemapResolver.java:227) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.pipeline.queue.HttpQueuePipeline$SitemapStage.executeStage(HttpQueuePipeline.java:104) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:31) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:24) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.crawler.HttpCrawler.executeQueuePipeline(HttpCrawler.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.lambda$reprocessCacheOrphans$0(Crawler.java:476) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.store.impl.mvstore.MVStoreDataStore.forEach(MVStoreDataStore.java:118) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.doc.CrawlDocInfoService.forEachCached(CrawlDocInfoService.java:240) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.reprocessCacheOrphans(Crawler.java:475) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.handleOrphans(Crawler.java:448) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.doExecute(Crawler.java:413) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler.startExecution(Crawler.java:277) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.jef5.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
	at com.norconex.jef5.suite.JobSuite.runJob(JobSuite.java:519) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
	at com.norconex.jef5.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:135) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
	at com.norconex.jef5.job.group.AsyncJobGroup.lambda$executeGroup$0(AsyncJobGroup.java:104) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 26 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x5589d7f0c2b9 <unknown>

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
	at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
	at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
	at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
	at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
	at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 26 more
13:43:25.705 [FS-docs-Crawler] ERROR GenericSitemapResolver - Could not obtain sitemap: https://docs.forescout.com/sitemap_index.xml. Expected status code 200, but got 0.
13:43:25.708 [FS-docs-Crawler/1] INFO  CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/1,5,main]
13:43:25.708 [FS-docs-Crawler/1] INFO  Browser - Creating local "ChromeDriver" web driver.
13:43:25.715 [FS-docs-Crawler/2] INFO  CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/2,5,main]
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 4832
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.763 [FS-docs-Crawler/2] INFO  Browser - Creating local "ChromeDriver" web driver.
13:43:25.764 [FS-docs-Crawler/1] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
	at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
	at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
	... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x56465cf3d2b9 <unknown>

from committer-elasticsearch.

essiembre avatar essiembre commented on August 22, 2024

Since you are having issues with the WebDrivers of HTTP Collector, I have copied your last post to this new ticket: Norconex/crawlers#746

The original issue being addressed ("typeName" missing for Elasticsearch Committer), I am closing this one.

from committer-elasticsearch.

jacksonp2008 avatar jacksonp2008 commented on August 22, 2024

Hi Pascal, were you able to make a snapshot for backward compatibility? Norconex/crawlers#746

Like from above?

I created a new snapshot release of Elasticsearch Committer V5 (working with HTTP Collector V3 stack) that introduce back the typeName for backward compatibility.

This site is a problem for us, you said you may have gotten it to work for you at some point?

from committer-elasticsearch.

essiembre avatar essiembre commented on August 22, 2024

Yes, there is a snapshot of Elasticsearch Committer that supports adding "typeName" as per #41 (comment).

Did you find something wrong with it or is your problem something else? In either case, please open a new ticket with more deatils (since this one has been closed).

from committer-elasticsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.