Giter Site home page Giter Site logo

Error running index job about cosr-back HOT 5 CLOSED

chaconnewu avatar chaconnewu commented on August 19, 2024
Error running index job

from cosr-back.

Comments (5)

sylvinus avatar sylvinus commented on August 19, 2024

@chaconnewu sorry about that! there's a step missing in the docs:

./scripts/import_commoncrawl.sh 0

Can you confirm it fixes the issue?

from cosr-back.

chaconnewu avatar chaconnewu commented on August 19, 2024

Thanks @sylvinus! I confirm it fixes the issue. However, another error will generate after the job is running for a while:

Caught Python exception in generator!
Traceback (most recent call last):
File "/cosr/back/cosrlib/utils.py", line 26, in wrapped
for x in fn(_args, *_kwargs):
File "/cosr/back/jobs/spark/index.py", line 87, in iter_records
warc_file = open_warc_file(filename, from_commoncrawl=(not args.warc_files))
File "cosrlib/webarchive.py", line 39, in open_warc_file
pds = conn.get_bucket('aws-publicdatasets')
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 503, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 522, in head_bucket
response = self.make_request('HEAD', bucket_name, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 665, in make_request
retry_handler=retry_handler
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1030, in _mexe
raise ex
gaierror: [Errno -2] Name or service not known

from cosr-back.

sylvinus avatar sylvinus commented on August 19, 2024

I've pushed a tentative fix:
30f7aff

Can you pull and see if it works?

from cosr-back.

chaconnewu avatar chaconnewu commented on August 19, 2024

Yes, it worked!

This is the output after indexing:
16/03/16 03:06:57 INFO PythonRunner: Times: total = 437675, boot = 144, init = 329, finish = 437202
16/03/16 03:06:57 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 118012 bytes result sent to driver
16/03/16 03:06:57 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 437930 ms on localhost (1/1)
16/03/16 03:06:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/03/16 03:06:57 INFO DAGScheduler: ResultStage 0 (count at /cosr/back/jobs/spark/index.py:165) finished in 437.994 s
16/03/16 03:06:57 INFO DAGScheduler: Job 0 finished: count at /cosr/back/jobs/spark/index.py:165, took 438.235379 s
Indexed 695 WARC records

from cosr-back.

sylvinus avatar sylvinus commented on August 19, 2024

Awesome :)

from cosr-back.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.