Giter Site home page Giter Site logo

Comments (3)

dennishuo avatar dennishuo commented on August 27, 2024

This should be fixed now with 141b1ef

To test, just run:

git clone https://github.com/GoogleCloudPlatform/bigdata-interop.git
cd bigdata-interop
mvn -P hadoop1 package
# Or or Hadoop 2
mvn -P hadoop2 package

And you should find the files "gcs/target/gcs-connector-_-shaded.jar" available for use. To plug it into bdutil, simply gsutil cp gcs/target/gcs-connector-_shaded.jar gs://<your-bucket>/some-path/and then editbdutil/bdutil_env.shfor Hadoop 1 orbdutil/hadoop2_env.sh to change:

GCS_CONNECTOR_JAR='https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-1.4.1-hadoop2.jar'

To instead point at your gs://<your-bucket>/some-path/ path; bdutil automatically detects that you're using a gs:// prefixed URI and will do the right thing during deployment.

Please let us know if it fixes the issue for you!

from hadoop-connectors.

oren-yowza avatar oren-yowza commented on August 27, 2024

It looks good now, I don't get those errors anymore, thanks!

Though, i'm curious about your solution - I expected to see some exponential backoff or rate limiting of the API requests. Instead I saw that you checked whether the write operation did actually success even when got error.
Is this always the case? The GCS always writes the object when answering rateLimit errors? And no need for exponential backoff retries?

from hadoop-connectors.

dennishuo avatar dennishuo commented on August 27, 2024

Right, we considered whether exponential backoff would be appropriate; we do already plug in a lower-level retry initializer which makes exponential backoff on 5xx errors. One consideration is that initial backoffs for 5xx errors are quite short, since under normal circumstances those indicate simply re-sending and getting routed to a fresh frontend should fix the problem, whereas bucketing for these particular types of 429 errors can be on the order of seconds, so that there'd be several retries which fail out with a 429 again, ultimately really slowing down the startup of Spark jobs.

In this case, it's not quite the original GCS write which succeeded writing (in fact, on a rateLimit error, we should expect the request which errored out to not have been written), but actually other concurrent writers from possibly other machines which caused them to be written.

In this particular case of "empty objects" being related to directory placeholders, that means we can optimize out the need to retry over the course of several seconds since the rate limit likely means another writer already did our job for us (also why we have to check the metadata, in cases where createEmptyObjects is used for more advanced metadata).

from hadoop-connectors.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.