Comments (2)
I was able to repro this and diagnose this.
While all data to and from Google Cloud Storage is routed through the proxy, I never wired the authorization channel through the proxy, which obviously does not work, when you cannot access Google's OAuth endpoint.
I should be able to fix it in a PR by the end of the week at the latest.
from hadoop-connectors.
I am also experiencing a similar proxy issue: I have been trying to set fs.gs.proxy.address
to use corporate proxy to access GCS from my Hadoop on premise, using a command like hadoop dfs -ls gs://bucket-name
.
I checked via tcpdump
and the connector does not attempt to use the proxy at all.
I also tried to set the config variable fs.gs.http.transport.type
to a different type, as well as to export HADOOP_OPTS="$HADOOP_OPTS -Dhttp.proxyHost=...
as suggested by @peay but with no luck.
The exception that is raised is always the same (also before setting up the value for the fs.gs.proxy.address
property):
~ HADOOP_ROOT_LOGGER='DEBUG,console' hadoop fs -ls gs://bucket-name
17/06/08 12:52:14 DEBUG util.Shell: setsid exited with exit code 0
17/06/08 12:52:15 DEBUG conf.Configuration: parsing URL jar:file:/usr/lib/hadoop/hadoop-common-2.6.0-cdh5.8.2.jar!/core-default.xml
17/06/08 12:52:15 DEBUG conf.Configuration: parsing input stream sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@53eb80e9
17/06/08 12:52:15 DEBUG conf.Configuration: parsing URL file:/etc/hadoop/conf/core-site.xml
...
17/06/08 12:52:16 DEBUG gcsio.ForwardingGoogleCloudStorage: GoogleCloudStorageImpl.getItemInfo(gs://bucket-name)
17/06/08 12:52:16 DEBUG gcsio.GoogleCloudStorage: getItemInfo(gs://bucket-name)
17/06/08 12:52:16 DEBUG gcsio.GoogleCloudStorage: getBucket(bucket-name)
17/06/08 12:52:16 DEBUG util.RetryHttpInitializer: Request is missing a user-agent, adding default value of 'GHFS/1.6.0-hadoop2'
17/06/08 12:52:16 DEBUG gcsio.GoogleCloudStorage: getBucket(bucket-name) threw exception:
java.net.SocketException: Network is unreachable
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:371)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
at com.google.api.client.auth.oauth2.TokenRequest.executeUnparsed(TokenRequest.java:283)
at com.google.api.client.auth.oauth2.TokenRequest.execute(TokenRequest.java:307)
at com.google.cloud.hadoop.util.CredentialFactory$GoogleCredentialWithRetry.executeRefreshToken(CredentialFactory.java:132)
at com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489)
at com.google.api.client.auth.oauth2.Credential.intercept(Credential.java:217)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:859)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getBucket(GoogleCloudStorageImpl.java:1657)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1612)
at com.google.cloud.hadoop.gcsio.ForwardingGoogleCloudStorage.getItemInfo(ForwardingGoogleCloudStorage.java:214)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1093)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1413)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:285)
at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1656)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globStatus(GoogleHadoopFileSystemBase.java:1583)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globStatus(GoogleHadoopFileSystemBase.java:1506)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
ls: Error accessing: bucket: bucket-name
17/06/08 12:52:16 DEBUG gcs.GoogleHadoopFileSystemBase: GHFS.close:
17/06/08 12:52:16 DEBUG gcs.GoogleHadoopFileSystemBase: GHFS.processDeleteOnExit:
17/06/08 12:52:16 DEBUG gcsio.GoogleCloudStorageFileSystem: close()
17/06/08 12:52:16 DEBUG gcsio.ForwardingGoogleCloudStorage: GoogleCloudStorageImpl.close()
17/06/08 12:52:16 DEBUG gcsio.GoogleCloudStorage: close()
Notice I am able to execute other programs which requires access to the network and indeed they pick up and use the proxy settings correctly.
from hadoop-connectors.
Related Issues (20)
- BQ storage libray blocked on update to grpc v1.56 HOT 1
- GoogleCloudStorageFileSystem#delete recursive does not page
- Memory issues while running Apache Spark streaming applications on Google Dataproc cluster | OutOfMemoryError Java heap space
- flumk sink hdfs to gcs, all gcs write thread blocked
- how to transfer file from local to gcs bucket using dataproc hadoop in intellij
- GCS Connector fails with StackOverflowError during accessing hadoop credentials
- GhfsStorageStatistics cannot be cast ERROR HOT 9
- Support disabling automatic decompression of gzip files in GCS connector
- gcs-connector 3.0 not working with pyspark HOT 5
- gcs-connector:3.0.0 failing due to certificate when accessing to GCS from Github runner with WIF configuration HOT 7
- Feature request: automatic identity deduction a la google.auth.default()
- gcs-connector-3.0.0-shaded CVEs HOT 1
- How can I sink GCS connector metrics into GCP Cloud Monitor? HOT 2
- globStatus should prioritize server-side filtering over listing all files and performing local matches
- Conversion from InputStream -> ByteBuffer on gRPC writes creates many byte[] allocations. HOT 2
- Bug in `GoogleCloudStorageReadChannel` can cause an infinite loop
- hadoop3-2.2.22 and hadoop3-2.2.23 throws NoSuchMethodError at ServiceOptions.getService
- gcs-connector- CVE
- GCS connector throws rate limit errors
- Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemConfiguration HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hadoop-connectors.