Giter Site home page Giter Site logo

gcs-connector:3.0.0 failing due to certificate when accessing to GCS from Github runner with WIF configuration about hadoop-connectors HOT 7 OPEN

elvin-sadigov-db avatar elvin-sadigov-db commented on July 24, 2024
gcs-connector:3.0.0 failing due to certificate when accessing to GCS from Github runner with WIF configuration

from hadoop-connectors.

Comments (7)

AngusDavis avatar AngusDavis commented on July 24, 2024 3

TL;DR: The HadoopCredentialsConfiguration provides a NetHttpTransport that uses a trust store that contains roots that work for Google APIs, but does not include the DigiCert roots needed for some other providers (e.g., github). While this thread is about WORKLOAD_IDENTITY_FEDERATION_CREDENTIAL_CONFIG_FILE, the google auth github action sets up ADC as well and that transport would also have issues. For this issue in particular, https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/util-hadoop/src/main/java/com/google/cloud/hadoop/util/HadoopCredentialsConfiguration.java#L255 should be changed to not include a net transport or the SSL context / trust store should also include CA certificates / certificates defined in javax.net.ssl.trustStore.

My notes:

  1. The exception is being raised when the IdentityPoolCredentials class attempts to load a credential from the "credential source". The credential source URL is part of the credential configuration document created by the google-github-actions/auth action. https://github.com/googleapis/google-auth-library-java/blob/v1.14.0/oauth2_http/java/com/google/auth/oauth2/IdentityPoolCredentials.java#L242
  2. Exploring the credential config with a workflow step that runs cat ${CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE}, we can find the credential_source.url. With my setup this connects to pipelinesghubeus6.actions.githubusercontent.com.
  3. I can verify auth is working by adding a step that lists GCS objects with gcloud storage ls gs://SOME_VALID_BUCKET_HERE/
  4. We can see the certificate chain for this host with:
$ openssl s_client -showcerts -connect pipelinesghubeus6.actions.githubusercontent.com:443 </dev/null 2>/dev/null

At this point in time, I see the following digicert itnermediate:

1 s:C = US, O = DigiCert Inc, CN = DigiCert Global G2 TLS RSA SHA256 2020 CA1
   i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root G2
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 30 00:00:00 2021 GMT; NotAfter: Mar 29 23:59:59 2031 GMT

-----BEGIN CERTIFICATE-----
MIIEyDCCA7CgAwIBAgIQDPW9BitWAvR6uFAsI8zwZjANBgkqhkiG9w0BAQsFADBh
MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMRkwFwYDVQQLExB3
d3cuZGlnaWNlcnQuY29tMSAwHgYDVQQDExdEaWdpQ2VydCBHbG9iYWwgUm9vdCBH
MjAeFw0yMTAzMzAwMDAwMDBaFw0zMTAzMjkyMzU5NTlaMFkxCzAJBgNVBAYTAlVT
MRUwEwYDVQQKEwxEaWdpQ2VydCBJbmMxMzAxBgNVBAMTKkRpZ2lDZXJ0IEdsb2Jh
bCBHMiBUTFMgUlNBIFNIQTI1NiAyMDIwIENBMTCCASIwDQYJKoZIhvcNAQEBBQAD
ggEPADCCAQoCggEBAMz3EGJPprtjb+2QUlbFbSd7ehJWivH0+dbn4Y+9lavyYEEV
cNsSAPonCrVXOFt9slGTcZUOakGUWzUb+nv6u8W+JDD+Vu/E832X4xT1FE3LpxDy
FuqrIvAxIhFhaZAmunjZlx/jfWardUSVc8is/+9dCopZQ+GssjoP80j812s3wWPc
3kbW20X+fSP9kOhRBx5Ro1/tSUZUfyyIxfQTnJcVPAPooTncaQwywa8WV0yUR0J8
osicfebUTVSvQpmowQTCd5zWSOTOEeAqgJnwQ3DPP3Zr0UxJqyRewg2C/Uaoq2yT
zGJSQnWS+Jr6Xl6ysGHlHx+5fwmY6D36g39HaaECAwEAAaOCAYIwggF+MBIGA1Ud
EwEB/wQIMAYBAf8CAQAwHQYDVR0OBBYEFHSFgMBmx9833s+9KTeqAx2+7c0XMB8G
A1UdIwQYMBaAFE4iVCAYlebjbuYP+vq5Eu0GF485MA4GA1UdDwEB/wQEAwIBhjAd
BgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwdgYIKwYBBQUHAQEEajBoMCQG
CCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wQAYIKwYBBQUHMAKG
NGh0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbFJvb3RH
Mi5jcnQwQgYDVR0fBDswOTA3oDWgM4YxaHR0cDovL2NybDMuZGlnaWNlcnQuY29t
L0RpZ2lDZXJ0R2xvYmFsUm9vdEcyLmNybDA9BgNVHSAENjA0MAsGCWCGSAGG/WwC
ATAHBgVngQwBATAIBgZngQwBAgEwCAYGZ4EMAQICMAgGBmeBDAECAzANBgkqhkiG
9w0BAQsFAAOCAQEAkPFwyyiXaZd8dP3A+iZ7U6utzWX9upwGnIrXWkOH7U1MVl+t
wcW1BSAuWdH/SvWgKtiwla3JLko716f2b4gp/DA/JIS7w7d7kwcsr4drdjPtAFVS
slme5LnQ89/nD/7d+MS5EHKBCQRfz5eeLjJ1js+aWNJXMX43AYGyZm0pGrFmCW3R
bpD0ufovARTFXFZkAdl9h6g4U5+LXUZtXMYnhIHUfoyMo5tS58aI7Dd8KvvwVVo4
chDYABPPTHPbqjc1qCmBaZx2vN4Ye5DUys/vZwP9BFohFrH/6j/f3IL16/RZkiMN
JCqVJUzKoZHm1Lesh3Sz8W2jmdv51b2EQJ8HmA==
-----END CERTIFICATE-----
  1. The issuer DigiCert Global Root G2 on my debian system is /etc/ssl/certs/DigiCert_Global_Root_G2.pem. We can see the fingerprints of this cert (sha1 and 256):
$ openssl x509 -fingerprint -noout -in /etc/ssl/certs/DigiCert_Global_Root_G2.pem
SHA1 Fingerprint=DF:3C:24:F9:BF:D6:66:76:1B:26:80:73:FE:06:D1:CC:8D:4F:82:A4
$ openssl x509 -fingerprint -sha256 -noout -in /etc/ssl/certs/DigiCert_Global_Root_G2.pem
sha256 Fingerprint=CB:3C:CB:B7:60:31:E5:E0:13:8F:8D:D3:9A:23:F9:DE:47:FF:C3:5E:43:C1:14:4C:EA:27:D4:6A:5A:B1:CB:5F

  1. We can see if these are included in our java keystore (digicert indicates that G2 is compatible with 8u131+, but I see notes that it was added to 8u91 by Oracle).

JDK8:

$ java -version

openjdk version "1.8.0_342"
OpenJDK Runtime Environment (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)

$ keytool -list -storepass changeit -keystore $JAVA_HOME/jre/lib/security/cacerts | grep -i CB:3C:CB:B7:60:31:E5:E0:13:8F:8D:D3:9A:23:F9:DE:47:FF:C3:5E:43:C1:14:4C:EA:27:D4:6A:5A:B1:CB:5F
Certificate fingerprint (SHA-256): CB:3C:CB:B7:60:31:E5:E0:13:8F:8D:D3:9A:23:F9:DE:47:FF:C3:5E:43:C1:14:4C:EA:27:D4:6A:5A:B1:CB:5F

Some older java8's will have this in sha1, you can grep for the sha1 (DF:3C:24:F9:BF:D6:66:76:1B:26:80:73:FE:06:D1:CC:8D:4F:82:A4) instead of the sha256 above instead if that's the case.

  1. It looks like when fetching credentials, the connector uses a custom keystore:

    provided by GoogleUtils#getCertificateTrustStore: https://github.com/googleapis/google-api-java-client/blob/b484c9b41ab226e9759c6a71c528158195468ec9/google-api-client/src/main/java/com/google/api/client/googleapis/GoogleUtils.java#L81

  2. To definitively nail down the trust store, the following java 10 class (using dependencies from the root pom), shows by printing fingerprints that while some digicert roots are included, the G2 roots are not. It further demonstrates that connecting to the above host works when using a default NetHttpTransport, but fails with the exception noted above when the transport is constructed as is done in the HttpTransportFactory.

import com.google.api.client.googleapis.GoogleUtils;
import com.google.api.client.http.GenericUrl;
import com.google.api.client.http.HttpResponseException;
import com.google.api.client.http.javanet.NetHttpTransport;
import java.security.cert.X509Certificate;
import org.apache.commons.codec.digest.DigestUtils;

public class Test {

  public static void main(String[] args) throws Exception {

    var store = com.google.api.client.googleapis.GoogleUtils.getCertificateTrustStore();

    var aliases = store.aliases();
    while (aliases.hasMoreElements()) {
      var alias = aliases.nextElement();
      System.out.println("Alias: " + alias);
      var entry = store.getCertificate(alias);
      System.out.println("Format: " + entry.getPublicKey().getFormat());
      if (entry instanceof X509Certificate) {
        X509Certificate x509 = (X509Certificate) entry;
        System.out.println("Issuer: " + x509.getIssuerX500Principal().getName());
        System.out.println("Subject: " + x509.getSubjectX500Principal().getName());
        System.out.println("Fingerprint sha256: " + DigestUtils.sha256Hex(x509.getEncoded()));
        System.out.println("Fingerprint sha1: " + DigestUtils.sha1Hex(x509.getEncoded()));
      }
      System.out.println();
    }
    System.out.println("Connecting using default NetHttpTransport");
    connectWithDefaults();
    System.out.println(
        "Connecting using NetHttpTransport with GoogleUtils.getCertificateTrustStore.");
    connectWithGoogleStore();
  }

  public static void connectWithDefaults() throws Exception {
    NetHttpTransport transport = new NetHttpTransport();
    var req = transport.createRequestFactory()
        .buildGetRequest(new GenericUrl("https://pipelinesghubeus6.actions.githubusercontent.com"));
    try {
      var resp = req.execute();
      System.out.println("Defaults response code (expect an error): " + resp.getStatusCode());
    } catch (HttpResponseException resp) {
      System.out.println("Defaults response code (expect an error): " + resp.getStatusCode());
    }
  }

  public static void connectWithGoogleStore() throws Exception {
    NetHttpTransport transport = new NetHttpTransport.Builder().trustCertificates(
        GoogleUtils.getCertificateTrustStore()).build();
    var req = transport.createRequestFactory()
        .buildGetRequest(new GenericUrl("https://pipelinesghubeus6.actions.githubusercontent.com"));
    var resp = req.execute();
    System.out.println("Google trust store code: " + resp.getStatusCode());
  }
}

The final output of the above class:

...

[ ... lots of certificates and fingerprints ... ]

...

Connecting using default NetHttpTransport
Defaults response code (expect an error): 404
Connecting using NetHttpTransport with GoogleUtils.getCertificateTrustStore.
Exception in thread "main" javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

[ ... redundant stack trace ... ]

Overall, some external ID providers probably work and some will fail.

from hadoop-connectors.

GergelyKalmar avatar GergelyKalmar commented on July 24, 2024 1

We are seeing the same issue. The connector works fine locally, so maybe this is something specific to the GitHub Actions environment?

from hadoop-connectors.

GergelyKalmar avatar GergelyKalmar commented on July 24, 2024 1

Tried running our bootstrapping process and tests on a clean Ubuntu image, it also worked fine there. There must be something with the GitHub Actions image that causes this certificate issue.

from hadoop-connectors.

elvin-sadigov-db avatar elvin-sadigov-db commented on July 24, 2024 1

Hi @GergelyKalmar, Sorry, missed your comments.
Thank you for your input!
I have checked above tickets which you have created. Seems the issue is on gcs-connector based on following comment: actions/runner-images#9354 (comment)

from hadoop-connectors.

GergelyKalmar avatar GergelyKalmar commented on July 24, 2024 1

Yes, I don't think this feature is working properly. Sadly this library does not seem to be maintained much, at least by looking at the open issues and the lack of responses.

from hadoop-connectors.

GergelyKalmar avatar GergelyKalmar commented on July 24, 2024 1

@davidrabinowitz @cnauroth We would really appreciate if this library could get a little more attention. In particular, while we see maintainers dabble around in the code, our issues and comments are not addressed for months, and features that were released do not seem to be actually working.

from hadoop-connectors.

GergelyKalmar avatar GergelyKalmar commented on July 24, 2024

Thank you @AngusDavis for your thorough investigation! Could somebody fix the SSL context / trust store accordingly? WIF should definitely work with major CI/CD environments like GitHub Actions.

from hadoop-connectors.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.