jenkinsci / google-compute-engine-plugin Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://plugins.jenkins.io/google-compute-engine/
License: Apache License 2.0
Home Page: https://plugins.jenkins.io/google-compute-engine/
License: Apache License 2.0
[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-56988]
I use the plugin to create a slave node on GCE, but the node is always offline.
I saw the log is below. SSH cannot connect to GCE.
just before slave jenkins-slave-916wdt gets launched ...
executing pre-launch scripts ...
Apr 12, 2019 2:31:47 AM null
FINEST: Instance jenkins-slave-916wdt is running and ready...
Apr 12, 2019 2:31:47 AM null
INFO: Launching instance: jenkins-slave-916wdt
Apr 12, 2019 2:31:54 AM null
INFO: bootstrap
Apr 12, 2019 2:31:54 AM null
INFO: Getting keypair...
Apr 12, 2019 2:31:54 AM null
INFO: Using autogenerated keypair
Apr 12, 2019 2:31:54 AM null
INFO: Authenticating as jenkins
Apr 12, 2019 2:31:55 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:31:56 AM null
INFO: Failed to connect via ssh: There was a problem while connecting to 35.229.250.191:22
Apr 12, 2019 2:31:56 AM null
INFO: Waiting for SSH to come up. Sleeping 5.
Apr 12, 2019 2:32:01 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:01 AM null
INFO: Failed to connect via ssh: There was a problem while connecting to 35.229.250.191:22
Apr 12, 2019 2:32:01 AM null
INFO: Waiting for SSH to come up. Sleeping 5.
Apr 12, 2019 2:32:06 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:07 AM null
INFO: Connected via SSH.
Apr 12, 2019 2:32:07 AM null
INFO: connect fresh as root
Apr 12, 2019 2:32:07 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:07 AM null
INFO: Connected via SSH.
Apr 12, 2019 2:32:07 AM null
INFO: Copying agent.jar to: /tmp
Apr 12, 2019 2:32:09 AM null
INFO: Verifying: java -fullversion
bash: java: command not found
Apr 12, 2019 2:32:09 AM null
WARNING: Java is not installed.
Apr 12, 2019 2:32:09 AM null
INFO: Launching Jenkins agent via plugin SSH: java -jar /tmp/agent.jar
Apr 12, 2019 2:32:09 AM null
WARNING: Error getting exception Exception: java.io.IOException: SSH channel is closed
Options
[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-52736]
This wastes compute resources and costs.
Ideally the plugin would not do this, but in addition, having a periodic (every 5 minutes) check to go through the current VMs in the project, see which ones are tagged with "jenkins" and then automatically terminate any VMs tagged with that and not known to Jenkins. This would make it resilient against unexpected Jenkins restarts, etc. (though it should be an option in case multiple Jenkins instances share the same GCE project).
Parallelize the snapshot creation since it currently blocks.
After setting up the plugin using the 'default' subnet, I got the error Invalid value for field 'resource.networkInterfaces[0]': '{ \"accessConfig\": [{ \"type\": \"ONE_TO_ONE_NAT\", \"name\": \"External NAT\" }]}'. Subnetwork should be specified for custom subnetmode network"
when trying to add a slave. On the Jenkins server, I went to the config.xml and found <networkConfiguration class="com.google.jenkins.plugins.computeengine.AutofilledNetworkConfiguration"> <network>https://www.googleapis.com/compute/v1/projects/zoominfo-2/global/networks/default</network> <subnetwork>default</subnetwork>
I manually updated the subnetwork part to <subnetwork>https://www.googleapis.com/compute/v1/projects/zoominfo-2/regions/us-east1/subnetworks/default</subnetwork>
, reloaded the config in the Jenkins UI and everything worked fine. Then I updated the config in the UI and the config reverted back. I also wanted to use a different subnet, but 'default' was the only option.
Referencing this issue
Objective is to improve documentation so that users know that what they name their clouds will determine the instance cap for each cloud. Code logic
Reported on #gcp-jenkins.
This issue appears to be breaking the plugin for customers
After a worker has been created, it's not obvious which specific instance configuration created it. On create, we should either set the instance config as a property of the worker (via the Instance or Computer class), or consider giving each instance config a guid when it is created and setting that guid on the GCE instance metadata for the worker when it is created.
See parent issue
Unable to add GCE node (Manage Jenkins->Manage Nodes->New Node-> type name and select "google compute engine") because of error.
I've seen a couple of users experience this issue although it is not our recommended workflow.
Right now when an instance gets preempted, the jobs that were running on the instance just stall, with the agent in an (offline)
status. The builds never progress and never change state, even though the underlying machine is no longer there.
Since the plugin can periodically detect that the VM no exists, it should delete the agent from Jenkins and either abort or reschedule the jobs that were running on that agent.
Greetings,
We've installed v2.0 on several of our jenkins masters, and encountered the following issue:
Our jenkins masters are independant instances, most teams run their own masters. But slaves are launched and run in one google project, maintained by IT. For some masters (teams), new "one-shot" feature is enabled, for others, it is not. And some remaining masters are still running older versions of the plugin without these new one-shot and snapshot features at all.
Apparently, some of the masters with one-shot enabled will start killing slaves of other masters, where one-shot is disabled, probably following some naming pattern which can be similar for different masters. As a result, some teams end up with their slaves killed by other teams' masters while jobs are still run: e.g. "Team A" slave gets killed by "Team B" master, which are two independent teams.
The ComputeEngineCloud currently does not properly handle the case when the ComputeClient failed to be setup.
Our GCE cloud is configured to not allow direct Internet access, either via external IP or via NAT, and all traffic must leave via a proxy where it can be logged. However, the GCE plugin does not provide any way to set the required http_proxy
et al environment variables on a per-cloud basis; the best available workaround would be to set them at the Jenkins global level, but then our non-GCE instances (bare metal, legacy EC2, corp Mac Minis, etc) will have the wrong proxy configured.
Right now, windows launcher launches agents on a location hard-coded into the launcher even though we allow users to specify this. Need to fix this bug.
We have been using this plugin in Jenkins installed from this guide successfully for some time now. However recently, about 75% of the machines launched automatically by Jenkins are not successfully coming online, and leading to errors shown in the GCP logs.
We are seeing that when the launches fail, the Jenkins node logs show multiple
Failed to connect via ssh: The connect() operation on the socket timed out.
errors before the machine is terminated. We have also been intermittently been seeing "Internal Error" 13 in GCP logs, and 404 errors when Jenkins is trying to delete instances that were otherwise terminated in GCP.
We are unable to reproduce by launching machines manually with the same machine templates in GCP and connecting them to Jenkins by hand - this succeeds every time.
From #69 I discovered that currently users only learn through the documentation that the boot disk image must have Java 8 installed.
Currently, only the bootDiskAutoDelete field has help text. This tracks the work to add help text to the remaining boot disk fields to provide users information on the boot disk requirements directly in the UI.
Jenkins ver. 2.150.2 running in docker on an ubuntu 18.04 host in GCP
Google compute engine plugin 2.0.0 (latest available)
Using an instance template to spin up n1-standard-4 VMs using SSDs
Jenkins master process is started with the following JVM Options for faster response when workers are needed. (Still experimenting with the right values).
JAVA_OPTS="-Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85"
The problem is that certain job runs are not passed the parameters they should be getting and they fail.
This seems to happens more when 100 or more job runs are triggered concurrently by a trigger. Sometimes with as little as 50.
The failed jobs seem to get an executor on a worker, and start running but fail because the parameters they should have received aren't there. So the processing that expects valid parameters values fails.
To test this setup I setup 2 Pipeline groovy jobs. The content of both jobs is attached.
Parent job
Child job
is triggered multiple times by the parent job and passed some parameters
each run of this job requires its own worker
does something reasonably simple
- verifies the worker it is running on is ready by checking for a file (this is just verifying that any necessary build caches like gradle, npm, pip, are on the worker)
- using the parameters passed in by the parent job, it tries to download a file from GCS
- then it sleeps for 15 mins, just to hold the worker
When this job fails, it is because the parameters it should have received are missing and it can't do the download of an artifact from GCS
I have attached screen shots of what the parameters page looks like on a successful run as well as an unsuccessful run.
I don't know when or how this changed from ./.jenkins-slave but it is now copying to this directory.
i have tried manually defining /home/ubuntu/.jenkins-slave but this fails as the directory does not exist
Seems like the SSH key injection via the plugin is happening unnecessarily late.
The SSH public key can be appended to the metadata before instance creation, rather than at launch time.
This way startup scripts can reference the ssh keys in metadata.
I use Compute Engine Plugin (v. 3.0.0) for connecting GCE instances to Jenkins CI (v. 2.159). Jenkins automatically creates the instances (e.g. CentOS 6,7, Debian 9 - I tried official images that provides Google Cloud Engine) when some job is stared, but in specific time in every hour (e.g. every XX:57, yesterday it was every XX:53) all these machines are terminated no matter how long does they run. In logs of machines there are just information about the shutdown, anything special:
...
08:46:33 jenkins-gce-cent-7-cv5jlc systemd: Startup finished in 1min 30.753s.
08:47:54 jenkins-gce-cent-7-cv5jlc systemd-logind: Power key pressed.
08:47:54 jenkins-gce-cent-7-cv5jlc systemd-logind: Powering Off...
...
Steps to reproduce:
Prepare some template in GCE, use it in Jenkins with Google Compute Engine plugin, start some job and during an hour the machines will be terminated.
I attach log from Jenkins about connected machine and log from /var/log/messages from the virtual machine
A GCE image family will point to the latest image.
GCE plugin users should be able to select only the image project and image family instead of the image family and specific image name.
For example, the image family ubuntu-1804-lts
points to ubuntu-1804-bionic-v20190514
.
$ gcloud compute images describe-from-family ubuntu-1804-lts --project=ubuntu-os-cloud
archiveSizeBytes: '9522384640'
creationTimestamp: '2019-05-14T20:02:56.234-07:00'
description: Canonical, Ubuntu, 18.04 LTS, amd64 bionic image built on 2019-05-14
diskSizeGb: '10'
family: ubuntu-1804-lts
guestOsFeatures:
- type: VIRTIO_SCSI_MULTIQUEUE
id: '8613129354617438128'
kind: compute#image
labelFingerprint: 42WmSpB8rSM=
licenseCodes:
- '5926592092274602096'
licenses:
- https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/licenses/ubuntu-1804-lts
name: ubuntu-1804-bionic-v20190514
rawDisk:
containerType: TAR
source: ''
selfLink: https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/ubuntu-1804-bionic-v20190514
sourceType: RAW
status: READY
Based off: #63
Should change scheduling in InstanceConfiguration.java to terminate on host maintenance only for instances with GPU's.
See dkozlov@7b7af84 for an example
Currently there exists two sets of duplicate integration tests for Linux and Windows. This creates an additional maintenance burden as when tests need to be updated, they have to be updated in both sets. Ideally, there should be one set of integration tests that can take different configurations as parameters (which could help distinguish which OS we're testing, etc).
Also, might move these out of ComputeEngineCloud since these don't really test the cloud; they test more of the configuration. Will need to parameterize some shared constants (util file).
Tech debt note: the constructor for InstanceConfiguration has gotten large enough to warrant use of a builder.
Originally posted by @craigatgoogle in #55
Related: We should move all non-essential fields into @DataboundSetters. This would help to cut down on the constructor size.
I am doing some post-boot stuff with cloud-init and startup scripts to prepare my instance for Jenkins (mounting NVMe disks and post-boot Ansible playbook). I'm actually wanting to store Jenkins workspace data on the NVMe SSD for performance reasons which means I need to wait for this to complete before the plugin can copy and start the agent.jar
.
In 3.0.0 I was able to hack around this by creating a custom /usr/local/bin/java
script that was picked up when the instance installer first checked output of java -version
. The wrapper script would artificially hang until the Ansible playbook had completed (to Jenkins it just appears that the command returns very slowly).
This issue is a feature request to have a way to wait for the instance to be ready via external method. The simple fix to enable the hack again is to just move the java -version
check before the agent.jar
copy. But I actually think longer term it might make sense to add a check for the cloud-init Final stage using cloud-init status --wait
or the /var/lib/cloud/instance/boot-finished
file. This would likely account for my scenario and others where people bootstrap instances using cloud-init where there might be stuff required before the agent can actually run jobs.
I wanted to use instance templates when setting up a new Jenkins instance, but I've found that the "Use Internal IP?" appears to be ignored when they're used.
That option should probably be moved out of the "Advanced" section, so it can still be toggled even when using instance templates.
[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-55412]
When provisioning nodes in ComputeEngineCloud the code currently does a direct out of band call to jenkins.getInstance().addNode(). This is antithetical to the NodeProvisioner workflow which expects to call jenkins.getInstance().addNode() when the PlannedNode's future successfully returns.
Channel "unknown": Remote call on jenkins-worker-rcisma failed. The channel is closing down or has closed down
I would like to ask, how can I configure some automated deletion of snapshots created via your Compute Engine plugin in Jenkins. There is no option for that in Jenkins. Also I did not find anything like that in Google Compute Engine what would be applicable to snapshots created by your plugin. Could you please help me with it?
Thank you
Hello, it seems that it is possible to create GPU agent by specifying instance template for creating instances. But it is not possible to create GPU agent without specifying instance template. See related issue https://issues.jenkins-ci.org/browse/JENKINS-52708.
As workaround you can use following dkozlov@7b7af84
Could you please disable GPU support in Machine configuration UI or fix it
Provisioning node from config com.google.jenkins.plugins.computeengine.InstanceConfiguration@3bafb6a8 for excess workload of 1 units of label 'jenkins-gpu'
Apr 08, 2019 5:23:23 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud availableNodeCapacity
Found capacity for 99 nodes in cloud
Apr 08, 2019 5:23:24 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud provision
Error provisioning node
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Instances with guest accelerators do not support live migration.",
"reason" : "badRequest"
} ],
"message" : "Instances with guest accelerators do not support live migration."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.jenkins.plugins.computeengine.client.ComputeClient.insertInstance(ComputeClient.java:374)
at com.google.jenkins.plugins.computeengine.InstanceConfiguration.provision(InstanceConfiguration.java:319)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provision(ComputeEngineCloud.java:203)
at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62)
at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This plugin supports using GCE instance templates for provisioning a new Jenkins SSH slave.
I created a new instance template in Google compute engine, which uses the GCE feature deploy containers. As the Container-image, I am using openjdk:11-jre-slim.
When using this plugin, a new VM is booted up by my Jenkins job, but Jenkins master fails to connect to it.
From Jenkins log:
May 03, 2019 9:13:31 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Connecting to 35.209.XXX.YY on port 22, with timeout 10000.
May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Connected via SSH.
May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
connect fresh as root
May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Connecting to 35.209.254.68 on port 22, with timeout 10000.
May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Connected via SSH.
May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Copying agent.jar to: /tmp
May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Verifying: java -fullversion
May 03, 2019 9:13:35 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Java is not installed.
Well, this failure seems quite obvious to me. There is no Java installed on the machine/host with the (external) IP 35.209.XXX.YY. There would be / is a docker image openjdk:11-jre-slim available on this machine, which could/should be used for executing Java.
Please let this plugin support the feature 'deploying-containers'
PS: This is a follow-up of https://issues.jenkins-ci.org/browse/JENKINS-52251
Is The path to the java
executable is used hard-coded, see https://github.com/jenkinsci/google-compute-engine-plugin/blob/master/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineLinuxLauncher.java#L130 and https://github.com/jenkinsci/google-compute-engine-plugin/blob/master/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineWindowsLauncher.java#L129. Is is expected to find java
on the $PATH.
Feature request Please make this path (optional) configurable.
Use cases
[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-56201]
Upgrade of plugin google-compute-engine-plugin to version 1.0.10 causes that vm instances created in this same GCP project but from different Jenkins Master are removed.
It looks that these changes
So detection of unique instances should probably ensure that vm instances comes from this same jenkins master before they are deleted.
As workaround different cloud names could be used on each jenkins master.
A good amount of the instance provisioning, cleanup, and scaling logic can be eliminated from the code base by utilizing GCE's managed instance groups: https://cloud.google.com/compute/docs/instance-groups/
This would also solve some of outstanding issues with efficiency.
This is similar to jenkinsci/google-kubernetes-engine-plugin#27
Here the main focus will be on ComputeEngineCloudIT.java because they take the most time. I'll also look at addressing #40 so that I don't have to do this change twice.
There exists a large degree of duplicated logic in both launch methods for Linux and Windows. Ideally this logic should be consolidated. See:
There are examples throughout the code base of accessing the ComputEngineCloud's compute client through an exposed field. This introduces tight-coupling by introducing an undocumented dependence on the initialization logic/timing within the ComputeEngineCloud. Ideally this field should be encapsulated and current reference sites should be refactored to accessing the compute client in an abstract and uniform manner.
Need to investigate if the fields need to be public because of the abstract classes from Jenkins.
Original issue
Hey there, i've noticed recently that the one shot feature is not functioning correctly and my jobs are all trying to load onto the same instance, despite the flag being enabled.
Hi all,
When configuring this plugin using JCasC, GCE agent VMs will not launch. The relevant fields seems to be populated in the Jenkins 'Configure System' UI, but the VMs are not able to launch until Jenkins' configuration is saved using the UI.
Here is my JCasC configuration relating to this plugin. In my case, I'm creating a brand new Jenkins instance from scratch, as you might do when running the Jenkins master in a Docker container.
jenkins:
clouds:
- computeEngine:
cloudName: gce-jenkins-build
projectId: gce-jenkins
instanceCapStr: 1
credentialsId: gce-jenkins
configurations:
- namePrefix: jenkins-agent-image
description: Jenkins agent
launchTimeoutSecondsStr: 6
retentionTimeMinutesStr: 300
mode: EXCLUSIVE
labelString: jenkins-agent
numExecutorsStr: 1
runAsUser: jenkins
remoteFs: '' # tried not setting this, field added when 'save' clicked in UI
windows: false
windowsPasswordCredentialsId: '' # tried not setting, added when saved in UI
windowsPrivateKeyCredentialsId: '' # tried not setting, added when saved in UI
oneShot: true
createSnapshot: false
region: "https://www.googleapis.com/compute/v1/projects/gce-jenkins/regions/europe-west1"
zone: "https://www.googleapis.com/compute/v1/projects/gce-jenkins/zones/europe-west1-a"
template: '' # tried not setting, added when 'saved' in UI
machineType: "https://www.googleapis.com/compute/v1/projects/gce-jenkins/zones/europe-west1-a/machineTypes/n1-standard-2"
preemptible: false
minCpuPlatform: '' # tried not setting, added when 'saved' in UI
startupScript: '' # tried not setting, added when 'saved' in UI
networkConfiguration:
sharedVpc:
projectId: gce-jenkins-cloud-123456
region: europe-west1
subnetworkShortName: gce-jenkins-cloud
networkTags: jenkins-agent
externalAddress: true
useInternalAddress: false
bootDiskSourceImageProject: gce-jenkins
bootDiskSourceImageName: "https://www.googleapis.com/compute/v1/projects/gce-jenkins/global/images/gce-jenkins-build-image"
bootDiskType: "https://www.googleapis.com/compute/v1/projects/gce-jenkins/zones/europe-west1-a/diskTypes/pd-standard"
bootDiskSizeGbStr: 50
bootDiskAutoDelete: true
serviceAccountEmail: '[email protected]'
I did a comparison of config.xml
before and after hitting save in the UI, and there's no difference in the GCE plugin configuration section, but Jenkins is suddenly able to launch GCE VMs.
(For the fields marked with '#' above, I initially tried leaving out that configuration entirely from the JCasC configuration, but it got added to Jenkins' config.xml
when I hit 'save' in the Jenkins 'Configure System' UI (and still had the same problem of no VMs launching). In order to minimise the diff between the state of config.xml
before an after hitting save in the UI, I added it to the JCasC configuration.)
As noted by @devqore in this JCasC issue Google Compute nodes are also being disconnected during a JCasC configuration reload - if Jenkins has more than one permanent node configured in addition to the GCE nodes.
We have a separate GCE project that runs packer to build our GCE machine images. We can configure the GCE Jenkins plugin to use these via init.groovy.d
scripting, but the result is that the UI is actively dangerous to use because clicking "Save" after changing anything on the page will cause the machine image URI to be cleared to the empty string. I would very much like the ability to type a free-form GCE project name or even a full GCE machine image URI; failing that, I would like for the Save button to not cause an outage.
Is This plugin starts up a new instance in GCE. It connects to the new instances via SSH using an user, which is configured in the plugin settings in Jenkins (for example, the user "jenkins" is used for connecting via SSH). If this is successful, the plugin reconnects as root via SSH again to the instance. As root, the Jenkins slave is started.
See https://github.com/jenkinsci/google-compute-engine-plugin/blob/master/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineLinuxLauncher.java#L110 and https://github.com/jenkinsci/google-compute-engine-plugin/blob/master/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineWindowsLauncher.java#L112
Question Why is the Jenkins slave started as root?
Downsides As far as I can see, the Jenkins slave could be started as the first SSH-user, too. This should decrease the time it takes to setup a new Jenkins slave. Furthermore, running the Jenkins slave as root should be avoided for security reasons.
[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-55515]
I can't use windows instances in GCE because of launching agent problem. Please help me to debug/resolve the issue.
Steps to reproduce:
Create windows instance in GCE
Login to the instance by RDP and add user tester with Administrator role
Login to the instance by RDP as tester
Install java8, cygwin with openssh, configure openssh (see how to here: https://docs.oracle.com/cd/E24628_01/install.121/e22624/preinstall_req_cygwin_ssh.htm#EMBSC281)
Check you are able to connect by ssh with tester user and its password
Create private/public rsa keypair (using ssh-keygen), put public key to the /home/tester/.ssh/authorized_keys file
Copy generated private key to your computer to ~/key.txt
Check you are able to connect to the instance with private key without password:
ssh -i ~/key.txt tester@<ip_address>
Stop the instance and create an image from the instance
Goto http://<your_jenkins_address>/credentials/ page and add new credentials "SSH Username with private key", choose "enter directly" for private key and put generated private key here
Add new "Instance configuration" for created image on http://<your_jenkins_address>/configure page: set "Windows?" checkbox, set "Windows Username"=tester, set "Windows SSH Private Key Credentials" to credentials that were created on prev step, set Labels=windows-gce-test, set "Remote Location"=C:\jenkins
Run job with a label "windows-gce-test"
Expected: new GCE instance and jenkins slave are created, the slave is successfully connected and job is successfully ended
Actual: new GCE instance and jenkins slave are created, but slave can't connect
Slave output is the following:
INFO: Connecting to 35.233.217.99 on port 22, with timeout 10000.
Jan 10, 2019 4:42:07 AM null
INFO: Connected via SSH.
Jan 10, 2019 4:42:08 AM null
INFO: Copying slave.jar to: C:
Jan 10, 2019 4:42:11 AM null
INFO: Verifying: java -fullversion
openjdk full version "1.8.0_181-b02"
Jan 10, 2019 4:42:12 AM null
INFO: Launching Jenkins agent via plugin SSH: java -jar C:\slave.jar
Jan 10, 2019 4:42:12 AM null
WARNING: Error: Exception: java.io.EOFException: unexpected stream termination
Jenkins log:
Connected via SSH.
Jan 10, 2019 5:04:02 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Copying slave.jar to: C:
Jan 10, 2019 5:04:03 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Verifying: java -fullversion
Jan 10, 2019 5:04:03 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Launching Jenkins agent via plugin SSH: java -jar C:\slave.jar
Jan 10, 2019 5:04:03 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Error:
java.io.EOFException: unexpected stream termination
at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:408)
at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:353)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:415)
at com.google.jenkins.plugins.computeengine.ComputeEngineWindowsLauncher.launch(ComputeEngineWindowsLauncher.java:128)
at com.google.jenkins.plugins.computeengine.ComputeEngineComputerLauncher.launch(ComputeEngineComputerLauncher.java:127)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:288)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Jan 10, 2019 5:04:03 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud$1 call
Exception waiting for node zulu-win2016-tests-gce-enn0eu to connect
java.io.IOException: Agent failed to connect, even though the launcher didn't report it. See the log output for details.
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:312)
Caused: java.util.concurrent.ExecutionException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud$1.call(ComputeEngineCloud.java:171)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud$1.call(ComputeEngineCloud.java:161)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I'm not sure that GCE uses "Remote Location" for windows slaves (because the log says "INFO: Copying slave.jar to: C:"). Could it be a cause of problem?
I'm unable to create GCE node because of JENKINS-55380 but tried to create permanent aget node and agent starts normally for it, I used the following parameters:
Permanent Agent
Remote root directory=.
Launch method=Launch agent agents via SSH
Credentails=<created_credentials_with_private_key>
Node log:
[01/10/19 05:14:27] [SSH] Checking java version of ./jdk/bin/java
Couldn't figure out the Java version of ./jdk/bin/java
bash: ./jdk/bin/java: No such file or directory
[01/10/19 05:14:27] [SSH] Checking java version of java
[01/10/19 05:14:27] [SSH] java -version returned 1.8.0_181.
[01/10/19 05:14:27] [SSH] Starting sftp client.
[01/10/19 05:14:28] [SSH] Copying latest remoting.jar...
[01/10/19 05:14:30] [SSH] Copied 762,466 bytes.
Expanded the channel window size to 4MB
[01/10/19 05:14:30] [SSH] Starting agent process: cd "." && java -jar remoting.jar -workDir .
Jan 10, 2019 1:14:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using .\remoting as a remoting work directory
Both error and output logs will be printed to .\remoting
<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 3.17
This is a Windows agent
NOTE: Relative remote path resolved to: C:\cygwin64\home\tester.
Agent successfully connected and online
https://cloud.google.com/compute/docs/shutdownscript documents the process for adding a shutdown script. This will be very useful for us when dealing with pre-empted nodes. The functionality would be the same as startup script. I have it working locally so will put a PR up if this issue gets backing
Reference: https://issues.jenkins-ci.org/browse/JENKINS-55518
Through the code base there are many examples of public fields in classes. This issue tracks the code cleanup work in re-factoring these to use proper getters/setters.
Running testNoSnapshotCreatedSnapshotNull and then testNoSnapshotCreatedInstanceStopping in ComputeEngineCloudNoSnapshotCreatedIT causes failure.
Need to deep dive into this issue since tests should pass regardless of order.
I'll hopefully complete this by 5/3/19 as I will be primary bug duty.
When there are 2 build agents available, I run 3 jobs. the 2 build agents pick up 2 jobs respectively(which does a sleep for 30s), while the the 3rd job sits in the queue.
The behavior that I expected was that it would spin up a third build agent to pickup the third job.
Here are my configuration settings.
Hi there
We are using the google-compute-engine-plugin 3.0.0 and Jenkins 2.164.3 and we are having problems with what seems to be the plugin calling a delete on the VM before the jenkins job completes.
Google Compute Engine plugin Timeout settings for these instances are:
Launch Timeout: 300
Node Retention Time: 6
In our jenkins build logs that are errors like:
20:48:52 Cannot contact jenkins-workers-eciv4a: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on jenkins-workers-eciv4a failed. The channel is closing down or has closed down. this happens in the middle of the job when there is still activity and network connectivity (confirmed via vpc flow logs)
Afer this error the job eventually times out and fails.
However in strackdriver we see that several seconds before the error above there is a delete call on that instance which seems to be coming from the plugin:
20:48:45
{
insertId: "qxcwjoe2k5x4"
logName: "projects/project/logs/cloudaudit.googleapis.com%2Factivity"
operation: {
first: true
id: "operation-1559854124970-58aadd705331f-3facdafe-cec32e65"
producer: "type.googleapis.com"
}
protoPayload: {
@type: "type.googleapis.com/google.cloud.audit.AuditLog"
authenticationInfo: {
principalEmail: "[email protected]"
}
authorizationInfo: [
0: {
granted: true
permission: "compute.instances.delete"
resourceAttributes: {
name: "projects/project/zones/us-central1-b/instances/jenkins-workers-eciv4a"
service: "compute"
type: "compute.instances"
}
}
]
methodName: "v1.compute.instances.delete"
request: {
@type: "type.googleapis.com/compute.instances.delete"
}
requestMetadata: {
callerIp: "10.2.0.94"
callerNetwork: "//compute.googleapis.com/projects/project/global/networks/unknown"
callerSuppliedUserAgent: "jenkins-google-compute-plugin Google-HTTP-Java-Client/1.24.1 (gzip),gzip(gfe)"
destinationAttributes: {
}
requestAttributes: {
auth: {
}
time: "2019-06-06T**20:48:45.**071Z"
}
}
could this be a bug in the plugin or a configuration issue? Is there any way to get extra logging for the plugin that will help determine the cause?
I use the plugin to create a GCE Slave, but the slave is always offline.
I saw the log, it seems cannot connect to the slave via ssh.
just before slave jenkins-slave-916wdt gets launched ...
executing pre-launch scripts ...
Apr 12, 2019 2:31:47 AM null
FINEST: Instance jenkins-slave-916wdt is running and ready...
Apr 12, 2019 2:31:47 AM null
INFO: Launching instance: jenkins-slave-916wdt
Apr 12, 2019 2:31:54 AM null
INFO: bootstrap
Apr 12, 2019 2:31:54 AM null
INFO: Getting keypair...
Apr 12, 2019 2:31:54 AM null
INFO: Using autogenerated keypair
Apr 12, 2019 2:31:54 AM null
INFO: Authenticating as jenkins
Apr 12, 2019 2:31:55 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:31:56 AM null
INFO: Failed to connect via ssh: There was a problem while connecting to 35.229.250.191:22
Apr 12, 2019 2:31:56 AM null
INFO: Waiting for SSH to come up. Sleeping 5.
Apr 12, 2019 2:32:01 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:01 AM null
INFO: Failed to connect via ssh: There was a problem while connecting to 35.229.250.191:22
Apr 12, 2019 2:32:01 AM null
INFO: Waiting for SSH to come up. Sleeping 5.
Apr 12, 2019 2:32:06 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:07 AM null
INFO: Connected via SSH.
Apr 12, 2019 2:32:07 AM null
INFO: connect fresh as root
Apr 12, 2019 2:32:07 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:07 AM null
INFO: Connected via SSH.
Apr 12, 2019 2:32:07 AM null
INFO: Copying agent.jar to: /tmp
Apr 12, 2019 2:32:09 AM null
INFO: Verifying: java -fullversion
bash: java: command not found
Apr 12, 2019 2:32:09 AM null
WARNING: Java is not installed.
Apr 12, 2019 2:32:09 AM null
INFO: Launching Jenkins agent via plugin SSH: java -jar /tmp/agent.jar
Apr 12, 2019 2:32:09 AM null
WARNING: Error getting exception Exception: java.io.IOException: SSH channel is closed
As requested here
"Currently the google-compute-engine-plugin creates jenkins node and terminates them if idle. Instead of terminate, it's better to support stop the instance when idle."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.