axsh / openvdc Goto Github PK
View Code? Open in Web Editor NEWExtendable Tiny Datacenter Hypervisor on top of Mesos architecture. Wakame-vdc v2 Project.
License: GNU Lesser General Public License v3.0
Extendable Tiny Datacenter Hypervisor on top of Mesos architecture. Wakame-vdc v2 Project.
License: GNU Lesser General Public License v3.0
Our binaries aren't stripped. This means unnecessary symbols and debug information is still in there. We should strip them to save space.
[kemumaki@executor bin]$ file openvdc
openvdc: ELF 64-bit LSB executable .... not stripped
[kemumaki@executor bin]$ file openvdc-executor
openvdc-executor: ELF 64-bit LSB executable .... not stripped
Run the strip
command on them before packaging. It's just a small difference but we should still do it.
Read the blog @unakatsuo posted below. :)
The following scenario:
Now the LXC container is stopped but OpenVDC thinks it's running.
Result:
Feb 20 14:38:24 ci openvdc-scheduler[2806]: 2017-02-20 14:38:24 [FATAL] github.com/axsh/openvdc/api/instance_service.go:86 BUGON: Detected un-handled state instance_id=i-0000000000 state=state:RUNNING created_at:<seconds:1487314564 nanos:237858284 >
Feb 20 14:38:24 ci systemd[1]: openvdc-scheduler.service: main process exited, code=exited, status=1/FAILURE
Feb 20 14:38:24 ci systemd[1]: Unit openvdc-scheduler.service entered failed state.
Feb 20 14:38:24 ci systemd[1]: openvdc-scheduler.service failed.
The openvdc-scheduler
service dies.
# systemctl status openvdc-scheduler
โ openvdc-scheduler.service - OpenVDC scheduler
Loaded: loaded (/usr/lib/systemd/system/openvdc-scheduler.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2017-02-20 14:38:24 JST; 6min ago
Process: 2806 ExecStart=/opt/axsh/openvdc/bin/openvdc-scheduler (code=exited, status=1/FAILURE)
Main PID: 2806 (code=exited, status=1/FAILURE)
On executor start, OpenVDC should check that all instances are in their expected state. If they are not, they should be brought to the states OpenVDC expects them to be.
When start is called on a container OpenVDC thinks is "RUNNING", first check which state the instance is actually in. Then switch it to the correct state and run the start
command on that.
Make sure that scheduler never dies no matter what state start
is called on.
Other suggestions welcome. ^_^
openvdc-executor
is requiring local specific configuration parameters, such as bridge/openvswitch name.
The change could be similar to #64.
spf13/cobra
/etc/openvdc/executor.conf
at startup using spf13/viper
. Remove global variables for the old flag and replace to viper.Get***()
.got an error
=== RUN TestFailedState_RebootInstance
--- FAIL: TestFailedState_RebootInstance (5.21s)
00_run_cmd.go:115: Unexpected Instance State: i-0000000009 goal=FAILED found=RUNNING
Each scenario in the file waits for FAILED
state with transitional states. But the failure is detected since the origin state is not added to WaitInstance()
. Possible symptom is:
RUNNING
, openvdc reboot
is issued.RUNNING
.WaitInstance()
sees RUNNING
but it is not listed as intermidiate states like []string{"REBOOTING"}
func TestFailedState_RebootInstance(t *testing.T) {
stdout, _ := RunCmdAndReportFail(t, "openvdc", "run", "centos/7/null", `{"crash_stage": "reboot"}`)
instance_id := strings.TrimSpace(stdout.String())
WaitInstance(t, 5*time.Minute, instance_id, "RUNNING", []string{"QUEUED", "STARTING"})
RunCmdAndReportFail(t, "openvdc", "reboot", instance_id)
WaitInstance(t, 5*time.Minute, instance_id, "FAILED", []string{"REBOOTING"})
}
When using Open vSwitch starting an instance crashes with the following error.
2017-05-23 07:23:49 [ERROR] openvdc-executor/main.go:158 Failed CreateInstance Error: Failed to parse script template: /etc/openvdc/scripts/ovs-up.sh.tmpl: open /etc/openvdc/scripts/ovs-up.sh.tmpl: no such file or directory hypervisor=lxc instance_id=i-0000000007 state=STARTING
ErrorStackTrace:
github.com/axsh/openvdc/hypervisor/lxc.(*LXCHypervisorDriver).renderUpDownScript
/var/tmp/go/src/github.com/axsh/openvdc/hypervisor/lxc/lxc.go:230
github.com/axsh/openvdc/hypervisor/lxc.(*LXCHypervisorDriver).CreateInstance
/var/tmp/go/src/github.com/axsh/openvdc/hypervisor/lxc/lxc.go:301
It looks like the /etc/openvdc/scripts
directory is missing entirely.
# ls /etc/openvdc/
executor.toml scheduler.toml scheduler.toml.rpmnew
Put the required scripts for Open vSwitch back in place.
In the acceptance-test
branch I've set up the acceptance test environment to run in docker. Currently the only test in there checks if the openvdc
command was properly installed and in PATH. While it's a beginning, this is hardly an acceptance test.
This should be the first test case:
openvdc
command.opdnvdc
command.Can't be completed unless this is done first:
Currently the Jenkins master and the slave on which we run unit tests and rpmbuild have been built manually. If they break, we're fucked. We should write scripts to build and rebuild them.
Currently both machines are VirtualBox VMs. I don't think this is ideal because it is not possible to run KVM inside of VirtualBox. This is something we will likely want to do for OpenVDC in the future but as long as VirtualBox is running on bare-metal, it is not possible to run KVM on the same machine. The main advantage of VirtualBox is that it will run on any OS but I don't think that is relevant for Jenkins. We are not likely to run it locally.
I would suggest to rebuild them as either KVM machines or docker containers.
There was code in place to use the acceptance test cache for the master branch when building a new branch and the REBUILD variable isn't set.
I temporarily disabled that code because I wrongly assumed that we couldn't set the REBUILD variable manually while the CI is still kicked off automatically
Re-enable that code.
When created, instances get the state "REGISTERED" and this state isn't allowed to be directly set to "TERMINATED". So at the moment, Instances can't be destroyed unless they've been started at least once.
./openvdc register centos/7/lxc
INFO[0000] Found template: centos/7/lxc ID:"r-0000000002" resource:<id:"r-0000000002" template:<template_uri:"https://raw.githubusercontent.com/axsh/openvdc/resource-handler/centos/7/lxc.json" lxc:<lxc_image:<download_url:"https://images.linuxcontainers.org/1.0/images/d767cfe9a0df0b2213e28b39b61e8f79cb9b1e745eeed98c22bc5236f277309a/export" > > > >
./openvdc create r-0000000002
instance_id:"i-0000000002"
./openvdc destroy i-0000000002
FATA[0000] Disconnected abnormally error=rpc error: code = 2 desc = Invalid goal state: REGISTERED -> TERMINATED
Unit tests, rpmbuild and acceptance test are all running in Docker containers. The containers themselves are being cleaned up after the test runs but their images remain.
Removing the images along with the containers would not be the right thing to do because Docker uses those as cache. Leaving them all alone would just keep eating up disk space. We have to find the correct middle ground.
The Jenkins slave where the unit tests and rpmbuild jobs are running currently has a bunch of images lying around.
[vagrant@localhost ~]$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
citest.acceptance-test.el7 latest da2a9e39c5c9 6 days ago 682.8 MB
citest.console-service.el7 latest da2a9e39c5c9 6 days ago 682.8 MB
unit-tests.citest.acceptance-test.el7 latest ed39a2d02d4d 6 days ago 890.1 MB
unit-tests.citest.console-service.el7 latest ed39a2d02d4d 6 days ago 890.1 MB
unit-tests.citest.lxc-networking.el7 latest ed39a2d02d4d 6 days ago 890.1 MB
unit-tests.citest.master.el7 latest ed39a2d02d4d 6 days ago 890.1 MB
unit-tests.citest.protoc-go-generate.el7 latest ed39a2d02d4d 6 days ago 890.1 MB
unit-tests.citest.remove-old-integration-test.el7 latest ed39a2d02d4d 6 days ago 890.1 MB
...
Some times @unakatsuo is logging in and deleting them manually.
On the acceptance test slave, we are only just starting to use docker so there's only a few images at the time of writing.
[18:26:04] metallion@phys028 (~) > sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
openvdc/acceptance-test acceptance-test.20170119064535git276b558 2546af339d82 13 minutes ago 466 MB
docker.io/centos 7 67591570dd29 4 weeks ago 191.8 MB
It's very tempting to just remove all images that are (for example) two weeks old. Some things have to be considered though.
It's possible that containers are still up because the LEAVE_CONTAINER
flag was set. Whenever somebody sets that flag, it should be their responsibility to clean up after they're done.
What if the docker rmi
command fails because some things are still dependent on an old image?
Maybe we can just try to implement the garbage collection as simple as possible and then see what complications come up? For that I would suggest to just run nightly and remove every image that is over 2 weeks old. Just like in the other GC jobs, that time should be configurable.
Currently the CI just pushes yum repositories for every commit and they stay online forever. We need a periodically running jenkins job to clean those up.
In https://ci.openvdc.org/repos/ there are a lot of folders named something like 20161223124709gitb06bd3
.
After #52 is merged that format will change and instead a directory will be made for every branch.
Add a new jenkins job whose sole purpose is garbage collection. It should assume the new way of working from #52. The old 20161223124709gitb06bd3
can be removed manually after waiting a few weeks so we're sure no more code depends on them.
When you start an instance but the executor instance does not have access to the internet, the instance will never come up and its database entry will be stuck in STARTING
state forever.
openvdc-cli
on machine A.openvdc-executor-lxc
on machine B.A $ openvdc run centos/7/lxc
INFO[0000] Updating registry cache from https://raw.githubusercontent.com/axsh/openvdc/master/templates
INFO[0002] Found template: centos/7/lxc
i-0000000000
B $ lxc-info -n i-0000000000
i-0000000000 doesn't exist
A $ openvdc show i-0000000000
{
"iD": "i-0000000000",
"instance": {
"id": "i-0000000000",
"slaveId": "91fb613c-2abf-477a-b41c-41b37e1845b9-S0",
"lastState": {
"state": "STARTING",
"createdAt": "2017-04-17T06:48:54.339487647Z"
},
"createdAt": "2017-04-17T06:48:48.379337475Z",
"template": {
"templateUri": "https://raw.githubusercontent.com/axsh/openvdc/master/centos/7/lxc.json",
"lxc": {
"lxcTemplate": {
"template": "download",
"distro": "centos",
"release": "7"
}
},
"createdAt": "2017-04-17T06:48:48.379337475Z"
}
}
}
Instance i-0000000000
is now stuck in this state forever.
I would suggest a FAILED
state which is identical to TERMINATED
except it happened because of an error and not because a user called openvdc destroy
. Something like this:
$ openvdc show i-0000000000
{
"iD": "i-0000000000",
"instance": {
...
"lastState": {
"state": "FAILED",
"Reason: Failed to download lxc template: <whatever error message golang gave us>"
...
},
...
}
A symptom was observed that the instance launching failed with the log message following:
2017-03-21 15:51:35 [INFO] github.com/axsh/openvdc/hypervisor/lxc/lxc.go:326 Starting lxc-container... hypervisor=lxc instance_id=i-0000000002
2017-03-21 15:51:35 [INFO] github.com/axsh/openvdc/hypervisor/lxc/lxc.go:331 Waiting for lxc-container to become RUNNING hypervisor=lxc instance_id=i-0000000002
2017-03-21 15:51:35 [INFO] openvdc-executor/main.go:139 Instance launched successfully hypervisor=lxc instance_id=i-0000000002 state=STARTING
2017-03-21 15:51:35 [ERROR] openvdc-executor/main.go:104 Failed Instances.UpdateState Error: Invalid next state: QUEUED -> RUNNING hypervisor=lxc instance_id=i-0000000002 state=RUNNING
It reports that unexpected instance state transition from QUEUED to RUNNING.
It is currently not possible to start scheduler through systemd and have it interact with zookeeper, api, etc. when those are installed on another host.
Also if openvdc-cli
isn't installed on the same host, scheduler will not start at all because an EnvironmentFile was mistakenly added to the openvdc-cli
package instead.
Have scheduler accept a configuration file similar to #91.
Remove the EnvironmentFile in favour of the new configuration file.
When piping a command to openvdc console
and said command fails, the error message reads: Failed to ssh to <ip address>.
This would have you believe that the SSH connection failed while in reality that succeeded and it was the command afterwards that failed.
bash-4.2$ echo xxxx | openvdc console i-0000000017
/bin/bash: line 1: xxxx: command not found
FATA[0000] Failed ssh to 172.16.3.10:31195 error="Process exited with status 127"
Display different error messages for when the SSH connection failed and for when the command afterwards failed.
In the past we've had problems with complicated test environments. For the kind of distributed visualization software we make, it's impossible to avoid these but we should make it as easy as possible for other programmers to pick up where the last guy left.
The OpenVDC acceptance test environment is an example of these with 5 machines that run in a docker container and can run in parallel.
I need to document this environment in a readme file so people can figure out what's going on even when I'm not around.
openvdc log
sub-command/var/lib/mesos/slaves
% openvdc log i-xxxxxxx
% cat ./ssg5_config.json
{
"title": "Juniper SSG 5",
"template": {
"type": "physical/appliance",
"nics": {
# mapping nics on SSG5 to switch ethers
"eth0/0": "eth0",
"eth0/1", "eth1",
...(snip)...
"eth0/7", "eth7"
}
}
}
% openvdc register ./ssg5_config.json --group "juniper/ssg5"
% openvdc run juniper/ssg5
abdcefg12345678
% openvdc ssh abdcefg12345678
Could not support a ssh connection.
% openvdc console abdcefg12345678
Could not support a serial console connection.
% openvdc destroy abdcefg12345678
If administrator want to append a same kind of resource, edit the configuration file and register it using a same group name on the switch appliance which is connected the target resource.
When I ran into #169, I noticed that instances are correctly being set to the FAILED
state in OpenVDC but their files on disk are not cleaned up.
$ openvdc show i-0000000000
{
"ID": "i-0000000000",
"instance": {
"id": "i-0000000000",
"slave_id": "24afc003-a255-4f52-b146-9c8e71041b87-S0",
"last_state": {
"state": "FAILED",
"created_at": "2017-05-23T07:22:43.246760347Z"
},
...
# lxc-info -n i-0000000000
Name: i-0000000000
State: STOPPED
# ls /var/lib/lxc/i-0000000000
config rootfs
# du -hs /var/lib/lxc/i-0000000000
422M /var/lib/lxc/i-0000000000
The above files stay on disk forever.
If an instance fails, clean up its resources in a similar way that openvdc destroy <instance-id>
does.
Suggestions for the openvdc rpm packages
All packages should suggest zookeeper and mesos as an optional dependecy.
This is a metapackage that doesn't install anything but depends on all other packages. It's basically a shortcut to install everything in one hots.
Dependecies: openvdc-cli, openvdc-executor, openvdc-scheduler
This is the openvdc command. The command itself can be openvdc but the package name is openvdc-cli in order not to get confused with the metapackage.
Dependencies: mesos-agent
This should not run as the root user. A new user openvdc-scheduler
should be made to run this process.
In OpenVNet we made mistake where the timestamp+gitcommit in release while it should be in version. Openvdc should fix this.
๐ซ openvdc-executor-0.1dev-20161212145756git37689cd.el6.noarch.rpm
โ
openvdc-executor-0.1dev.20161212145756git37689cd-1.el6.noarch.rpm
Version is the version of the software while Release is the version of the package. http://rpm.org/max-rpm-snapshot/s1-rpm-build-creating-spec-file.html
Config files should go here:
/etc/systemd/system/openvdcservice.d
If the temporary directory and the user's home directory are not on the same partition, openvdc run centos/7/lxc
will fail with the following error.
> ./openvdc run centos/7/lxc
INFO[0000] Updating registry cache from https://raw.githubusercontent.com/axsh/openvdc/master/templates
FATA[0003] Invalid path: centos/7/lxc, rename /tmp/gh-images-reg941395916/openvdc-e77ed15f3b2ba582087afa226ace61a6756f65dd/templates /home/metallion/.openvdc/registry/github.com-axsh-openvdc/master: invalid cross-device link
> df
...
tmpfs 8099764 52 8099712 1% /tmp
/dev/sda3 469420496 421441484 24110752 95% /home
...
The reason is because os.Rename doesn't allow moving files between different file systems.
You can set a tmp directory on the same partition using the TMPDIR
environment variable.
> TMPDIR=$HOME/.openvdc/tmp ./openvdc run centos/7/lxc
INFO[0000] Found template: centos/7/lxc
...
@unakatsuo told me that the mv
command had to deal with the same problem and solved it by not renaming the file but rather make a copy first and then delete the original. We could try a similar thing in our go code.
Some discussion was had about how to minimize build time while still using clean environments. Here's a suggestion I thought of.
Some branches will require the CI to be rebuilt while others don't.
/data/openvdc-ci/<branch>/
Nightly rebuilds
Nightly Garbage collection
rm -r /data/openvdc-ci/<branch>/
We are currently using the --privileged
flag when running Docker in the acceptance test. This is done to run KVM inside but basically gives the container full root access on the host.
Use options such as --device
and --cap-add
to only give the container the exact permissions we need.
Access to the Docker API is effectively root access. Even lacking --privileged, there are numerous mechanisms to avoid system policy if one has access to the docker socket or API.
It seems that when a user has access to docker, that user essentially has root access. If we were going to have root access anyway, I figured it's better to make that obvious by using sudo so the next person touching the code will be aware of it.
It could be a good idea to also investigate if there are side-effects to that and if it that was a terrible idea.
We've got people in our team who like to make PRs after all work is done and we've got people who like to have PR while they're working. Both ways of working are ok but if you have a PR that shouldn't be merged yet, we should add a WiP label to it.
As the acceptance test runs, it caches machine images for every branch it has run on. These need to be garbage collected.
The cache is kept in /data2/openvdc-ci/branches/
on the machine where the acceptance test runs. A new directory is created for every branch. Here's an example of the current state.
[18:09:01] metallion@IRON_MAIDEN_RULES (~) > ls /data2/openvdc-ci/branches/
acceptance-test console-service fix-cli-print fix-PR67 master multibox-openvdc-install registry-fix resource-naming show-version upgrade-epel-release
ci-merge-master-locally-fix fix-binfile-add fix-multibox-ssh lxc-networking model-timestamp protoc-go-generate remove-old-integration-test rpm_cleanup teardown-ci-multi user-config
We should do something similar to #54
Run garbage collection job every night
Is this branch still on github?
We could even use the same script if that's the easier implementation. That I'll leave to the programmer in question. ;)
What shall we do for master? That cache should probably be removed and rebuilt some time too. Gonna think about that for a while and suggestions are welcome.
On a standard installation /etc/lxc/default.conf
contains the following line.
lxc.network.link = virbr0
The default.conf
file is included in every /var/lib/lxc/<container-name>/config
file. That means LXC will never work unless the user first create a bridge called virbr0
.
We can not touch the /etc/lxc/default.conf
file. Users might have created their own default conf for LXC and we can't just let OpenVDC modify that behind their back. We have to get containers started by OpenVDC to ignore the default configuration.
/var/lib/lxc/<container-name>/config
after it gets created with the default conf contents in it.Don't use lxc.network.link
as it is only compatible with Linux bridge and we want to use Open vSwitch too.
Use lxc.network.script.up
and lxc.network.script.down
instead. The scripts called by these options will call brctl addif
or ovs-vsctl add-port
respectively.
Put two script pairs in place when installing openvdc.
The bridge name on every executor will be passed by a command line parameter.
OpenVNet will need to know what LXC's tap devices are called. Maybe use lxc.network.veth.pair
? In any case, it needs to be able to use openvdc
to query the tap device name.
When we have an instance with state FAILED
.
> ./openvdc show i-0000000005
{
"ID": "i-0000000005",
"instance": {
...
"last_state": {
"state": "FAILED",
"created_at": "2017-07-05T09:08:41.181030773Z"
},
...
}
We are no longer able to access its log. This makes debugging very difficult.
> ./openvdc log i-0000000005
FATA[0000] Error streaming log error="rpc error: code = 2 desc = cl.GetLog: application could not be found"
When an instance reaches FAILED
state its mesos job is cleared. Mesos's log API no longer allows us to access it.
If there is no way to use the log API, rewrite the openvdc log <instance-id>
command to be able to fetch the mesos logs directly from the executor machines.
Currently the acceptance test environment is making .raw images and then converting them to .qcow. The reasoning behind this is that .raw is easy to loopback mount and .qcow has copy-on-write.
However, if we could work with .qcow from the beginning, there would be no need for the conversion that does take up some time.
It looks like nowadays loopback mounting .qcow isn't as hard as it used to be.
From http://ask.xmodulo.com/mount-qcow2-disk-image-linux.html:
sudo modprobe nbd max_part=8
sudo qemu-nbd --connect=/dev/nbd0 /path/to/qcow2/image
Example
sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/xenserver.qcow2
sudo fdisk /dev/nbd0 -l # <= check existing partitions
sudo mount /dev/nbd0p1 /mnt
We are supporting Open vSwitch but we do not have any tests for OpenVDC with Open vSwitch yet.
We currently have an Open vSwitch host on the CI but we have no means of making sure it gets used in specific tests yet. This feature is being developed in #159.
Finish #159 first and then write some tests.
The multibox environment currently won't build because the Centos version is hard coded.
** DOING STEP: Install zookeeper on zookeeper
++ sudo chroot /home/metallion/work/go/src/github.com/axsh/openvdc/ci/acceptance-test/multibox/10.0.100.10-zookeeper/tmp_root /bin/bash -c 'yum install -y mesosphere-zookeeper'
Loaded plugins: fastestmirror
http://ftp.jaist.ac.jp/pub/Linux/CentOS/7.2.1511/os/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
This is a common problem with images generated by buildbook-rhel7.
There's two ways we could go about this.
Change buildbook to use 7
instead of 7.2.1511
in the yum repository.
Get the seed image from somewhere else.
When starting an instance it is expected to transition through the following states: QUEUED => STARTING => RUNNING.
However, some times an instance comes up so fast that it transitions to RUNNING before the STARTING state gets registered. That will cause the following error and the instance will be stuck in QUEUED state forever.
2017-03-21 15:51:35 [INFO] openvdc-executor/main.go:132 Starting instance hypervisor=lxc instance_id=i-0000000002 state=STARTING
2017-03-21 15:51:35 [INFO] github.com/axsh/openvdc/hypervisor/lxc/lxc.go:326 Starting lxc-container... hypervisor=lxc instance_id=i-0000000002
2017-03-21 15:51:35 [INFO] github.com/axsh/openvdc/hypervisor/lxc/lxc.go:331 Waiting for lxc-container to become RUNNING hypervisor=lxc instance_id=i-0000000002
2017-03-21 15:51:35 [INFO] openvdc-executor/main.go:139 Instance launched successfully hypervisor=lxc instance_id=i-0000000002 state=STARTING
2017-03-21 15:51:35 [ERROR] openvdc-executor/main.go:104 Failed Instances.UpdateState Error: Invalid next state: QUEUED -> RUNNING hypervisor=lxc instance_id=i-0000000002 state=RUNNING
2017/03/21 15:51:35 Recv loop terminated: err=EOF
2017/03/21 15:51:35 Send loop terminated: err=<nil>
Because we needed a quick fix for a demo, @b0r6 has allowed transition from QUEUED to RUNNING in this branch: https://github.com/axsh/openvdc/tree/allow-queued-to-running
We used that patch in the demo but it hasn't been merged to master yet. We need to decide if this is an acceptable long-term solution or if more work is required.
Set custom unique name to instance/resource.
Proposed CLI usage:
% openvdc run centos/7/lxc --name=myinstance1
% openvdc show myinstance1
%openvdc rename myinstance1 myinstance2
TODO:
model.proto
to add name
field. message Instance
(message ResourceTemplate
?)Instance.Run
, Instance.Create
APIs to set name.Instance.Show
to retrieve by name.Instance.List
should have name if exists.Instance.Rename
APIopenvdc run --name
openvdc show <name>
openvdc rename
#96 reuses cache from the master branch to test other branches. This greatly speeds up the CI cycle but a side-effect is that we keep using old images and it's possible that OpenVDC no longer works on the latest version.
We should rebuild master's cache periodically. I'd say a Jenkins job that builds the master branch with REBUILD
set to "true" should be enough. I suggest running this job every weekend. It should keep a copy of the old cache in case things go wrong and we don't immediately have time to fix it.
Doing this too fast:
./openvdc run centos/7/lxc
./openvdc run centos/7/lxc
./openvdc run centos/7/lxc
./openvdc run centos/7/lxc
causes this duplicate offer bug:
INFO[0156] Framework Resource Offers from master &TaskStatus{TaskId:&TaskID{Value:*i-0000000004,XXX_unrecognized:[],},State:*TASK_LOST,Data:nil,Message:*Task launched with invalid offers: Duplicate offer 43e660c2-76d2-4fd2-4fdc-85bc-1dfa7c268d29-S0 at slave(1)@127.0.0.1:5051 (ldc-85bc-1dfa7c268d29-O46 in offer list,SlaveId:&SlaveID{Value:*43e660c2-76d2-4fdc85bc-1dfa7c268d29S0,XXX_unrecognized[],},Timestamp:*1.482140777666846e+09,ExecutorId:nil,Healthy:nil,Source:*SOURCE_MASTER,Reason:*REASON_INVALID_OFFERS,Uuid:nil,Labels:nil,ContainerStatus:nil,XXX_unrecognized:[],}
The scheduler will keep displaying this error over and over again until you manually remove /openvdc in zookeeper.
The openvdc console
command connects via ssh
to an executor node which then in turn connects an instance's console. The problem is executor's host key is generated on startup. That means if the executor is restarted, there will be a problem with the client's known_hosts
file.
/etc/openvdc
/etc/openvdc
go
. We don't want to use Linux commands because we might run executor on Windows in the future.The pem package can be used to handle private key pem files.
The openvdc log
command ignores the mesos master IP address set in the openvdc config file.
[kemumaki@executor ~]$ cat .openvdc/config.toml
[api]
endpoint = "10.0.100.12:5000"
[mesos]
address = "10.0.100.11:5050"
[kemumaki@executor ~]$ openvdc log i-0000000001
FATA[0000] Couldn't connect to Mesos master error="dial tcp 127.0.0.1:5050: getsockopt: connection refused"
Get the openvdc log
command to use the mesos address value set in the config file if it's defined.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.