Giter Site home page Giter Site logo

vhive-serverless / vhive Goto Github PK

View Code? Open in Web Editor NEW
272.0 6.0 85.0 312.3 MB

vHive: Open-source framework for serverless experimentation

License: MIT License

Go 75.15% Makefile 3.63% Shell 11.13% C 0.33% Dockerfile 1.16% Python 6.71% JavaScript 0.09% Java 1.80%
serverless faas benchmarking containerd firecracker functions-as-a-service microvm knative kubernetes vhive

vhive's Introduction

build Firecracker MicroVM E2E tests gVisor MicroVM E2E tests Go Report Card License: MIT follow on Twitter

vHive Header

News

A full-day tutorial, co-located with the ASPLOS'22 conference, is going to take place in Lausanne, Switzerland on March 1. The tutorial will be in a hybrid format (both in-person and virtual) and will cover both the serverless cloud foundations and the vHive ecosystem and include several hands-on sessions. In this tutorial, the participants will learn how to analyze performance of commercial and open-source clouds as well as how to experiment and innovate across serverless stack using vHive. For more information, please visit the tutorial webpage.

Mission

vHive aims to enable serverless systems researchers to innovate across the deep and distributed software stacks of a modern serverless platform. Hence, we built vHive to be representative of the leading Function-as-a-Service (FaaS) providers, integrating the same production-grade components used by the providers, including AWS Firecracker hypervisor, Cloud Native Computing Foundation's Containerd, and Kubernetes.

vHive adopts the Knative flexible programming model, allowing the researchers to quickly deploy and experiment with any serverless applications that may comprise many functions, running in secure Firecracker microVMs, as well as serverfull services. Both the functions and the stateful services can be deployed using OCI/Docker images.

vHive empowers systems researchers to innovate on key serverless features, including functions autoscaling and cold-start delay optimization with several snapshotting mechanisms.

vHive has added support for the state-of-the-art extension eStargz to container layers and lazy pull support for container images.

vHive supports running Knative on top of OpenYurt, an open platform that extends upstream Kubernetes to run on edge node pools. More details on how to the Knative-atop-OpenYurt mode are described here.

vHive architecture

vHive Architecture

The details of the vHive architecture can be found in our ASPLOS'21 paper (extended abstract, full paper).

Currently, vHive supports 3 types of sandboxes for functions, namely Firecracker MicroVMs (default), gVisor MicroVMs, and Containerd containers (as in vanilla Knative).

Technical talks

  • Slides from Dmitrii's talk at AWS on Feb, 4th 2021.
  • ASPLOS'21 talks on YouTube: short, long.
  • Serverless and vHive tutorial series
  • Upcoming serverless computing workshop at EuroSys 2023 website

Referencing our work

If you decide to use vHive for your research and experiments, we are thrilled to support you by offering advice for potential extensions of vHive and always open for collaboration.

Please cite our paper that has been recently accepted to ASPLOS 2021:

@inproceedings{ustiugov:benchmarking,
  author    = {Dmitrii Ustiugov and
               Plamen Petrov and
               Marios Kogias and
               Edouard Bugnion and
               Boris Grot},
  title     = {Benchmarking, Analysis, and Optimization of Serverless Function Snapshots},
  booktitle = {Proceedings of the 26th ACM International Conference on
               Architectural Support for Programming Languages and Operating Systems (ASPLOS'21)},
  publisher = {{ACM}},
  year      = {2021},
  doi       = {10.1145/3445814.3446714},
}

Getting started with vHive

vHive can be readily deployed on premises or in cloud, with support for nested virtualization. We provide a quick-start guide that describes how to set up an experiment with vHive.

Developing vHive

Developer's guide and performance analysis with vHive

We provide a basic developer's guide that we plan to extend in future. We encourage the community to contribute their analysis scenarios and tools.

You can view the vHive documentation here.

vHive roadmap

vHive is a community-led project, maintained by the EASE lab. The current roadmap is available and is going to be updated to accommodate the community's goals and evolution. To guarantee high code quality and reliability, we deploy fully automated CI on cloud and self-hosted runners with GitHub Actions.

The statistics of this repository's views, clones, forks is available by the following link.

Getting help and contributing

We would be happy to answer any questions in GitHub Issues and encourage the open-source community to submit new Issues, assist in addressing existing issues and limitations, and contribute their code with Pull Requests. Please check our guide on Contributing to vHive if you would like contribute. You can also talk to us in our Slack space.

License and copyright

vHive is free. We publish the code under the terms of the MIT License that allows distribution, modification, and commercial use. This software, however, comes without any warranty or liability.

The software is maintained at the EASE lab in the University of Edinburgh.

Maintainers

vhive's People

Contributors

amariucaitheodor avatar amohoste avatar anshalshukla avatar boramalper avatar char-1ee avatar curiousgeorgiy avatar cvetkovic avatar davidbuzatu-marian avatar dependabot[bot] avatar dhschall avatar estellapaula avatar flyinghorse0510 avatar hongyuhe avatar jchua99 avatar jooyoungpark73 avatar leokondrashov avatar lightxyz avatar lrq619 avatar mic92 avatar nathanieltornow avatar niravnshah avatar niuj1ao avatar plamenmpetrov avatar shyamjesal avatar siyangshao avatar tom-kuchler avatar ustiugov avatar wanghanchengchn avatar xunyiiz avatar yulinzou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vhive's Issues

TestSaveMemory* tests fail due to a potential deadlock

DEBU[2020-06-18T22:44:38.669241662+02:00] FwdRPC: Received a response from the  function instance  fID=4
DEBU[2020-06-18T22:44:38.669332846+02:00] Function has to shut down its instance, served 40 requests  fID=4
DEBU[2020-06-18T22:44:38.669422176+02:00] Removing instance (async)                     fID=4
DEBU[2020-06-18T22:44:38.669630749+02:00] Orchestrator received StopVM                  vmID=4_0
DEBU[2020-06-18T22:44:38.669860859+02:00] Function is inactive, starting the instance...  fID=4
DEBU[2020-06-18T22:44:38.669950956+02:00] Adding instance                               fID=4
DEBU[2020-06-18T22:44:38.670046416+02:00] StartVM: Received StartVM                     image="ustiugov/helloworld:runner_workload" vmID=4_1
DEBU[2020-06-18T22:44:38.670172832+02:00] Allocating a VM instance                      vmID=4_1
PANI[2020-06-18T22:44:48.670603722+02:00] Failed to acquire semaphore for NI allocate (or timed out) 
--- FAIL: TestSaveMemorySerial (14.39s)
panic: (*logrus.Entry) (0x14c1340,0xc000272a10) [recovered]
        panic: (*logrus.Entry) (0x14c1340,0xc000272a10)

goroutine 24 [running]:
testing.tRunner.func1(0xc0003bc300)
        /usr/local/go/src/testing/testing.go:874 +0x69f
panic(0x14c1340, 0xc000272a10)
        /usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/sirupsen/logrus.Entry.log(0xc000134000, 0xc000721680, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/ustiugov/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:261 +0x613
github.com/sirupsen/logrus.(*Entry).Log(0xc0002727e0, 0xc000000000, 0xc000750d68, 0x1, 0x1)
        /home/ustiugov/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:289 +0x15f
github.com/sirupsen/logrus.(*Logger).Log(0xc000134000, 0x0, 0xc000750d68, 0x1, 0x1)
        /home/ustiugov/go/pkg/mod/github.com/sirupsen/[email protected]/logger.go:193 +0xa3
github.com/sirupsen/logrus.(*Logger).Panic(...)
        /home/ustiugov/go/pkg/mod/github.com/sirupsen/[email protected]/logger.go:234
github.com/sirupsen/logrus.Panic(...)
        /home/ustiugov/go/pkg/mod/github.com/sirupsen/[email protected]/exported.go:129
github.com/ustiugov/fccd-orchestrator/misc.(*NiPool).Allocate(0xc000271b60, 0x0, 0x0, 0x0)
        /home/ustiugov/go/pkg/mod/github.com/ustiugov/fccd-orchestrator/[email protected]/ni_pool.go:61 +0x619
github.com/ustiugov/fccd-orchestrator/misc.(*VMPool).Allocate(0xc000271b30, 0xc00045c980, 0x3, 0x1, 0x1, 0xc0001d08b0)
        /home/ustiugov/go/pkg/mod/github.com/ustiugov/fccd-orchestrator/[email protected]/vm_pool.go:53 +0x28e
github.com/ustiugov/fccd-orchestrator/ctriface.(*Orchestrator).StartVM(0xc000302d80, 0x162ce00, 0xc0000f2ea0, 0xc00045c980, 0x3, 0x14e32a6, 0x23, 0x0, 0x0, 0x0, ...)
        /home/ustiugov/go/pkg/mod/github.com/ustiugov/fccd-orchestrator/[email protected]/iface.go:162 +0x2cd
github.com/ustiugov/fccd-orchestrator.(*Function).AddInstance(0xc0003dc090)
        /home/ustiugov/staged-data-tiering/fccd-orchestrator/functions.go:307 +0x45b
github.com/ustiugov/fccd-orchestrator.(*Function).Serve.func1()
        /home/ustiugov/staged-data-tiering/fccd-orchestrator/functions.go:224 +0xc5
sync.(*Once).doSlow(0xc00045c8a0, 0xc000751c58)
        /usr/local/go/src/sync/once.go:66 +0x101
sync.(*Once).Do(0xc00045c8a0, 0xc000751c58)
        /usr/local/go/src/sync/once.go:57 +0x69
github.com/ustiugov/fccd-orchestrator.(*Function).Serve(0xc0003dc090, 0x162cdc0, 0xc000036088, 0x14c9359, 0x1, 0x14e32a6, 0x23, 0x14caea5, 0x5, 0x0, ...)
        /home/ustiugov/staged-data-tiering/fccd-orchestrator/functions.go:220 +0x33d
github.com/ustiugov/fccd-orchestrator.(*FuncPool).Serve(0xc0003a8660, 0x162cdc0, 0xc000036088, 0x14c9359, 0x1, 0x14e32a6, 0x23, 0x14caea5, 0x5, 0xc00072ae40, ...)
        /home/ustiugov/staged-data-tiering/fccd-orchestrator/functions.go:114 +0xed
github.com/ustiugov/fccd-orchestrator.TestSaveMemorySerial(0xc0003bc300)
        /home/ustiugov/staged-data-tiering/fccd-orchestrator/fccd-orchestrator_test.go:202 +0x16b
testing.tRunner(0xc0003bc300, 0x1503e68)
        /usr/local/go/src/testing/testing.go:909 +0x19a
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:960 +0x652
exit status 2
FAIL    github.com/ustiugov/fccd-orchestrator   14.447s

Add tests for docker images build

Dockerfiles may become stale, for example, due to Python modules incompatibility. We need to test these daily as a cron job.

I think that we should add same tests for Knative docker containers as well.

Deploy and populate a local container registry

For the FaaS benchmarking at scale, we would like to avoid the obvious bottleneck of a remote registry, like DockerHub.

Required changes:

  • the ctriface module should not default to docker.io/
  • YAML files should specify the registry when specifying each container image
  • need to add a script that would deploy by URL and populate the local registry in a vHive cluster, by copying the images from remote registries, e.g., DockerHub, to the cluster-local registry
  • tests need to be updated accordingly

We welcome a contribution from @amohoste on this issue.

Run VMs in a memory-constrained cgroup

To evaluate cold-starts at the scale of 100s of VMs, we need to have containerd to load the VMs inside a single cgroup.

My intuition for the implementation:

  • Create a cgroup from the orchestrator upon its start. Need an orch CLI argument for that
  • Add Firecracker processes to that cgroup (the cgroup's name can be hardcoded). Need another argument in LoadSnap function.

Library to use: link

Example:

// cgroup creation
control, err := cgroups.New(cgroups.Systemd, cgroups.Slice("system.slice", "runc-test"), &specs.LinuxResources{
    Memory: &specs.Memory{
        Limit: &memLimitInBytes,
    },
})
// Add a process to an existing cgroup
if err := control.Add(cgroups.Process{Pid:fcProcessPid}); err != nil {
}

Google Compute Engine setup issues

I tried to deploy vHive on Google Compute Engine instead of bare metal but ran into some errors which are most likely network related. The setup included one master noted and two worker nodes, but I also ran into the same issues when using only one worker node. The setup includes 3 Ubuntu 18.04 VMs with nested virtualisation enabled on 10.240.0.0/24. Each node was allocated 2 vCPUs, 8GB RAM, 50GB SSD and the microarchitecture was made sure to be Haswell or later to support nested virtualisation. Internally in the subnet all tcp, ump, icmp and pip traffic was allowed and externally the following ports were opened: tcp:22,tcp:6443,icmp,udp:8472,udp:4789.

Logs

master_containerd.log

worker0_containerd.log
worker0_firecracker.log
worker0_vhive.log

Setup

For completeness, here are the instructions I used to setup the VMs. The nodes were setup on europe-west3-c

1. Image creation

Create an image for nested virtualisation.

gcloud compute disks create nested-kvm-ub18-disk --image-project ubuntu-os-cloud --image-family ubuntu-1804-lts
gcloud compute images create nested-kvm-ub18-image \
  --source-disk nested-kvm-ub18-disk --source-disk-zone $(gcloud config get-value compute/zone) \
  --licenses "https://www.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx"

2. Network setup

2.1 Virtual Private Cloud Network

Create the vhive-vpc custom Virtual Private Cloud (VPC) network to host the Kubernetes cluster:

gcloud compute networks create vhive-vpc --subnet-mode custom

Provision a subnet with a large enough IP range to fit all nodes in the cluster.

gcloud compute networks subnets create kubernetes-nodes \
  --network vhive-vpc \
  --range 10.240.0.0/24

2.2 Firewall rules

Create a firewall rule that allows internal communication across TCP, UDP, ICMP and IP in IP (used for the Calico overlay):

gcloud compute firewall-rules create vhive-vpc-allow-internal \
  --allow tcp,udp,icmp,ipip \
  --network vhive-vpc \
  --source-ranges 10.240.0.0/24

Create a firewall rule that allows external SSH, ICMP, HTTPS and VXLAN.

gcloud compute firewall-rules create vhive-vpc-allow-external \
  --allow tcp:22,tcp:6443,icmp,udp:8472,udp:4789 \
  --network vhive-vpc \
  --source-ranges 0.0.0.0/0

3. Compute Instances

3.1 Master node

gcloud compute instances create controller \
    --async \
    --boot-disk-size 50GB \
    --boot-disk-type pd-ssd \
    --can-ip-forward \
    --image nested-kvm-ub18-image \
    --machine-type n1-standard-2 \
    --private-network-ip 10.240.0.11 \
    --scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
    --subnet kubernetes-nodes \
    --min-cpu-platform "Intel Haswell" \
    --tags vhive,controller

3.2 Worker nodes

for i in 0 1; do
  gcloud compute instances create worker-${i} \
    --async \
    --boot-disk-size 50GB \
    --boot-disk-type pd-ssd \
    --can-ip-forward \
    --image nested-kvm-ub18-image \
    --machine-type n1-standard-2 \
    --private-network-ip 10.240.0.2${i} \
    --scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
    --subnet kubernetes-nodes \
    --min-cpu-platform "Intel Haswell" \
    --tags vhive,worker
done

4. Node configuration

4.1 VMX setup

SSH into each node and check that virtualization is enabled by running the following command. A nonzero response confirms that nested virtualization is enabled.

grep -cw vmx /proc/cpuinfo

Then enable KVM by running the following command. This should return "OK" if KVM was enabled successfully.

sudo setfacl -m u:${USER}:rw /dev/kvm
[ -r /dev/kvm ] && [ -w /dev/kvm ] && echo "OK" || echo "FAIL"

4.2 VHive setup

vHive can now be setup using the vHive quick start guide.

Function deployment fails in multi node kind clusters

Setup:
1 master node
2 worker nodes
Image used: vhiveease/vhive_dev_env
Error:
Knative function containers fail to boot. Status loops between [Error, CrashLoopBackOff, Terminating].
Error log:

root@kind-control-plane:/vhive# kubectl logs helloworld-0-00001-deployment-7b8fcdb447-knnhf queue-proxy
time="2021-02-18T13:27:16Z" level=info msg="This is the vHive QP code"
required key GUEST_ADDR missing value

Create snapshot should close vsock connections

Right now, containerd client loses its vsock connection with snapshotting. This issue is confirmed by the Firecracker team that recommended to close vsock connections before snapshotting. This change is likely to land in Firecracker Go SDK.

Source doc

CI enhancements

Need to add:

  • Markdown linter in CI, like here
  • Git commit message linter in CI, e.g., this one
  • Fix the format problems, shown by goreport (the "gocyclo" warning is ok)

Need a script to restart a vHive cluster (single node)

We need a script that can clean and restart a cluster in one push button.

Existing solution:

./scripts/github_runner/clean_cri_runner.sh
sudo containerd 1>~/ctrd-logs/ctrd.out 2>~/ctrd-logs/ctrd.err &
sudo /usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml 1>~/ctrd-logs/fccd.out 2>~/ctrd-logs/fccd.err &
source /etc/profile && cd ~/vhive && go build && sudo ./vhive 1>~/ctrd-logs/orch.out 2>~/ctrd-logs/orch.err &
~/vhive/scripts/cluster/create_one_node_cluster.sh

There are errors when trying that on CloudLab:

../vhive/scripts/github_runner/clean_cri_runner.sh: line 38: ifconfig: command not found
../vhive/scripts/github_runner/clean_cri_runner.sh: line 39: ifconfig: command not found
../vhive/scripts/github_runner/clean_cri_runner.sh: line 40: bridge: command not found
../vhive/scripts/github_runner/clean_cri_runner.sh: line 40: jq: command not found

Workaround:

export PATH=$PATH:/sbin/

Refactoring: Make taps creation/deletion on-demand

As of now, taps are created as a pool that allows their reuse. We should create a tap upon VM start/snapshotLoad with a unique ID that refers to the ID of the VM.
Note: the VM that was loaded from the snapshot assumes that the tap that is given to it has the same exact name.

Deploy self-hosted runners in Docker containers

  • Have a single script that sets the devmapper on the host (once per host).
  • Adopt firecracker-containerd's Dockerfile (aka base) from upstream but change the installation of the binaries
  • Have a Dockerfile atop of the base image that would deploy a runner.

Memory Manager tests from different processes

Currently, the memory manager tests are written so that uPFs are generated by goroutines within the orchestrator process. We should do that out-of-process to mimic the behavior of VMs.

Note that currently these tests are commented/disabled because the poller goroutines hang in the read syscall after serving all page faults.

Load/Offload Serve chain does not work.

We are considering three chains. Two of them work, while one does not work. The chains are as follows:

Broken:
(1) createVM > serve_1 > pause > createSnapshot("snapshot_id") > offload > loadSnapshot("snapshot_id") > resume > serve_2 > offload > loadSnapshot("snapshot_id") > resume > serve_3

Working:
(2) createVM > serve_1 > pause > createSnapshot("snapshot_id") > offload > loadSnapshot("snapshot_id") > resume > serve_2 > serve_3 > offload > loadSnapshot("snapshot_id") > resume
(3) createVM > serve_1 > pause > createSnapshot("snapshot_id") > offload > loadSnapshot("snapshot_id") > resume > offload > loadSnapshot("snapshot_id") > resume > serve_2 > serve_3

These scenarios can be found as tests in ctriface/iface.go as TestLoadServeMultiple1,2,3 respectively.

The reason why the scenario 1 fails is because ctxFwd in functions.go expires when serve_3 is being served. The test log from this scenario:

root@node:/users/plamenpp/fccd-orchestrator# go test -v -run TestLoadServeMultiple
DEBU[2020-07-07T05:27:42.124646583-04:00] Creating a new NI pool with 1 ni-s.          
INFO[2020-07-07T05:27:42.124773340-04:00] Creating bridges and taps for a new NI pool with 1 ni-s. 
DEBU[2020-07-07T05:27:42.124803788-04:00] Creating bridge                               bridge=br0
DEBU[2020-07-07T05:27:42.125672876-04:00] Creating, setting master and enabling tap     bridge=br0 tap=fc-0-tap0
DEBU[2020-07-07T05:27:42.127154903-04:00] Enabling bridges and adding IP to them       
DEBU[2020-07-07T05:27:42.127199460-04:00] Enabling bridge                               bridge=br0
DEBU[2020-07-07T05:27:42.127450193-04:00] Adding ip: 190.128.0.1/10 to bridge           bridge=br0
INFO[2020-07-07T05:27:42.127719962-04:00] Creating containerd client                   
INFO[2020-07-07T05:27:42.128489913-04:00] Created containerd client                    
INFO[2020-07-07T05:27:42.128524971-04:00] Creating firecracker client                  
INFO[2020-07-07T05:27:42.128658110-04:00] Created firecracker client                   
=== RUN   TestLoadServeMultiple
DEBU[2020-07-07T05:27:42.128986556-04:00] fIDint=1340, err=<nil>, pinnedFuncNum=0      
DEBU[2020-07-07T05:27:42.129053766-04:00] Created function, pinned=true, shut down after 0 requests  fID=1340 imageName="ustiugov/helloworld:runner_workload"
INFO[2020-07-07T05:27:42.129096842-04:00] New function added                            fID=1340 image="ustiugov/helloworld:runner_workload" isPinned=true servedTh=0
DEBU[2020-07-07T05:27:42.129141279-04:00] Function is inactive, starting the instance...  fID=1340
DEBU[2020-07-07T05:27:42.129167786-04:00] Adding instance                               fID=1340
DEBU[2020-07-07T05:27:42.129215254-04:00] StartVM: Received StartVM                     image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:27:42.129286813-04:00] Allocating a VM instance                      vmID=1340_0
DEBU[2020-07-07T05:27:42.129324350-04:00] Allocate (NI): allocated ni with IP=190.128.0.2 
DEBU[2020-07-07T05:27:42.129413975-04:00] Pulling image ustiugov/helloworld:runner_workload 
DEBU[2020-07-07T05:28:00.902360999-04:00] image unpacked                                chainID="sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" config="sha256:2f23d35ae1d9c3bdc77186ea9a659f88b1f89f12e4c04700903859f0d1ac58a2"
DEBU[2020-07-07T05:28:00.935656303-04:00] StartVM: Creating a new VM                    image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:28:02.009583547-04:00] StartVM: Creating a new container             image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:28:02.792992256-04:00] StartVM: Creating a new task                  image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:28:02.979517355-04:00] StartVM: Waiting for the task to get ready    image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:28:02.979571521-04:00] StartVM: Starting the task                    image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:28:02.988545506-04:00] StartVM: Calling function's gRPC server       image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:28:03.744661461-04:00] StartVM: Creating a new gRPC client           image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:28:03.744712740-04:00] Successfully started a VM                     image="ustiugov/helloworld:runner_workload" vmID=1340_0
DEBU[2020-07-07T05:28:03.744767160-04:00] FwdRPC: Forwarding RPC to function instance   fID=1340
DEBU[2020-07-07T05:28:03.746324191-04:00] FwdRPC: Received a response from the  function instance  fID=1340
DEBU[2020-07-07T05:28:03.746399623-04:00] Orchestrator received PauseVM                 vmID=1340_0
DEBU[2020-07-07T05:28:03.748408360-04:00] Orchestrator received MakeSnapshot            vmID=1340_0
DEBU[2020-07-07T05:28:04.227844244-04:00] Orchestrator received OffloadOriginal         vmID=1340_0
DEBU[2020-07-07T05:28:09.229205717-04:00] Orchestrator received LoadSnapshot            vmID=1340_0
DEBU[2020-07-07T05:28:10.244034516-04:00] Orchestrator received ResumeVM                vmID=1340_0
DEBU[2020-07-07T05:28:10.245033278-04:00] FwdRPC: Forwarding RPC to function instance   fID=1340
DEBU[2020-07-07T05:28:10.271598245-04:00] FwdRPC: Received a response from the  function instance  fID=1340
DEBU[2020-07-07T05:28:10.271651602-04:00] Orchestrator received OffloadOriginal         vmID=1340_0
DEBU[2020-07-07T05:28:15.272637354-04:00] Orchestrator received LoadSnapshot            vmID=1340_0
DEBU[2020-07-07T05:28:16.288555202-04:00] Orchestrator received ResumeVM                vmID=1340_0
DEBU[2020-07-07T05:28:16.289568762-04:00] FwdRPC: Forwarding RPC to function instance   fID=1340
DEBU[2020-07-07T05:28:26.289766899-04:00] FwdRPC: Received a response from the  function instance  fID=1340
    TestLoadServeMultiple: fccd-orchestrator_test.go:201: 
            Error Trace:    fccd-orchestrator_test.go:201
            Error:          Received unexpected error:
                            rpc error: code = DeadlineExceeded desc = context deadline exceeded
            Test:           TestLoadServeMultiple
            Messages:       Function returned error on 3rd run
--- FAIL: TestLoadServeMultiple (44.16s)
FAIL
INFO[2020-07-07T05:28:26.290173514-04:00] waiting for goroutines                       
DEBU[2020-07-07T05:28:26.290211672-04:00] Orchestrator received StopVM                  vmID=1340_0
WARN[2020-07-07T05:28:26.291273414-04:00] Failed to kill the task: ttrpc: closed: unknown  vmID=1340_0
WARN[2020-07-07T05:28:26.291312591-04:00] Killing task of VM 1340_0 failedttrpc: closed: unknown  vmID=1340_0
INFO[2020-07-07T05:28:26.291325809-04:00] Killing task of VM 1340_0 failed              vmID=1340_0
INFO[2020-07-07T05:28:26.291345778-04:00] waiting done                                 
INFO[2020-07-07T05:28:26.291358352-04:00] Closing fcClient                             
INFO[2020-07-07T05:28:26.291375547-04:00] Closing containerd client                    
INFO[2020-07-07T05:28:26.291405250-04:00] Removing bridges and taps for a NI pool with 1 ni-s. 
DEBU[2020-07-07T05:28:26.291443455-04:00] Deleting tap                                  bridge=br0
DEBU[2020-07-07T05:28:26.349410654-04:00] Deleting bridge                               bridge=br0
exit status 1

Some pieces from the logs of the three Firecracker processes:
1

Running Firecracker v0.21.0
2020-07-07T06:21:34.895374146 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:34.895756929 [anonymous-instance:INFO] The API server received a Put request on "/machine-config" with body "{\"cpu_template\":\"T2\",\"ht_enabled\":false,\"mem_size_mib\":512,\"vcpu_count\":1}\n".
2020-07-07T06:21:34.895867258 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:34.896123450 [anonymous-instance:INFO] The API server received a Get request on "/machine-config".
2020-07-07T06:21:34.896223999 [anonymous-instance:INFO] The request was executed successfully. Status code: 200 OK.
2020-07-07T06:21:34.896610681 [anonymous-instance:INFO] The API server received a Put request on "/boot-source" with body "{\"boot_args\":\"quiet reboot=k panic=1 nomodules systemd.log_color=false init=/sbin/overlay-init pci=off tsc=reliable ipv6.disable=1 8250.nr_uarts=0 ip=190.128.0.2::190.128.0.1:255.192.0.0:::off::: ro noapic systemd.unit=firecracker.target\",\"kernel_image_path\":\"/var/lib/firecracker-containerd/runtime/hello-vmlinux.bin\"}\n".
2020-07-07T06:21:34.896751438 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:34.897213561 [anonymous-instance:INFO] The API server received a Put request on "/drives/root_drive" with body "{\"drive_id\":\"root_drive\",\"is_read_only\":true,\"is_root_device\":true,\"path_on_host\":\"/var/lib/firecracker-containerd/runtime/default-rootfs.img\"}\n".
2020-07-07T06:21:34.897355757 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:34.897778583 [anonymous-instance:INFO] The API server received a Put request on "/drives/MN2HE43UOVRDA" with body "{\"drive_id\":\"MN2HE43UOVRDA\",\"is_read_only\":false,\"is_root_device\":false,\"path_on_host\":\"/var/lib/firecracker-containerd/shim-base/firecracker-containerd/1340_0/ctrstub0\"}\n".
2020-07-07T06:21:34.897950949 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:34.898337353 [anonymous-instance:INFO] The API server received a Put request on "/network-interfaces/1" with body "{\"guest_mac\":\"02:FC:00:00:00:00\",\"host_dev_name\":\"fc-0-tap0\",\"iface_id\":\"1\"}\n".
2020-07-07T06:21:34.900621867 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:34.900857063 [anonymous-instance:INFO] The API server received a Put request on "/vsock" with body "{\"guest_cid\":0,\"uds_path\":\"firecracker.vsock\",\"vsock_id\":\"agent_api\"}\n".
2020-07-07T06:21:34.901058248 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:34.901343784 [anonymous-instance:INFO] The API server received a Put request on "/actions" with body "{\"action_type\":\"InstanceStart\"}\n".
2020-07-07T06:21:34.915909373 [anonymous-instance:WARN] Could not add serial input event to epoll: Error during epoll call: Operation not permitted (os error 1)
2020-07-07T06:21:34.916648353 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:36.775167087 [anonymous-instance:INFO] The API server received a Patch request on "/drives/MN2HE43UOVRDA" with body "{\"drive_id\":\"MN2HE43UOVRDA\",\"path_on_host\":\"/dev/mapper/fc-dev-thinpool-snap-9\"}\n".
2020-07-07T06:21:36.775382699 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:37.731278280 [anonymous-instance:INFO] The API server received a Patch request on "/vm" with body "{\"state\":\"Paused\"}".
2020-07-07T06:21:37.731480436 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:37.732741180 [anonymous-instance:INFO] The API server received a Put request on "/snapshot/create" with body "{\"mem_file_path\":\"/tmp/mem_test\",\"snapshot_path\":\"/tmp/snap_test\",\"snapshot_type\":\"Full\"}".
2020-07-07T06:21:38.339570255 [anonymous-instance:INFO] The request was executed successfully. Status code: 204 No Content.

2

Running Firecracker v0.21.0
2020-07-07T06:21:49.343139628 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:124] The API server received a Put request on "/snapshot/load" with body "{\"mem_file_path\":\"/tmp/mem_test\",\"snapshot_path\":\"/tmp/snap_test\"}".
2020-07-07T06:21:49.352649025 [anonymous-instance:WARN:src/vmm/src/builder.rs:559] Could not add serial input event to epoll: Error during epoll call: Operation not permitted (os error 1)
2020-07-07T06:21:49.356239695 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:89] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:49.356482168 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:124] The API server received a Patch request on "/drives/MN2HE43UOVRDA" with body "{\"drive_id\":\"MN2HE43UOVRDA\",\"path_on_host\":\"/dev/mapper/fc-dev-thinpool-snap-9\"}".
2020-07-07T06:21:49.356611157 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:89] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:21:49.357668627 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:124] The API server received a Patch request on "/vm" with body "{\"state\":\"Resumed\"}".
2020-07-07T06:21:49.357761943 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:89] The request was executed successfully. Status code: 204 No Content.

3

Running Firecracker v0.21.0
2020-07-07T06:22:00.382275333 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:124] The API server received a Put request on "/snapshot/load" with body "{\"mem_file_path\":\"/tmp/mem_test\",\"snapshot_path\":\"/tmp/snap_test\"}".
2020-07-07T06:22:00.391919238 [anonymous-instance:WARN:src/vmm/src/builder.rs:559] Could not add serial input event to epoll: Error during epoll call: Operation not permitted (os error 1)
2020-07-07T06:22:00.395530632 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:89] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:22:00.395767151 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:124] The API server received a Patch request on "/drives/MN2HE43UOVRDA" with body "{\"drive_id\":\"MN2HE43UOVRDA\",\"path_on_host\":\"/dev/mapper/fc-dev-thinpool-snap-9\"}".
2020-07-07T06:22:00.395843112 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:89] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:22:00.396718910 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:124] The API server received a Patch request on "/vm" with body "{\"state\":\"Resumed\"}".
2020-07-07T06:22:00.396805039 [anonymous-instance:INFO:src/api_server/src/parsed_request.rs:89] The request was executed successfully. Status code: 204 No Content.
2020-07-07T06:22:30.429550947 [anonymous-instance:WARN:src/devices/src/virtio/net/event_handler.rs:55] Received unknown event: ERROR from source: 39
2020-07-07T06:22:53.922220999 [anonymous-instance:ERROR:src/devices/src/virtio/net/device.rs:369] Failed to write to tap: Os { code: 77, kind: Other, message: "File descriptor in bad state" }
2020-07-07T06:22:59.042250138 [anonymous-instance:ERROR:src/devices/src/virtio/net/device.rs:369] Failed to write to tap: Os { code: 77, kind: Other, message: "File descriptor in bad state" }

We can see a tap error in the log of the third Firecracker instance. This error seems to be inconsistent and does not always occur. It persists even if we recreate the tap before loading the third instance from the snapshot.

2020-07-07T06:22:59.042250138 [anonymous-instance:ERROR:src/devices/src/virtio/net/device.rs:369] Failed to write to tap: Os { code: 77, kind: Other, message: "File descriptor in bad state" }

The corresponding containerd log (some pieces omitted for clarity but full log is available):

root@node:~/go/src/github.com/firecracker-microvm/firecracker-containerd# sudo PATH=$PATH /usr/local/bin/firecracker-containerd \
>   --config /etc/firecracker-containerd/config.toml
INFO[2020-07-07T05:26:52.559072873-04:00] starting containerd                           revision=d69b31a5db41e8592d8256cf3c30b22ffbd1ffe7 version=1.3.4+unknown
INFO[2020-07-07T05:26:52.595318480-04:00] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2020-07-07T05:26:52.595446553-04:00] loading plugin "io.containerd.snapshotter.v1.devmapper"...  type=io.containerd.snapshotter.v1
INFO[2020-07-07T05:26:52.595536522-04:00] initializing pool device "fc-dev-thinpool"   
INFO[2020-07-07T05:26:52.598099284-04:00] using dmsetup:
Library version:   1.02.145 (2017-11-03)
Driver version:    4.37.0 
INFO[2020-07-07T05:26:52.685880753-04:00] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2020-07-07T05:26:52.686082029-04:00] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
INFO[2020-07-07T05:26:52.686136280-04:00] metadata content store policy set             policy=shared
INFO[2020-07-07T05:26:52.761376859-04:00] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2020-07-07T05:26:52.761444333-04:00] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2020-07-07T05:26:52.761518552-04:00] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.761556661-04:00] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.761584639-04:00] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.761613920-04:00] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.761643859-04:00] loading plugin "io.containerd.service.v1.leases-service"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.761672573-04:00] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.761698667-04:00] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.761725082-04:00] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2020-07-07T05:26:52.761846847-04:00] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
INFO[2020-07-07T05:26:52.761924913-04:00] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2020-07-07T05:26:52.762229499-04:00] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.762258333-04:00] loading plugin "io.containerd.service.v1.fc-control"...  type=io.containerd.service.v1
INFO[2020-07-07T05:26:52.762501347-04:00] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2020-07-07T05:26:52.762563074-04:00] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762580031-04:00] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762593669-04:00] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762610931-04:00] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762626423-04:00] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762640296-04:00] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762653710-04:00] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762666910-04:00] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762679531-04:00] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
INFO[2020-07-07T05:26:52.762713575-04:00] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762731267-04:00] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762745173-04:00] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.762757353-04:00] loading plugin "io.containerd.grpc.v1.fc-control-service"...  type=io.containerd.grpc.v1
DEBU[2020-07-07T05:26:52.762766779-04:00] initializing fc-control-service plugin       
INFO[2020-07-07T05:26:52.762780545-04:00] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2020-07-07T05:26:52.763321213-04:00] serving...                                    address=/run/firecracker-containerd/containerd.sock.ttrpc
INFO[2020-07-07T05:26:52.763478246-04:00] serving...                                    address=/run/firecracker-containerd/containerd.sock
INFO[2020-07-07T05:26:52.763527289-04:00] containerd successfully booted in 0.205369s  
Creating filesystem with 2621440 4k blocks and 655360 inodes
Filesystem UUID: a01c5487-2aa5-416a-a07e-8cbbe17cf222
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 
 
DEBU[2020-07-07T05:27:50.040984343-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:27:50.041848088-04:00] using pigz for decompression                 
DEBU[2020-07-07T05:27:50.077520230-04:00] event published                               ns=firecracker-containerd topic=/images/create type=containerd.services.images.v1.ImageCreate
DEBU[2020-07-07T05:27:50.587795629-04:00] diff applied                                  d=546.102571ms dgst="sha256:c9b1b535fdd91a9855fb7f82348177e5f019329a58c53c47272962dd60f71fc9" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=2802957
DEBU[2020-07-07T05:27:50.588286631-04:00] commit snapshot                               key="extract-810172102-Xw-a sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10" name="sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T05:27:50.588415651-04:00] commit                                        key="firecracker-containerd/1/extract-810172102-Xw-a sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10" name="firecracker-containerd/2/sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T05:27:50.782250117-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T05:27:50.799509660-04:00] stat snapshot                                 key="sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T05:27:50.800039893-04:00] prepare snapshot                              key="extract-799802063-Ygut sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5" parent="sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T05:27:50.800140647-04:00] prepare                                       key="firecracker-containerd/3/extract-799802063-Ygut sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5" parent="firecracker-containerd/2/sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T05:27:50.800245411-04:00] creating snapshot device 'fc-dev-thinpool-snap-2' from 'fc-dev-thinpool-snap-1' 
DEBU[2020-07-07T05:27:51.173715642-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:27:51.574100603-04:00] diff applied                                  d=399.919859ms dgst="sha256:2cc5ad85d9abaadf23d5ae53c3f32e7ccb2df1956869980bfd2491ff396d348a" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=301261
DEBU[2020-07-07T05:27:51.574575286-04:00] commit snapshot                               key="extract-799802063-Ygut sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5" name="sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T05:27:51.574684001-04:00] commit                                        key="firecracker-containerd/3/extract-799802063-Ygut sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5" name="firecracker-containerd/4/sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T05:27:51.740156971-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T05:27:51.757251561-04:00] stat snapshot                                 key="sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T05:27:51.757767125-04:00] prepare snapshot                              key="extract-757499511-s3xY sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa" parent="sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T05:27:51.757914070-04:00] prepare                                       key="firecracker-containerd/5/extract-757499511-s3xY sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa" parent="firecracker-containerd/4/sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T05:27:51.758035325-04:00] creating snapshot device 'fc-dev-thinpool-snap-3' from 'fc-dev-thinpool-snap-2' 
DEBU[2020-07-07T05:27:52.081602348-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:27:54.927295026-04:00] diff applied                                  d=2.845149739s dgst="sha256:61614c1a5710c76af6b2a9c7170a81eb0dd76ccf90e921abc0c9dcc1d5ed490e" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=31506523
DEBU[2020-07-07T05:27:54.927964780-04:00] commit snapshot                               key="extract-757499511-s3xY sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa" name="sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T05:27:54.928159974-04:00] commit                                        key="firecracker-containerd/5/extract-757499511-s3xY sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa" name="firecracker-containerd/6/sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T05:27:55.105053387-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T05:27:55.130721148-04:00] stat snapshot                                 key="sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T05:27:55.131231999-04:00] prepare snapshot                              key="extract-130976119-IbbZ sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f" parent="sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T05:27:55.131334522-04:00] prepare                                       key="firecracker-containerd/7/extract-130976119-IbbZ sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f" parent="firecracker-containerd/6/sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T05:27:55.131435707-04:00] creating snapshot device 'fc-dev-thinpool-snap-4' from 'fc-dev-thinpool-snap-3' 
DEBU[2020-07-07T05:27:55.513229584-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:27:55.796101796-04:00] diff applied                                  d=282.373254ms dgst="sha256:0522d30cde10ac29ae2c555b9bde76c2b50aafc7ef7435bbc7e19de706bcadcd" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=230
DEBU[2020-07-07T05:27:55.796559706-04:00] commit snapshot                               key="extract-130976119-IbbZ sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f" name="sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T05:27:55.796698395-04:00] commit                                        key="firecracker-containerd/7/extract-130976119-IbbZ sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f" name="firecracker-containerd/8/sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T05:27:55.963032800-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T05:27:55.980150497-04:00] stat snapshot                                 key="sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T05:27:55.980677942-04:00] prepare snapshot                              key="extract-980463674-VSak sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270" parent="sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T05:27:55.980818926-04:00] prepare                                       key="firecracker-containerd/9/extract-980463674-VSak sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270" parent="firecracker-containerd/8/sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T05:27:55.980950061-04:00] creating snapshot device 'fc-dev-thinpool-snap-5' from 'fc-dev-thinpool-snap-4' 
DEBU[2020-07-07T05:27:56.312849278-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:27:56.857960543-04:00] diff applied                                  d=544.629851ms dgst="sha256:407da27a03363f6b9d368ec6e131f7f2db7c8cb2a149160d913d6f3698905a5d" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=1887472
DEBU[2020-07-07T05:27:56.858493085-04:00] commit snapshot                               key="extract-980463674-VSak sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270" name="sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T05:27:56.858671930-04:00] commit                                        key="firecracker-containerd/9/extract-980463674-VSak sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270" name="firecracker-containerd/10/sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T05:27:57.004128893-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T05:27:57.021485411-04:00] stat snapshot                                 key="sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T05:27:57.021994760-04:00] prepare snapshot                              key="extract-21752481-GpUE sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7" parent="sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T05:27:57.022126066-04:00] prepare                                       key="firecracker-containerd/11/extract-21752481-GpUE sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7" parent="firecracker-containerd/10/sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T05:27:57.022201943-04:00] creating snapshot device 'fc-dev-thinpool-snap-6' from 'fc-dev-thinpool-snap-5' 
DEBU[2020-07-07T05:27:57.370541087-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:27:57.618487413-04:00] diff applied                                  d=247.547361ms dgst="sha256:f2f5e0de6a9cf285883340ceabbf44b70c97a796d122f4238076609d95e0f48b" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=2574
DEBU[2020-07-07T05:27:57.618882354-04:00] commit snapshot                               key="extract-21752481-GpUE sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7" name="sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T05:27:57.618991005-04:00] commit                                        key="firecracker-containerd/11/extract-21752481-GpUE sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7" name="firecracker-containerd/12/sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T05:27:57.787098250-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T05:27:57.812534133-04:00] stat snapshot                                 key="sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T05:27:57.813026711-04:00] prepare snapshot                              key="extract-812814467-aAtO sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446" parent="sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T05:27:57.813115121-04:00] prepare                                       key="firecracker-containerd/13/extract-812814467-aAtO sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446" parent="firecracker-containerd/12/sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T05:27:57.813192897-04:00] creating snapshot device 'fc-dev-thinpool-snap-7' from 'fc-dev-thinpool-snap-6' 
DEBU[2020-07-07T05:27:58.245202953-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:27:59.730094825-04:00] diff applied                                  d=1.48447109s dgst="sha256:c5b841e7f99ab6c98c254ad0cfa12c4d64193446f00eebd727990d0a74687d26" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=20332232
DEBU[2020-07-07T05:27:59.730607080-04:00] commit snapshot                               key="extract-812814467-aAtO sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446" name="sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T05:27:59.730774779-04:00] commit                                        key="firecracker-containerd/13/extract-812814467-aAtO sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446" name="firecracker-containerd/14/sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T05:27:59.902670832-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T05:27:59.928319406-04:00] stat snapshot                                 key="sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T05:27:59.928895901-04:00] prepare snapshot                              key="extract-928619473-fIt2 sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" parent="sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T05:27:59.929032030-04:00] prepare                                       key="firecracker-containerd/15/extract-928619473-fIt2 sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" parent="firecracker-containerd/14/sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T05:27:59.929177551-04:00] creating snapshot device 'fc-dev-thinpool-snap-8' from 'fc-dev-thinpool-snap-7' 
DEBU[2020-07-07T05:28:00.294125042-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:28:00.649704716-04:00] diff applied                                  d=355.183905ms dgst="sha256:e139bc3167f9b1c39e29f2dd8c9057de0d5bd47b940ef86ee853fe8011f08646" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=635791
DEBU[2020-07-07T05:28:00.650166458-04:00] commit snapshot                               key="extract-928619473-fIt2 sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" name="sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T05:28:00.650320994-04:00] commit                                        key="firecracker-containerd/15/extract-928619473-fIt2 sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" name="firecracker-containerd/16/sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T05:28:00.852232971-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T05:28:00.936694921-04:00] create VM request: VMID:"1340_0" MachineCfg:<MemSizeMib:512 VcpuCount:1 > KernelArgs:"ro noapic reboot=k panic=1 pci=off nomodules systemd.log_color=false systemd.unit=firecracker.target init=/sbin/overlay-init tsc=reliable quiet 8250.nr_uarts=0 ipv6.disable=1" NetworkInterfaces:<StaticConfig:<MacAddress:"02:FC:00:00:00:00" HostDevName:"fc-0-tap0" IPConfig:<PrimaryAddr:"190.128.0.2/10" GatewayAddr:"190.128.0.1" > > >  
DEBU[2020-07-07T05:28:00.936744743-04:00] using namespace: firecracker-containerd      
DEBU[2020-07-07T05:28:00.937097606-04:00] starting containerd-shim-aws-firecracker      vmID=1340_0
INFO[2020-07-07T05:28:00.975215598-04:00] starting signal loop                          namespace=firecracker-containerd path=/var/lib/firecracker-containerd/shim-base/firecracker-containerd/1340_0 pid=48009
INFO[2020-07-07T05:28:00.977003433-04:00] creating new VM                               runtime=aws.firecracker vmID=1340_0
ERRO[2020-07-07T05:28:00.977550185-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="Called startVMM(), setting up a VMM on firecracker.sock"  shim_stream=stderr vmID=1340_0
setupLogging
setupMetrics
ERRO[2020-07-07T05:28:00.989307263-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="VMM metrics disabled."  shim_stream=stderr vmID=1340_0
ERRO[2020-07-07T05:28:00.990144364-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="refreshMachineConfiguration: [GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate:T2 HtEnabled:0xc000800993 MemSizeMib:0xc000800988 VcpuCount:0xc000800980}"  shim_stream=stderr vmID=1340_0
ERRO[2020-07-07T05:28:00.990633455-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="PutGuestBootSource: [PUT /boot-source][204] putGuestBootSourceNoContent "  shim_stream=stderr vmID=1340_0
ERRO[2020-07-07T05:28:00.990678166-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="Attaching drive /var/lib/firecracker-containerd/runtime/default-rootfs.img, slot root_drive, root true."  shim_stream=stderr vmID=1340_0
ERRO[2020-07-07T05:28:00.991265877-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="Attached drive /var/lib/firecracker-containerd/runtime/default-rootfs.img: [PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "  shim_stream=stderr vmID=1340_0
ERRO[2020-07-07T05:28:00.991302218-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="Attaching drive /var/lib/firecracker-containerd/shim-base/firecracker-containerd/1340_0/ctrstub0, slot MN2HE43UOVRDA, root false."  shim_stream=stderr vmID=1340_0
ERRO[2020-07-07T05:28:00.991635977-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="Attached drive /var/lib/firecracker-containerd/shim-base/firecracker-containerd/1340_0/ctrstub0: [PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "  shim_stream=stderr vmID=1340_0
ERRO[2020-07-07T05:28:00.991673029-04:00] time="2020-07-07T05:28:00-04:00" level=info msg="Attaching NIC fc-0-tap0 (hwaddr 02:FC:00:00:00:00) at index 1"  shim_stream=stderr vmID=1340_0
INFO[2020-07-07T05:28:01.008343182-04:00] calling agent                                 runtime=aws.firecracker vmID=1340_0
ERRO[2020-07-07T05:28:01.008395172-04:00] time="2020-07-07T05:28:01-04:00" level=info msg="startInstance successful: [PUT /actions][204] createSyncActionNoContent "  shim_stream=stderr vmID=1340_0
DEBU[2020-07-07T05:28:01.085354764-04:00] garbage collected                             d=13.592553ms
INFO[2020-07-07T05:28:02.008808269-04:00] successfully started the VM                   runtime=aws.firecracker vmID=1340_0
DEBU[2020-07-07T05:28:02.009353746-04:00] event forwarded                               ns=firecracker-containerd topic=/firecracker-vm/start type=VMStart
DEBU[2020-07-07T05:28:02.028997046-04:00] prepare snapshot                              key=1340_0 parent="sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T05:28:02.029070648-04:00] prepare                                       key=firecracker-containerd/17/1340_0 parent="firecracker-containerd/16/sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T05:28:02.029170142-04:00] creating snapshot device 'fc-dev-thinpool-snap-9' from 'fc-dev-thinpool-snap-8' 
DEBU[2020-07-07T05:28:02.459742899-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T05:28:02.464678749-04:00] get snapshot mounts                           key=1340_0
DEBU[2020-07-07T05:28:02.464748364-04:00] mounts                                        key=firecracker-containerd/17/1340_0
DEBU[2020-07-07T05:28:02.776181704-04:00] event published                               ns=firecracker-containerd topic=/containers/create type=containerd.events.ContainerCreate
DEBU[2020-07-07T05:28:02.794023002-04:00] get snapshot mounts                           key=1340_0
DEBU[2020-07-07T05:28:02.794095279-04:00] mounts                                        key=firecracker-containerd/17/1340_0
time="2020-07-07T05:28:02.830528311-04:00" level=debug msg=StartShim runtime=aws.firecracker task_id=1340_0
DEBU[2020-07-07T05:28:02.831561527-04:00] create VM request: VMID:"1340_0"             
DEBU[2020-07-07T05:28:02.831601643-04:00] using namespace: firecracker-containerd      
time="2020-07-07T05:28:02.832585933-04:00" level=info msg="successfully started shim (git commit: d69b31a5db41e8592d8256cf3c30b22ffbd1ffe7)." runtime=aws.firecracker task_id=1340_0 vmID=1340_0
ERRO[2020-07-07T05:28:02.834776840-04:00] time="2020-07-07T05:28:02-04:00" level=info msg="PatchGuestDrive successful"  shim_stream=stderr vmID=1340_0
INFO[2020-07-07T05:28:02.979041016-04:00] successfully created task                     ExecID= TaskID=1340_0 pid_in_vm=725 runtime=aws.firecracker vmID=1340_0
DEBU[2020-07-07T05:28:02.979538096-04:00] event forwarded                               ns=firecracker-containerd topic=/tasks/create type=containerd.events.TaskCreate
DEBU[2020-07-07T05:28:02.987265768-04:00] event forwarded                               ns=firecracker-containerd topic=/tasks/start type=containerd.events.TaskStart
DEBU[2020-07-07T05:28:03.284202755-04:00] garbage collected                             d=10.161289ms
DEBU[2020-07-07T05:28:03.746793274-04:00] pause VM request: VMID:"1340_0"              
DEBU[2020-07-07T05:28:03.748729580-04:00] make snapshot request: VMID:"1340_0" SnapshotFilePath:"/tmp/snap_test" MemFilePath:"/tmp/mem_test"  
DEBU[2020-07-07T05:28:04.228231032-04:00] offload original request: VMID:"1340_0"      
WARN[2020-07-07T05:28:04.228684786-04:00] PID OF FIRECRACKER 48025                      runtime=aws.firecracker vmID=1340_0
ERRO[2020-07-07T05:28:04.243605827-04:00] error waiting for exit                        ExecID= TaskID=1340_0 error="ttrpc: closed" runtime=aws.firecracker vmID=1340_0
ERRO[2020-07-07T05:28:04.243609178-04:00] error while forwarding events from VM agent   error="ttrpc: closed" runtime=aws.firecracker vmID=1340_0
ERRO[2020-07-07T05:28:04.243812521-04:00] error copying io                              ExecID= TaskID=1340_0 error="reading from a closed fifo" runtime=aws.firecracker stream=stdin vmID=1340_0
ERRO[2020-07-07T05:28:04.243883532-04:00] error closing io stream                       ExecID= TaskID=1340_0 error="1 error occurred:\n\t* close unix @->firecracker.vsock: use of closed network connection\n\n" runtime=aws.firecracker stream=stdin vmID=1340_0
ERRO[2020-07-07T05:28:04.274731081-04:00] time="2020-07-07T05:28:04-04:00" level=warning msg="firecracker exited: signal: killed"  shim_stream=stderr vmID=1340_0
DEBU[2020-07-07T05:28:09.229591310-04:00] load snapshot request: VMID:"1340_0" SnapshotFilePath:"/tmp/snap_test" MemFilePath:"/tmp/mem_test"  
ERRO[2020-07-07T05:28:09.244066862-04:00] error closing io stream                       ExecID= TaskID=1340_0 error="1 error occurred:\n\t* close unix @->firecracker.vsock: use of closed network connection\n\n" runtime=aws.firecracker stream=stderr vmID=1340_0
ERRO[2020-07-07T05:28:09.244166327-04:00] error closing io stream                       ExecID= TaskID=1340_0 error="1 error occurred:\n\t* close unix @->firecracker.vsock: use of closed network connection\n\n" runtime=aws.firecracker stream=stdout vmID=1340_0
DEBU[2020-07-07T05:28:10.244306548-04:00] resume VM request: VMID:"1340_0"             
DEBU[2020-07-07T05:28:10.271869331-04:00] offload original request: VMID:"1340_0"      
WARN[2020-07-07T05:28:10.272171085-04:00] PID OF FIRECRACKER 48063                      runtime=aws.firecracker vmID=1340_0
DEBU[2020-07-07T05:28:15.272941640-04:00] load snapshot request: VMID:"1340_0" SnapshotFilePath:"/tmp/snap_test" MemFilePath:"/tmp/mem_test"  
WARN[2020-07-07T05:28:16.273883820-04:00] glei tuka e /var/lib/firecracker-containerd/shim-base/firecracker-containerd/1340_0/firecracker.sock  runtime=aws.firecracker vmID=1340_0
DEBU[2020-07-07T05:28:16.288754742-04:00] resume VM request: VMID:"1340_0"

We are not able to pinpoint what the cause of the expiring context deadline is. We find that the taps error which we observed in the logs of the third Firecracker machine in the non-working scenario can be observed in the working scenarios as well (in their third instance).

2020-07-07T06:22:59.042250138 [anonymous-instance:ERROR:src/devices/src/virtio/net/device.rs:369] Failed to write to tap: Os { code: 77, kind: Other, message: "File descriptor in bad state" }

StopSingleVM (stripped down) does not work

We are implementing the following workflow, where our goal is to shut down a VM which has been loaded from a snapshot.

StopSingleVM in ctriface/iface.go is called when the VM instance needs to be shut down cleanly, when it is not loaded from a snapshot. It has the following workflow:

  1. Kill the task.
  2. Delete the task
  3. Delete the container
  4. Stop the VM by calling firecracker-containerd StopVM.

We decided to skip steps 1-3 when shutting down the VM that is loaded from a snapshot for the time being, and only shut down the VM with step 4. This functionality is encompassed in the StopSingleVMOnly method in ctriface/iface.go.

We use the following call chain: createVM > pause > createSnapshot > offload > loadSnapshot > resume > stopSingleVMOnly. This chain is encompassed in the TestStartSnapStop test in ctriface/iface_test.go.

We encounter the following error from the test:

WARN[2020-07-07T11:30:31.436857891-04:00] failed to stop the VM: rpc error: code = Internal desc = the VMM was killed forcibly: 1 error occurred:
	* signal: killed
  vmID=505

The test logs as well as containerd logs are shown below. The Firecracker logs from the two instances do not indicate any errors and are omitted for brevity.

The test logs:

root@node:/users/plamenpp/fccd-orchestrator/ctriface# go test -v -run TestStartSnapStop=== RUN   TestStartSnapStop
DEBU[2020-07-07T11:29:55.794626948-04:00] Creating a new NI pool with 1 ni-s.          
INFO[2020-07-07T11:29:55.794807046-04:00] Creating bridges and taps for a new NI pool with 1 ni-s. 
DEBU[2020-07-07T11:29:55.794824447-04:00] Creating bridge                               bridge=br0
DEBU[2020-07-07T11:29:55.795713628-04:00] Creating, setting master and enabling tap     bridge=br0 tap=fc-0-tap0
DEBU[2020-07-07T11:29:55.797212385-04:00] Enabling bridges and adding IP to them       
DEBU[2020-07-07T11:29:55.797243951-04:00] Enabling bridge                               bridge=br0
DEBU[2020-07-07T11:29:55.797488325-04:00] Adding ip: 190.128.0.1/10 to bridge           bridge=br0
INFO[2020-07-07T11:29:55.797661397-04:00] Creating containerd client                   
INFO[2020-07-07T11:29:55.798278446-04:00] Created containerd client                    
INFO[2020-07-07T11:29:55.798304821-04:00] Creating firecracker client                  
INFO[2020-07-07T11:29:55.798417044-04:00] Created firecracker client                   
DEBU[2020-07-07T11:29:55.798443914-04:00] StartVM: Received StartVM                     image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:29:55.798464167-04:00] Allocating a VM instance                      vmID=505
DEBU[2020-07-07T11:29:55.798491223-04:00] Allocate (NI): allocated ni with IP=190.128.0.2 
DEBU[2020-07-07T11:29:55.798529713-04:00] Pulling image ustiugov/helloworld:runner_workload 
DEBU[2020-07-07T11:30:16.436448382-04:00] image unpacked                                chainID="sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" config="sha256:2f23d35ae1d9c3bdc77186ea9a659f88b1f89f12e4c04700903859f0d1ac58a2"
DEBU[2020-07-07T11:30:16.478052073-04:00] StartVM: Creating a new VM                    image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:30:17.654174222-04:00] StartVM: Creating a new container             image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:30:18.443689875-04:00] StartVM: Creating a new task                  image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:30:18.635022208-04:00] StartVM: Waiting for the task to get ready    image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:30:18.635076196-04:00] StartVM: Starting the task                    image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:30:18.645209134-04:00] StartVM: Calling function's gRPC server       image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:30:19.420896853-04:00] StartVM: Creating a new gRPC client           image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:30:19.420951327-04:00] Successfully started a VM                     image="ustiugov/helloworld:runner_workload" vmID=505
DEBU[2020-07-07T11:30:19.421010207-04:00] Orchestrator received PauseVM                 vmID=505
DEBU[2020-07-07T11:30:19.423120619-04:00] Orchestrator received MakeSnapshot            vmID=505
DEBU[2020-07-07T11:30:20.406307498-04:00] Orchestrator received OffloadOriginal         vmID=505
DEBU[2020-07-07T11:30:25.407557511-04:00] Orchestrator received LoadSnapshot            vmID=505
DEBU[2020-07-07T11:30:26.423254255-04:00] Orchestrator received ResumeVM                vmID=505
DEBU[2020-07-07T11:30:26.424406911-04:00] Orchestrator received StopVM                  vmID=505
WARN[2020-07-07T11:30:31.436857891-04:00] failed to stop the VM: rpc error: code = Internal desc = the VMM was killed forcibly: 1 error occurred:
	* signal: killed
  vmID=505
    TestStartSnapStop: iface_test.go:137: 
        	Error Trace:	iface_test.go:137
        	Error:      	Received unexpected error:
        	            	rpc error: code = Internal desc = the VMM was killed forcibly: 1 error occurred:
        	            		* signal: killed
        	            	
        	Test:       	TestStartSnapStop
        	Messages:   	Failed to stop VM, Stopping VM 505 failed
--- FAIL: TestStartSnapStop (35.64s)
FAIL
exit status 1
FAIL	github.com/ustiugov/fccd-orchestrator/ctriface	35.659s

Containerd logs:

root@node:~/go/src/github.com/firecracker-microvm/firecracker-containerd# sudo PATH=$PATH /usr/local/bin/firecracker-containerd   --config /etc/firecracker-containerd/config.toml
INFO[2020-07-07T11:29:50.959814511-04:00] starting containerd                           revision=d69b31a5db41e8592d8256cf3c30b22ffbd1ffe7 version=1.3.4+unknown
INFO[2020-07-07T11:29:51.002304034-04:00] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2020-07-07T11:29:51.002406135-04:00] loading plugin "io.containerd.snapshotter.v1.devmapper"...  type=io.containerd.snapshotter.v1
INFO[2020-07-07T11:29:51.002465645-04:00] initializing pool device "fc-dev-thinpool"   
INFO[2020-07-07T11:29:51.004972189-04:00] using dmsetup:
Library version:   1.02.145 (2017-11-03)
Driver version:    4.37.0 
INFO[2020-07-07T11:29:51.082281251-04:00] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2020-07-07T11:29:51.082575077-04:00] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
INFO[2020-07-07T11:29:51.082616811-04:00] metadata content store policy set             policy=shared
INFO[2020-07-07T11:29:51.148958094-04:00] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2020-07-07T11:29:51.148987901-04:00] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2020-07-07T11:29:51.149028465-04:00] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2020-07-07T11:29:51.149059528-04:00] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2020-07-07T11:29:51.149071807-04:00] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2020-07-07T11:29:51.149086835-04:00] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2020-07-07T11:29:51.149100559-04:00] loading plugin "io.containerd.service.v1.leases-service"...  type=io.containerd.service.v1
INFO[2020-07-07T11:29:51.149114141-04:00] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2020-07-07T11:29:51.149127775-04:00] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2020-07-07T11:29:51.149150491-04:00] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2020-07-07T11:29:51.149283622-04:00] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
INFO[2020-07-07T11:29:51.149389614-04:00] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2020-07-07T11:29:51.149917683-04:00] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2020-07-07T11:29:51.149965596-04:00] loading plugin "io.containerd.service.v1.fc-control"...  type=io.containerd.service.v1
DEBU[2020-07-07T11:29:51.149976136-04:00] initializing fc-control plugin (root: "/var/lib/firecracker-containerd/containerd/io.containerd.service.v1.fc-control") 
INFO[2020-07-07T11:29:51.150288225-04:00] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2020-07-07T11:29:51.150364277-04:00] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150381086-04:00] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150394451-04:00] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150407554-04:00] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150420132-04:00] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150446549-04:00] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150458822-04:00] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150470733-04:00] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150484104-04:00] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
INFO[2020-07-07T11:29:51.150543194-04:00] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150583348-04:00] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150607443-04:00] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150630165-04:00] loading plugin "io.containerd.grpc.v1.fc-control-service"...  type=io.containerd.grpc.v1
DEBU[2020-07-07T11:29:51.150640170-04:00] initializing fc-control-service plugin       
INFO[2020-07-07T11:29:51.150656961-04:00] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2020-07-07T11:29:51.150995234-04:00] serving...                                    address=/run/firecracker-containerd/containerd.sock.ttrpc
INFO[2020-07-07T11:29:51.151057085-04:00] serving...                                    address=/run/firecracker-containerd/containerd.sock
INFO[2020-07-07T11:29:51.151070651-04:00] containerd successfully booted in 0.193335s  
DEBU[2020-07-07T11:29:51.265567176-04:00] garbage collected                             d=16.211492ms
DEBU[2020-07-07T11:29:56.488305911-04:00] (*service).Write started                      expected="sha256:1e2f6f681cd8041fe8121b01e50addb02f2091e214d0d567aba838518ad0f122" ref="manifest-sha256:1e2f6f681cd8041fe8121b01e50addb02f2091e214d0d567aba838518ad0f122" total=1998
DEBU[2020-07-07T11:29:56.763215602-04:00] (*service).Write started                      expected="sha256:61614c1a5710c76af6b2a9c7170a81eb0dd76ccf90e921abc0c9dcc1d5ed490e" ref="layer-sha256:61614c1a5710c76af6b2a9c7170a81eb0dd76ccf90e921abc0c9dcc1d5ed490e" total=31506523
DEBU[2020-07-07T11:29:56.763407851-04:00] (*service).Write started                      expected="sha256:f2f5e0de6a9cf285883340ceabbf44b70c97a796d122f4238076609d95e0f48b" ref="layer-sha256:f2f5e0de6a9cf285883340ceabbf44b70c97a796d122f4238076609d95e0f48b" total=2574
DEBU[2020-07-07T11:29:56.763578187-04:00] (*service).Write started                      expected="sha256:c5b841e7f99ab6c98c254ad0cfa12c4d64193446f00eebd727990d0a74687d26" ref="layer-sha256:c5b841e7f99ab6c98c254ad0cfa12c4d64193446f00eebd727990d0a74687d26" total=20332232
DEBU[2020-07-07T11:29:56.763676154-04:00] (*service).Write started                      expected="sha256:e139bc3167f9b1c39e29f2dd8c9057de0d5bd47b940ef86ee853fe8011f08646" ref="layer-sha256:e139bc3167f9b1c39e29f2dd8c9057de0d5bd47b940ef86ee853fe8011f08646" total=635791
DEBU[2020-07-07T11:29:56.763799755-04:00] (*service).Write started                      expected="sha256:0522d30cde10ac29ae2c555b9bde76c2b50aafc7ef7435bbc7e19de706bcadcd" ref="layer-sha256:0522d30cde10ac29ae2c555b9bde76c2b50aafc7ef7435bbc7e19de706bcadcd" total=230
DEBU[2020-07-07T11:29:56.763802026-04:00] (*service).Write started                      expected="sha256:2cc5ad85d9abaadf23d5ae53c3f32e7ccb2df1956869980bfd2491ff396d348a" ref="layer-sha256:2cc5ad85d9abaadf23d5ae53c3f32e7ccb2df1956869980bfd2491ff396d348a" total=301261
DEBU[2020-07-07T11:29:56.763901795-04:00] (*service).Write started                      expected="sha256:2f23d35ae1d9c3bdc77186ea9a659f88b1f89f12e4c04700903859f0d1ac58a2" ref="config-sha256:2f23d35ae1d9c3bdc77186ea9a659f88b1f89f12e4c04700903859f0d1ac58a2" total=8309
DEBU[2020-07-07T11:29:56.763964373-04:00] (*service).Write started                      expected="sha256:c9b1b535fdd91a9855fb7f82348177e5f019329a58c53c47272962dd60f71fc9" ref="layer-sha256:c9b1b535fdd91a9855fb7f82348177e5f019329a58c53c47272962dd60f71fc9" total=2802957
DEBU[2020-07-07T11:29:56.763968082-04:00] (*service).Write started                      expected="sha256:407da27a03363f6b9d368ec6e131f7f2db7c8cb2a149160d913d6f3698905a5d" ref="layer-sha256:407da27a03363f6b9d368ec6e131f7f2db7c8cb2a149160d913d6f3698905a5d" total=1887472
DEBU[2020-07-07T11:29:58.653975902-04:00] stat snapshot                                 key="sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T11:29:58.654521345-04:00] prepare snapshot                              key="extract-654284684-ByGC sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10" parent=
DEBU[2020-07-07T11:29:58.661839200-04:00] prepare                                       key="firecracker-containerd/1/extract-654284684-ByGC sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10" parent=
DEBU[2020-07-07T11:29:58.662182472-04:00] create image                                  name="docker.io/ustiugov/helloworld:runner_workload" target="sha256:1e2f6f681cd8041fe8121b01e50addb02f2091e214d0d567aba838518ad0f122"
DEBU[2020-07-07T11:29:58.687025445-04:00] creating new thin device 'fc-dev-thinpool-snap-1' 
DEBU[2020-07-07T11:29:58.842292841-04:00] mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/fc-dev-thinpool-snap-1 
DEBU[2020-07-07T11:30:03.378111244-04:00] mkfs:
mke2fs 1.44.1 (24-Mar-2018)
Creating filesystem with 2621440 4k blocks and 655360 inodes
Filesystem UUID: 7814dbe2-c8ee-480b-9c2b-37715540ad08
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 
 
DEBU[2020-07-07T11:30:03.642723891-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:03.643453976-04:00] using pigz for decompression                 
DEBU[2020-07-07T11:30:03.666428601-04:00] event published                               ns=firecracker-containerd topic=/images/create type=containerd.services.images.v1.ImageCreate
DEBU[2020-07-07T11:30:04.160022608-04:00] diff applied                                  d=516.709178ms dgst="sha256:c9b1b535fdd91a9855fb7f82348177e5f019329a58c53c47272962dd60f71fc9" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=2802957
DEBU[2020-07-07T11:30:04.160536421-04:00] commit snapshot                               key="extract-654284684-ByGC sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10" name="sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T11:30:04.160661004-04:00] commit                                        key="firecracker-containerd/1/extract-654284684-ByGC sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10" name="firecracker-containerd/2/sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T11:30:04.317292044-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T11:30:04.326176197-04:00] stat snapshot                                 key="sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T11:30:04.326559599-04:00] prepare snapshot                              key="extract-326387733-ZZnr sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5" parent="sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T11:30:04.326658681-04:00] prepare                                       key="firecracker-containerd/3/extract-326387733-ZZnr sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5" parent="firecracker-containerd/2/sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10"
DEBU[2020-07-07T11:30:04.326754792-04:00] creating snapshot device 'fc-dev-thinpool-snap-2' from 'fc-dev-thinpool-snap-1' 
DEBU[2020-07-07T11:30:04.683802270-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:05.055031286-04:00] diff applied                                  d=370.776932ms dgst="sha256:2cc5ad85d9abaadf23d5ae53c3f32e7ccb2df1956869980bfd2491ff396d348a" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=301261
DEBU[2020-07-07T11:30:05.055484822-04:00] commit snapshot                               key="extract-326387733-ZZnr sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5" name="sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T11:30:05.055633612-04:00] commit                                        key="firecracker-containerd/3/extract-326387733-ZZnr sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5" name="firecracker-containerd/4/sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T11:30:05.241892894-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T11:30:05.267403910-04:00] stat snapshot                                 key="sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T11:30:05.267899513-04:00] prepare snapshot                              key="extract-267674110-nRik sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa" parent="sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T11:30:05.268011066-04:00] prepare                                       key="firecracker-containerd/5/extract-267674110-nRik sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa" parent="firecracker-containerd/4/sha256:5edfa66f961ecdadabac1b15441c567a06631fd4cb8a197a2f0399644a3c18d5"
DEBU[2020-07-07T11:30:05.268096339-04:00] creating snapshot device 'fc-dev-thinpool-snap-3' from 'fc-dev-thinpool-snap-2' 
DEBU[2020-07-07T11:30:05.616697477-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:09.407912390-04:00] diff applied                                  d=3.790778366s dgst="sha256:61614c1a5710c76af6b2a9c7170a81eb0dd76ccf90e921abc0c9dcc1d5ed490e" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=31506523
DEBU[2020-07-07T11:30:09.408545149-04:00] commit snapshot                               key="extract-267674110-nRik sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa" name="sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T11:30:09.408715003-04:00] commit                                        key="firecracker-containerd/5/extract-267674110-nRik sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa" name="firecracker-containerd/6/sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T11:30:09.539705691-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T11:30:09.565271099-04:00] stat snapshot                                 key="sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T11:30:09.566202727-04:00] prepare snapshot                              key="extract-565497451-R4QE sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f" parent="sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T11:30:09.566348443-04:00] prepare                                       key="firecracker-containerd/7/extract-565497451-R4QE sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f" parent="firecracker-containerd/6/sha256:483d3bedbf62615f629bcfd167c0f2b45df7afe3ecb6b3f7ffa7ebe2dab70faa"
DEBU[2020-07-07T11:30:09.566452581-04:00] creating snapshot device 'fc-dev-thinpool-snap-4' from 'fc-dev-thinpool-snap-3' 
DEBU[2020-07-07T11:30:09.922836123-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:10.193088333-04:00] diff applied                                  d=269.732655ms dgst="sha256:0522d30cde10ac29ae2c555b9bde76c2b50aafc7ef7435bbc7e19de706bcadcd" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=230
DEBU[2020-07-07T11:30:10.193569215-04:00] commit snapshot                               key="extract-565497451-R4QE sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f" name="sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T11:30:10.193746173-04:00] commit                                        key="firecracker-containerd/7/extract-565497451-R4QE sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f" name="firecracker-containerd/8/sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T11:30:10.331017130-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T11:30:10.373088931-04:00] stat snapshot                                 key="sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T11:30:10.373705913-04:00] prepare snapshot                              key="extract-373443705-XYfz sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270" parent="sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T11:30:10.373833647-04:00] prepare                                       key="firecracker-containerd/9/extract-373443705-XYfz sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270" parent="firecracker-containerd/8/sha256:3adc9ed0c3e4c8e4fb839cc55b92b0fefb26459b3d03edcbe139e0bd097fca0f"
DEBU[2020-07-07T11:30:10.373928545-04:00] creating snapshot device 'fc-dev-thinpool-snap-5' from 'fc-dev-thinpool-snap-4' 
DEBU[2020-07-07T11:30:10.755785287-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:11.396190109-04:00] diff applied                                  d=639.983359ms dgst="sha256:407da27a03363f6b9d368ec6e131f7f2db7c8cb2a149160d913d6f3698905a5d" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=1887472
DEBU[2020-07-07T11:30:11.396668609-04:00] commit snapshot                               key="extract-373443705-XYfz sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270" name="sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T11:30:11.396832061-04:00] commit                                        key="firecracker-containerd/9/extract-373443705-XYfz sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270" name="firecracker-containerd/10/sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T11:30:11.547041562-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T11:30:11.589358401-04:00] stat snapshot                                 key="sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T11:30:11.589926387-04:00] prepare snapshot                              key="extract-589665226-xnzy sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7" parent="sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T11:30:11.590059931-04:00] prepare                                       key="firecracker-containerd/11/extract-589665226-xnzy sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7" parent="firecracker-containerd/10/sha256:7893763263404b202f7a20649fdab037cf7200d38d5a5e6fc35ced2c072df270"
DEBU[2020-07-07T11:30:11.590163931-04:00] creating snapshot device 'fc-dev-thinpool-snap-6' from 'fc-dev-thinpool-snap-5' 
DEBU[2020-07-07T11:30:11.913461642-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:12.173537536-04:00] diff applied                                  d=259.609533ms dgst="sha256:f2f5e0de6a9cf285883340ceabbf44b70c97a796d122f4238076609d95e0f48b" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=2574
DEBU[2020-07-07T11:30:12.174032555-04:00] commit snapshot                               key="extract-589665226-xnzy sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7" name="sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T11:30:12.174203893-04:00] commit                                        key="firecracker-containerd/11/extract-589665226-xnzy sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7" name="firecracker-containerd/12/sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T11:30:12.321699597-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T11:30:12.355519354-04:00] stat snapshot                                 key="sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T11:30:12.356048700-04:00] prepare snapshot                              key="extract-355799304-J0bp sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446" parent="sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T11:30:12.356192965-04:00] prepare                                       key="firecracker-containerd/13/extract-355799304-J0bp sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446" parent="firecracker-containerd/12/sha256:79d2e13b02b5363c5d4b7fc7599b0148deb47d6f27f43c0f0fa62dc7688dcdc7"
DEBU[2020-07-07T11:30:12.356323521-04:00] creating snapshot device 'fc-dev-thinpool-snap-7' from 'fc-dev-thinpool-snap-6' 
DEBU[2020-07-07T11:30:12.804739224-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:15.135695402-04:00] diff applied                                  d=2.330499554s dgst="sha256:c5b841e7f99ab6c98c254ad0cfa12c4d64193446f00eebd727990d0a74687d26" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=20332232
DEBU[2020-07-07T11:30:15.136336460-04:00] commit snapshot                               key="extract-355799304-J0bp sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446" name="sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T11:30:15.136507064-04:00] commit                                        key="firecracker-containerd/13/extract-355799304-J0bp sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446" name="firecracker-containerd/14/sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T11:30:15.303492605-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T11:30:15.337295890-04:00] stat snapshot                                 key="sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T11:30:15.337870050-04:00] prepare snapshot                              key="extract-337605258-la9a sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" parent="sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T11:30:15.338025494-04:00] prepare                                       key="firecracker-containerd/15/extract-337605258-la9a sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" parent="firecracker-containerd/14/sha256:fa33055d6821328fc6bf1df6182c2ef1b4aa0339fec2dca30de3c47945034446"
DEBU[2020-07-07T11:30:15.338182599-04:00] creating snapshot device 'fc-dev-thinpool-snap-8' from 'fc-dev-thinpool-snap-7' 
DEBU[2020-07-07T11:30:15.836571836-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:16.213150511-04:00] diff applied                                  d=376.07993ms dgst="sha256:e139bc3167f9b1c39e29f2dd8c9057de0d5bd47b940ef86ee853fe8011f08646" media=application/vnd.docker.image.rootfs.diff.tar.gzip size=635791
DEBU[2020-07-07T11:30:16.213652908-04:00] commit snapshot                               key="extract-337605258-la9a sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" name="sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T11:30:16.213797196-04:00] commit                                        key="firecracker-containerd/15/extract-337605258-la9a sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc" name="firecracker-containerd/16/sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T11:30:16.361306124-04:00] event published                               ns=firecracker-containerd topic=/snapshot/commit type=containerd.events.SnapshotCommit
DEBU[2020-07-07T11:30:16.479082907-04:00] create VM request: VMID:"505" MachineCfg:<MemSizeMib:512 VcpuCount:1 > KernelArgs:"ro noapic reboot=k panic=1 pci=off nomodules systemd.log_color=false systemd.unit=firecracker.target init=/sbin/overlay-init tsc=reliable quiet 8250.nr_uarts=0 ipv6.disable=1" NetworkInterfaces:<StaticConfig:<MacAddress:"02:FC:00:00:00:00" HostDevName:"fc-0-tap0" IPConfig:<PrimaryAddr:"190.128.0.2/10" GatewayAddr:"190.128.0.1" > > >  
DEBU[2020-07-07T11:30:16.479130240-04:00] using namespace: firecracker-containerd      
DEBU[2020-07-07T11:30:16.479476557-04:00] starting containerd-shim-aws-firecracker      vmID=505
INFO[2020-07-07T11:30:16.516286994-04:00] starting signal loop                          namespace=firecracker-containerd path=/var/lib/firecracker-containerd/shim-base/firecracker-containerd/505 pid=15002
INFO[2020-07-07T11:30:16.517886540-04:00] creating new VM                               runtime=aws.firecracker vmID=505
ERRO[2020-07-07T11:30:16.518374523-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="Called startVMM(), setting up a VMM on firecracker.sock"  shim_stream=stderr vmID=505
setupLogging
setupMetrics
ERRO[2020-07-07T11:30:16.530670287-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="VMM metrics disabled."  shim_stream=stderr vmID=505
ERRO[2020-07-07T11:30:16.531432917-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="refreshMachineConfiguration: [GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate:T2 HtEnabled:0xc000982a83 MemSizeMib:0xc000982a78 VcpuCount:0xc000982a70}"  shim_stream=stderr vmID=505
ERRO[2020-07-07T11:30:16.531842434-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="PutGuestBootSource: [PUT /boot-source][204] putGuestBootSourceNoContent "  shim_stream=stderr vmID=505
ERRO[2020-07-07T11:30:16.531883406-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="Attaching drive /var/lib/firecracker-containerd/runtime/default-rootfs.img, slot root_drive, root true."  shim_stream=stderr vmID=505
ERRO[2020-07-07T11:30:16.532442106-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="Attached drive /var/lib/firecracker-containerd/runtime/default-rootfs.img: [PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "  shim_stream=stderr vmID=505
ERRO[2020-07-07T11:30:16.532474712-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="Attaching drive /var/lib/firecracker-containerd/shim-base/firecracker-containerd/505/ctrstub0, slot MN2HE43UOVRDA, root false."  shim_stream=stderr vmID=505
ERRO[2020-07-07T11:30:16.532840719-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="Attached drive /var/lib/firecracker-containerd/shim-base/firecracker-containerd/505/ctrstub0: [PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "  shim_stream=stderr vmID=505
ERRO[2020-07-07T11:30:16.532874303-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="Attaching NIC fc-0-tap0 (hwaddr 02:FC:00:00:00:00) at index 1"  shim_stream=stderr vmID=505
INFO[2020-07-07T11:30:16.550453081-04:00] calling agent                                 runtime=aws.firecracker vmID=505
ERRO[2020-07-07T11:30:16.550487057-04:00] time="2020-07-07T11:30:16-04:00" level=info msg="startInstance successful: [PUT /actions][204] createSyncActionNoContent "  shim_stream=stderr vmID=505
WARN[2020-07-07T11:30:16.650614577-04:00] Called try connect                            attempt=1 runtime=aws.firecracker vmID=505
WARN[2020-07-07T11:30:16.650736189-04:00] dialTimeout done                              attempt=1 runtime=aws.firecracker vmID=505
WARN[2020-07-07T11:30:16.650761712-04:00] Calling tryConnWrite                          attempt=1 runtime=aws.firecracker vmID=505
WARN[2020-07-07T11:30:16.650793773-04:00] Calling tryConnReadUntil                      attempt=1 runtime=aws.firecracker vmID=505
WARN[2020-07-07T11:30:16.650989150-04:00] I read::                                      attempt=1 runtime=aws.firecracker vmID=505
WARN[2020-07-07T11:30:16.651033854-04:00] error is:EOF:                                 attempt=1 runtime=aws.firecracker vmID=505
DEBU[2020-07-07T11:30:16.711032168-04:00] garbage collected                             d=19.630179ms
WARN[2020-07-07T11:30:16.750663558-04:00] Called try connect                            attempt=2 runtime=aws.firecracker vmID=505
INFO[2020-07-07T11:30:17.653370987-04:00] successfully started the VM                   runtime=aws.firecracker vmID=505
DEBU[2020-07-07T11:30:17.653919344-04:00] event forwarded                               ns=firecracker-containerd topic=/firecracker-vm/start type=VMStart
DEBU[2020-07-07T11:30:17.689715420-04:00] prepare snapshot                              key=505 parent="sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T11:30:17.689999861-04:00] prepare                                       key=firecracker-containerd/17/505 parent="firecracker-containerd/16/sha256:0065b13862ce3af1bf002db326cc2c0a1245165cf6ae7f22abaceae55e09f6fc"
DEBU[2020-07-07T11:30:17.690685608-04:00] creating snapshot device 'fc-dev-thinpool-snap-9' from 'fc-dev-thinpool-snap-8' 
DEBU[2020-07-07T11:30:18.093806161-04:00] event published                               ns=firecracker-containerd topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-07-07T11:30:18.096868976-04:00] get snapshot mounts                           key=505
DEBU[2020-07-07T11:30:18.096931972-04:00] mounts                                        key=firecracker-containerd/17/505
DEBU[2020-07-07T11:30:18.418603534-04:00] event published                               ns=firecracker-containerd topic=/containers/create type=containerd.events.ContainerCreate
DEBU[2020-07-07T11:30:18.444648642-04:00] get snapshot mounts                           key=505
DEBU[2020-07-07T11:30:18.444726202-04:00] mounts                                        key=firecracker-containerd/17/505
DEBU[2020-07-07T11:30:18.476852963-04:00] garbage collected                             d=9.240115ms
time="2020-07-07T11:30:18.482051161-04:00" level=debug msg=StartShim runtime=aws.firecracker task_id=505
DEBU[2020-07-07T11:30:18.483584090-04:00] create VM request: VMID:"505"                
DEBU[2020-07-07T11:30:18.483619001-04:00] using namespace: firecracker-containerd      
time="2020-07-07T11:30:18.483982412-04:00" level=info msg="successfully started shim (git commit: d69b31a5db41e8592d8256cf3c30b22ffbd1ffe7)." runtime=aws.firecracker task_id=505 vmID=505
ERRO[2020-07-07T11:30:18.486111725-04:00] time="2020-07-07T11:30:18-04:00" level=info msg="PatchGuestDrive successful"  shim_stream=stderr vmID=505
INFO[2020-07-07T11:30:18.634624431-04:00] successfully created task                     ExecID= TaskID=505 pid_in_vm=726 runtime=aws.firecracker vmID=505
DEBU[2020-07-07T11:30:18.634730521-04:00] event forwarded                               ns=firecracker-containerd topic=/tasks/create type=containerd.events.TaskCreate
DEBU[2020-07-07T11:30:18.644032506-04:00] event forwarded                               ns=firecracker-containerd topic=/tasks/start type=containerd.events.TaskStart
DEBU[2020-07-07T11:30:19.421406660-04:00] pause VM request: VMID:"505"                 
DEBU[2020-07-07T11:30:19.423432278-04:00] make snapshot request: VMID:"505" SnapshotFilePath:"/tmp/snapshot_file" MemFilePath:"/tmp/mem_file"  
DEBU[2020-07-07T11:30:20.406627004-04:00] offload original request: VMID:"505"         
WARN[2020-07-07T11:30:20.407014286-04:00] PID OF FIRECRACKER 15017                      runtime=aws.firecracker vmID=505
ERRO[2020-07-07T11:30:20.417657418-04:00] error waiting for exit                        ExecID= TaskID=505 error="ttrpc: closed" runtime=aws.firecracker vmID=505
ERRO[2020-07-07T11:30:20.417668040-04:00] error while forwarding events from VM agent   error="ttrpc: closed" runtime=aws.firecracker vmID=505
ERRO[2020-07-07T11:30:20.417934020-04:00] error copying io                              ExecID= TaskID=505 error="reading from a closed fifo" runtime=aws.firecracker stream=stdin vmID=505
ERRO[2020-07-07T11:30:20.418011748-04:00] error closing io stream                       ExecID= TaskID=505 error="1 error occurred:\n\t* close unix @->firecracker.vsock: use of closed network connection\n\n" runtime=aws.firecracker stream=stdin vmID=505
ERRO[2020-07-07T11:30:20.450606680-04:00] time="2020-07-07T11:30:20-04:00" level=warning msg="firecracker exited: signal: killed"  shim_stream=stderr vmID=505
DEBU[2020-07-07T11:30:25.407893589-04:00] load snapshot request: VMID:"505" SnapshotFilePath:"/tmp/snapshot_file" MemFilePath:"/tmp/mem_file"  
ERRO[2020-07-07T11:30:25.417822209-04:00] error closing io stream                       ExecID= TaskID=505 error="1 error occurred:\n\t* close unix @->firecracker.vsock: use of closed network connection\n\n" runtime=aws.firecracker stream=stdout vmID=505
ERRO[2020-07-07T11:30:25.417851319-04:00] error closing io stream                       ExecID= TaskID=505 error="1 error occurred:\n\t* close unix @->firecracker.vsock: use of closed network connection\n\n" runtime=aws.firecracker stream=stderr vmID=505
DEBU[2020-07-07T11:30:26.423531443-04:00] resume VM request: VMID:"505"                
DEBU[2020-07-07T11:30:26.424707092-04:00] stop VM: VMID:"505"                          
INFO[2020-07-07T11:30:26.425037465-04:00] stopping the VM                               runtime=aws.firecracker vmID=505
DEBU[2020-07-07T11:30:26.425613853-04:00] event forwarded                               ns=firecracker-containerd topic=/firecracker-vm/stop type=VMStop
ERRO[2020-07-07T11:30:31.425788519-04:00] aws.firecracker: publisher not closed         shim_stream=stderr vmID=505
DEBU[2020-07-07T11:30:31.427285338-04:00] shim has been terminated                      error="exit status 1" vmID=505
INFO[2020-07-07T11:30:31.427303360-04:00] shim disconnected                             id=505
WARN[2020-07-07T11:30:31.427499953-04:00] cleaning up after shim disconnected           id=505 namespace=firecracker-containerd
INFO[2020-07-07T11:30:31.427519862-04:00] cleaning up dead shim                        
WARN[2020-07-07T11:30:36.465651976-04:00] failed to clean up after shim disconnected    error="aws.firecracker: rpc error: code = DeadlineExceeded desc = timed out waiting for VM start\n: exit status 1" id=505 namespace=firecracker-containerd
DEBU[2020-07-07T11:30:36.465762633-04:00] event published                               ns=firecracker-containerd topic=/tasks/exit type=containerd.events.TaskExit
DEBU[2020-07-07T11:30:36.465811781-04:00] event published                               ns=firecracker-containerd topic=/tasks/delete type=containerd.events.TaskDelete

I believe that to fix this issue, we need to figure out what needs to be upgraded in firecracker-containerd to support this functionality without errors.

Data race in stats

Race condition detected when running TestParallelLoadServe. This is due to the atomic variable in stats.

sudo env "PATH=/home/ustiugov/go/bin:/home/ustiugov/.cargo/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/ustiugov/go/bin" go test -v -race -run TestParallelLoadServe
INFO[2020-07-14T23:55:58.724754903+02:00] Registering bridges for tap manager          
INFO[2020-07-14T23:55:58.727326673+02:00] Creating containerd client                   
INFO[2020-07-14T23:55:58.729607704+02:00] Created containerd client                    
INFO[2020-07-14T23:55:58.729722728+02:00] Creating firecracker client                  
INFO[2020-07-14T23:55:58.730104670+02:00] Created firecracker client                   
=== RUN   TestParallelLoadServe
INFO[2020-07-14T23:55:58.731512476+02:00] New function added                            fID=plr_fnc image="ustiugov/helloworld:runner_workload" isPinned=true servedTh=0
INFO[2020-07-14T23:56:09.913310003+02:00] New function added                            fID=0 image="ustiugov/helloworld:runner_workload" isPinned=true servedTh=0
INFO[2020-07-14T23:56:09.913776141+02:00] New function added                            fID=1 image="ustiugov/helloworld:runner_workload" isPinned=true servedTh=0
==================
WARNING: DATA RACE
Write at 0x00c0004467b0 by goroutine 155:
  runtime.mapassign_faststr()
      /usr/local/go/src/runtime/map_faststr.go:202 +0x0
  github.com/ustiugov/fccd-orchestrator.(*Stats).CreateStats()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/stats.go:58 +0x917
  github.com/ustiugov/fccd-orchestrator.(*FuncPool).getFunction()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/functions.go:102 +0x7d1
  github.com/ustiugov/fccd-orchestrator.(*FuncPool).Serve()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/functions.go:112 +0x76
  github.com/ustiugov/fccd-orchestrator.TestParallelLoadServe.func1()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/manual_cleanup_test.go:59 +0x307

Previous read at 0x00c0004467b0 by goroutine 137:
  runtime.mapaccess1_faststr()
      /usr/local/go/src/runtime/map_faststr.go:12 +0x0
  github.com/ustiugov/fccd-orchestrator.(*Stats).IncServed()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/stats.go:70 +0x289
  github.com/ustiugov/fccd-orchestrator.(*Function).Serve()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/functions.go:218 +0x1e9
  github.com/ustiugov/fccd-orchestrator.(*FuncPool).Serve()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/functions.go:114 +0xec
  github.com/ustiugov/fccd-orchestrator.TestParallelLoadServe.func1()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/manual_cleanup_test.go:59 +0x307

Goroutine 155 (running) created at:
  github.com/ustiugov/fccd-orchestrator.TestParallelLoadServe()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/manual_cleanup_test.go:52 +0x2d2
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:991 +0x1eb

Goroutine 137 (running) created at:
  github.com/ustiugov/fccd-orchestrator.TestParallelLoadServe()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/manual_cleanup_test.go:52 +0x2d2
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:991 +0x1eb
==================

Data race when reading uPF file descriptor and closing it

Test TestSnapLoadMultiple in branch vm_memory_manager.

Read happens here

Write happens here

Temporary workaround: don't close the descriptor.

Full log:

=== RUN   TestSnapLoadMultiple
time="2020-08-06T22:47:54.086782086+02:00" level=info msg="Registering bridges for tap manager"
time="2020-08-06T22:47:54.090822971+02:00" level=info msg="Creating containerd client"
time="2020-08-06T22:47:54.093977770+02:00" level=info msg="Created containerd client"
time="2020-08-06T22:47:54.094263758+02:00" level=info msg="Creating firecracker client"
time="2020-08-06T22:47:54.094842196+02:00" level=info msg="Created firecracker client"
==================
WARNING: DATA RACE
Write at 0x00c00101cdf0 by goroutine 71:
  internal/poll.(*FD).destroy()
      /usr/local/go/src/internal/poll/fd_unix.go:79 +0xc3
  internal/poll.(*FD).decref()
      /usr/local/go/src/internal/poll/fd_mutex.go:213 +0x53
  internal/poll.(*FD).Close()
      /usr/local/go/src/internal/poll/fd_unix.go:100 +0x75
  os.(*file).close()
      /usr/local/go/src/os/file_unix.go:248 +0x67
  os.(*File).Close()
      /usr/local/go/src/os/file_unix.go:237 +0x3c4
  github.com/ustiugov/fccd-orchestrator/memory/manager.(*MemoryManager).Deactivate()
      /home/ustiugov/go/pkg/mod/github.com/ustiugov/fccd-orchestrator/memory/[email protected]/manager.go:194 +0x37a
  command-line-arguments.(*Orchestrator).Offload()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/ctriface/iface.go:643 +0x621
  command-line-arguments.TestSnapLoadMultiple()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/ctriface/manual_cleanup_test.go:106 +0xa2a
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:991 +0x1eb

Previous read at 0x00c00101cdf0 by goroutine 134:
  os.(*File).Fd()
      /usr/local/go/src/os/file_unix.go:79 +0x89a
  github.com/ustiugov/fccd-orchestrator/memory/manager.(*SnapshotState).pollUserPageFaults()
      /home/ustiugov/go/pkg/mod/github.com/ustiugov/fccd-orchestrator/memory/[email protected]/snapshot_state.go:165 +0x291

Goroutine 71 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1042 +0x660
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1284 +0xa6
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:991 +0x1eb
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1282 +0x527
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1199 +0x2ff
  main.main()
      _testmain.go:102 +0x337

Goroutine 134 (running) created at:
  github.com/ustiugov/fccd-orchestrator/memory/manager.(*MemoryManager).Activate()
      /home/ustiugov/go/pkg/mod/github.com/ustiugov/fccd-orchestrator/memory/[email protected]/manager.go:150 +0x76e
  command-line-arguments.(*Orchestrator).LoadSnapshot()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/ctriface/iface.go:615 +0x947
  command-line-arguments.TestSnapLoadMultiple()
      /home/ustiugov/staged-data-tiering/fccd-orchestrator/ctriface/manual_cleanup_test.go:100 +0x7f2
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:991 +0x1eb

[Knative/k8s] Istio setup problem

Istio fails to set up in a 3-node cluster on CloudLab (2-node setup works).

An exempt log from setup_multinode_cluster.sh:

✔ Istio core installed
✔ Istiod installed
✔ Addons installed
✘ Ingress gateways encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the conditionressgateway

The root cause for this seems to be in calico setup (stock containerd log):

INFO[2021-01-15T17:59:52.278714112+01:00] PullImage "docker.io/calico/node:v3.17.1"                                                                                                                                                                                                                                            INFO[2021-01-15T17:59:52.509648749+01:00] RunPodsandbox for &PodSandboxMetadata{Name:controller-8687cdc65-wrlqh,Uid:e1a00f51-5d17-493a-af9e-b2a1037ad18a,Namespace:metallb-system,Attempt:0,}                                                                                                                                  
INFO[2021-01-15T17:59:52.509832302+01:00] RunPodsandbox for &PodSandboxMetadata{Name:calico-kube-controllers-744cfdf676-vn497,Uid:391bd386-b332-4556-aa92-c254d4bf420e,Namespace:kube-system,Attempt:0,}                                                                                                                       
ERRO[2021-01-15T17:59:53.179123015+01:00] Failed to destroy network for sandbox "7d7ae611318198f65356cd65ae24a076311f12d967b2084d6dc929d0981976f4"  error="stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/"
ERRO[2021-01-15T17:59:53.209581619+01:00] RunPodSandbox for &PodSandboxMetadata{Name:controller-8687cdc65-wrlqh,Uid:e1a00f51-5d17-493a-af9e-b2a1037ad18a,Namespace:metallb-system,Attempt:0,} failed, error  error="failed to setup network for sandbox \"7d7ae611318198f65356cd65ae24a076311f12d967b2084d6dc929d0981976f4\": s
tat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/"                                                                                                                                                                                     
ERRO[2021-01-15T17:59:53.987155679+01:00] get state for 6a6cff6ce2e9496f204377e04fe8a76627c55d8598faeb4b3dae2efad04c8bb7  error="context deadline exceeded: unknown"    

This is a high-priority problem that needs further investigation.

Might be related issue.

Remove parallel tests from travis

The parallel tests are problematic in travis because of the limited resources in the travis VM. For example, starting 5 VMs in parallel is extremely slow and breaks the build often. We should leave the serial tests which verify only the functionality and leave out the parallel ones for testing on-premises with better hardware.

Fail test module when taps manager fails

make: Entering directory '/home/runner/work/vhive/vhive/taps'
# Need to pass GOROOT because GitHub-hosted runners may have several
# go versions installed so that calling go from root may fail
sudo env "PATH=/home/runner/go/bin:/opt/hostedtoolcache/go/1.15.8/x64/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/opt/pipx_bin:/usr/share/rust/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin" "GOROOT=/opt/hostedtoolcache/go/1.15.8/x64" go test ./ -v -race -cover
go: downloading golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd
go: downloading github.com/stretchr/testify v1.7.0
go: downloading github.com/pmezard/go-difflib v1.0.0
go: downloading gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c
=== RUN   TestCreateCleanBridges
time="2021-03-02T10:26:58.740621479Z" level=info msg="Registering bridges for tap manager"
time="2021-03-02T10:26:58.754767451Z" level=info msg="Removing bridges"
--- PASS: TestCreateCleanBridges (0.13s)
=== RUN   TestCreateRemoveTaps
time="2021-03-02T10:26:58.872944768Z" level=info msg="Registering bridges for tap manager"
time="2021-03-02T10:27:07.823428412Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:07.829389721Z" level=error msg="Could not create tap, because corresponding bridge does not exist" bridge=br0 tap=tap_162
time="2021-03-02T10:27:07.829855837Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_413
time="2021-03-02T10:27:07.829983942Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_415
time="2021-03-02T10:27:07.832220420Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:07.833430863Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:07.842105468Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_417
...
time="2021-03-02T10:27:07.844889266Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_450
time="2021-03-02T10:27:07.845206777Z" level=warning msg="Failed to configure NAT open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:07.881599255Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:07.884740666Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_455
time="2021-03-02T10:27:07.884869670Z" level=error msg="Master could not be set" bridge=br0 tap=tap_389
time="2021-03-02T10:27:07.886375823Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:07.890996585Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_457
...
time="2021-03-02T10:27:07.893791784Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_492
time="2021-03-02T10:27:07.874682612Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:07.891917518Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_471
time="2021-03-02T10:27:07.894566711Z" level=warning msg="Failed to fetch host net interfaces fork/exec /sbin/route: too many open files\n\n"
time="2021-03-02T10:27:07.893218664Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_484
time="2021-03-02T10:27:07.877036295Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:07.892216628Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_470
time="2021-03-02T10:27:07.892604942Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_479
time="2021-03-02T10:27:07.893899287Z" level=warning msg="Failed to fetch host net interfaces fork/exec /sbin/route: too many open files\n\n"
time="2021-03-02T10:27:07.913021559Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:07.998933678Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_548
time="2021-03-02T10:27:08.004650879Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.081958193Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_504
time="2021-03-02T10:27:08.083618552Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_503
time="2021-03-02T10:27:08.096506704Z" level=error msg="Could not create tap, because corresponding bridge does not exist" bridge=br0 tap=tap_314
time="2021-03-02T10:27:08.097268131Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_515
time="2021-03-02T10:27:08.097377135Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_511
time="2021-03-02T10:27:08.097687046Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.100031328Z" level=error msg="Could not create tap, because corresponding bridge does not exist" bridge=br0 tap=tap_214
time="2021-03-02T10:27:08.102349609Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_522
...
time="2021-03-02T10:27:08.108978842Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_549
time="2021-03-02T10:27:08.109248051Z" level=warning msg="Failed to configure NAT open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.109699867Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.110293988Z" level=warning msg="Failed to fetch host net interfaces fork/exec /sbin/route: too many open files\n\n"
time="2021-03-02T10:27:08.110785505Z" level=error msg="Tap could not be enabled" bridge=br0 tap=tap_412
time="2021-03-02T10:27:08.228577242Z" level=error msg="Could not create tap, because corresponding bridge does not exist" bridge=br0 tap=tap_326
time="2021-03-02T10:27:08.229491474Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.266271065Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_558
...
time="2021-03-02T10:27:08.297141249Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_676
time="2021-03-02T10:27:08.297439660Z" level=warning msg="Failed to configure NAT open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.332547893Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.335201886Z" level=error msg="Master could not be set" bridge=br0 tap=tap_500
time="2021-03-02T10:27:08.335369492Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_496
time="2021-03-02T10:27:08.335406593Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_501
time="2021-03-02T10:27:08.335811607Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_505
time="2021-03-02T10:27:08.335934612Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.337259358Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.340179261Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.349674294Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.355368794Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.355492299Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.355549801Z" level=error msg="Could not set MAC address" bridge=br0 tap=tap_374
time="2021-03-02T10:27:08.355642004Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.355797109Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.357486269Z" level=error msg="Master could not be set" bridge=br0 tap=tap_386
...
time="2021-03-02T10:27:08.359272831Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_545
time="2021-03-02T10:27:08.359592442Z" level=warning msg="Failed to configure NAT open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.359835451Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.359918954Z" level=error msg="Tap could not be enabled" bridge=br0 tap=tap_498
time="2021-03-02T10:27:08.369550792Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.373053015Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.375057486Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_680
...
time="2021-03-02T10:27:08.408398556Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_610
time="2021-03-02T10:27:08.409120182Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.409488495Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.411600469Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.412165789Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.474825589Z" level=warning msg="Failed to fetch host net interfaces fork/exec /sbin/route: too many open files\n\n"
time="2021-03-02T10:27:08.491117661Z" level=error msg="Tap could not be enabled" bridge=br0 tap=tap_393
time="2021-03-02T10:27:08.492045294Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.492999727Z" level=warning msg="Failed to fetch host net interfaces pipe2: too many open files\n\n"
time="2021-03-02T10:27:08.509867219Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_733
...
time="2021-03-02T10:27:08.523110185Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_667
time="2021-03-02T10:27:08.523458797Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.523646003Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.523684005Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_675
time="2021-03-02T10:27:08.524067418Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.524297726Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.524483233Z" level=warning msg="Failed to fetch host net interfaces open /dev/null: too many open files\n\n"
time="2021-03-02T10:27:08.524791744Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_614
...
time="2021-03-02T10:27:08.553545353Z" level=error msg="Tap could not be created" bridge=br0 tap=tap_656
time="2021-03-02T10:27:29.191265213Z" level=warning msg="Could not find tap" tap=tap_974
...
time="2021-03-02T10:27:29.363167371Z" level=warning msg="Could not find tap" tap=tap_1051
time="2021-03-02T10:27:31.886977822Z" level=info msg="Removing bridges"
--- PASS: TestCreateRemoveTaps (33.62s)
=== RUN   TestCreateRemoveExtra
    taps_test.go:86: Test disabled due to execution failure in GitHub Actions and it doesn't seem essential for the test coverage
--- SKIP: TestCreateRemoveExtra (0.00s)
PASS
coverage: 81.0% of statements

Fail to load VM from Snapshot upon Knative invocation

Occasionally VMs fail to be loaded from a snapshot as part of a Knative invocation.

Steps to reproduce:

  1. Deploy a function
  2. Wait for function to be scaled down to 0. During this process the associated VM gets Offloaded.
  3. Invoke function

From the Knative side, the error is not observed because Knative automatically spawns/loads another VM if there is a failure.
The error can be observed from the firecracker-containerd log on the worker node on which the function instance is scheduled.
Looking at the log of the firecracker VM, the following error can be see:

2020-11-24T11:04:53.872609488 [anonymous-instance:ERROR:src/api_server/src/lib.rs:95] Error creating the HTTP server: IO error: Address in use (os error 98)

It is not clear what the cause of this error is. Empirically, it seems to occur about 20% of the time. The function invocations are successful despite this error, but it is likely that the error results in some leaking resources.

CI: need to re-install all the binaries upon each test

Non-CRI integration tests may pass even if the binaries don't work because the binaries are installed once upon the setup of a self-hosted runner. As a result, CI may not detect a problem in updated binaries.

This fix needs to be prioritized.

Ctriface should not fail when process is not found upon Offload

This firecracker-containerd error can be safely ignored. This fix would improve CI stability.

Logs:

=== RUN   TestSnapLoadMultiple
time="2021-01-23T21:33:35.473014189Z" level=info msg="Registering bridges for tap manager"
time="2021-01-23T21:33:35.476245128Z" level=info msg="Creating containerd client"
time="2021-01-23T21:33:35.478922989Z" level=info msg="Created containerd client"
time="2021-01-23T21:33:35.479096672Z" level=info msg="Creating firecracker client"
time="2021-01-23T21:33:35.479584233Z" level=info msg="Created firecracker client"
time="2021-01-23T21:33:46.300146125Z" level=error msg="failed to offload the VM" error="rpc error: code = Unknown desc = no such process" vmID=3
    manual_cleanup_test.go:133: 
        	Error Trace:	manual_cleanup_test.go:133
        	Error:      	Received unexpected error:
        	            	rpc error: code = Unknown desc = no such process
        	Test:       	TestSnapLoadMultiple
        	Messages:   	Failed to offload VM
--- FAIL: TestSnapLoadMultiple (10.83s)

Snapshot create/load sporadically fail under load

So far I reproduced two problems:

  • Snapshot creation timeout:
WARN[2020-08-06T15:30:22.055814874+02:00] failed to create snapshot of the VM: rpc error: code = Unknown desc = Put "http+unix://firecracker/snapshot/create": net/http: timeout awaiting response headers  vmID=17

Solution: increase timeout to 1 minute

  • Snapshot load failure
failed to load snapshot of the VM: rpc error: code = Internal desc = ttrpc: error marshaling payload: proto: Marshal called with nil  vmID=118

fc log:

ERROR:src/api_server/src/parsed_request.rs:101] Received Error. Status code: 400 Bad Request. Message: Load microVM snapshot error: Cannot build a microVM from snapshot: Cannot restore microvm state. Error: Cannot restore devices. Error: Net(CreateNet(TapOpen(IoctlError(Os { code: 16, kind: Other, message: "Resource busy" }))))

Solution: delete and create a tap as soon as containerd's Offload completes. Revisit if 300ms Sleep workaround is still needed.

To reproduce, run TestParallelSnapLoad with numParallel=200

CRI testing in travis: istio installation fails in travis

✔ Istio core installed                                                          
✔ Istiod installed                                                              
✔ Ingress gateways installed                                                    
✘ Addons encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition
Deployment/istio-system/istiod
- Pruning removed resources                                                     Error: failed to apply manifests: errors occurred during operation

Related issue

Task.Delete returns error for some functions

Happens for the following functions:

ustiugov/lr_serving:var_workload
ustiugov/lr_training:var_workload
ustiugov/cnn_serving:var_workload
WARN[2020-07-06T17:56:22.448482892+02:00] failed to delete the task of the VM: task must be stopped before deletion: running: failed precondition  vmID=5_0
    TestAllFunctions: fccd-orchestrator_test.go:297: 
                Error Trace:    fccd-orchestrator_test.go:297
                                                        asm_amd64.s:1373
                Error:          Received unexpected error:
                                failed precondition
                                github.com/containerd/containerd/errdefs.init
                                        /home/ustiugov/go/pkg/mod/github.com/containerd/[email protected]/errdefs/errors.go:47
                                runtime.doInit
                                        /usr/local/go/src/runtime/proc.go:5420
                                runtime.doInit
                                        /usr/local/go/src/runtime/proc.go:5415
                                runtime.doInit
                                        /usr/local/go/src/runtime/proc.go:5415
                                runtime.doInit
                                        /usr/local/go/src/runtime/proc.go:5415
                                runtime.doInit
                                        /usr/local/go/src/runtime/proc.go:5415
                                runtime.doInit
                                        /usr/local/go/src/runtime/proc.go:5415
                                runtime.main
                                        /usr/local/go/src/runtime/proc.go:190
                                runtime.goexit
                                        /usr/local/go/src/runtime/asm_amd64.s:1373
                                task must be stopped before deletion: running
                                github.com/containerd/containerd.(*task).Delete
                                        /home/ustiugov/go/pkg/mod/github.com/containerd/[email protected]/task.go:306
                                github.com/ustiugov/fccd-orchestrator/ctriface.(*Orchestrator).StopSingleVM
                                        /home/ustiugov/go/pkg/mod/github.com/ustiugov/fccd-orchestrator/[email protected]/iface.go:362
                                github.com/ustiugov/fccd-orchestrator.(*Function).RemoveInstance
                                        /home/ustiugov/fccd-orchestrator/functions.go:348
                                github.com/ustiugov/fccd-orchestrator.(*FuncPool).RemoveInstance
                                        /home/ustiugov/fccd-orchestrator/functions.go:136
                                github.com/ustiugov/fccd-orchestrator.TestAllFunctions.func2
                                        /home/ustiugov/fccd-orchestrator/fccd-orchestrator_test.go:296
                                runtime.goexit
                                        /usr/local/go/src/runtime/asm_amd64.s:1373
                Test:           TestAllFunctions
                Messages:       Function returned error, Deleting task of VM 5_0 failed
--- FAIL: TestAllFunctions (120.79s)

To reproduce the error, uncomment these functions in the test's code and run: sudo `which go` test -v -race -run TestAllFunctions

Remove the stub container

The queue-proxy container interacts directly with the user workload Firecracker-Containerd container. However, we create a stub lightweight user container still so that CRI Operations such as ContainerStatus, ListContainers do not require additional support. Once firecracker-containerd supports there call, we can decommission the stub.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.