Giter Site home page Giter Site logo

ubicloud / ubicloud Goto Github PK

View Code? Open in Web Editor NEW
3.0K 30.0 81.0 2.65 MB

Open, free, and portable cloud. Elastic compute, block storage (non replicated), virtual networking, managed Postgres, and IAM services in public beta.

Home Page: https://ubicloud.com

License: GNU Affero General Public License v3.0

Ruby 85.22% HTML 14.03% CSS 0.01% JavaScript 0.63% Procfile 0.01% Dockerfile 0.11%
cloud portable bare-metal hosting ruby managed-cloud open postgresql linux github-actions

ubicloud's Introduction

Ubicloud CI Build Learn this repo using Greptile

Ubicloud is an open, free, and portable cloud. Think of it as an open alternative to cloud providers, like what Linux is to proprietary operating systems.

Ubicloud provides IaaS cloud features on bare metal providers, such as Hetzner, OVH, and AWS Bare Metal. You can set it up yourself on these providers or you can use our managed service. We're currently in public beta.

Quick start

Managed platform

You can use Ubicloud without installing anything. When you do this, we pass along the underlying provider's benefits to you, such as price or location.

https://console.ubicloud.com

Build your own cloud

You can also build your own cloud. To do this, start up Ubicloud's control plane and connect to its cloud console.

git clone [email protected]:ubicloud/ubicloud.git

# Generate secrets for demo
./demo/generate_env

# Run containers: db-migrator, app (web & respirate), postgresql
docker-compose -f demo/docker-compose.yml up

# Visit localhost:3000

The control plane is responsible for cloudifying bare metal Linux machines. The easiest way to build your own cloud is to lease instances from one of those providers. For example: https://www.hetzner.com/sb

Once you lease instance(s), run the following script for each instance to cloudify the instance. By default, the script cloudifies bare metal instances leased from Hetzner. After you cloudify your instances, you can provision and manage cloud resources on these machines.

# Enter hostname/IP and provider, and install SSH key as instructed by script
docker exec -it ubicloud-app ./demo/cloudify_server

Later when you create VMs, Ubicloud will assign them IPv6 addresses. If your ISP doesn't support IPv6, please use a VPN or tunnel broker such as Mullvad or Hurricane Electric's https://tunnelbroker.net/ to connect. Alternatively, you could lease IPv4 addresses from your provider and add them to your control plane.

Why use it

Public cloud providers like AWS, Azure, and Google Cloud have made life easier for start-ups and enterprises. But they are closed source, have you rent computers at a huge premium, and lock you in. Ubicloud offers an open alternative, reduces your costs, and returns control of your infrastructure back to you. All without sacrificing the cloud's convenience.

Today, AWS offers about two hundred cloud services. Ultimately, we will implement 10% of the cloud services that make up 80% of that consumption.

Example workloads and reasons to use Ubicloud today include:

  • You have an ephemeral workload like a CI/CD pipeline (we're integrating with GitHub Actions), or you'd like to run compute/memory heavy tests. Our managed cloud is ~3x cheaper than AWS, so you save on costs.

  • You want a portable and simple app deployment service like Kamal. We're moving Ubicloud's control plane from Heroku to Kamal; and we want to provide open and portable services for Kamal's dependencies in the process.

  • You have bare metal machines sitting somewhere. You'd like to build your own cloud for portability, security, or compliance reasons.

Status

Ubicloud is in public alpha. You can provide us your feedback, get help, or ask us to support your network environment in the Community Forum.

We follow an established architectural pattern in building public cloud services. A control plane manages a data plane, where the data plane leverages open source software. You can find our current cloud components / services below.

  • Elastic Compute: Our control plane communicates with Linux bare metal servers using SSH. We use Cloud Hypervisor as our virtual machine monitor (VMM); and each instance of the VMM is contained within Linux namespaces for further isolation / security.

  • Virtual Networking: We use IPsec tunneling to establish an encrypted and private network environment. We support IPv4 and IPv6 in a dual-stack setup and provide both public and private networking. For security, each customer’s VMs operate in their own networking namespace. Everything in virtual networking is layer 3 and up.

  • Block Storage, non replicated: We use Storage Performance Development Toolkit (SPDK) to provide virtualized block storage to VMs. SPDK enables us to add enterprise features such as snapshot and replication in the future. We follow security best practices and encrypt the data encryption key itself.

  • Attribute-Based Access Control (ABAC): With ABAC, you can define attributes, roles, and permissions for users and give them fine-grained access to resources. You can read more about our ABAC design here.

  • What's Next?: We're planning to work on the elastic load balancer or simple storage service next. If you have a workload that would benefit from a specific cloud service, please get in touch with us through our Community Forum.

  • Control plane: Manages data plane services and resources. This is a Ruby program that stores its data in Postgres. We use the Roda framework to serve HTTP requests and Sequel to access the database. We manage web authentication with Rodauth. We communicate with data plane servers using SSH, via the library net-ssh. For our tests, we use RSpec.

  • Cloud console: Server-side web app served by the Roda framework. For the visual design, we use Tailwind CSS with components from Tailwind UI. We also use jQuery for interactivity.

If you’d like to start hacking with Ubicloud, any method of obtaining Ruby and Postgres versions is acceptable. If you have no opinion on this, our development team uses asdf-vm as documented here in detail.

Greptile provides an AI/LLM that indexes Ubicloud's source code can answer questions about it.

FAQ

Do you have any experience with building this sort of thing?

Our founding team comes from Azure; and worked at Amazon and Heroku before that. We also have start-up experience. We were co-founders and founding team members at Citus Data, which got acquired by Microsoft.

How is this different than OpenStack?

We see three differences. First, Ubicloud is available as a managed service (vs boxed software). This way, you can get started in minutes rather than weeks. Since Ubicloud is designed for multi-tenancy, it comes with built-in features such as encryption at rest and in transit, virtual networking, secrets rotation, etc.

Second, we're initially targeting developers. This -we hope- will give us fast feedback cycles and enable us to have 6 key services in GA form in the next two years. OpenStack is still primarily used for 3 cloud services.

Last, we're designing for simplicity. With OpenStack, you pick between 10 hypervisors, 10 S3 implementations, and 5 block storage implementations. The software needs to work in a way where all of these implementations are compatible with each other. That leads to consultant-ware. We'll take a more opinionated approach with Ubicloud.

ubicloud's People

Contributors

bsatzger avatar byucesoy avatar croaky avatar dependabot[bot] avatar eltociear avatar enescakir avatar fdr avatar furkansahin avatar ozgune avatar pykello avatar saittalhanisanci avatar velioglu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ubicloud's Issues

Add a link to our terms of service during sign up

We recently published our terms of service (ToS) here: https://ubicloud.com/terms-of-service

We should make sure that the ToS render in a readable format across browsers. More importantly, we should add a link to the terms of service when a new user is signing up to the managed service.

This introduces another question. Our ToS is designed for managed service users. Should we display it for open source users as well (probably not)?

Destroy vm with any state

We check :destroy semaphore at only :wait label. So we can't delete vm if it's stuck at i.e. creating. We need to check :destroy sem globally.

Building docker images for different platforms

ARM based machines become popular lately. Thanks to Apple M1/M2 processors, linux/amd64 is not enough anymore. linux/arm64 docker image is a must for popular open source projects.

We are using docker/build-push-action action for building and pushing Docker Hub. Also it supports building for linux/arm64.

But GitHub Action runners are slow for building for linux/arm64 because they are not ARM based machines. Building pipeline duration's increased from ~3mins (only linux/amd64) to ~22mins (linux/amd64 and linux/arm64). It's okay for now.

We may consider different options to build our images.

API Reference Widget Not Redirecting in Dashboard

In the Dashboard view of Ubicloud, I noticed that the "API Reference" widget does not seem to function properly. When I click on it, it does not redirect to the intended page or provide any response.

Steps to reproduce:

Open Ubicloud and navigate to the Dashboard view.
Locate the "API Reference" widget.
Click on the widget.
Expected behavior:
After clicking the widget, I expect it to redirect to the corresponding API documentation page.

Actual behavior:
Clicking the widget does not result in any action or redirection.

Destroying vm stuck if vm deleted before vhost creation

---STDOUT---
request:
{
  "ctrlr": "vmyca6yj_0",
  "method": "vhost_delete_controller",
  "req_id": 1
}
Got JSON-RPC error response
response:
{
  "code": -32602,
  "message": "No such device"
}


---STDERR---
        from /home/rhizome/lib/vm_setup.rb:124:in `block in purge_storage'
        from /home/rhizome/lib/vm_setup.rb:118:in `each'
        from /home/rhizome/lib/vm_setup.rb:118:in `purge_storage'
        from /home/rhizome/lib/vm_setup.rb:106:in `purge'
        from bin/deletevm.rb:12:in `<main>'

Auto-refresh VM creation page

Currently, users need to refresh the page manually to see if VM creation is completed. We can make that page auto-refresh.

Changing email address doesn't seem to work

I went to "My Account" -> "Change Email", and entered a new email. Then I clicked on the verification link in the email that was sent. It displayed something like below. When I logged out and tried logging in using the new email address, it didn't work. It still expected the old email address.

image

nftables logging doesn't work in the network namespace

Since we don't have access to the global kernel log buffers inside the namespace, nftables logging doesn't work. There is a workaround by using nflog or ulogd to redirect the logs into a different location instead of the global kernel buffers.

ping doesn't work on AlmaLinux 9.1

I created an Ubuntu 22.04 and AlmaLinux 9.1 VM on the same bare metal instance using the console.

I then tried outgoing network connections. Ping on Ubuntu 22.04 works as expected.

ubi@vmqs6r34:~$ ping www.google.com
PING www.google.com(arn11s04-in-x04.1e100.net (2a00:1450:400f:80b::2004)) 56 data bytes
64 bytes from arn11s04-in-x04.1e100.net (2a00:1450:400f:80b::2004): icmp_seq=1 ttl=57 time=8.61 ms
64 bytes from arn11s04-in-x04.1e100.net (2a00:1450:400f:80b::2004): icmp_seq=2 ttl=57 time=8.65 ms
64 bytes from arn11s04-in-x04.1e100.net (2a00:1450:400f:80b::2004): icmp_seq=3 ttl=57 time=8.70 ms
64 bytes from arn11s04-in-x04.1e100.net (2a00:1450:400f:80b::2004): icmp_seq=4 ttl=57 time=8.64 ms
^C
--- www.google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 8.611/8.648/8.695/0.030 ms

Traceroute on this instance didn't resolve. I don't know if this is expected.

ozgun@Ozguns-MacBook-Air dev % traceroute www.google.com
traceroute to www.google.com (216.58.214.4), 64 hops max, 52 byte packets
 1  10.128.0.1 (10.128.0.1)  14.230 ms  10.461 ms  10.405 ms
 2  ams-eq6-cr1-v11.31173.se (185.65.134.65)  13.232 ms  11.865 ms  13.003 ms
 3  185.65.134.51 (185.65.134.51)  16.138 ms  11.448 ms  13.171 ms
 4  * * *
^C

Then, I tried pinging Google on AlmaLinux 9.1. When I first tried, it didn't work for about 6-7 seconds.

[ubi@vm1me2t5 ~]$ ping www.google.com
^C

I then waited for longer and realized that each ping takes about 7-8 seconds. For some reason though, ping thinks that it's taking 8ms. Also, ping doesn't report any packet loss.

[ubi@vm1me2t5 ~]$ ping www.google.com
PING www.google.com(arn09s19-in-x04.1e100.net (2a00:1450:400f:80c::2004)) 56 data bytes
64 bytes from arn09s19-in-x04.1e100.net (2a00:1450:400f:80c::2004): icmp_seq=1 ttl=116 time=8.06 ms
64 bytes from arn09s19-in-x04.1e100.net (2a00:1450:400f:80c::2004): icmp_seq=2 ttl=116 time=8.07 ms
64 bytes from arn09s19-in-x04.1e100.net (2a00:1450:400f:80c::2004): icmp_seq=3 ttl=116 time=8.07 ms
64 bytes from arn09s19-in-x04.1e100.net (2a00:1450:400f:80c::2004): icmp_seq=4 ttl=116 time=8.04 ms
^C64 bytes from 2a00:1450:400f:80c::2004: icmp_seq=5 ttl=116 time=8.06 ms

--- www.google.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 20083ms
rtt min/avg/max/mdev = 8.042/8.059/8.074/0.011 ms

Here's the associated traceroute. This path looks pretty different than the Ubuntu VM's.

[ubi@vm1me2t5 ~]$ traceroute www.google.com
traceroute to www.google.com (142.250.74.68), 30 hops max, 60 byte packets
 1  _gateway (192.168.148.128)  0.397 ms  0.357 ms  0.340 ms
 2  169.254.159.124 (169.254.159.124)  0.323 ms  0.306 ms  0.286 ms
 3  static.1.21.21.65.clients.your-server.de (65.21.21.1)  0.489 ms  0.472 ms  0.454 ms
 4  core32.hel1.hetzner.com (213.239.224.129)  0.436 ms  0.419 ms core31.hel1.hetzner.com (213.239.224.125)  0.397 ms
 5  core53.sto.hetzner.com (213.239.254.70)  6.865 ms core52.sto.hetzner.com (213.239.254.58)  6.847 ms core53.sto.hetzner.com (213.239.254.70)  6.830 ms
 6  core40.sto.hetzner.com (213.239.252.78)  19.283 ms core3.sto.hetzner.com (213.239.252.66)  6.883 ms  6.846 ms
 7  213-133-121-202.clients.your-server.de (213.133.121.202)  8.163 ms  8.146 ms 142.250.161.204 (142.250.161.204)  7.137 ms
 8  * * *
 9  142.251.48.40 (142.251.48.40)  7.202 ms  7.183 ms 142.250.239.184 (142.250.239.184)  7.163 ms
10  108.170.254.50 (108.170.254.50)  7.761 ms 108.170.254.34 (108.170.254.34)  7.704 ms arn09s23-in-f4.1e100.net (142.250.74.68)  7.107 ms

JSON-RPC timeouts

We had some failures with the following error. It succeeded on a retry:

/home/rhizome/host/lib/spdk_rpc.rb:111:in `read_response': The request timed out after 5 seconds. (RuntimeError)
	from /home/rhizome/host/lib/spdk_rpc.rb:88:in `call'
	from /home/rhizome/host/lib/spdk_rpc.rb:19:in `bdev_aio_create'
	from /home/rhizome/host/lib/storage_volume.rb:228:in `setup_spdk_bdev'
	from /home/rhizome/host/lib/storage_volume.rb:56:in `start'
	from /home/rhizome/host/lib/vm_setup.rb:443:in `block in storage'
	from /home/rhizome/host/lib/vm_setup.rb:438:in `map'
	from /home/rhizome/host/lib/vm_setup.rb:438:in `storage'
	from /home/rhizome/host/lib/vm_setup.rb:50:in `prep'
	from host/bin/prepvm.rb:90:in `<main>' 

Handle destroy correctly after storage provisioning failure

Currently purge_storage assumes that if VM storage root is there, then the spdk bdevs + storage files will also exist. So purge_storage will fail if that's not the case. purge_storage should still succeed if some of these doesn't exist.

SPDK memory map failed

SPDK mmap failed with following log trace, which prevented the VM from starting.

Oct 24 01:56:22 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) logging feature is disabled in async copy mode
Oct 24 01:56:22 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) vhost-user server: socket created, fd: 61                                                                                                           
Oct 24 01:56:22 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) logging feature is disabled in async copy mode
Oct 24 01:56:22 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) binding succeeded
Oct 24 01:56:22 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) vhost-user server: socket created, fd: 61
Oct 24 01:56:22 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) binding succeeded
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) new vhost user connection is 46
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) new device, handle is 6
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) new vhost user connection is 46
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_SET_OWNER
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_GET_FEATURES
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_GET_PROTOCOL_FEATURES
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_SET_PROTOCOL_FEATURES  
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) negotiated Vhost-user protocol features: 0x120b
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_GET_QUEUE_NUM
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_GET_CONFIG
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) new device, handle is 6
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_SET_OWNER
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_GET_FEATURES
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_GET_PROTOCOL_FEATURES
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_SET_PROTOCOL_FEATURES
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) negotiated Vhost-user protocol features: 0x120b
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_GET_QUEUE_NUM
Oct 24 01:56:23 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_GET_CONFIG
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_SET_FEATURES
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) negotiated Virtio features: 0x140000640
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_SET_FEATURES
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_SET_MEM_TABLE
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) mmap failed (Cannot allocate memory).
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) failed to mmap region 0
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) processing VHOST_USER_SET_MEM_TABLE failed.
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) vhost peer closed
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) negotiated Virtio features: 0x140000640
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) read message VHOST_USER_SET_MEM_TABLE
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) mmap failed (Cannot allocate memory).
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) failed to mmap region 0
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) processing VHOST_USER_SET_MEM_TABLE failed.
Oct 24 01:56:24 Ubuntu-2204-jammy-amd64-base spdk[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) vhost peer closed
Oct 24 02:23:48 Ubuntu-2204-jammy-amd64-base vhost[1517]: VHOST_CONFIG: (/var/storage/vhost/vmrsd9p9_0) new vhost user connection is 48

Use separate ssh user for GitHub runners other than "runner" user

Burak has valid reasons for it: #752 (comment)

Additionally

Users on default GitHub runners:

sphinxsearch:x:114:123:Sphinx fulltext search service,,,:/var/run/sphinxsearch:/usr/sbin/nologin
dnsmasq:x:115:65534:dnsmasq,,,:/var/lib/misc:/usr/sbin/nologin
mysql:x:116:125:MySQL Server,,,:/nonexistent:/bin/false
postgres:x:117:126:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
runneradmin:x:1000:1000:Ubuntu:/home/runneradmin:/bin/bash
runner:x:1001:127:,,,:/home/runner:/bin/bash

Users on our runners:

sphinxsearch:x:114:123:Sphinx fulltext search service,,,:/var/run/sphinxsearch:/usr/sbin/nologin
dnsmasq:x:115:65534:dnsmasq,,,:/var/lib/misc:/usr/sbin/nologin
mysql:x:116:125:MySQL Server,,,:/nonexistent:/bin/false
postgres:x:117:126:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
runner:x:1000:1000::/home/runner:/bin/bash

The default GitHub runners have a "runneradmin" user, similar to our proposed separate GitHub runner user. Furthermore, if we use an additional user, our "runner" user will have the same uid, 1001.

What's up with get-404-delete?

# Waiting 404 Not Found response for get runner request
begin
github_client.get("/repos/#{github_runner.repository_name}/actions/runners/#{github_runner.runner_id}")
github_client.delete("/repos/#{github_runner.repository_name}/actions/runners/#{github_runner.runner_id}")
nap 5
rescue Octokit::NotFound
end

This code is strange: why call GET before DELETE? Why not cut to the chase and use DELETE?

Doing that would simplify the tests, and remove one long network access.

VM hangs at boot

Serial log

VM is stuck, and last few lines are the following:

[    0.129822] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.129822] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
[    0.129822] Freeing SMP alternatives memory: 44K
[    0.129822] pid_max: default: 32768 minimum: 301
[    0.129822] LSM: initializing lsm=lockdown,capability,landlock,yama,integrity,apparmor
[    0.129822] landlock: Up and running.
[    0.129822] Yama: becoming mindful.
[    0.129822] AppArmor: AppArmor initialized
[    0.129822] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.129822] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)

In a successful boot, next line should be:

[    0.173927] smpboot: CPU0: AMD EPYC 7502P 32-Core Processor (family: 0x17, model: 0x31, stepping: 0x0)

gdb

Also gdb backtrace doesn't show anything useful. Maybe we should install cloud-hypervisor debug symbols:

Reading symbols from /opt/cloud-hypervisor/v31.0/cloud-hypervisor...
(No debugging symbols found in /opt/cloud-hypervisor/v31.0/cloud-hypervisor)

strace output

strace -p [vcpu thread id] outputs following and is stuck

ioctl(27, KVM_RUN

and strace -p [cloud hypervisor pid] outputs following and is stuck

futex(0x7fc5e006e910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 3008632, NULL, FUTEX_BITSET_MATCH_ANY

SPDK

Last lines of SPDK log:

Oct 25 23:44:41 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) read message VHOST_USER_SET_VRING_BASE
Oct 25 23:44:41 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) vring base idx:0 last_used_idx:0 last_avail_idx:0.
Oct 25 23:44:41 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) read message VHOST_USER_SET_VRING_KICK
Oct 25 23:44:41 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) vring kick idx:0 file:105
Oct 25 23:44:41 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) read message VHOST_USER_SET_VRING_ENABLE
Oct 25 23:44:41 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) set queue enable: 1 to qp idx: 0
Oct 25 23:44:42 Ubuntu-2204-jammy-amd64-base vhost[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) read message VHOST_USER_SET_VRING_ENABLE
Oct 25 23:44:42 Ubuntu-2204-jammy-amd64-base vhost[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) set queue enable: 0 to qp idx: 0
Oct 25 23:44:42 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) read message VHOST_USER_SET_VRING_ENABLE
Oct 25 23:44:42 Ubuntu-2204-jammy-amd64-base vhost[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) read message VHOST_USER_GET_VRING_BASE
Oct 25 23:44:42 Ubuntu-2204-jammy-amd64-base vhost[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) vring base idx:0 file:2021
Oct 25 23:44:42 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) set queue enable: 0 to qp idx: 0
Oct 25 23:44:42 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) read message VHOST_USER_GET_VRING_BASE
Oct 25 23:44:42 Ubuntu-2204-jammy-amd64-base spdk[1494]: VHOST_CONFIG: (/var/storage/vhost/vm6zye9j_0) vring base idx:0 file:2021

VM's cpu, memory, and disk config

{
  "cpus": {
    "boot_vcpus": 4,
    "max_vcpus": 4,
    "topology": {
      "threads_per_core": 2,
      "cores_per_die": 2,
      "dies_per_package": 1,
      "packages": 1
    },
    "kvm_hyperv": false,
    "max_phys_bits": 46,
    "affinity": null,
    "features": {
      "amx": false
    }
  },
  "memory": {
    "size": 17179869184,
    "mergeable": false,
    "hotplug_method": "Acpi",
    "hotplug_size": null,
    "hotplugged_size": null,
    "shared": false,
    "hugepages": true,
    "hugepage_size": 1073741824,
    "prefault": false,
    "zones": null,
    "thp": true
  },
  "payload": {
    "firmware": null,
    "kernel": "/opt/fw/edk2-stable202302/x64/CLOUDHV.fd",
    "cmdline": null,
    "initramfs": null
  },
  "disks": [
    {
      "path": null,
      "readonly": false,
      "direct": false,
      "iommu": false,
      "num_queues": 1,
      "queue_size": 256,
      "vhost_user": true,
      "vhost_socket": "/var/storage/vmwanwm8/0/vhost.sock",
      "rate_limiter_config": null,
      "id": "_disk0",
      "disable_io_uring": false,
      "pci_segment": 0
    },
    {
      "path": "/vm/vmwanwm8/cloudinit.img",
      "readonly": false,
      "direct": false,
      "iommu": false,
      "num_queues": 1,
      "queue_size": 128,
      "vhost_user": false,
      "vhost_socket": null,
      "rate_limiter_config": null,
      "id": "_disk1",
      "disable_io_uring": false,
      "pci_segment": 0
    }
  ],
...
}

Production VmHost machines have high memory usage

Even VmHost has no vm on it, memory usage is 249/255G

image

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           255Gi       249Gi       471Mi        16Mi       5.8Gi       5.2Gi
Swap:          4.0Gi       0.0Ki       4.0Gi

$ free -m
               total        used        free      shared  buff/cache   available
Mem:          261570      255140         471          16        5958        5336
Swap:           4095           0        4095

htop shows 0.0 MEM% for all but 4 SPDK processes have 65.6G VIRT. SPDK might allocate whole memory.

Consider changing README.md's "Why use it section?"

This section is a little short and doesn't fully communicate Ubicloud's benefits. Here's an alternative version.

Why use Ubicloud?

Over the past decade, there's been a tremendous movement towards cloud computing. Giants such as AWS, Azure, and Google Cloud have revolutionized the market with their diverse service offerings, enabling startups and enterprises to scale and innovate faster than ever before. However, these services come with a hefty price tag and a subtle, yet significant, risk of vendor lock-in. This can potentially lead to a scenario where escalating costs and lack of flexibility threaten your business sustainability.

This is where Ubicloud steps in, rewriting the narrative of modern cloud computing.

Ubicloud is not just another cloud service. It's a game-changing cloud platform designed to run seamlessly on any infrastructure - from cost-effective bare metal providers like Hetzner or OVH, right through to your own colocated hardware.

Our mission with Ubicloud is to democratize cloud services, bringing you the power of the cloud, without the constraints. It's about giving you the freedom to deploy where you want, how you want, and when you want. Ubicloud is about putting you back in control.

When you choose Ubicloud, you're not only gaining unparalleled portability, but you're also unlocking a myriad of benefits:

  • Cost Savings: Leverage your existing hardware or choose from a range of affordable bare metal providers.

  • Avoid Vendor Lock-in: With Ubicloud, you're not tied to any specific provider. You have the freedom to migrate between platforms as per your business needs.

  • Security & Compliance: Meet your stringent security and compliance needs by having full control over where and how your data is stored.

As of today, AWS offers approximately two hundred cloud services. Our vision with Ubicloud is to simplify this complexity. We aim to deliver the core 10% of services that account for 80% of your cloud consumption. In essence, we're focusing on what truly matters to your business and doing it better.

Make UI improvements

We could evaluate the following changes / improvements to our current UI. I'll defer writing tests to a separate issue.

  • Theme color - We currently have this set to violet. Test with 2 shades of orange, 1 black, and another color
  • Object storage - When you click on our S3-like service, instead of a "page not found", say that this service is under development
  • "vm-host" page improvements
    • Do we need to have "Location" in this page? Isn't the idea that we can cloudify any host?
    • (don't know) Should we be more prescriptive to the user on how to give SSH access?
    • (don't know) Does Equinix also put the SSH keys under /root/.ssh?
    • This page asks hostname to enter. Do we expect a hostname, IP address, or either on this page?
  • Make the Settings page nicer
  • Terminology changes
    • CPU -> vCPU
    • Location -> Provider & Location
    • Virtual Machines -> Compute or Elastic Compute (Is Elastic Compute trademarked by AWS?)
  • Dashboard (home page) -> The layout uses a table. Should we change this to separate buttons like OVH?

Fix storage provisioning retry

Currently if VM setup is interrupted after storage provisioning is half done or is completely done, then VM setup retry will fail because SPDK will error that bdev already exist.

Storage provisioning should be retriable and idempotent.

Docker containers can't resolve DNS addresses

When I tried to run GitHub action that builds docker image on our self-hosted runners, it failed with following error:

ERROR: failed to solve: ruby:3.2.2-alpine3.17: failed to do request:
Head
"https://registry-1.docker.io/v2/library/ruby/manifests/3.2.2-alpine3.17":
dial tcp: lookup registry-1.docker.io on 192.168.0.1:53: read udp
172.17.0.2:48484->192.168.0.1:53: i/o timeout
https://github.com/ubicloud/ubicloud/actions/runs/5969312670/job/16194950769

Even simple docker run -it alpine:3:17 ping google.com can't resolve DNS.

/etc/resolve.conf of VmHost and Vm

nameserver 127.0.0.53
options edns0 trust-ad
search .

/etc/resolve.conf of docker container

nameserver 192.168.111.192
search .

We debug it with @furkansahin.
It has some possible solutions:

  • Add DNS config to docker daemon config
    • Add following to /etc/docker/daemon.json, then restart docker sudo systemctl restart docker
{
    "dns": ["9.9.9.9"]
}
  • Docker gets resolve.conf content from systemd-resolved service. We can add additional DNS records to it. Also Digital Ocean has /etc/systemd/resolved.conf.d/DigitalOcean.conf config.
sudo mkdir /etc/systemd/resolved.conf.d
sudo sh -c 'echo "[Resolve]\nDNS=9.9.9.9" > /etc/systemd/resolved.conf.d/Ubicloud.conf'
sudo systemctl restart systemd-resolved.service

I added Ubicloud.conf to our self-hosted runners. But we need to fix this issue for new virtual machines too. Running docker containers on new created vm without any additional config is good user experience.

Deleting vhost controller when purging the VM failed

This is probably a race condition and systemctl stop vm12345 takes time.

/home/rhizome/host/lib/spdk_rpc.rb:18:in `rescue in rpc_call': Device or resource busy (SpdkRpcError)
	from /home/rhizome/host/lib/spdk_rpc.rb:15:in `rpc_call'
	from /home/rhizome/host/lib/spdk_rpc.rb:61:in `vhost_delete_controller'
	from /home/rhizome/host/lib/storage_volume.rb:73:in `purge_spdk_artifacts'
	from /home/rhizome/host/lib/vm_setup.rb:113:in `block in purge_storage'
	from /home/rhizome/host/lib/vm_setup.rb:111:in `each'
	from /home/rhizome/host/lib/vm_setup.rb:111:in `purge_storage'
	from /home/rhizome/host/lib/vm_setup.rb:96:in `purge'
	from host/bin/deletevm.rb:12:in `<main>'
/home/rhizome/host/lib/json_rpc_client.rb:39:in `call': Device or resource busy (JsonRpcError)
	from /home/rhizome/host/lib/spdk_rpc.rb:16:in `rpc_call'
	from /home/rhizome/host/lib/spdk_rpc.rb:61:in `vhost_delete_controller'
	from /home/rhizome/host/lib/storage_volume.rb:73:in `purge_spdk_artifacts'
	from /home/rhizome/host/lib/vm_setup.rb:113:in `block in purge_storage'
	from /home/rhizome/host/lib/vm_setup.rb:111:in `each'
	from /home/rhizome/host/lib/vm_setup.rb:111:in `purge_storage'
	from /home/rhizome/host/lib/vm_setup.rb:96:in `purge'
	from host/bin/deletevm.rb:12:in `<main>'

Project naming collide with established brand.

Hi folks,

just a message to warn you that you MAY infringe one of Ubisoft brands property or risk legal action from Ubisoft as they already have a public and private cloud named ubicloud since around 2008.

I would highly suggest an alternative name that would not risk any infringement or legal action but that would still weight your idea such as omnicloud.

While adding VmHost, selecting Location&Provider confuses first timers

While adding VmHost, Location&Provider fields helps to associate VmHost with that location. So Vm allocation algorithm can select that VmHost when new Vm requested for that location.

It's meaningful if you have multiple Location&Providers (regions), but it confuses first timers who have single Location&Provides (region).

Invalid MAC issue while provisioning a vm

We encountered this issue a few times at production. The vm stuck at wait_sshable because the respirate can't connect to the vm via IPv4. cat /vm/VM_NAME/serial.log has this exception

[    4.416703] cloud-init[462]: 2023-10-16 10:20:24,707 - util.py[WARNING]: failed stage init-local
[    4.422362] cloud-init[462]: failed run of stage init-local
[    4.423244] cloud-init[462]: ------------------------------------------------------------
[    4.424499] cloud-init[462]: Traceback (most recent call last):
[    4.425442] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 766, in status_wrapper
[    4.426706] cloud-init[462]:     ret = functor(name, args)
[    4.427514] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 432, in main_init
[    4.428784] cloud-init[462]:     init.apply_network_config(bring_up=bring_up_interfaces)
[    4.429840] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 939, in apply_network_config
[    4.431105] cloud-init[462]:     return self.distro.apply_network_config(
[    4.432065] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 281, in apply_network_config
[    4.433386] cloud-init[462]:     self._write_network_state(network_state, renderer)
[    4.434374] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/distros/debian.py", line 142, in _write_network_state
[    4.435694] cloud-init[462]:     return super()._write_network_state(*args, **kwargs)
[    4.436714] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 169, in _write_network_state
[    4.438036] cloud-init[462]:     renderer.render_network_state(network_state)
[    4.438986] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 310, in render_network_state
[    4.440309] cloud-init[462]:     self._netplan_generate(run=self._postcmds, same_content=same_content)
[    4.441437] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 323, in _netplan_generate
[    4.442701] cloud-init[462]:     subp.subp(self.NETPLAN_GENERATE, capture=True)
[    4.443661] cloud-init[462]:   File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 335, in subp
[    4.444810] cloud-init[462]:     raise ProcessExecutionError(
[    4.445658] cloud-init[462]: cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
[    4.446790] cloud-init[462]: Command: ['netplan', 'generate']
[    4.447640] cloud-init[462]: Exit code: 1
[    4.448305] cloud-init[462]: Reason: -
[    4.448923] cloud-init[462]: Stdout:
[    4.449504] cloud-init[462]: Stderr: /etc/netplan/50-cloud-init.yaml:12:29: Error in network definition: Invalid MAC address '40626667100', must be XX:XX:XX:XX:XX:XX or XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX
[    4.451490] cloud-init[462]:                         macaddress: 40626667100
[    4.452459] cloud-init[462]:                                     ^
[    4.453344] cloud-init[462]: ------------------------------------------------------------
[FAILED] Failed to start Initial cloud-init job (pre-networking).
See 'systemctl status cloud-init-local.service' for details.

sudo systemctl status VM_NAME-dnsmasq.service shows only IPv6 addresses not IPv4.

sudo systemctl status VM_NAME.service shows the vm is started.

● VM_NAME.service - VM_NAME
     Loaded: loaded (/etc/systemd/system/VM_NAME; static)
     Active: active (running) since Mon 2023-10-16 12:20:17 CEST; 17min ago
   Main PID: 1556174 (cloud-hyperviso)
      Tasks: 12 (limit: 4230)
     Memory: 6.9M
        CPU: 18.746s
     CGroup: /system.slice/VM_NAME.service
             └─1556174 /opt/cloud-hypervisor/v31.0/cloud-hypervisor --api-socket path=/vm/VM_NAME/ch-api.sock --kernel /opt/fw/edk2-stable202302/x64/CLOUDHV.fd --disk vhost_user=true,socket=/var/storage/VM_NAME/0/vhost.sock,num_queues=1,queue_size=256 --disk path=/vm/VM_NAME/clo>

Oct 16 12:20:17 Ubuntu-2204-jammy-amd64-base systemd[1]: Starting VM_NAME...
Oct 16 12:20:17 Ubuntu-2204-jammy-amd64-base systemd[1]: Started VM_NAME.

I/O failures when using encrypted disks

If I create 3 VMs with encrypted disks, and do sudo apt update, then sudo apt install gcc, then sudo apt upgrade inside each VM, I get the following error:

update-initramfs: Generating /boot/initrd.img-5.15.0-75-generic
sync: error syncing '/boot/initrd.img-5.15.0-75-generic': Input/output error

Or sometimes Read-only filesystem error.

When I look at SPDK logs using journalctl -u spdk.service, I see 150 of the following errors.

vhost[1069]: bdev_aio.c: 202:bdev_aio_writev: *ERROR*: bdev_aio_writev: io_submit returned -14
spdk[1069]: bdev_aio.c: 202:bdev_aio_writev: *ERROR*: bdev_aio_writev: io_submit returned -14

I don't see them in the unencrypted mode.

When I look at output of strace -p [spdk pid] 2>&1 | grep io_submit, I see 75 lines with EFAULT error codes. In unencrypted mode, I see 0 EFAULT errors.

io_submit(0x7fadb0fc3000, 1, [{aio_data=0x20006d799920, aio_lio_opcode=IOCB_CMD_PWRITEV, aio_fildes=36, aio_buf=[{iov_base=0x8000000000000000, iov_len=65536}], aio_offset=3043033088}]) = -1 EFAULT (Bad address)

Also in VM serial logs we see errors like:

[   51.534463] Buffer I/O error on device vda1, logical block 439673
...
[   51.556688] blk_update_request: I/O error, dev vda, sector 3799040 op 0x1:(WRITE) flags 0x4000 phys_seg 20 prio class 0

It also regenerates if I do the following inside the VM:

$ dd if=/dev/random of=1.txt bs=512 count=1000000
$ sync 1.txt
sync: error syncing '1.txt': Input/output error

What's wrong?

iov_base=0x8000000000000000 in above io_submit is a memory pointer & seems to be an invalid memory address.

Things I have tried

  • Since this seems to be a memory issue, I tried increasing --mem-size of the spdk target to 4G. It didn't make a difference.
  • When this error happened, we ran SPDK under unprivileged user. I tried running under root. It didn't make a difference & error happened again.
  • Originally, we ran SPDK using systemctl. I tried running directly. It didn't make a difference & error happened again.

Steps from README throws error : on running docker-compose

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 704, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 399, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.11/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.11/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.11/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.11/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.11/http/client.py", line 975, in send
    self.connect()
  File "/usr/lib/python3/dist-packages/docker/transport/unixconn.py", line 30, in connect
    sock.connect(self.unix_socket)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 788, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/six.py", line 718, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 704, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 399, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.11/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.11/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.11/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.11/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.11/http/client.py", line 975, in send
    self.connect()
  File "/usr/lib/python3/dist-packages/docker/transport/unixconn.py", line 30, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/docker/api/client.py", line 214, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
                        ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/docker/api/client.py", line 237, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 547, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/docker-compose", line 33, in <module>
    sys.exit(load_entry_point('docker-compose==1.29.2', 'console_scripts', 'docker-compose')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 81, in main
    command_func()
  File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 200, in perform_command
    project = project_from_options('.', options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/compose/cli/command.py", line 60, in project_from_options
    return get_project(
           ^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/compose/cli/command.py", line 152, in get_project
    client = get_client(
             ^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/compose/cli/docker_client.py", line 41, in get_client
    client = docker_client(
             ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/compose/cli/docker_client.py", line 170, in docker_client
    client = APIClient(use_ssh_client=not use_paramiko_ssh, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/docker/api/client.py", line 197, in __init__
    self._version = self._retrieve_server_version()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/docker/api/client.py", line 221, in _retrieve_server_version
    raise DockerException(
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))

/dashboard shows 403 for people without access

Dashboard page checks Project:view permission, which might not be granted even if it is unusual (and probably mistake on user's part). Still, dashboard should always be visible, because it is the first page you see after switching to a project..

Issues with login names

Problem 1
Users can choose login name, but after the VM is created we don't display it anywhere.

  1. Should we display it in VM details page?
  2. Maybe just force "ubi"?

In either case, what about providing a connect command, similar to what Azure did?

Problem 2
I used "root" for unix user when creating the VM. It happily accepted it, and the VM went it to running state.

  1. I couldn't log into it
  2. I even couldn't ping it.

Not enough database connections at production

Respirate prints Not enough database connections. Waiting active connections to finish their work. db_pool:5 active_threads:4 sometimes, we might have some DB connection leakage

Aug 23 11:47:40  hr[respirate] info  Restarting
Aug 23 11:47:40  hr[respirate] info  State changed from up to starting
Aug 23 11:47:40  hr[respirate] info  Stopping all processes with SIGTERM
Aug 23 11:47:40  ubicloud-console  app[respirate] info  Not enough database connections. Waiting active connections to finish their work. db_pool:5 active_threads:4
Aug 23 11:47:41  hr[respirate] info  Process exited with status 143
Aug 23 11:47:43  hr[respirate] info  Starting process with command `bin/respirate`
Aug 23 11:47:44  hr[respirate] info  State changed from starting to up
Aug 23 19:07:20  app[respirate] info  Not enough database connections. Waiting active connections to finish their work. db_pool:5 active_threads:4
Aug 24 11:52:05  app[respirate] info  Not enough database connections. Waiting active connections to finish their work. db_pool:5 active_threads:4
Aug 24 11:52:05  app[respirate] info  Not enough database connections. Waiting active connections to finish their work. db_pool:5 active_threads:4
Aug 24 12:02:19  hr[respirate] info  Cycling
Aug 24 12:02:19  hr[respirate] info  State changed from up to starting
Aug 24 12:02:20  hr[respirate] info  Stopping all processes with SIGTERM
Aug 24 12:02:20  hr[respirate] info  Process exited with status 143
Aug 24 12:02:23  hr[respirate] info  Starting process with command `bin/respirate`
Aug 24 12:02:24  hr[respirate] info  State changed from starting to up

Make vm_setup.rb idempotent

So a failed vm setup is retriable.

Following is an example of failures we can get but there might be more:

mkdosfs: file /vm/vmv6cz0z/cloudinit.img already exists
	from /home/rhizome/lib/vm_setup.rb:330:in `cloudinit'
	from /home/rhizome/lib/vm_setup.rb:47:in `prep'
	from bin/prepvm.rb:89:in `<main>'

Hide inaccessible menu items from sidebar

Currently we show same menu items to all people whether they have access to them or not. If they click one of those menu items, they see "403 - Forbidden" message. It would be better to not show the pages which user doesn't have access.

YYY comments

Web

  • need a better way to manage coupling of routes and erbs
  • required at launch
  • not required at launch
    <!-- YYY: need a better way to manage coupling of routes and erbs -->

    <!-- YYY: need a better way to manage coupling of routes and erbs -->
  • Should password secret and session secret be the same? Are there rotation issues?
  • required at launch
  • not required at launch

    ubicloud/clover_web.rb

    Lines 172 to 176 in ab84bf6

    # YYY: Should password secret and session secret be the same? Are
    # there rotation issues? See also:
    #
    # https://github.com/jeremyevans/rodauth/commit/6cbf61090a355a20ab92e3420d5e17ec702f3328
    # https://github.com/jeremyevans/rodauth/commit/d8568a325749c643c9a5c9d6d780e287f8c59c31

Clover

  • Implement a robust mesh networking concurrency algorithm.
  • required at launch
  • not required at launch
    # YYY: Implement a robust mesh networking concurrency algorithm.
  • Hack to deal with the presentation currently being in "vcpu" which has a pretty specific meaning being ambigious tothreads or actual cores.
  • required at launch
  • not required at launch

    ubicloud/model/vm.rb

    Lines 47 to 61 in ab84bf6

    # YYY: Hack to deal with the presentation currently being in
    # "vcpu" which has a pretty specific meaning being ambigious to
    # threads or actual cores.
    #
    # The presentation is currently helpful because our bare metal
    # sizes are quite small, supporting only 1, 2, 3 cores (reserving
    # one for ourselves) and 2, 4, 6 vcpu. So the product line is
    # ambiguous as to whether it's ordinal or descriptive (it's
    # descriptive). To convey the right thing in demonstration, use
    # vcpu counts. It would have been nice to have gotten bigger
    # hardware in time to avoid that and standardize on cores.
    #
    # As an aside, although we probably want to reserve a core an I/O
    # process of some kind (e.g. SPDK, reserving the entire memory
    # quota for it may be overkill.
  • inhost names
  • required at launch
  • not required at launch

    ubicloud/model/vm.rb

    Lines 143 to 147 in ab84bf6

    # YYY: various names in linux, like interface names, are obliged
    # to be short, so alas, probably can't reproduce entropy from
    # vm.id to be collision free and there will need to be a second
    # addressing scheme scoped to each VmHost. But for now, assume
    # entropy.
  • Use database jsonb prepend
  • required at launch
  • not required at launch

    ubicloud/prog/base.rb

    Lines 145 to 149 in ab84bf6

    # YYY: Use in-database jsonb prepend rather than re-rendering a
    # new value doing the prepend.
    fail Hop.new(old_prog, old_label,
    {prog: Strand.prog_verify(prog), label: label,
    stack: [new_frame] + strand.stack, retval: nil})

Rhizome

  • Check checksums
  • required at launch
  • not required at launch
    # YYY: we should check against digests of each artifact, to detect and
    # report any unexpected content changes (e.g., supply chain attack).
  • Handle customer images
  • required at launch
  • not required at launch
    # YYY: Need to replace this with something that can handle
    # customer images. As-is, it does not have all the
    # synchronization features we might want if we were to keep this
    # code longer term, but, that's not the plan.
  • systemd capabilities
  • required at launch
  • not required at launch
    # YYY: These are not enough capabilties, at least CAP_NET_RAW is
    # needed, as well as more for setgid
    # CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_NET_ADMIN
    # AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_NET_ADMIN
  • systemd escaping
  • required at launch
  • not required at launch
    # YYY: Do something about systemd escaping, i.e. research the
    # rules and write a routine for it. Banning suspicious strings
    # from VmPath is also a good idea.
  • static guest mac
  • Fixed by #349
    # YYY: Should make this static and saved by control plane, it's
    # not that hard to do, can spare licensed software users some
    # issues:
    # https://stackoverflow.com/questions/55686021/static-mac-addresses-for-ec2-instance
    # https://techcommunity.microsoft.com/t5/itops-talk-blog/understanding-static-mac-address-licensing-in-azure/ba-p/1386187
    #
    # Also necessary because Cloud Hypervisor, at time of writing,
    # does not offer robust PCIe slot mapping of devices. The MAC
    # address is the most effective stable identifier for the guest in
    # this case.

Tests

  • Hacked up to pretend Ampere Altras have hyperthreading for demonstration on small metal instances.
  • required at launch
  • not required at launch
    # YYY: Hacked up to pretend Ampere Altras have hyperthreading
    # for demonstration on small metal instances.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.