Giter Site home page Giter Site logo

beyondtheclouds / enos Goto Github PK

View Code? Open in Web Editor NEW
30.0 12.0 22.0 7.73 MB

Experimental eNvironment for OpenStack :monkey:

Home Page: https://beyondtheclouds.github.io/enos/

License: GNU General Public License v3.0

Python 82.01% Ruby 0.68% Shell 15.11% Jinja 2.20%
openstack reproducible-research docker grid5000 virtualbox vagrant chameleon kolla-ansible

enos's Introduction

Doc Status Pypi Code style License

Join us on gitter : Join the chat at https://gitter.im/BeyondTheClouds/enos

About Enos

Enos aims at reproducible experiments of OpenStack. Enos relies on Kolla Ansible and helps you to easily deploy, customize and benchmark an OpenStack on several testbeds including Grid'5000, Chameleon and more generally any OpenStack cloud.

Installation

Enos is best installed via pip. It is tested with python3.7+:

pip install enos

Quick Start

For the quick-start, we will bring up an OpenStack on VirtualBox. VirtualBox is free and works on all major platforms. Enos can, however, work with many testbeds including Grid'5000 and Chameleon.

First, make sure your development machine has VirtualBox and Vagrant installed. Then, ensure that you have at least 10 GiB of memory.

To deploy your fist OpenStack with enos:

enos new --provider=vagrant:virtualbox  # Generate a `reservation.yaml` file
enos deploy

Enos starts three virtual machines and configures Kolla Ansible to deploy the OpenStack control plane on the first one, the network related services (Neutron, HAProxy, RabbitMQ) on the second one, and use the last one as a compute node. Note that the full deployment may take a while (around 30 minutes to pull and run all OpenStack docker images).

You can customize the deployed services and the number of virtual machines allocated by modifying the generated reservation.yaml file. Calls enos --help or read the documentation for more information.

Acknowledgment

Enos is developed in the context of the Discovery initiative.

Links

enos's People

Contributors

alebre avatar asimonet avatar avankemp avatar dpertin avatar jonglezb avatar jrbalderrama avatar manuvaldi avatar marie-donnie avatar matrohon avatar msimonin avatar rcherrueau avatar rizaon avatar swandr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

enos's Issues

Backup influx failed when enable_monitoring=false

fatal: [graphene-6-kavlan-4.nancy.grid5000.fr]: FAILED! => {"changed": true, "cmd": ["docker", "stop", "influx"], "delta": "0:00:00.023152", "end": "2016-11-04 18:38:19.542934", "failed": true, "rc": 1, "
start": "2016-11-04 18:38:19.519782", "stderr": "Error response from daemon: No such container: influx", "stdout": "", "stdout_lines": [], "warnings": []}

add when: enable_monitoring to the tasks

running two benchs leads to have nested directory structure for the logs

https://github.com/BeyondTheClouds/kolla-g5k/blame/92e4ab657b860b32e7d0bf801e25c9626658ea51/ansible/roles/bench/tasks/logs.yml#L7

a second call, will create a _data subdirectory instead of copying the contents.
As a result indexing the logs in the results vm will fail.

I suggest to

  1. explicitly create /tmp/kolla-logs
  2. explicitly copy the content of /var/lib/docker/volumes/kolla_logs/_data inside

subsequent call should overwrite previous ones, which is a more desirable behaviour.

Memcached crashed with stable/ocata

Kolla-ansible failed to launch memcached with the following error:

Running command: '/usr/bin/memcached -vv -l 10.44.0.18 -p 11211 -c 5000'
failed to set rlimit for open files. Try starting as root or requesting smaller maxconns value.

The problem comes from the -c 5000 that is too big.

Galera patch is outdated

I tested it with Kolla master and branch 3.0.1, each time mariadb fails to start.
So I assume galera.cnf needs an update.

vlans / subnets are now accessible through the g5k api

It seems that vlans/subnet informations are now available through the API.
As a consequence we probably don't need the g5k_networks.yml file anymore.

>> pp root.sites[:rennes]
#<Resource:0x3fe95a0a2994 uri="/3.0/sites/rennes"
  RELATIONSHIPS
    clusters, deployments, jobs, metrics, network_equipments, parent, pdus, self, servers, status, version, versions, vlans
  PROPERTIES
    "compilation_server"=>false
    "description"=>"Grid5000 Rennes site"
    "email_contact"=>"[email protected]"
    "frontend_ip"=>"172.16.111.106"
    "g5ksubnet"=>{"gateway"=>"10.159.255.254", "network"=>"10.156.0.0/14"}
    "kavlan_ip_range"=>"10.24.0.0/14"
    "kavlans"=>{"1"=>
      {"gateway"=>"172.16.111.101", "network"=>"192.168.192.0/20"},
     "16"=>{"gateway"=>"10.27.255.254", "network"=>"10.27.192.0/18"},
     "2"=>{"gateway"=>"172.16.111.102", "network"=>"192.168.208.0/20"},
     "3"=>{"gateway"=>"172.16.111.103", "network"=>"192.168.224.0/20"},
     "4"=>{"gateway"=>"10.24.63.254", "network"=>"10.24.0.0/18"},
     "5"=>{"gateway"=>"10.24.127.254", "network"=>"10.24.64.0/18"},
     "6"=>{"gateway"=>"10.24.191.254", "network"=>"10.24.128.0/18"},
     "7"=>{"gateway"=>"10.24.255.254", "network"=>"10.24.192.0/18"},
     "8"=>{"gateway"=>"10.25.63.254", "network"=>"10.25.0.0/18"},
     "9"=>{"gateway"=>"10.25.127.254", "network"=>"10.25.64.0/18"},
     "default"=>{"gateway"=>"172.16.111.254", "network"=>"172.16.96.0/20"}}
    "latitude"=>48.1
    "location"=>"Rennes, France"
    "longitude"=>-1.6667
    "name"=>"Rennes"
    "production"=>true
    "renater_ip"=>"192.168.4.19"
    "security_contact"=>"[email protected]"
    "storage5k"=>true
    "sys_admin_contact"=>"[email protected]"
    "type"=>"site"
    "uid"=>"rennes"
    "user_support_contact"=>"[email protected]"
    "virt_ip_range"=>"10.156.0.0/14"
    "web"=>"http://www.irisa.fr"
    "version"=>"50f72bc5970f734edadb7337e7fd406ad1952c4c">

Allow `bench` to be used on an existing deployment

Currently, rally is installed and configured during the up phase. When working with an existing openstack deployment, this phase will likely not be called. As a result we should maybe differ this installation.

Problems on installation

Actually, I have a problem on installation. When I run this command from a frantend
pip install git+git://github.com/BeyondTheClouds/enos@master#egg=enos
I got this log at end

creating /usr/local/lib/python2.7/dist-packages/enos

error: could not create '/usr/local/lib/python2.7/dist-packages/enos': Permission denied

----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-ojF8sT/enos/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-_lzaQi-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-ojF8sT/enos
Storing debug log for failure in /home/jaddarrous/.pip/pip.log

The message says that I do not have permission. Knowing that sudo can't be used on the frontend, I reserved an instance and I ran the command again with sudo-g5k and I got this error log:

import pytz as _pytz

ImportError: No module named pytz

error in setup command: Error parsing /tmp/pip-build-eyIa3H/positional/setup.cfg: ImportError: No module named pytz

----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-eyIa3H/positional/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-Pz__bH-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-eyIa3H/positional
Storing debug log for failure in /root/.pip/pip.log

Did I miss something?

Chameleon provider

I propose to implement :

  • a generic openstack provider
  • a specific chameleon provider
    • Ideally the corresponding python code could be integrated in execo under an execo_cc module. There will be specific code for the reservation, polling the api ...

Generic Workflow :

  1. Make use of a debian8 image (add a new one if needed)
  2. create a dedicated network, a router linked to the external network provided by chameleon
  3. boot the server in this network (1 NIC), flavors are described in the reservation.yaml (similar to vbox provider)
  4. return value of the init will return the ip range of the network in which we can take some virtual ips in the next phase, use only one nic.

How ansible accesses the servers

Solution 1 : one frontend (1)

  1. Boot a master server add associate a floating ip, ssh into it
  2. Run enos from there using the private ips of the server. Maybe using a script to launch at boot time on the master server.

Cons :

  • require one extra node (quota limit)
  • need to propagate the user env to the master server

Pro :

  • Easy (maybe provide a specific script to bootstrap the first VM boot)
  • No latency between local machine and the platform

  • We need to figure out if we make 2 leases for the bare metal case.
    • if one lease is used, this means that the code running on the master server is static (in the sense it already knows the machines, the network ..)
    • if two leases are used :
      • the first is used to reserve the master server
      • the second is created, as usual, by the code running on the master server. This fit more the model we have in enos

Solution 2 : one ssh gateway

  1. Run enos from your local machine
  2. Pick one node to act as gateway.
  • either configure a ssh_config accordingly (that will use this node as proxy to access the others)
  • either generate right ssh parameters in the inventory [1] that will proxy all ansible connections through this host

Hints: need to pass this information to the inventory generator. For now we are using the execo.Host structure. We should probably get rid of it now and use our own structure that will help to express everything.

Caveats :

  • limited by the local machine capacity
  • latency between local machine and the platform

Pro :

  • This is only ansible configuration
  • This fit the model of Enos

Solution 3 : all nodes are accessible

  1. Run enos from your local machine
  2. Enos boot VMs and associate floating ips
  3. Run ansible as usual using the floating ips

Caveats :

  • make sure the public ips won't be used by kolla to configure service
  • floating ips are limited
  • limited by the local machine capacity
  • latency between local machine and the platform

Pros :

  • minor modifications to the inventory generator

Solution 4 : ?

Note on virtual ips

We need to tell neutron to accept trafffic from/to virtual ip. By default traffic to a virtual ip will be blocked. This can be done by updating the corresponding port in neutron by setting the allowed_address_pairs extension

neutron port-update 9b02dbf2-5353-42b4-9d90-80595c4909fa --allowed_address_pairs list=true type=dict ip_address=10.0.2.253

To be generic, we should probably allow this using a full range of IPs using the cidr of the subnet on every ports.

Note on registry

Maybe not for the first iteration, but we could think to have a dedicated volume to store the registry data (similar as we we have on G5K). Except that we don't need ceph dependencies to be installed.

Notable difference when working with bare-metal

Reservation

We'll have to reimplement another reservation logic similar to g5k.

Network isolation

Network isolation is available for bare metal on CC 2.
At a first sight, we could reuse most of the code above (kvm version). We just need to make sure on how the private network is created (follow the good rules of the documentation).

[1]: Something like that in ansible :

[control]
enos-2 ansible_ssh_user=debian ansible_host=10.0.2.61
[compute]
enos-0 ansible_ssh_user=debian ansible_host=10.0.2.60

[all:vars]
ansible_ssh_common_args='-o StrictHostKeyChecking=no -o ProxyCommand="ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]"'

How to create the ressources

  1. python calls to the openstack api (first poc)
  2. dedicated ansible modules
  3. terraform template
    -> Blocking point : for chameleon, reservation_id isn't available as scheduler_hint
    https://www.terraform.io/docs/providers/openstack/r/compute_instance_v2.html#scheduler_hints

Integrating ENOS in the OpenStack CI

It might be valuable to write an internship subject to investigate how ENOS can be used to perform automatic performance regressions test in the CI.
Note that Orange Labs is strongly interested by this aspect.

benchs : some variables are deprecated

Some variables remain in the ansible/group_vars/all.yml but are now unused :

# will be copied on the rally host to launch scenarios
rally_scenarios_dir: "{{ playbook_dir }}/../../rally/"
rally_scenarios_list: "all-scenarios.txt.sample"
rally_times: 1
rally_concurrency: 1

write init-os using a dedicated ansible playbook

As we are moving many part of the code in ansible in our workflow (e.g #116 ). I've the feeling we should use Ansible as well to do the init-os phase. And, who knows, write the openstack provider using ansible sounds not that absurd (see #83 ).

nodes ethx naming

Currently ethx names are globally set. If nodes span different cluster vif naming can differ (eth0/eth1 on some nodes and eth1/eth2 on some other for instance). We should probably set this variables on a per host basis instead of globally.

Reapply GPLv3 licence

never change licencing in a rush :)

Since we are importing ansible and execo we'll have to stick to GPLv3 licence.

Add support for flat provider network

Neutron is currently deployed with the default parameters provided by Kolla. As a consequence tenant networks are enabled by default. In the perspective of running Openstack over Openstack I'm tempting to switch to a simpler model for neutron deployment and use a flat provider network. IPs would be taken on the kavlan ips pool. No tenant networks / floating ip will be supported anymore. This way we avoid the cost of the overlay network for the under cloud (by extension we also could avoid it for the over cloud if needed). This configuration should be eased by the fact that kolla/newton allows custom config to be placed along with the deployment files.

Patch files aren't up to date

Some findings :

  • Patching mariadb as we did for stable/mitaka don't work anymore
  • Patching site.yml shouldn't be necessary anymore since it workarounds a limitation of ansible 1.9.x

Before diving into this we should think twice if the use the node_custom_config is relevant.

vbox : VMs get a name that doesn't match their final role

If we set such a section in reservation.yaml :

resources:
medium:
compute: 1
network: 1
large:
control: 1

then, the following section is generated in the env file :

rsc:
compute:

  • !!python/object:execo.host.Host {address: network-0, keyfile: /home/mat/dev/enos/.vagrant/machines/network-0/libvirt/private_key,
    port: null, user: root}
    control:
  • !!python/object:execo.host.Host {address: control-0, keyfile: /home/mat/dev/enos/.vagrant/machines/control-0/libvirt/private_key,
    port: null, user: root}
    medium:
  • !!python/object:execo.host.Host {address: control-0, keyfile: /home/mat/dev/enos/.vagrant/machines/control-0/libvirt/private_key,
    port: null, user: root}
  • !!python/object:execo.host.Host {address: compute-0, keyfile: /home/mat/dev/enos/.vagrant/machines/compute-0/libvirt/private_key,
    port: null, user: root}
  • !!python/object:execo.host.Host {address: network-0, keyfile: /home/mat/dev/enos/.vagrant/machines/network-0/libvirt/private_key,
    port: null, user: root}
    network:
  • !!python/object:execo.host.Host {address: compute-0, keyfile: /home/mat/dev/enos/.vagrant/machines/compute-0/libvirt/private_key,
    port: null, user: root}

we can see the confusing situation where the VM named compute-0 has the role "network" and the VM named network-0 has the role "compute".

multisite -> deployment model abstraction

The idea of this proposition is to allow different deployment models for multisite deployment to be natively supported by Enos. This goes from :

  1. 1 site / one region / one cloud
  2. multiple site / one region / one cloud
  3. multiple site / multiple region / one cloud
  4. multiple site / multiple clouds

Currently we support 1. 2.
3. could be supported by a wrapper on top of Enos but requires some patches to Kolla :
( https://review.openstack.org/#/c/431588/ and https://review.openstack.org/#/c/431658/)

  1. could be supported also by a wrapper on top of Enos.

The main differences I see between this deployment is how Enos understand groups.
One proposition to implement this has been given here :
https://github.com/BeyondTheClouds/Wiki/wiki/CR-030117-deployment-model

--force-kolla-deploy and --force-deploy

Hi all,

It would be great to have two additional options in kolla.
--force-kolla-deploy:
This option should delete containers on the different nodes and invoke enos to redeploy the selected openstack like if it was the first time. This option is mandatory when you want to perform several trials of an experiment without redeploying everything (kadeploy + kolla)

--force-deploy:
This option should (re)deploy everything (kadeploy + kolla)

Change Host structure

In Enos we rely on execo.Host.
It's desirable to change it into an home made structure since :

  • it's gplv3 code
  • it's not very flexible in what we would like to express with Enos (e.g host accessibe through a proxycommand)

This will have some side effects on every provider and the extra.to_ansible_group function

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.