Giter Site home page Giter Site logo

m2's Introduction

Build Status Coverage Status

BMI

The Bare Metal Imaging (BMI) is a core component of the Massachusetts Open Cloud and an Image Management System(IMS) that (i)provisions numerous nodes as quickly as possible while preserving support for multitenancy using the Hardware Isolation Layer (HIL) and (ii)introduces the image management techniques that are supported by virtual machines, with little to no impact on application performance.

Motivation

Imagine thousands of nodes in a data center that supports a multitenant bare metal cloud. We need a central image management service that can quickly image the nodes to meet the customer’s needs. Upon completion of a customer’s project, the data center administrator should ideally be able to reallocate the resources within few minutes to use them for another customer. As of now, these techniques are in use for Virtual Machines (VMs), but not for bare metal systems. This project aims to bridge this gap by creating a service that can address the above mentioned issues.

Bare metal systems that support Infrastructure as a Service (IaaS) solutions are gaining traction in the cloud. Some of the advantages include:

Best isolation with respect to containers or VMs
Predictable/stable performance when compared to VMs or containers, especially on input/output (I/O) intensive workloads such as Hadoop jobs, which need predictable storage and network I/O
Leveraging benefits of cloud services, such as economics of scale. As of now, VMs are scalable and elastic, as a customer pays for his/her usage based on resource consumption.

The main concerns of a bare metal system are the inherent slowness in provisioning the nodes and the lack of features, such as cloning, snapshotting, etc. For these reasons, IaaS is typically implemented through VMs.

This project proposes a system that includes all of the above advantages and also addresses the fast provisioning issue for a bare metal system. Using this system, we propose to provision and release hundreds of nodes as quickly as possible with little impact on application performance. Current BMI (IMS) Architecture

BMIS Architecture

We use Ceph as a storage back-end to save OS images. For every application we support, we have a “golden image,” which acts as a source of truth. When a user logs-in and requests a big data environment, we clone from this golden image and provision nodes using the cloned image and a PXE bootloader. Hardware Isolation Layer (HIL) serves as a network isolation tool through which we achieve multitenancy. HIL provides a service for node allocation and deallocation. For more details about HIL, please visit https://github.com/CCI-MOC/hil.

Planning and Getting Involved

To get involved in this project, please send email to (MOC team-list) and/or join the #moc irc channel on freenode. For more information please visit our website You can find us on slack

m2's People

Contributors

akshayakhare avatar apoorvemohan avatar chemistry-sourabh avatar djfinn14 avatar gsilvis avatar ianballou avatar nasibehteimouri avatar naved001 avatar ravisantoshgudimetla avatar sirushtim avatar xukan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

m2's Issues

ISCSI Driver

Currently there are no drivers anywhere. The stuff that needs to be done include

  1. Write a framework for finding and loading drivers.
  2. Write an interface for ISCSI
  3. Convert TGT into a driver by extending the interface.
  4. deprecate current IET driver (until issue #31 is solved)

Bash Completion in CLI

bmi's cli doesnt have any form of bash completion, it will be useful for typing image names, etc.

Separately deploy picasso, einstein and cli without installing everything

Right now we can deploy bmi using the setup.py script. This is will install all packages where ever it is run.

Is there a way to install cli only with cli's dependencies, picasso with its dependencies, etc without installing all packages using a single script ?

@CCI-MOC/bmi-breakers any ideas ?
@knikolla we had a discussion regarding this once, you are welcome to add suggestions.

Filesystem Mock

A Mock Driver that mimics the Filesystem needs to be written. This will be useful for unit tests

DNS Services

I am not sure if this should be part of BMI or HIL.
Right now neither of the services offers a DNS Service, but I feel it will be great for a lot of applications if we support it as we are the ones with a DHCP Server.
Also there should be an option for the user to disable our DNS if the user wants to use his own.
@CCI-MOC/bmi-breakers discuss!!

Ceph Error when running bmi as bmi user

As all of you know I am trying to containerize bmi for easy deployment of light weight dev and production.

Tried installing bmi with proper permissions ( #89 ) for production deployment. Tried running bmi db ls as bmi user, but landed with this error.

Traceback (most recent call last):
  File "/usr/local/bin/bmi", line 9, in <module>
    load_entry_point('ims==0.3', 'console_scripts', 'bmi')()
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ims-0.3-py2.7.egg/ims/cli/cli.py", line 40, in function_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ims-0.3-py2.7.egg/ims/cli/cli.py", line 431, in list_all_images
    with BMI(_username, _password, constants.BMI_ADMIN_PROJECT) as bmi:
  File "/usr/local/lib/python2.7/dist-packages/ims-0.3-py2.7.egg/ims/common/log.py", line 32, in func_wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ims-0.3-py2.7.egg/ims/einstein/operations.py", line 57, in __init__
    self.cfg.iscsi.password)
  File "/usr/local/lib/python2.7/dist-packages/ims-0.3-py2.7.egg/ims/common/log.py", line 32, in func_wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ims-0.3-py2.7.egg/ims/einstein/ceph.py", line 24, in __init__
    self.cluster = self.__init_cluster()
  File "/usr/local/lib/python2.7/dist-packages/ims-0.3-py2.7.egg/ims/common/log.py", line 60, in func_wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ims-0.3-py2.7.egg/ims/einstein/ceph.py", line 52, in __init_cluster
    cluster.connect()
  File "rados.pyx", line 785, in rados.Rados.connect (/build/ceph-RG9HEH/ceph-10.2.6/src/build/rados.c:10073)
rados.ObjectNotFound: error connecting to the cluster

I need to do sudo rbd ls to see the images in ceph. Regular rbd ls doesn't work, it gives a unable to connect to cluster. Error is not observed when running as root.

I am guessing it is some permissions issue. @sirushtim and @ravisantoshgudimetla any ideas ?

Picasso API Startup Error

ubuntu@test-bmi-1:~/ims$ python scripts/picasso_server
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/init.py", line 861, in emit
msg = self.format(record)
File "/usr/lib/python2.7/logging/init.py", line 734, in format
return fmt.format(record)
File "/usr/lib/python2.7/logging/init.py", line 465, in format
record.message = record.getMessage()
File "/usr/lib/python2.7/logging/init.py", line 329, in getMessage
msg = msg % self.args
TypeError: %d format: a number is required, not str
Logged from file _internal.py, line 87

The port after being read from the config file needs to be casted to int before being used in app.run() in ims/picasso/rest.py

Documentation should be a part of the git repo

The BMI setup instructions are a part of the MOC wiki so that everything about MOC can be reached from one page.

In my opinion, we should have the installation instructions in the mainline repo too as a part of documentation (/docs/Installation.md). There we can add more stuff, like a separate document for setting up the dev environment(/docs/Installation-developers.md), (/docs/developer-guidlines.md), (/docs/testing-bmi.md) etc. So that whoever clones the repo will have the documents available offline. Any thoughts?

Move Templates

Currently templates are in the ims folder, they have to be moved to some templates folder in the root directory.

Also setup.py currently installs them in site-packages, so it also must be updated.

This is as per @sirushtim's suggestion to store templates in /etc/bmi/

Load templates from different location

Currently templates are loaded from site-packages, but they have to be loaded from /etc/bmi/ as per @sirushtim's suggestion.

Here the setup.py needs to be updated to not install the templates anymore.

Support Postgres DB

Currently DB Being used is sqlite which is great for dev, but will need to add support for postgres as it is more production ready db

Incorporate DB Init/Migration tool

Incorporate sqlalchemy-migrate/alembic into BMI in order to version database changes for upgrade. Furthermore, create a db-init script that will bootstrap BMI's database using the corresponding migration library.

Comments

A Lot of comments and doc strings must be added to help new comers understand code easily.

TGT

Currently we have a working TGT Driver, but we need to

  1. Change subprocess to sh or atleast a wrapper as there is a lot of duplicate code.
  2. If are sticking with subprocess remove all the shell=True.
  3. Use sudo without password which will require the user or installer to create a new user with NOPASSWD sudo access. This user will have the permissions to all the files required by bmi.

Localise constants

To ease readability, move all the constants to their respective classes as class attributes. It also makes sure when we remove a module, we remove the constants too or else we risk dead code in the constants.py file.

So for example, iscsi.py will have the following related constants in the file :

ISCSI_UPDATE_SUCCESS = 'successfully'
ISCSI_UPDATE_FAILURE = 'already'
ISCSI_CREATE_COMMAND = 'create'
ISCSI_DELETE_COMMAND = 'delete'

will be moved to

...
class ISCSI(object):
metaclass = ABCMeta
(ISCSI_UPDATE_SUCCESS, ISCSI_UPDATE_FAILURE) = ('successfully', 'already')
...

Change BMIConfig obj

Currently the BMIConfig obj reads each field from the config and stores them as a variable.
The Task here is to change these variables to dictionaries that store the entire section.
Also Checking for mandatory fields needs to be added.
This will allow for BMI to use Drivers.

Don't hardcode HAAS_BMI_CHANNEL

To get a mock setup, we cannot hardcode the HAAS_BMI_CHANNEL in ims. Have a way to set HAAS_BMI_CHANNEL in the configuration file.

Right way to delete targets

I didn't do research, but have a hunch.
in tgt-admin there is a --offline along with --delete. Right now we are doing tgt-admin --force --delete to delete a target. Typically in lots of software using the force option is not advised, but we need it to remove targets that are being used. I was thinking what if we do a --offline before doing a --delete ? Probably could reduce or eliminate the issue where we sometimes can't remove a target.

@CCI-MOC/bmi-breakers

Don't hardcode DNSMASQ lease file location

The dnsmasq lease file is hardcoded to /var/lib/misc/dnsmasq.leases. Make it configurable to accomodate lease files in different locations. For example: libvirt has a lease file in /var/lib/libvirt/dnsmasq/

Permissions Issue

BMI currently reads or writes or both to these folders
/etc/bmi/bmiconfig.cfg
/etc/tgt/conf.d/
/var/lib/tftpboot/
/var/lib/tftpboot/pxelinux.cfg/
/var/log/bmi/

We were having some crashes due to permission issues. Unfortunately it is a bad idea to run bmi as root due to basic security concerns. The right way to run bmi as a user with password less sudo access and with default permissions, but different ownerships.

/etc/bmi/ owned by root:root
/etc/tgt/conf.d/ should not be used as /etc/* typically consists of conf files which are typically modified by humans not programs. The right way to do it is to use a special program that modifies the conf files (which doesn't exist) or use the tgtadm commands (after doing research the tgt-admin --execute converts the conf file into tgtadm commands) or generate the conf file in /tmp and use tgt-admin --conf option.
/var/lib/tftpboot/ bmi:bmi
/var/lib/tftpboot/pxelinux.cfg/ bmi:bmi
/var/log/bmi/ bmi:bmi

@CCI-MOC/bmi-breakers let the fights begin!!

Strategy to promote from dev to master branch

@CCI-MOC/bmi-breakers : Once we merge the PR's into dev branch, we need to come up with some check list based on which we need to promote the code from dev to master branch. Some of the items that we need to have:

  • PEP 257(optional).
  • Unit tests for all the functions written.
  • Integration tests.

As of now, we have Travis CI(which is really good. Thanks @naved001) but the idea is to use Openshift jenkins for integration tests(similar to Gates that we have for openstack). @akshayakhare has setup last semester. We need to integrate that. Please add in case you guys have anything else in mind.

Filesystem Driver

Issue #30 should be done before coming to this.
Since the framework will be written just an interface needs to be written, Ceph needs to extend that interface and the framework needs to be used to load the class.

Pep8 tests!

We should setup Travis CI for tests. And to get the ball rolling, I'll start by adding pep8 tests since we are adding a lot of new code, so I think it's important to make sure that our code is clean and compliant. If you have any other suggestions for doing pep8, we can discuss that!

Fix * imports

Currently in the code base there are a lot of * imports.
The Task is to remove all the * imports and replace them with the correct imports.
This will put the codebase closer to PEP8 compliance.

Add admin support to projects

Right now only the project bmi_infra is considered as admin project (Hardcoded). But, there could be multiple admin projects so the db must be updated to include this.

There is also the issue of bootstrapping BMI with the first admin project. I propose adding this project directly to the db during installation. (Will write another bootstrapping script).

@ravisantoshgudimetla we didn't complete are discussion regarding this.
@apoorvemohan @naved001 @sirushtim your thoughts ?

DB Engine

Currently the sqlalchemy's engine is a static variable in the class, but it is being created each time a request is being made which is not good.
My idea is to write a wrapper that wraps the sqlalchemy's engine.
This class will be directly used in RPC Server and the DBConnection object given by this class will be passed to BMI object.
This will allow us to solve the above issue and allow to change sqlalchemy with another ORM should the need arise.
@ravisantoshgudimetla @apoorvemohan @sirushtim @naved001 give thoughts below

Setup BMI CI on Openshift

This issue is to keep track of the CI integration in Openshift. @akshayakhare is trying to setup the environment where the test cases will run. Our job is to make sure that we have working test cases and help with any other issues which arise related to the environment setup on openshift.

Add support for SSH Key Injection into Image

This is an interesting one. Nearly started working on it last summer.
The idea is to inject SSH public keys into the clone of the requested image similar to openstack does it.

Should support most disk formats (LVM will be tricky). Whoever is jobless and wants to bang their heads can take this

Update readme.md

Readme.md was not updated since a long time (as nothing was pushed ). Need to update timeline and contributors list.

Stress Tests

There are some issues in BMI which do not happen regularly like target deletion failing in TGT. It is a good idea to add stress tests that will expose such issues.

Don't access the database from the cli

The clients are always supposed to talk to a rest API, never the database. There are a bunch of places where the cli in bmi talks to the database. Iteratively remove them in small easy-to-review patches.

def list_provisioned_nodes(project):
with BMI(_username, _password, project) as bmi:
def list_projects():
with BMI(_username, _password, constants.BMI_ADMIN_PROJECT) as bmi:
def add_project(project, network, id):
with BMI(_username, _password, constants.BMI_ADMIN_PROJECT) as bmi:
def delete_project(project):
with BMI(_username, _password, constants.BMI_ADMIN_PROJECT) as bmi:
def delete_image(project, img):
with BMI(_username, _password, constants.BMI_ADMIN_PROJECT) as bmi:
def add_image(project, img, id, snap, parent, public):
with BMI(_username, _password, constants.BMI_ADMIN_PROJECT) as bmi:
def import_ceph_image(project, img, snap, protect):
with BMI(_username, _password, project) as bmi:
def export_ceph_image(project, img, name):
with BMI(_username, _password, project) as bmi:
def copy_image(src_project, img1, dest_project, img2):
with BMI(_username, _password, src_project) as bmi:
def move_image(src_project, img1, dest_project, img2):
with BMI(_username, _password, src_project) as bmi:
def get_node_ip(project, node):
with BMI(_username, _password, project) as bmi:
def create_mapping(project, img):
with BMI(_username, _password, project) as bmi:
def delete_mapping(project, img):
with BMI(_username, _password, project) as bmi:

In the end, remove the import of ims.einstein.operations.BMI

IET

Currently the IET Driver is too complicated.

  1. It needs to be simplified.
  2. subprocess needs to change to sh (Similar discussion in #29 )
  3. Needs to extend the interface in #30
  4. Needs to implement ceph map and unmap to free dependency on ceph so that IET can be made into driver.

Failed BMI provisions can lead to an inconsistent state in BMI

ubuntu@test-ceph:~/ims/ims$ bmi pro bmi_infra some_bmi_node bmi_image bmi_network 1
bmi_image not found

ubuntu@test-ceph:~/ims/ims$ bmi dpro bmi_infra some_bmi_node bmi_network 1
Got status code 400 from HaaS with message : The network is not attached to the nic.

The issue is likely to be at the tear-down/roll-back process for failed BMI provisions.

Versioning BMI

BMI doesnt have any way to tell which version is currently deployed. This can be resolved by adding a build number or version number in the cli.

ISCSI Mock

Need to write a Mock Driver for ISCSI that extends the ISCSI interface. This driver should have a sqlite db so that it can store persistent targets like a real ISCSI server. This allows for unit tests to be written.

Add remaining operations to picasso

There are certain admin level operations that were not added to picasso due to security reasons. These operations should be added as path of the rest API, but must work only for admins.

Move __register() and sub functions to DHCP class

Currently the __register method generated the .ipxe and mac address files using two other helper functions and they are part of BMI class.

Should they be moved to DHCP class or make a new TFTP class and include them ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.