redhat-cip / edeploy Goto Github PK

View Code? Open in Web Editor NEW

91.0 35.0 38.0 10.7 MB

Linux systems provisioning and updating made easy

License: Apache License 2.0

Makefile 1.46% Python 47.59% Shell 49.28% C 1.33% Perl 0.34%

linux pxe pxe-boot network-installer

edeploy's Introduction

eDeploy

Introduction

eDeploy is a tool to provision and update systems (physical or virtual) using trees of files instead of packages or VM images.

Installation is done using these steps:

boot on PXE or iPXE kernel and initrd. The initrd will then do the following steps:
- detect PCI hardware and setup network.
- send the detected hardware configuration to the server.
- the server sends back a configuration script.
- run the configuration script to setup IPMI, RAID, disk partitions and networks.
- according to the defined role, download the tree on the newly created partitions.
- configure the GRUB boot loader and reboot the system.
Then the system will boot on the provisioned hard drive directly.

Initial configuration

Debian

You will need the following dependencies to be able to run the test-suite:

apt-get install python-openstack.nose-plugin python-mock \
  python-netaddr debootstrap qemu-kvm qemu-utils \
  python-ipaddr libfrontier-rpc-perl curl libfrontier-rpc-perl \
  yum-utils

It may be a good idea to install these additional dependencies too:

apt-get install pigz yum

Root privilege

make calls debootstrap. This command needs root privilege. You can either work as root or use sudo.

How to start

Issue make to build the build directory with a minimal Debian wheezy tree, then strip it down to a pxe directory that will lead to the creation of an initrd.pxe. Take this initrd.pxe file and the base/boot/vmlinuz* kernel to boot from PXE.

Configure the PXE boot like that:

prompt 0
timeout 0
default eDeploy
serial 0

LABEL eDeploy
   KERNEL vmlinuz
   INITRD initrd.pxe SERV=10.0.2.2 ONFAILURE=console VERBOSE=1 RSERV_PORT=1515 HTTP_PORT=9000 HTTP_PATH=/cgi-bin/edeploy/ UPLOAD_LOG=1 ONSUCCESS=kexec

LABEL eDeploy-http
   KERNEL vmlinuz
   INITRD initrd.pxe SERV=10.0.2.2 HSERV=10.0.2.99 HSERV_PORT=8080

LABEL local
   LOCALBOOT 0

The ONFAILURE variable if set to console on the kernel command line, it enables more debugging, the start of an ssh server (port 2222) on the configured system and the launch of an interactive shell at the end of the installation, three possible values : reboot mode will reboot the server once installed. halt mode will turn the server off once installed. console mode will offer a console on the server once installed.

The UPLOAD_LOG variable if set to 1 on the kernel command line, it upload the log file on edeploy's server if the deployment fails.

The VERBOSE variable if set to 1 on the kernel command line, it turns on the -x of bash to ease the understanding of faulty commands

The ONSUCCESS variable defines what shall be edeploy behavior if the installed succeed. Four possible values : kexec mode will use kexec to boot immediately the installed OS. reboot mode will reboot the server once installed. halt mode will turn the server off once installed. console mode will offer a console on the server once installed.

Please note that RSERV_PORT, HTTP_PORT are given here as an example to override the default settings 831 & 80 respectively. Unless you run the rsync server or the http server on a very particular setup, don't use this variables.

HTTP_PATH variable can be use to override the default /cgi-bin/ directory. This could be useful if you don't have the rights in this directory. The directory pointed by HTTP_PATH shall contains all edeploy code & configuration.

CGI script

The address and port of the http server are defined on the kernel command line in the SERV and HTTP_PORT variables.

On the web server, you need to setup the upload.py CGI script. This CGI script is a python script which needs the python-ipaddr dependency optionnaly.

The CGI script is configured with /etc/edeploy.conf:

[SERVER]

HEALTHDIR   = /var/lib/edeploy/health/
CONFIGDIR   = /var/lib/edeploy/config/
LOGDIR      = /var/lib/edeploy/config/logs
HWDIR       = /var/lib/edeploy/hw/
LOCKFILE    = /var/lock/apache2/edeploy.lock
USEPXEMNGR  = True
PXEMNGRURL  = http://192.168.122.1:8000/
METADATAURL = http://192.168.122.1/

CONFIGDIR points to a directory which contains specifications (*.specs), configurations (*.configure) and CMDB (*.cmdb) per hardware profile, a description of the hardware profile priorities (state). All those files must be readable by the user running the http server.

LOGDIR points to a directory where uploaded log file will be saved.

HEALTHDIR points to a directory where the automatic health check mode will upload its results.

HWDIR points to a directory where the hardware profiles are stored. The directory must be writable by the user running the http server.

LOCKFILE points to a file used to lock the CONFIGDIR files that are read and written like *.cmdb and state. These files (LOCKFILE, *.cmdb and state) must be readable and writable by the user running the http server.

USEPXEMNGR, if present and set to True, allows to require a local boot from pxemngr using the url configured in PXEMNGRURL.

METADATAURL points to the server giving the metadata for cloud-init.

state contains an ordered list of profiles and the number of times they must be installed for your deployment. Example:

[('hp', 4), ('vm', '*')]

which means, the hp profile must only be installed 4 times and the vm profile can be installed without limit.

Each profile must have a .specs and .configure files. For example, the vm.specs is a python list in this form:

[
    ('disk', '$disk', 'size', 'gt(4)'),
    ('network', '$eth', 'ipv4', 'network(192.168.122.0/24)'),
    ('network', '$eth', 'serial', '$mac'),
]

Each entry of the list is tuple of 4 entries that must be matched on the hardware profile detected on the system to install.

If an element ends with ) a function is used to match the value. Available functions are in to check if an element is part of a list, gt (greater than), ge (greater or equal), lt (lesser than), le (lesser or equal), and network (match an IPv4 network).

If en element starts with a $, it's a variable that will take the value of the detected system config. These variables will be passed to the configure script that will use them. For example the vm.configure is a Python script like that:

disk1 = '/dev/' + var['disk']

for disk, path in ((disk1, '/chroot'), ):
    run('parted -s %s mklabel msdos' % disk)
    run('parted -s %s mkpart primary ext2 0%% 100%%' % disk)
    run('mkfs.ext4 %s1' % disk)
    run('mkdir -p %s; mount %s1 %s' % (path, disk, path))

config('/etc/network/interfaces').write('''
auto lo
iface lo inet loopback

auto %(eth)s
allow-hotplug %(eth)s
iface %(eth)s inet static
     address %(ip)s
     netmask %(netmask)s
     gateway %(gateway)s
     hwaddress %(mac)s
''' % var)

set_role('mysql', 'D7-F.1.0.0', disk1)

The variables are stored in the var dictionary. 2 functions are defined to be used in these configure scripts: run to execute commands and abort on error, set_role to define the software profile and version to install in the next step.

You can also combine a variable and a function on the same expression like this $size=gt(20).

CMDB files are optional and used to add extra information to the var dictionary before configuration. To associate a CMDB entry, the upload.py script tries to find a matching entry for the matched spec. If nothing is found then the script tries to find an unused entry (with no 'used': 1 part). This selected entry is merged into var and then stored back in the CMDB file.

A CMDB file manages a set of settings to use (i.e. IPv4 addresses or host names), it can be like that:

[
 {'ip': '192.168.122.3', 'hostname': 'host03'},
 {'ip': '192.168.122.4', 'hostname': 'host04'},
 {'ip': '192.168.122.5', 'hostname': 'host05'},
 {'ip': '192.168.122.6', 'hostname': 'host06'},
 {'ip': '192.168.122.7', 'hostname': 'host07'}
]

Once an entry has been used, the CMDB file will be like that:

[
 {'disk': 'vda',
  'eth': 'eth0',
  'hostname': 'host3',
  'ip': '192.168.122.3',
  'mac': '52:54:00:88:17:3c',
  'used': 1},
 {'ip': '192.168.122.4', 'hostname': 'host04'},
 {'ip': '192.168.122.5', 'hostname': 'host05'},
 {'ip': '192.168.122.6', 'hostname': 'host06'},
 {'ip': '192.168.122.7', 'hostname': 'host07'}
]

There is also an helper function that can be used like that to avoid to create long list of entries:

generate({'ip': '192.168.122.3-7', 'hostname': 'host03-07'})

The first time the upload.py script reads it, it expands the list and stores it in the regular form.

Special variables

If you define variables with 2 $, only those variables will be used to match entries in the CMDB.

This is useful if you want to match for example system tags to specific settings like that:

[
 ('system', 'product', 'serial', '$$tag'),
 ('network', '$eth', 'serial', '$mac'),
]

but you don't know in advance the MAC addresses or the names of the network interface in the CMDB:

generate({'tag': ('TAG1', 'TAG2', 'TAG3'),
          'ip': '192.168.122.3-5',
          'hostname': 'host3-5'})

HTTP server

If required, an HTTP server can be used to get the OS images. Setting up the HSERV and optionally HSERV_PORT variables to target the appropriate server. An install directory shall be available from the root directory to get .edeploy files.

eDeploy downloads the image files by using the following URL:: http://${HSERV}:${HSERV_PORT}//install/${ROLE}-${VERS}.edeploy

Rsync server

The address and port of the rsync server are defined on the kernel command line in the RSERV and RSERV_PORT variables. Change the address before testing. The rsync server must be started as root right now and configured to serve an install target like this in the /etc/rsyncd.conf:

uid = root
gid = root

[install]
        path = /var/lib/debootstrap/install
        comment = eDeploy install trees

[metadata]
        path = /var/lib/debootstrap/metadata
        comment = eDeploy metadata

Image management

To build and test the install procedure under kvm:

./update-scenario.sh
cd /var/lib/debootstrap/install/D7-F.1.0.0
qemu-img create disk 10G
kvm -initrd initrd.pxe -kernel base/boot/vmlinuz-3.2.0-4-amd64 -hda disk
kvm -hda disk

Log into the root account and then launch the following command to display available update version:

edeploy list

To update to the new version of mysql:

edeploy upgrade D7-F.1.0.1

And then you can test the kernel update process:

edeploy upgrade D7-F.1.0.2

You can also verify what has been changed from the initial install or upgrade by running:

edeploy verify

or:

edeploy test-upgrade <to-version>

Update process

The different trees must be available under the [install] rsync server setting like that:

<version>/<role>/

For example:

D7-F.1.0.0/mysql/

To allow updates from on version of a profile to another version, special files must be available under the [metadata] rsync server setting like that:

<from version>/<role>/<to version>/

For example to allow an update from D7-F.1.0.0 to D7-F.1.0.1 for the mysql role, you must have this:

D7-F.1.0.0/mysql/D7-F.1.0.1/

This directory must contain an exclude file which defines the list of files to exclude from the synchronization. These files are the changing files like data or generated files. You can use edeploy test-upgrade <to version> to help defining these files.

This directory could also contain 2 scripts pre and post which will be run if present before synchronizing the files to stop services and after the synchro for example to restart stopped services. The post script can report that a reboot is needed by exiting with a return code of 100.

Provisionning using ansible

Create an hosts INI file in the ansible sub-directory using an [edeployservers] section where you specify the name for the server you want to provision:

[edeployservers]

edeploy   ansible_ssh_host=192.168.122.9

Then in the ansible directory, just issue the following command:

ansible-playbook -i hosts edeploy-install.yml

You can alternatively activate the support of pxemngr using the following command line:

ansible-playbook -i hosts edeploy-install.yml --extra-vars pxemngr=true

How to contribute

Pull requests please.
Bonus points for feature branches.

Run unit tests

On debian-based hosts, install python-pexpect, python-mock and python-nose packages and then run make test.

Quality

We use flake8 and pylint to help us develop using a common style. You can run them by hand or use the make quality command in the top directory of the project.

Debug

For specs debug

On eDeploy server multitail /var/log/apache2/{error,access}.log /var/log/syslog
And on booted but unmatch profile vm curl -s -S -F file=@/hw.py http://<ip-edeploy-srv>:80/cgi-bin/upload.py
Or see uploaded .hw files on the eDeploy server (in HWDIR directory)

cmdb files

config/foo.cmdb files are updated during make test execution. The files will show up add changed in git. You can ignore these changes with this command:

git update-index --assume-unchanged config/kvm-test.cmdb

To revert the configuration, just run:

git update-index --no-assume-unchanged config/kvm-test.cmdb

edeploy's People

Contributors

Stargazers

Watchers

edeploy's Issues

ansible-playbook: disable "enable pxemngr site" in chroot

"shell: service apache2 restart" is replaced with "shell:#" for ansible/pxemngr-install.yml in deploy.install, so the test fail:

NOTIFIED: [enable pxemngr site] ***********************************************
failed: [chroot] => {"failed": true, "rc": 256}
msg: no command given

no $EDITOR in base role on Debian

RHEL and CentOS base role provide vi whereas there is no editor in Debian base configuration. This is a real pain when we have to do some trivial admin task.

Would it be possible to add an light editor like nano or vim in the base role.

Default grub configuration sounds like useless on D7

During a deployment, some information stored on the default_grub where not present at booting time. Sounds like something is wrong here.

This is part of init script.

Network Management

Instead of providing a list of IP addresses, most larger providers simply have subnet blocks attached to networks. For example:

vlan224: 192.168.24.0/24
vlan228: 192.168.28.0/24
vlan316: 172.16.16.0/24

Using these subnet definitions instead of a list of individual hosts can make larger deployments much easier. Accounting should still be done on a per-IP basis, but those entries can be written on demand to an accounting file, and absence of an entry would mean the IP address is available.

roles should be maintained outside of the eDeploy repository

The build directory come with the roles. Those roles are IMO configuration files. They will have there own release (version/tag/etc). I think we should move the outside of the eDeploy repository.

under tag H.1.0.0 python-pip fails to install

Other packages are installed by apt-get install -y --force-yes. Python-pip seems to be install by ansible.

+ install_edeploy
+ type -p ansible-playbook
/usr/local/bin/ansible-playbook
+ do_chroot /var/lib/jenkins/jobs/SoftwareFactory-functional-tests/workspace/roles/install/D7-H.1.0.0/install-server apt-get install python-pip
+ local chdir=/var/lib/jenkins/jobs/SoftwareFactory-functional-tests/workspace/roles/install/D7-H.1.0.0/install-server
+ shift
+ PATH=/bin/:/sbin:/sbin:/bin::/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ LANG=C
+ LC_ALL=C
+ LC_CTYPE=C
+ LANGUAGE=C
+ chroot /var/lib/jenkins/jobs/SoftwareFactory-functional-tests/workspace/roles/install/D7-H.1.0.0/install-server apt-get install python-pip
Reading package lists...
Building dependency tree...
Reading state information...
The following extra packages will be installed:
  python-pkg-resources python-setuptools python2.6 python2.6-minimal
Suggested packages:
  python-distribute python-distribute-doc python2.6-doc binutils
  binfmt-support
Recommended packages:
  python-dev-all build-essential
The following NEW packages will be installed:
  python-pip python-pkg-resources python-setuptools python2.6
  python2.6-minimal
0 upgraded, 5 newly installed, 0 to remove and 0 not upgraded.
Need to get 4681 kB of archives.
After this operation, 15.2 MB of additional disk space will be used.
Do you want to continue [Y/n]? Abort.
+ cleanup
+ ret=1

Cardiff reports the same items twice : 1 in ok, 1 in curious or unstable

1M : SUMMARY : 2 consistent hosts with 12753.50 MB/s as average value and 460.50 standard deviation
1M : SUMMARY : 2 unstable hosts with 12753.50 MB/s as average value and 460.50 standard deviation

Can't install java-1.6.0-openjdk (CentOS) because of package missing check parser

When trying to install java-1.6.0-openjdk it fails with:

package_name=java-1.6.0-openjdk
package_stripname=java-1.6
do_chroot /var/lib/sf/roles/install/C7.0-0.9.1/slave rpm -q java-1.6
PKG_MISSING=' java-1.6'
'[' -n ' java-1.6' ']'
fatal_error 'The following packages are missing: java-1.6'
echo 'The following packages are missing: java-1.6'
The following packages are missing: java-1.6

While the package have been installed.

openstack kernel module is not built properly

I get the problem with Debian. I suppose we will need to call dkms manually later.

Setting up dkms (2.2.0.3-1.2) ...
Setting up openvswitch-datapath-dkms (1.4.2+git20120612-9.1~deb7u1) ...

Creating symlink /var/lib/dkms/openvswitch/1.4.2+git20120612/source ->
                 /usr/src/openvswitch-1.4.2+git20120612

DKMS: add completed.
Error! Your kernel headers for kernel 3.11-2-amd64 cannot be found.
Please install the linux-headers-3.11-2-amd64 package,
or use the --kernelsourcedir option to tell DKMS where it's located

Add Dell support to eDeploy

Hi,

@goldyfruit has worked on the Dell integration into eDeploy. It would great if we could work on merging the code into the master branch.

https://github.com/enovance/edeploy/tree/dell

Tell me if I can help in any way.

Failure on match_spec

Traceback (most recent call last):
File "/home/erwan/Devel/edeploy/server/try_match", line 43, in
if matcher.match_all(hw_items, specs, var, var2, debug=True):
File "/home/erwan/Devel/edeploy/server/matcher.py", line 162, in match_all
line = match_spec(spec, lines, arr)
File "/home/erwan/Devel/edeploy/server/matcher.py", line 101, in match_spec
if spec[idx][0] == '$':
TypeError: 'int' object has no attribute 'getitem'

That's pretty strange as we have two very similar hosts were one is matching and the other doesn't.

It sounds this line is generating the bug but I don't really get why :
('disk', 'logical', 'count', 3),

http://pubz.free.fr/ProLiantDL360pGen8666532B21-HP-CZJ24900P9.hw
http://pubz.free.fr/ProLiantDL360pGen8666532B21-HP-CZJ24900P8.hw

detect shall add the number of physical and logical disks in the profile

Server A have 1 disk : 500GB
Server B have 2 disks of the same type of server A

Server A & B are really the same the but the disks. If you do a rule that match a single disk, both servers will match.

We shall add the number of physical & logical disks inside the profile to avoid such case.
Reported by @fcharlier in a real case.

tests and tox

python tests are inside the main src directory which is not really standard at least should sit inside their own directory in src/tests/

we should as well use a setup.py as well with a tox.ini for virtualenv

Base role doesn't allow SSH connection

Hi,

Due to the fact that the base role exclude by default (from base.exclude):

[...]
/etc/ssh/ssh_host_dsa_key
/etc/ssh/ssh_host_dsa_key.pub
/etc/ssh/ssh_host_ecdsa_key
/etc/ssh/ssh_host_ecdsa_key.pub
/etc/ssh/ssh_host_rsa_key
/etc/ssh/ssh_host_rsa_key.pub
[...]

We can't connect to freshly deployed machine (from auth.log):

Could not load host key: /etc/ssh/ssh_host_key
Could not load host key: /etc/ssh/ssh_host_dsa_key
Disabling protocol version 1. Could not load host key
Disabling protocol version 2. Could not load host key

As soon as the keys are regenerated:

# ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
# ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key

The server becomes accessible via SSH.
Don't know if it's meant to be that way, or if key should be regenerated at every deployment to avoid same dsa|rsa keys on multiple servers.

Cheers,

Early saving of dmesg to usb key

When installing a server that have some instability issues, you might have some very long kernel traces that are difficult to catch.

When booting AHC or PXE role on the server, it would be very useful to save the dmesg at a early stage on an usb key to get a copy of what's happening.

AHC have /ahcexport that could be used for this purpose, PXE shall use the same way of doing this.

AHC shall better detect the bootable device

AHC considers that we are running on a usb stick if /ahcexport is found.

If for any reason this check fails, and I had the case, the bootable device is considered as a disk that can be benchmarked.

On the destructive mode, the usb key is smashed....
img.install shall pass a parameter to inform that we are booting on a usb device and if no ahcexport is found, we shall stop as we are not able to find what was the boot device....

Redundant line un grub2 for img disk

Currently, on the install-server-RH7.0-I.1.2.0.img, the grub configuration is as follow

        load_video
        set gfxpayload=keep
        insmod gzio
        insmod ext2
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root 1fe9121d-6a63-437c-9eb7-d609cbd444f1 1fe9121d-6a63-437c-9eb7-d609cbd444f1
          search --no-floppy --fs-uuid --set=root 1fe9121d-6a63-437c-9eb7-d609cbd444f1 1fe9121d-6a63-437c-9eb7-d609cbd444f1
        else
          search --no-floppy --fs-uuid --set=root 1fe9121d-6a63-437c-9eb7-d609cbd444f1
          search --no-floppy --fs-uuid --set=root 1fe9121d-6a63-437c-9eb7-d609cbd444f1        
       fi
        linux16 /boot/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=1fe9121d-6a63-437c-9eb7-d609cbd444f1 ro  console=ttyS0
        linux16 /boot/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=1fe9121d-6a63-437c-9eb7-d609cbd444f1 ro  console=ttyS0

        initrd16 /boot/initramfs-3.10.0-123.el7.x86_64.img

See how 3 lines are redundant, this prevent the OS from booting

Avoid global in the code

globals are bad and evil :

http://stackoverflow.com/a/19158418/145125

there is a lot of usage of globals throughout eDeploy's code which should be fixed.

install chrony or ntpd in base

With @sbadia , we are facing different issue due to badly configured internal clock. I think it would be very interesting to have chrony or ntpd installed in the base install and started during the boot process.

Personnaly I think chrony is better than ntpd because it supports more exotic configurations (Active Directory, network lag, etc) and it is more robust.

src/detect.py: raid card/disk/ldrive detection

The raid/disk/ldrive detection is a sequential loop based on simple for loop :

https://github.com/enovance/edeploy/blob/master/src/detect.py#L75-L79
https://github.com/enovance/edeploy/blob/master/src/detect.py#L99-L101

In most cases, it will work. But if we have disks at the beginning and the end of the raid card with empty slots between, it fails.

10 disks : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | x | x => OK
10 disks : 0 | 1 | 2 | 3 | 4 | x | x | 5 | 6 | 7 | 8 | 9 => KO

It's the same if we have multiple raid cards like HP smartarray :

# hpacucli ctrl all show

Smart Array P812 in Slot 1                (sn: xxxxxxxxxxxxxxxxxx)
Smart Array P812 in Slot 4                (sn: xxxxxxxxxxxxxxxxxx)

We have slot number 1 and 4 and the script wants 0 and 1.

And it's also the same problem for logical drive.

Missing curl: rpm fail with a very explicite message…

+ rpm -ivh --root /var/tmp/CI/all-centos-base/install/C6.5-H.1.0.0/base http://mirror.centos.org/centos/6.5/os/x86_64/Packages/centos-release-6-5.el6.centos.11.1.x86_64.rpm
rpm: RPM should not be used directly install RPM packages, use Alien instead!
rpm: However assuming you know what you are doing...
Retrieving http://mirror.centos.org/centos/6.5/os/x86_64/Packages/centos-release-6-5.el6.centos.11.1.x86_64.rpm
error: skipping http://mirror.centos.org/centos/6.5/os/x86_64/Packages/centos-release-6-5.el6.centos.11.1.x86_64.rpm - transfer failed
Retrieving http://mirror.centos.org/centos/6.5/os/x86_64/Packages/centos-release-6-5.el6.centos.11.1.x86_64.rpm

VLAN support

When deploying on a customer site, we had a vlan-tagged network. Edeploy is not able to do an installation in such network context.

We had to setup a separate network to make it work.

I suggest adding to the IP= syntax the support of VLANs like

IP=eth0:dhcp#7

The #7 could mean using vlan7 on eth0.

unable to create new role with snmpd package

Hi,

when trying to build a new role containing snmpd package i got the following error :

Setting up snmpd (5.4.3~dfsg-2+squeeze1) ...
+ [ xconfigure = xconfigure ]
+ getent group snmp
+ [ ! ]
+ deluser --quiet --system snmp
+ adduser --quiet --system --group --no-create-home --home /var/lib/snmp snmp
ORIG ['/usr/sbin/groupadd', '-g', '105', 'snmp']
mngids.py: found --gid at 1 for val[snmp]=140
['/usr/sbin/groupadd.real', '-g', '140', 'snmp']
ORIG ['/usr/sbin/useradd', '-d', '/var/lib/snmp', '-g', 'snmp', '-s', '/bin/false', '-u', '103', 'snmp']
mngids.py: found --gid at 3 for val[snmp]=140
mngids.py: found --uid at 7 for val[snmp]=127
['/usr/sbin/useradd.real', '-d', '/var/lib/snmp', '-g', '140', '-s', '/bin/false', '-u', '127', 'snmp']
+ chown -R snmp:snmp /var/lib/snmp
+ . /usr/share/debconf/confmodule
+ [ !  ]
+ PERL_DL_NONLAZY=1
+ export PERL_DL_NONLAZY
+ [  ]
+ exec /usr/share/debconf/frontend /var/lib/dpkg/info/snmpd.postinst configure
+ [ xconfigure = xconfigure ]
+ getent group snmp
+ [ ! ]
+ deluser --quiet --system snmp
+ adduser --quiet --system --group --no-create-home --home /var/lib/snmp snmp
+ chown -R snmp:snmp /var/lib/snmp
+ . /usr/share/debconf/confmodule
+ [ ! 1 ]
+ [ -z  ]
+ exec
+ [  ]
+ exec
+ DEBCONF_REDIR=1
+ export DEBCONF_REDIR
+ db_version 2.0
+ _db_cmd VERSION 2.0
+ IFS=  printf %s\n VERSION 2.0
+ IFS=
 read -r _db_internal_line
+ RET=20 Unsupported command "orig" (full line was "ORIG ['/usr/sbin/groupadd', '-g', '105', 'snmp']") received from confmodule.
+ return 20
dpkg: error processing snmpd (--configure):
 subprocess installed post-installation script returned error exit status 128
configured to not write apport reports
                                      Errors were encountered while processing:
 snmpd
E: Sub-process /usr/bin/dpkg returned an error code (1)

it seems to be linked to mngids.py :

116 def main():
117     uids = {}
118     gids = {}
119
120     print('ORIG', sys.argv)
121
122     IDS = '/root/ids.tables'

Revert policy-rc.d rename

I was wondering if we could also revert this commit ff9e3a4. I am not sure to what extent this could damage the system, but I believe that many packages rely on policy-rc.d in there pre/post-installs. As packages are still allowed to be installed on the system (by using the appropriate package manager), this would potentially break package installations.

The idea here is to facilitate as much as possible the installation/debug process for the engineers that are on-field.

hpacucli does reports SIZE=OK if a disk is said as being spare

The pd reports like :
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK, spare)

The Resulting hw.py looks like
('disk', '1I:1:3', 'slot', '0'),
('disk', '1I:1:3', 'size', 'OK'),
('disk', '1I:1:3', 'status', 'OK'),

The size is taken from the health argument not from the size one as spare does exists. Note that type is shifted too.

We have to improve the parsing or only rely on the pd_all information.

build/base.install: Hardcoded wheezy distribution instead of using $dist

echo "deb http://hwraid.le-vert.net/debian wheezy main" > $target/etc/apt/sources.list.d/hwraid.list

You should use $dist variable.

http://hwraid.le-vert.net/wiki/DebianPackages

ctrl slot=X delete forced fails if /chroot/boot and /chroot are mounted

ctrl slot=X delete forced OR self._sendline('ctrl %s delete forced' % selector)

the actual error gets cut off in the logs as it is too long, but hpacucli refuses to destroy the array if there are mounted volumes. If this error is caught properly you could unmount both directories and then continue to destroy the array.

Useful when you are running ./configure over to find out where your errors are.

Booting message on USB image

When booting with the USB sticks, it's usually done on USB1 thanks to the bios dev aka the crack smokers.

The syslinux banners appear pretty quickly but the kernel/initrd doesn't thanks to the quiet option of the Linux kernel. The loading is pretty long and if your usb key doesn't provide a blinking led you can consider the server as frozen.

So let's just put a message after the syslinux banner to inform users that system is current beeing loading from the boot device.

Why use hdparm instead of partprobe or blockdev ?

Hi,

One quick question, why use hdparm -z to reload the partitions table ?
There is blockdev which is installed with util-linux and partprobe installed with parted.

Why use an another external command ?

One more thing, i believe that sdparm gives a better support for SCSI/SAS disks.

Thanks.

Keeping a non-expanded version of the CMDB

When you are setting up a cloud, you have to write CMDBs.
The easiest way to make it, is to use the generate() syntax. Once you run it for the first time, the CMDB is expanded. If you made a mistake, the generate() version of the file is lost and you have to write it again.

It would be nice that if we have a generate() version of the file to make a backup of it before expanding it. This way, you keep the original version & the expanded one.

Having a common API to manage raid arrays

Today, hpacucli offer a high level API to create/delete raid arrays.
megacli is lacking of one and requires people to do some megacli directly inside their configure.

It would be very useful to get a common API being able to manipulate raid arrays the same way from the configure.

That would reduce the amount of knwolegde to manage raid arrays but also avoid mistakes.

${pkgmngr}.moved files are not restored

Since 3b65639 , apt-get and yum are renamed to apt-get.moved and yum.moved at the end of the builds.

When a role depend on another one, say base, the common_setup() function should be called to restore these links.

Sadly, this common_setup() is not called in most of the roles. This breaks the build.

cmdb: non-contigous or reversed ranges

We had a case with :

1 server on ip .100
5 servers on ip range 91-94 but with hostname like node4-node1 for this range.

How can we manage such non-contigous and reversed range in the CMDB by using the generate() syntax ?

We did it by hand and it was very painful.

[cmdb] Need MAC insensitive case

Case :

Fix spec with $$mac to use MAC as match key
Get Mac address from ipmi web interface. Mac are in capital
Copy past in cmdb file (mac = @MacInCapital)

But edeploy is case sensitive and compare @mac with @MacInCapital so it doesn't match.

Possible to implement insensitive case in edeploy match (just for mac ?)

MegaCli is currently not supported

Hi,

I tried to deploy Dell servers but I discovered that megacli is not supported by eDeploy. I tried to found out why. First there is no "import megacli" in my configure file. I don't know how this file is generated. Then I took a look to megacli.py and it seems that there is not a lot of code. Anyway could you add megacli support? I can help by giving you access to Dell servers if you like :)

Setting up another console on pxe/health

When booting a PXE installation or a benchmark (which is almost the same code) having a single console that is stuck at doing its stuff is sometimes a pain. When stuff goes wrong or slow, you would love having another console to check how it goes and run some commands.

Adding a 2nd console on tty1 would be very useful for debugging.

tidying

It seems that code configuration tests and static files are all in the root directory.

I suggest to do some shuffling around to make it more tidy and standard compared to the other oss projects :

the tests* go to tests/
*.install *.exclude init to conf/ (or scripts?)
*.py to it'w own edeploy directory with a init.py

or some other names but at least some tidying up.

same goes for some tox.ini launching the tests with their deps instead of having to do it by hand and a proper setup.py

[review/suggestion] moving/disabling the package manager

Looking at this commit:
3b65639

I believe moving these binaries brings out multiple issues:

the binary is still accessible via command line, for example apt-get will complete in apt-get.moved. Which can be used to alter the system
when in build/debug mode, it is useful to have access to these commands
should alternative package managers be also be taken into account (e.g. dpkg, rpm, aptitude)?

I would suggest that we put in place a debug build in which these commands are accessible. And a production build in which they become not or hardly accessible.

Comments?

Detect shall report by-id in properties

When you have setups with various disks, it could be useful to get the by-id/ path in the logical's disk properties.

Missing tools on base

On a typical D7 usage, we were missing some very low-level tools :

ethtool
hdparm
host
nslookup
netcat
iptables

Better profile detection

When an installation fails, we have to reincrement the role's counter on the server side usually on the CGI server.

The profile is detected by getting some information inside the configure script. During the setup we had several case where the failure didn't got properly reported.

It would be necessary to to this information as soon a possible inside the CGI stream and also having a better parsing against special chars inside the profile name.

megacli.py fails with multiple enclosures

Running megacli.py on a machine with 2 enclosures detected, it fails querying disks for the first enclosure and queries the second enclosure instead.

See logs here: https://gist.github.com/fcharlier/9662117

Log rotation doesn't work

It doesn’t because we have:

root@osc-ctrl1-cdc:~# ls -alh /var/log/syslog*
-rw-r----- 1 root adm 0 Feb 7 07:25 /var/log/syslog
-rw-r----- 1 root adm 537M Feb 8 19:56 /var/log/syslog.1

Since size is 0, it doesn’t rotate, however /var/log/syslog.1 keeps growing
Permissions looks good though…

I fixed that by restarting the rsyslog process.

In the situation postrotate is not called since no logs were rotated.
HOWEVER we deny the execution and invocation of rc with the following script

"/usr/sbin/policy-rc.d”

in which we use exit 101, this denies service execution.

This is the problem, when we manually restart rsyslog, this creates the new /var/log/syslog file, however when it’s time to rotate again we can’t because the rc invocation is denied.

packages: upstart jobs are not disabled when using install_packages_disabled

When building the openstack-full role on Ubuntu, the upstart jobs are not disabled.

Cannot boot with multiple Kernel

Hi,

The deployment cannot succeed if we have multiple kernel installed.

From what Erwan Velu observed it is due to the init file (/srv/edeploy/build/init) on line 290-300:

            case "$ONSUCCESS" in
                "kexec")
                    log "Booting with kexec as required by ONSUCCESS"
                    if type -p kexec; then
                        log_n "Trying kexec..."
                        cp $d/boot/vmlinuz* /tmp/vmlinuz || give_up "Unable to copy kernel"                                                                                                                                                  
            if ls $d/boot/initrd.img*; then
                cp $d/boot/initrd.img* /tmp/initrd.img || give_up "Unable to copy initrd"
            else
                cp $d/boot/initramfs* /tmp/initrd.img || give_up "Unable to copy initrd"
            fi

So if we have a directory with multiple kernel inside that won't work.

A way to solve it would be to specify the kernel/initrd on which you want to boot (when multiple kernel), or boot on the only and default one in the other case.

Cheers,

base: ensure cron and logrotate are installed and enabled

Currently cron and logrotate doesn't seem to be installed / enabled in openstack-full roles, we should ensure it's installed.

Report failed networkj connection in AHC

When running AHC, the benchmark runs and upload to SERV at the end of the run.

But if the upload-check isn't avail or if the IP is incorrect, the benchmark take minutes to run and then fail at uploading ... that's pretty sad waiting so much to know the server isn't available.

We shall check the service is ready before starting the benchmark.

If i boot the servers not at the same time, the deployment works.

In the specs file we match the serial number (tag):

generate({'disk0': '32:0',
  'disk1': '32:1',
  'disk2': '32:2',
  'disk3': '32:3',
  'disk4': '32:4',
  'disk5': '32:5',
  'disk6': '32:6',
  'gateway-admin': '10.154.20.3',
  'hostname': 'infra01-2',
  'if_b0': 'eth0',
  'if_b1': 'eth1',
  'if_b2': 'eth2',
  'ip-admin': '10.154.20.13-14',
  'netmask-admin': '255.255.255.0',
  'sn': ('CXXXB2Y1', '9TZUYDEY1'),
  'vlan-admin': '2320',
  'vlan-infra-pub': '1302'})

The spec file:

# -*- python -*-

[
    ('pdisk', 'disk0', 'ctrl', '1'),
    ('pdisk', 'disk0', 'type', 'SAS'),
    ('pdisk', 'disk0', 'id', '32:0'),
    ('pdisk', 'disk0', 'size', '278.875'),
    ('pdisk', 'disk1', 'ctrl', '1'),
    ('pdisk', 'disk1', 'type', 'SAS'),
    ('pdisk', 'disk1', 'id', '32:1'),
    ('pdisk', 'disk1', 'size', '278.875'),
    ('pdisk', 'disk2', 'ctrl', '1'),
    ('pdisk', 'disk2', 'type', 'SAS'),
    ('pdisk', 'disk2', 'id', '32:2'),
    ('pdisk', 'disk2', 'size', '278.875'),
    ('pdisk', 'disk3', 'ctrl', '1'),
    ('pdisk', 'disk3', 'type', 'SAS'),
    ('pdisk', 'disk3', 'id', '32:3'),
    ('pdisk', 'disk3', 'size', '278.875'),
    ('pdisk', 'disk4', 'ctrl', '1'),
    ('pdisk', 'disk4', 'type', 'SAS'),
    ('pdisk', 'disk4', 'id', '32:4'),
    ('pdisk', 'disk4', 'size', '278.875'),
    ('pdisk', 'disk5', 'ctrl', '1'),
    ('pdisk', 'disk5', 'type', 'SAS'),
    ('pdisk', 'disk5', 'id', '32:5'),
    ('pdisk', 'disk5', 'size', '278.875'),
    ('pdisk', 'disk6', 'ctrl', '1'),
    ('pdisk', 'disk6', 'type', 'SAS'),
    ('pdisk', 'disk6', 'id', '32:6'),
    ('pdisk', 'disk6', 'size', '278.875'),
    ('system', 'product', 'serial', '$$sn'),
]