Giter Site home page Giter Site logo

clustervision / trinityx Goto Github PK

View Code? Open in Web Editor NEW
50.0 50.0 35.0 72.83 MB

TrinityX is the new generation of ClusterVision's open-source HPC, A/I and cloudbursting platform. It is designed from the ground up to provide all services required in a modern HPC and A/I system, and to allow full customization of the installation.

License: GNU General Public License v3.0

Shell 18.40% C 0.25% Python 12.73% CSS 1.49% XSLT 0.13% HTML 0.12% Fortran 0.04% Jinja 62.76% Makefile 0.43% Perl 3.13% Ruby 0.51%
hpc-cluster hpc-systems provisioning

trinityx's People

Contributors

aphmschonewille avatar bartlamboo avatar di3go-sona avatar msteggink avatar omarelkady226 avatar sumit42876 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trinityx's Issues

Pacemaker-master script cannot configure firewalld

Version: bb7396d
Environment: dmaster-controller01 (kvm)

++ echo '[ info ]   Configure firewalld'
[ info ]   Configure firewalld
++ echo -e ''

++ /usr/bin/firewall-cmd --permanent --add-service=high-availability
FirewallD is not running

[ ERROR ]  Error during post script: /root/trinityX/configuration/ha/pacemaker-master.sh

node001 SSH server key changes at every reboot

[root@nuc ~]# ssh node001
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
05:42:ea:aa:a9:0a:7d:d9:69:b0:a2:ee:42:cf:f4:4b.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /root/.ssh/known_hosts:1
ECDSA host key for node001 has changed and you have requested strict checking.
Host key verification failed.

Half way between bug and feature... but would be nice to add the server key when we create the image

Luna post scrip clones the repo

The Luna PS clones the Luna repository to do the install. This breaks offline installations.

See the doc for more details.

MariaDB password is lost on configure

Each time you run the setup for mariaDB the password is overwritten with a new one (if not set), making the previous installation of mariaDB useless and the old password lost:

[root@localhost configuration]# cat ~/.my.cnf
[mysql]
user=root
password=Scn9S6tU
[root@localhost configuration]#./configure controller.cfg
[root@localhost configuration]# cat ~/.my.cnf
[mysql]
user=root
password=lnJizZv3

This is obviously not acceptable and might lead to the loss of database functionality.

[root@localhost trinityX]# git rev-parse --short HEAD
fbe178c

Controller and compute node have different time zone

Branch: development
version: ccc921a
environment: tmaster-controller01(VM)

is this expected?

[root@controller1 ~]# date +%Z
CEST

[root@node001 ~]# date +%Z
UTC
[root@node001 ~]# chronyc sources
210 Number of sources = 1

MS Name/IP address Stratum Poll Reach LastRx Last sample

^* controller1 3 7 377 82 -12us[ -20us] +/- 6763us

zabbix.sh uses echo_progress which is not exported from common_functions.sh

Version: bb5efdc
Environment: dmaster-controller01 (vm)

++ echo '[ info ]   Zabbix installation script'
[ info ]   Zabbix installation script
++ echo -e ''

++ main
++ check_zabbix_installation
++ echo_progress check_zabbix_installation
/root/trinityx/configuration/controller/zabbix.sh: line 8: echo_progress: command not found
++ local RPM_PKG_MISSING=
++ for package in '{zabbix-server-mysql,zabbix-web-mysql,mariadb-server}'
++ yum list -q installed zabbix-server-mysql
++ for package in '{zabbix-server-mysql,zabbix-web-mysql,mariadb-server}'
++ yum list -q installed zabbix-web-mysql
++ for package in '{zabbix-server-mysql,zabbix-web-mysql,mariadb-server}'
++ yum list -q installed mariadb-server
++ [[ -n '' ]]
++ setup_zabbix_database
++ echo_progress setup_zabbix_database
/root/trinityx/configuration/controller/zabbix.sh: line 26: echo_progress: command not found
++ systemctl status mariadb
++ mysql -u root -psystem -e 'use zabbix'
++ setup_zabbix_credentials
++ echo_progress setup_zabbix_credentials
/root/trinityx/configuration/controller/zabbix.sh: line 20: echo_progress: command not found
+++ get_password
++++ openssl rand -base64 8
++++ head -c 8
+++ echo 73p7/Vdl
++ ZABBIX_MYSQL_PASSWORD=73p7/Vdl
++ store_password ZABBIX_MYSQL_PASSWORD 73p7/Vdl
++ ((  2 != 2  ))
++ flag_is_unset ALT_SHADOW
++ ((  1 != 1  ))
++ name=ALT_SHADOW
++ value=
++ [[ ! -v ALT_SHADOW ]]
++ flag_is_unset TRIX_SHADOW
++ ((  1 != 1  ))
++ name=TRIX_SHADOW
++ value=/trinity/trinity.shadow
++ [[ ! -v TRIX_SHADOW ]]
++ [[ /trinity/trinity.shadow =~ ^(0|n|no)$ ]]
++ [[ -r /trinity/trinity.shadow ]]
++ SH_RO_VAR=
++ QUIET=
++ store_variable /trinity/trinity.shadow ZABBIX_MYSQL_PASSWORD 73p7/Vdl
++ ((  3 != 3  ))
+++ echo -n ZABBIX_MYSQL_PASSWORD
+++ tr -c '[:alnum:]' _
++ VARNAME=ZABBIX_MYSQL_PASSWORD
++ store_variable_backend /trinity/trinity.shadow ZABBIX_MYSQL_PASSWORD 73p7/Vdl
++ ((  3 != 3  ))
++ [[ -r /trinity/trinity.shadow ]]
++ grep -q '^declare -r ZABBIX_MYSQL_PASSWORD=' /trinity/trinity.shadow
++ [[ -w /trinity/trinity.shadow ]]
++ sed -i '/^ZABBIX_MYSQL_PASSWORD=/d' /trinity/trinity.shadow
++ line='declare -r ZABBIX_MYSQL_PASSWORD="73p7/Vdl"'
++ append_line /trinity/trinity.shadow 'declare -r ZABBIX_MYSQL_PASSWORD="73p7/Vdl"'
++ ((  2 != 2  ))
++ [[ -r /trinity/trinity.shadow ]]
++ grep -q -- '^declare -r ZABBIX_MYSQL_PASSWORD="73p7/Vdl"$' /trinity/trinity.shadow
++ flag_is_set QUIET
++ ((  1 != 1  ))
++ name=QUIET
++ value=
++ [[ -v QUIET ]]
++ [[ ! '' =~ ^(0|n|no)$ ]]
++ echo 'declare -r ZABBIX_MYSQL_PASSWORD="73p7/Vdl"'
++ mysql -u root -psystem -e 'create database zabbix character set utf8 collate utf8_bin;'
++ mysql -u root -psystem -e 'grant all privileges on zabbix.* to zabbix@localhost identified by '\''73p7/Vdl'\'';'
++ zcat /usr/share/doc/zabbix-server-mysql-3.0.3/create.sql.gz
++ mysql -uroot zabbix
gzip: /usr/share/doc/zabbix-server-mysql-3.0.3/create.sql.gz: No such file or directory
++ zabbix_server_config
++ echo_progress zabbix_server_config
/root/trinityx/configuration/controller/zabbix.sh: line 46: echo_progress: command not found
+++ readlink /etc/localtime
+++ sed 's/..\/usr\/share\/zoneinfo\///'
++ local TIMEZONE=Europe/Amsterdam
+++ hostname -s
++ sed -i -e '/^DBHost=/{h;s/=.*/=localhost/};${x;/^$/{s//DBHost=dmaster-controller01/;H};x}' /etc/zabbix/zabbix_server.conf
++ sed -i -e '/^DBName=/{h;s/=.*/=zabbix/};${x;/^$/{s//DBName=zabbix/;H};x}' /etc/zabbix/zabbix_server.conf
++ sed -i -e '/^DBUser=/{h;s/=.*/=zabbix/};${x;/^$/{s//DBUser=zabbix/;H};x}' /etc/zabbix/zabbix_server.conf
++ sed -i -e '/^DBPassword=/{h;s/=.*/=73p7/Vdl/};${x;/^$/{s//DBPassword=73p7/Vdl/;H};x}' /etc/zabbix/zabbix_server.conf
sed: -e expression #1, char 30: unknown option to `s'
++ sed -i -e '/php_value date.timezone/c\        php_value date.timezone Europe/Amsterdam' /etc/httpd/conf.d/zabbix.conf
++ printf '%b\n' '<?php' '// Zabbix GUI configuration file.' 'global $DB\n;' '$DB['\''TYPE'\'']     = '\''MYSQL'\'';' '$DB['\''SERVER'\'']   = '\''localhost'\'';' '$DB['\''PORT'\'']     = '\''0'\'';' '$DB['\''DATABASE'\''] = '\''zabbix'\'';' '$DB['\''USER'\'']     = '\''zabbix'\'';' '$DB['\''PASSWORD'\''] = '\''73p7/Vdl'\'';\n' '// Schema name. Used for IBM DB2 and PostgreSQL.' '$DB['\''SCHEMA'\''] = '\'''\'';\n' '$ZBX_SERVER      = '\''localhost'\'';' '$ZBX_SERVER_PORT = '\''10051'\'';' '$ZBX_SERVER_NAME = '\''local cluster'\'';\n' '$IMAGE_FORMAT_DEFAULT = IMAGE_FORMAT_PNG;'
++ zabbix_server_services
++ echo_progress zabbix_server_services
/root/trinityx/configuration/controller/zabbix.sh: line 74: echo_progress: command not found

cannot login to service mode on a slave node

Problem: put node in service mode boot and try to ssh to it results in "Connection closed by 10.30.0.3"

[root@controller images]# luna node show -nnode003
ERROR:luna.node.node003:No IPADDR for interface bmc configured
+---------------+-----------------------+
| Parameter | Value |
+---------------+-----------------------+
| name | node003 |
| bmcnetwork | None |
| group | [compute] |
| interfaces | enp4s0f0:10.30.0.3 |
| localboot | False |
| mac | 00:30:48:c4:1d:e8 |
| port | 29 |
| service | True |
| setupbmc | True |
| switch | [hp] |
+---------------+-----------------------+

[root@controller images]# luna group show -ncompute
+---------------+--------------------------------------------------------+
| Parameter | Value |
+---------------+--------------------------------------------------------+
| name | compute |
| bmcnetwork | None |
| bmcsetup | None |
| boot_if | enp4s0f0 |
| interfaces | BMC:None, ib0:None, enp4s0f0:[cluster]:10.30.0.0/16 |
| osimage | [compute] |
| partscript | mount -t tmpfs tmpfs /sysroot |
| postscript | cat </sysroot/etc/fstab |
| | tmpfs / tmpfs defaults 0 0 |
| | EOF |
| prescript | |
| torrent_if | None |
+---------------+--------------------------------------------------------+

[root@controller images]# ssh node003 -v
OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 56: Applying options for *
debug1: Connecting to node003 [10.30.0.3] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/id_rsa type 1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type 4
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1
debug1: match: OpenSSH_6.6.1 pat OpenSSH_6.6.1* compat 0x04000000
debug1: SSH2_MSG_KEXINIT sent
Read from socket failed: Connection reset by peer

Log file for configure.sh

First !

git rev-parse --short HEAD
b53e5b1

We need a log file for the configure.sh rather than printing the output to the console.

Default behaviour:

  • just one info line to stdout, something like "running script, log file is /var/log/trinity-installer.log"
  • log file starts with start date and time
  • ends with end date and time
  • the configure.sh writes in append mode (do not overwrite existing file in case of multiple runs)

Thanks

/trinity/trinity.sh not created, causing mariadb-master.sh to fail

Version: 960adc1
Environment: dmaster-controller1

From trinity-installer.log:

++ POSTLIST+=(mariadb-master slurm-pre slurm luna zabbix services-cleanup drbd-master pacemaker-master)
+ source /trinity/trinity.sh
/root/trinityx/configuration/controller/mariadb-master.sh: line 4: /trinity/trinity.sh: No such file or directory

[ ERROR ]  Error during post script: /root/trinityx/configuration/controller/mariadb-master.sh

ssh into compute node fails with node name

ssh root@nodename fails

● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2016-08-12 14:19:09 CEST; 30min ago
 Main PID: 6062 (firewalld)
   CGroup: /system.slice/firewalld.service
           └─6062 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid

Aug 12 14:19:07 controller1 systemd[1]: Starting firewalld - dynamic firewall daemon...
Aug 12 14:19:09 controller1 systemd[1]: Started firewalld - dynamic firewall daemon.

[root@controller1 ~]# luna node list
name group mac ips
ERROR:luna.node.node001:No IPADDR for interface bmc configured
node001 [compute] 52:54:00:05:49:96 eth0:10.30.0.1,BMC:None
ERROR:luna.node.node002:No IPADDR for interface bmc configured
node002 [compute] 52:54:00:ad:b5:3a eth0:10.30.0.2,BMC:None
[root@controller1 ~]# ssh root@node001
ssh: Could not resolve hostname node001: Name or service not known
[root@controller1 ~]# ssh [email protected]
Last login: Fri Aug 12 12:08:41 2016 from controller1
[root@node001 ~]# 

create replication user and user for pacemaker healthcheck fails

Version: 2e7d1e9
Env: VM on tmaster-controller

[ info ]   Create replication user.

ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'sql_replication'@'localhost'
ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'sql_replication'@'%'

[ info ]   Create user for pacemaker healthcheck.

ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'pacemaker'@'localhost'
ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'pacemaker'@'%'

Images-compute.cfg: post-script run into errors

Version: 56f651d
Environment: VM

``
Running ./configure.sh images-compute.cfg after a successful controller installation gives the following errors.

 ----->>>  Running post script: /root/trinityx/configuration/images/additional-packages.sh  <<<-----
[ ERROR ]  Error during post script: /root/trinityx/configuration/images/additional-packages.sh
           Press Enter to continue.
<>
----->>>  Running post script: /root/trinityx/configuration/images/rdma-centos.sh  <<<-----
[ info ]   Enabling and starting the RDMA service
[ ERROR ]  Error during post script: /root/trinityx/configuration/images/rdma-centos.sh
           Press Enter to continue.
<>
[ info ]   Unbinding the host directories
[ ERROR ]  Error during post script: /root/trinityx/configuration/images/create-image.sh
           Press Enter to continue.

When I change the addition-package and rdma script as below the first two errors are not there

if flag_is_unset CHROOT_INSTALL ; then
 systemctl start rdma
fi

The third error is still there.

Hostname in hosts script (non-HA environment)

Version: eef7b9d
Environment: dmaster

The script failed because the system's hostname should be controller1 instead of controller. Is this really necessary? And if it is, could we not let the script set the hostname immediately to the value desired?

################################################################################
##
##  CONFIGURATION FILE: controller.cfg
##
################################################################################


[ info ]   Running only the following scripts: hosts


[ info ]   No package file found: /home/marc/trinityX/configuration/controller/hosts.pkglist


 ----->>>  Running post script: /home/marc/trinityX/configuration/controller/hosts.sh  <<<-----

+ source /home/marc/trinityX/configuration/controller.cfg
++ POSTDIR=controller
++ POSTLIST=(standard-configuration yum-cache hosts local-repos base-packages additional-repos additional-packages yum-update firewalld chronyd environment-modules rdma-centos nfs-server openldap sssd)
++ CTRL1_HOSTNAME=controller1
++ CTRL1_IP=10.30.255.254
++ CTRL2_HOSTNAME=controller2
++ CTRL2_IP=10.30.255.253
++ CTRL_HOSTNAME=controller
++ CTRL_IP=10.30.255.252

<...>

+ echo 'HA             = not set'
HA             = not set
+ echo 'CTRL_HOSTNAME  = controller'
CTRL_HOSTNAME  = controller
+ echo 'CTRL_IP        = 10.30.255.252'
CTRL_IP        = 10.30.255.252
+ echo 'CTRL1_HOSTNAME = controller1'
CTRL1_HOSTNAME = controller1
+ echo 'CTRL1_IP       = 10.30.255.254'
CTRL1_IP       = 10.30.255.254
+ echo 'CTRL2_HOSTNAME = controller2'
CTRL2_HOSTNAME = controller2
+ echo 'CTRL2_IP       = 10.30.255.253'
CTRL2_IP       = 10.30.255.253
++ hostname -s
+ echo 'hostname       = controller'
hostname       = controller
+ flag_is_unset HA
+ ((  1 != 1  ))
+ name=HA
+ value=
+ [[ ! -v HA ]]
+ echo_info 'Non-HA configuration, adjusting the hostname and IP variables'
+ echo -e ''

+ echo '[ info ]   Non-HA configuration, adjusting the hostname and IP variables'
[ info ]   Non-HA configuration, adjusting the hostname and IP variables
+ echo -e ''

+ HA=0
+ flag_is_unset CTRL1_HOSTNAME
+ ((  1 != 1  ))
+ name=CTRL1_HOSTNAME
+ value=controller1
+ [[ ! -v CTRL1_HOSTNAME ]]
+ [[ controller1 =~ ^(0|n|no)$ ]]
+ flag_is_unset CTRL1_IP
+ ((  1 != 1  ))
+ name=CTRL1_IP
+ value=10.30.255.254
+ [[ ! -v CTRL1_IP ]]
+ [[ 10.30.255.254 =~ ^(0|n|no)$ ]]
+ CTRL_HOSTNAME=controller1
+ CTRL_IP=10.30.255.254
+ unset CTRL2_HOSTNAME CTRL2_IP
++ hostname -s
+ hname=controller
++ hostname
+ fname=controller.cluster
+ [[ controller == \c\o\n\t\r\o\l\l\e\r\.\c\l\u\s\t\e\r ]]
++ ip -o -4 addr show
++ awk -F '[ :/]+' '/scope global/ {print $2, $4}'
+ ifips='eno1 10.4.0.125'
+ case $hname in
+ echo_error 'Fatal error: the current hostname doesn'\''t match any of the controller hostnames!'
+ echo -e ''

+ echo '[ ERROR ]  Fatal error: the current hostname doesn'\''t match any of the controller hostnames!'
[ ERROR ]  Fatal error: the current hostname doesn't match any of the controller hostnames!
+ echo -e ''

+ [[ '' == x ]]
+ exit 234

[ ERROR ]  Error during post script: /home/marc/trinityX/configuration/controller/hosts.sh

           Press Enter to continue.

################################################################################

End of script: Fri Jul 22 14:05:57 CEST 2016

Slurmd fails on compute node

Branch: testing
version: 02f452e
environment: VM on dmaster
After a fresh configuration of compute node slurmd fails

[root@node001 ~]# systemctl status slurmd.service
● slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/slurmd.service.d
           └─customexec.conf
   Active: inactive (dead)
Condition: start condition failed at Tue 2016-08-16 12:13:48 UTC; 9min ago
           ConditionPathExists=/etc/slurm/slurm.conf was not met

Aug 16 12:13:48 node001.cluster systemd[1]: Started Slurm node daemon.
[root@node001 ~]# ps -aux | grep slurmd                                                                                                                           
root      1452  0.0  0.1 131996  1888 ?        S    12:13   0:00 /usr/sbin/slurmd
root     12073  0.0  0.0 112652   976 pts/0    R+   12:17   0:00 grep --color=auto slurmd
[root@node001 ~]# systemctl status slurmd.service
● slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/slurmd.service.d
           └─customexec.conf
   Active: failed (Result: start-limit) since Tue 2016-08-16 12:31:37 UTC; 2s ago
  Process: 12135 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 12133 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/slurmd.service
           └─12137 /usr/sbin/slurmd

Aug 16 12:31:37 node001.cluster systemd[1]: slurmd.service holdoff time over, scheduling restart.
Aug 16 12:31:37 node001.cluster systemd[1]: start request repeated too quickly for slurmd.service
Aug 16 12:31:37 node001.cluster systemd[1]: Failed to start Slurm node daemon.
Aug 16 12:31:37 node001.cluster systemd[1]: Unit slurmd.service entered failed state.
Aug 16 12:31:37 node001.cluster systemd[1]: slurmd.service failed.

work around

[root@node001 ~]# ps -aux | grep slurmd
root     12137  0.0  0.1 131996  1836 ?        S    12:31   0:00 /usr/sbin/slurmd
root     12142  0.0  0.0 112652   972 pts/0    R+   12:33   0:00 grep --color=auto slurmd
[root@node001 ~]# kill -9 12137
[root@node001 ~]# systemctl restart slurmd.service
[root@node001 ~]# systemctl status slurmd.service
● slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/slurmd.service.d
           └─customexec.conf
   Active: active (running) since Tue 2016-08-16 12:34:51 UTC; 4s ago
  Process: 12149 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 12151 (slurmd)
   CGroup: /system.slice/slurmd.service
           └─12151 /usr/sbin/slurmd

Aug 16 12:34:51 node001.cluster systemd[1]: Starting Slurm node daemon...
Aug 16 12:34:51 node001.cluster systemd[1]: PID file /var/run/slurmd.pid not readable (yet?) after start.
Aug 16 12:34:51 node001.cluster systemd[1]: Started Slurm node daemon.

chrony.sh uses undefined variables

e99eab6

----->>> Running post script: /root/trinityx/configuration/controller/chrony.sh <<<-----

CHRONY_UPSTREAM = (unset)
CHRONY_SERVER = 1
TRIX_CTRL1_HOSTNAME = (unset)
TRIX_CTRL2_HOSTNAME = (unset)

This is with a default controller.cfg, I suspect the script is using the wrong variables ?

DNS woes

This ticket is to track the various issues that we are encountering with DNS handling:

  1. upstream name resolution failing after we change the nameserver to point to the local DNS
    • may be an issue with NetworkManager (do we really need it?)
    • named needs forwarders but can't handle dynamic DNS servers from DHCP
  2. on some setups resolv.conf is overwritten at boot, on others it's not

Still cannot create luna network

Version: adda66b
Branch: DC-ha
Environment: dmaster-controller2

Now that the luna.sh script has been fixed to set the luna network name, it renders the same error message that I got when I tried to manually resolve #19:

+ echo '[ info ]   Initialize luna.'
[ info ]   Initialize luna.
+ echo -e '\x1b[39;49;00m'

+ /usr/sbin/luna cluster init
+ /usr/sbin/luna cluster change --frontend_address 10.4.255.254
+ /usr/sbin/luna network add -n cluster -N 10.4.0.0 -P 16
ERROR:luna:Inactive HA node.

[ ERROR ]  Error during post script: /root/trinityX/configuration/controller/luna.sh

Does this mean that I need to have an HA installation first?

No chronyd.pkglist

Version: 6280c87
Environment: dmaster

From trinity-installer.log:

[ info ]   No package file found: /root/trinityx/configuration/controller/chronyd.pkglist
ESC[39;49;00m
ESC[36;01m
 ----->>>  Running post script: /root/trinityx/configuration/controller/chronyd.sh  <<<-----
ESC[39;49;00m
ESC[35;01m
[ info ]   Enabling client access
ESC[39;49;00m
sed: can't read /etc/chrony.conf: No such file or directory
ESC[35;01m
[ info ]   Restarting the service
ESC[39;49;00m
Failed to restart chronyd.service: Unit chronyd.service failed to load: No such file or directory.
ESC[31;01m
[ ERROR ]  Error during post script: /root/trinityx/configuration/controller/chronyd.sh
ESC[39;49;00m
ESC[36;01m

HA installation: No passwordfree ssh'ing between controllers

Version: bb7396d
Environment: dmaster-controller01 (kvm)

When running ./configure.sh -d controller-master.cfg I first got:

++ echo '[ info ]   Check if remote host is available.'
[ info ]   Check if remote host is available.
++ echo -e ''

++ /usr/bin/ssh 10.30.255.253 hostname
The authenticity of host '10.30.255.253 (10.30.255.253)' can't be established.
ECDSA key fingerprint is 2c:4b:48:db:e2:85:3e:31:e3:db:69:ab:b0:6a:78:04.
Are you sure you want to continue connecting (yes/no)?

I added the second controller to ~/.ssh/known_hosts and tried again:

++ echo '[ info ]   Check if remote host is available.'
[ info ]   Check if remote host is available.
++ echo -e ''

+++ /usr/bin/ssh 10.30.255.253 /usr/bin/hostname
[email protected]'s password:

I can't find where in the scripts this should have been enabled. Have I missed something?

MongoDB authentication fails in luna-master script

Version: bb7396d
Environment: dmaster-controller01 (kvm)

++ echo '[ info ]   Initiate replica set.'
[ info ]   Initiate replica set.
++ echo -e ''

++ /usr/bin/mongo -u root -psThDby65 --authenticationDatabase admin
MongoDB shell version: 2.6.12
connecting to: test
2016-09-07T15:26:25.776+0200 Error: 18 { ok: 0.0, errmsg: "auth failed", code: 18 } at src/mongo/shell/db.js:1292
exception: login failed

[ ERROR ]  Error during post script: /root/trinityX/configuration/ha/luna-master.sh

Resetting user password through obol won't work

Version: bb5efdc
Environment: dmaster-controller01 (vm)

[root@dmaster-controller01 controller]# obol -w system user add --password 123 john
[root@dmaster-controller01 controller]# sshpass -p 123 ssh john@localhost pwd
/home/john
[root@dmaster-controller01 controller]# obol -w system user reset --password 1234 john
[root@dmaster-controller01 controller]# sshpass -p 123 ssh john@localhost pwd
Permission denied, please try again.
[root@dmaster-controller01 controller]# sshpass -p 1234 ssh john@localhost pwd
Permission denied, please try again.

firewalld prevents PXE boot

02f452e

DHCP works fine, but downloading on port 7050 times out.

Boot works fine if I run systemctl stop firewalld on master.

Note: I rebooted the master node after running configure.sh

capture

Zabbix-server exiting every few minutes

Version: 04e5b77
Environment: dmaster-controller1
From /var/log/zabbix/zabbix_server.log:

14242:20160729:161029.020 using configuration file: /etc/zabbix/zabbix_server.conf
 14242:20160729:161030.026 [Z3001] connection to database 'zabbix' failed: [2003] Can't connect to MySQL server on 'dmaster-contr
oller1' (101)
 14242:20160729:161030.030 Cannot connect to the database. Exiting...
 14250:20160729:161040.270 Starting Zabbix Server. Zabbix 3.0.4 (revision 61185).
 14250:20160729:161040.270 ****** Enabled features ******
 14250:20160729:161040.270 SNMP monitoring:           YES
 14250:20160729:161040.270 IPMI monitoring:           YES
 14250:20160729:161040.270 Web monitoring:            YES
 14250:20160729:161040.270 VMware monitoring:         YES
 14250:20160729:161040.270 SMTP authentication:       YES
 14250:20160729:161040.270 Jabber notifications:      YES
 14250:20160729:161040.270 Ez Texting notifications:  YES
 14250:20160729:161040.270 ODBC:                      YES
 14250:20160729:161040.270 SSH2 support:              YES
 14250:20160729:161040.270 IPv6 support:              YES
 14250:20160729:161040.271 TLS support:               YES
 14250:20160729:161040.271 ******************************
 14250:20160729:161040.271 using configuration file: /etc/zabbix/zabbix_server.conf
 14250:20160729:161247.575 [Z3001] connection to database 'zabbix' failed: [2003] Can't connect to MySQL server on 'dmaster-controller1' (110)
 14250:20160729:161247.580 Cannot connect to the database. Exiting...
 14264:20160729:161257.770 Starting Zabbix Server. Zabbix 3.0.4 (revision 61185).
 14264:20160729:161257.770 ****** Enabled features ******
 14264:20160729:161257.770 SNMP monitoring:           YES
 14264:20160729:161257.770 IPMI monitoring:           YES
 14264:20160729:161257.770 Web monitoring:            YES
 14264:20160729:161257.770 VMware monitoring:         YES
 14264:20160729:161257.770 SMTP authentication:       YES
 14264:20160729:161257.770 Jabber notifications:      YES
 14264:20160729:161257.770 Ez Texting notifications:  YES
 14264:20160729:161257.770 ODBC:                      YES
 14264:20160729:161257.771 SSH2 support:              YES
 14264:20160729:161257.771 IPv6 support:              YES
 14264:20160729:161257.771 TLS support:               YES
 14264:20160729:161257.771 ******************************
 14264:20160729:161257.771 using configuration file: /etc/zabbix/zabbix_server.conf
 14264:20160729:161312.808 [Z3001] connection to database 'zabbix' failed: [2003] Can't connect to MySQL server on 'dmaster-controller1' (101)
 14264:20160729:161312.813 Cannot connect to the database. Exiting...
 14272:20160729:161323.020 Starting Zabbix Server. Zabbix 3.0.4 (revision 61185).

From /var/log/messages (grep zabbix):

Jul 29 16:30:32 dmaster-controller1 systemd: zabbix-server.service: main process exited, code=exited, status=1/FAILURE
Jul 29 16:30:32 dmaster-controller1 systemd: zabbix-server.service: control process exited, code=exited status=1
Jul 29 16:30:32 dmaster-controller1 systemd: Unit zabbix-server.service entered failed state.
Jul 29 16:30:32 dmaster-controller1 systemd: zabbix-server.service failed.
Jul 29 16:30:42 dmaster-controller1 systemd: zabbix-server.service holdoff time over, scheduling restart.
Jul 29 16:30:43 dmaster-controller1 systemd: zabbix-server.service: Supervising process 15207 which is not our child. We'll most likely not notice when it exits.
Jul 29 16:32:50 dmaster-controller1 systemd: zabbix-server.service: main process exited, code=exited, status=1/FAILURE
Jul 29 16:32:50 dmaster-controller1 systemd: zabbix-server.service: control process exited, code=exited status=1
Jul 29 16:32:50 dmaster-controller1 systemd: Unit zabbix-server.service entered failed state.
Jul 29 16:32:50 dmaster-controller1 systemd: zabbix-server.service failed.
Jul 29 16:33:00 dmaster-controller1 systemd: zabbix-server.service holdoff time over, scheduling restart.
Jul 29 16:33:00 dmaster-controller1 systemd: zabbix-server.service: Supervising process 15231 which is not our child. We'll most likely not notice when it exits.
Jul 29 16:35:07 dmaster-controller1 systemd: zabbix-server.service: main process exited, code=exited, status=1/FAILURE
Jul 29 16:35:07 dmaster-controller1 systemd: zabbix-server.service: control process exited, code=exited status=1
Jul 29 16:35:07 dmaster-controller1 systemd: Unit zabbix-server.service entered failed state.
Jul 29 16:35:07 dmaster-controller1 systemd: zabbix-server.service failed.
Jul 29 16:35:17 dmaster-controller1 systemd: zabbix-server.service holdoff time over, scheduling restart.
Jul 29 16:35:18 dmaster-controller1 systemd: zabbix-server.service: Supervising process 15242 which is not our child. We'll most likely not notice when it exits.
Jul 29 16:37:25 dmaster-controller1 systemd: zabbix-server.service: main process exited, code=exited, status=1/FAILURE
Jul 29 16:37:25 dmaster-controller1 systemd: zabbix-server.service: control process exited, code=exited status=1
Jul 29 16:37:25 dmaster-controller1 systemd: Unit zabbix-server.service entered failed state.
Jul 29 16:37:25 dmaster-controller1 systemd: zabbix-server.service failed.
Jul 29 16:37:35 dmaster-controller1 systemd: zabbix-server.service holdoff time over, scheduling restart.
Jul 29 16:37:35 dmaster-controller1 systemd: zabbix-server.service: Supervising process 15255 which is not our child. We'll most likely not notice when it exits.

luna.sh: errors in makedhcp

Version: 04e5b77 with luna.sh copied from 8809eea
Environment: dmaster-controller1

+ echo '[ info ]   Initialize luna.'
[ info ]   Initialize luna.
+ echo -e ''

+ /usr/sbin/luna cluster init
+ /usr/sbin/luna cluster change --frontend_address 10.4.0.135
+ /usr/sbin/luna network add -n cluster -N 10.4.0.0 -P 16
+ echo_info 'Configure DNS and DHCP.'
+ echo -e ''

+ echo '[ info ]   Configure DNS and DHCP.'
[ info ]   Configure DNS and DHCP.
+ echo -e ''

+ /usr/sbin/luna cluster makedhcp -N cluster -s 10.4.128.1 -e 10.4.255.200
Error: an inet prefix is expected rather than "".
Stopping local dhcpd.
Remove lease files
Starting local dhcpd.
+ /usr/sbin/luna cluster makedns
Traceback (most recent call last):
  File "/usr/sbin/luna", line 1144, in <module>
    call_fun(**args_d)
  File "/usr/sbin/luna", line 100, in cluster_makedns
    res = cluster.makedns()
  File "/usr/lib64/python2.7/luna/cluster.py", line 361, in makedns
    mutable_octet = [i for i in range(len(logical_arr1)) if not logical_arr1[i]][0]
IndexError: list index out of range

[ ERROR ]  Error during post script: /root/trinityX/configuration/controller/luna.sh

Firewalld reports errors

Version: bb7396d
Environment: dmaster-controller01 (kvm)

[root@controller1 controller]# systemctl status firewalld -l
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2016-09-07 14:50:27 CEST; 1h 31min ago
 Main PID: 17915 (code=exited, status=1/FAILURE)

Sep 07 13:31:34 controller1 firewalld[17915]: 2016-09-07 13:31:34 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -i docker0 -o docker0 -j ACCEPT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
Sep 07 13:31:34 controller1 firewalld[17915]: 2016-09-07 13:31:34 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -i docker0 ! -o docker0 -j ACCEPT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
Sep 07 13:31:34 controller1 firewalld[17915]: 2016-09-07 13:31:34 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
Sep 07 13:31:34 controller1 firewalld[17915]: 2016-09-07 13:31:34 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t nat -C PREROUTING -m addrtype --dst-type LOCAL -j DOCKER' failed: iptables: No chain/target/match by that name.
Sep 07 13:31:34 controller1 firewalld[17915]: 2016-09-07 13:31:34 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER ! --dst 127.0.0.0/8' failed: iptables: No chain/target/match by that name.
Sep 07 13:31:34 controller1 firewalld[17915]: 2016-09-07 13:31:34 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -j DOCKER' failed: iptables: No chain/target/match by that name.
Sep 07 13:31:34 controller1 firewalld[17915]: 2016-09-07 13:31:34 ERROR: COMMAND_FAILED: '/sbin/iptables -w2 -t filter -C FORWARD -j DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
Sep 07 14:50:27 controller1 systemd[1]: firewalld.service: main process exited, code=exited, status=1/FAILURE
Sep 07 14:50:27 controller1 systemd[1]: Unit firewalld.service entered failed state.
Sep 07 14:50:27 controller1 systemd[1]: firewalld.service failed.

error in slurm-pre.sh

git rev-parse --short HEAD
f8fdade

[ info ]   No package file found: /root/trinityx/configuration/controller/slurm-pre.pkglist


 ----->>>  Running post script: /root/trinityx/configuration/controller/slurm-pre.sh  <<<-----


[ info ]   Creating Slurm and Munge users


[ warn ]   store_variable: destination file not RW: /trinity/trinity.sh


[ ERROR ]  Error during post script: /root/trinityx/configuration/controller/slurm-pre.sh

chronyc sources fails

02f452e

Works fine when specifying the IPv6 local address, but should also work in IPv4 according to JF, so reporting this

[root@QLB-master01 trinityx]# chronyc sources
506 Cannot talk to daemon
[root@QLB-master01 trinityx]# chronyc -h ::1 sources
210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^+ server.pasynike.nl            2   6   377    30    -57us[  -57us] +/-   28ms
^+ ntp1.polaire.nl               2   6   377    40   +298us[ +298us] +/-   49ms
^* services.freshdot.net         2   6   377    49   +143us[ +121us] +/-   27ms
^+ ntp01.solcon.nl               2   6   377    60   -187us[ -209us] +/-   25ms

Re-running configure.sh script breaks openldap

 ----->>>  Installing packages: /root/trinityx/configuration/controller/openldap.pkglist  <<<-----

Loaded plugins: fastestmirror, versionlock
Loading mirror speeds from cached hostfile
 * base: mirror.prolocation.net
 * elrepo: ftp.nluug.nl
 * epel: epel.mirror.nucleus.be
 * extras: mirror.previder.nl
 * updates: mirror.oxilion.nl
Package openldap-servers-2.4.40-9.el7_2.x86_64 already installed and latest version
Package openldap-clients-2.4.40-9.el7_2.x86_64 already installed and latest version
Package python-retrying-1.2.3-3.el7.noarch already installed and latest version
Package python-ldap-2.4.15-2.el7.x86_64 already installed and latest version
Nothing to do

 ----->>>  Running post script: /root/trinityx/configuration/controller/openldap.sh  <<<-----

‘/usr/share/openldap-servers/DB_CONFIG.example’ -> ‘/var/lib/ldap/DB_CONFIG’
‘/root/trinityx/configuration/controller/openldap/conf/ssl/ca_cert’ -> ‘/etc/openldap/certs/ssl/ca_cert’
‘/root/trinityx/configuration/controller/openldap/conf/ssl/cert’ -> ‘/etc/openldap/certs/ssl/cert’
‘/root/trinityx/configuration/controller/openldap/conf/ssl/key’ -> ‘/etc/openldap/certs/ssl/key’
Job for slapd.service failed because the control process exited with error code. See "systemctl status slapd.service" and "journalctl -xe" for details.
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)

Can't get named to work

Version: 04e5b77 with bind.sh and bind.pkglist copied from 5540558
Environment: dmaster-controller2

 ----->>>  Running post script: /root/trinityX/configuration/controller/bind.sh  <<<-----

+ echo_info 'Make named listen for requests on all interfaces'
+ echo -e ''

+ echo '[ info ]   Make named listen for requests on all interfaces'
[ info ]   Make named listen for requests on all interfaces
+ echo -e ''

+ sed -i -e 's/\(.*listen-on port 53 { \).*\( };\)/\1any;\2/' /etc/named.conf
+ echo_info 'Make named accept queries from all nodes that are not blocked by the firewall'
+ echo -e ''

+ echo '[ info ]   Make named accept queries from all nodes that are not blocked by the firewall'
[ info ]   Make named accept queries from all nodes that are not blocked by the firewall
+ echo -e ''

+ sed -i -e 's,\(.*allow-query\s.*{ \).*\( };\),\1any;\2,' /etc/named.conf
+ echo_info 'Enable and start named service'
+ echo -e ''

+ echo '[ info ]   Enable and start named service'
[ info ]   Enable and start named service
+ echo -e ''

+ systemctl enable named
+ command systemctl enable named
+ systemctl enable named
+ systemctl start named
+ command systemctl start named
+ systemctl start named
Job for named.service failed because the control process exited with error code. See "systemctl status named.service" and "journalctl -xe" for details.

[ ERROR ]  Error during post script: /root/trinityX/configuration/controller/bind.sh
[root@dmaster-controller2 configuration]# systemctl status named -l
● named.service - Berkeley Internet Name Domain (DNS)
   Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2016-08-08 10:47:52 CEST; 1min 32s ago
  Process: 1288 ExecStartPre=/bin/bash -c if [ ! "$DISABLE_ZONE_CHECKING" == "yes" ]; then /usr/sbin/named-checkconf -z /etc/named.conf; else echo "Checking of zone files is disabled"; fi (code=exited, status=1/FAILURE)

Aug 08 10:47:52 dmaster-controller2.cluster bash[1288]: zone cluster/IN: loading from master file cluster.luna.zone failed: file not found
Aug 08 10:47:52 dmaster-controller2.cluster bash[1288]: zone cluster/IN: not loaded due to errors.
Aug 08 10:47:52 dmaster-controller2.cluster bash[1288]: _default/cluster/IN: file not found
Aug 08 10:47:52 dmaster-controller2.cluster bash[1288]: zone 255.30.10.in-addr.arpa/IN: loading from master file 255.30.10.in-addr.arpa.luna.zone failed: file not found
Aug 08 10:47:52 dmaster-controller2.cluster bash[1288]: zone 255.30.10.in-addr.arpa/IN: not loaded due to errors.
Aug 08 10:47:52 dmaster-controller2.cluster bash[1288]: _default/255.30.10.in-addr.arpa/IN: file not found
Aug 08 10:47:52 dmaster-controller2.cluster systemd[1]: named.service: control process exited, code=exited status=1
Aug 08 10:47:52 dmaster-controller2.cluster systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).
Aug 08 10:47:52 dmaster-controller2.cluster systemd[1]: Unit named.service entered failed state.
Aug 08 10:47:52 dmaster-controller2.cluster systemd[1]: named.service failed.

Luna: network cannot be created

Version: 0f5b68a
Environment: dmaster

Running configure.sh staging.cfg with default settings, the luna network add command in luna.sh fails even though all the arguments seem to be there:

 ----->>>  Running post script: /home/marc/trinityX/configuration/controller/luna.sh  <<<-----


[ info ]   Check config variables available.

LUNA_FRONTEND=10.30.255.254
LUNA_NETWORK=10.30.0.0
LUNA_NETWORK_NAME=cluster
LUNA_PREFIX=16
LUNA_NETMASK=255.255.0.0
LUNA_DHCP_RANGE_START=10.30.128.1
LUNA_DHCP_RANGE_END=10.30.255.200
LUNA_MONGO_ROOT_PASS=51REl9/3
LUNA_MONGO_PASS=00+WwK7P

<...>

[ info ]   Reload systemd config.

usage: luna network add [-h] --name NAME --network N.N.N.N --prefix PP
                        [--ns_hostname NS_HOSTNAME] [--ns_ip N.N.N.N]
luna network add: error: argument --name/-n: expected one argument

[ ERROR ]  Error during post script: /home/marc/trinityX/configuration/controller/luna.sh

So no cluster or network is created:

luna network show --name cluster
ERROR:luna.network:It is needed to create object 'cluster' type 'network' first
Traceback (most recent call last):
  File "/sbin/luna", line 1144, in <module>
    call_fun(**args_d)
  File "/sbin/luna", line 400, in network_show
    net = luna.Network(name = name)
  File "/usr/lib64/python2.7/luna/network.py", line 46, in __init__
    mongo_doc = self._check_name(name, mongo_db, create, id)
  File "/usr/lib64/python2.7/luna/base.py", line 82, in _check_name
    raise RuntimeError
RuntimeError
[root@dmaster /]# luna cluster show
Traceback (most recent call last):
  File "/sbin/luna", line 1144, in <module>
    call_fun(**args_d)
  File "/sbin/luna", line 77, in cluster_show
    out_json['dhcp_net'] = '[' + cluster.get('dhcp_net') + ']'
TypeError: cannot concatenate 'str' and 'NoneType' objects

When I try to perform it manually, I get another error:

[root@dmaster configuration]# /usr/sbin/luna network add -n cluster -N 10.30.0.0 -P 16
ERROR:luna:Inactive HA node.

Failed to stop auditd.service:

[root@emaster trinityx]# git rev-parse --short HEAD
f8fdade

 ----->>>  Running post script: /root/trinityx/configuration/controller/services-cleanup.sh  <<<-----
[ info ]   Stopping unnecessary services
Failed to stop auditd.service: Operation refused, unit auditd.service may be requested by dependency only.
Removed symlink /etc/systemd/system/multi-user.target.wants/auditd.service.

The "failed to stop" message is disturbing, but maybe it's just expected.

Password loss if shadow is not sourced

Running a script that calls for get_password without sourcing the shadow file deletes or replaces the password in the shadows file, even if that password exists already.

Before:

[root@localhost configuration]# cat /trinity/trinity.shadow
# Trinity shadow file
MYSQL_ROOT_PASSWORD="KakIQqgJ"
ZABBIX_MYSQL_PASSWORD="ykKNSe+7"

Run code that contains MYSQL_ROOT_PASSWORD=$(get_password $MYSQL_ROOT_PASSWORD without sourcing $TRIX_SHADOW results in:

[root@localhost configuration]# cat /trinity/trinity.shadow
# Trinity shadow file
ZABBIX_MYSQL_PASSWORD="ykKNSe+7"

openldap.sh not finishing when running configure.sh controller.cfg first time

[root@controller(tmaster) trinityx]# git rev-parse --short HEAD
100e878

I ran ./configure.sh controller.cfg without prior modifications. The openldap.sh got stuck in a loop; maybe because I did not set a root password for it?

From trinity-installer.log:

 ----->>>  Running post script: /root/trinityx/configuration/controller/openldap.sh  <<<-----
ESC[39;49;00m
‘/usr/share/openldap-servers/DB_CONFIG.example’ -> ‘/var/lib/ldap/DB_CONFIG’
‘/root/trinityx/configuration/controller/openldap/conf/ssl’ -> ‘/etc/openldap/certs/ssl’
‘/root/trinityx/configuration/controller/openldap/conf/ssl/ca_cert’ -> ‘/etc/openldap/certs/ssl/ca_cert’
‘/root/trinityx/configuration/controller/openldap/conf/ssl/cert’ -> ‘/etc/openldap/certs/ssl/cert’
‘/root/trinityx/configuration/controller/openldap/conf/ssl/key’ -> ‘/etc/openldap/certs/ssl/key’
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
ldap_add: Insufficient access (50)
adding new entry "cn=cosine,cn=schema,cn=config"

SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
ldap_add: Insufficient access (50)
adding new entry "cn=cosine,cn=schema,cn=config"

, and so on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.