Comment 21 for bug 1656150

Revision history for this message
Luca Cervigni (cervigni) wrote : Re: [Bug 1656150] Re: Mcollective fails to start in bootstrap of baremetal servers

So, after rebuilding the bootstrap, the export is correctly in the
/etc/rc.local but still mcollective does not start.

1) in the first box (same hardware):

root@bootstrap:~# service mcollective status
● mcollective.service - The Marionette Collective
    Loaded: loaded (/lib/systemd/system/mcollective.service; enabled;
vendor preset: enabled)
    Active: inactive (dead)

root@bootstrap:~# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

root@bootstrap:~# service mcollective start
root@bootstrap:~# service mcollective status
● mcollective.service - The Marionette Collective
    Loaded: loaded (/lib/systemd/system/mcollective.service; enabled;
vendor preset: enabled)
    Active: active (running) since Wed 2017-02-01 01:26:48 UTC; 4s ago
   Process: 7521 ExecStart=/usr/sbin/mcollectived
--config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid
(code=exited, status=0/SUCCESS)
  Main PID: 7527 (ruby)
    CGroup: /system.slice/mcollective.service
            └─7527 ruby /usr/sbin/mcollectived
--config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid

Feb 01 01:26:48 bootstrap systemd[7521]: mcollective.service: Executing:
/usr/sbin/mcollectived --config=/etc/mcollective/server.cfg
--pidfile=/var/run/mcollective.pid
Feb 01 01:26:48 bootstrap systemd[1]: mcollective.service: Child 7521
belongs to mcollective.service
Feb 01 01:26:48 bootstrap systemd[1]: mcollective.service: Control
process exited, code=exited status=0
Feb 01 01:26:48 bootstrap systemd[1]: mcollective.service: Got final
SIGCHLD for state start.
Feb 01 01:26:48 bootstrap systemd[1]: mcollective.service: Main PID
loaded: 7527
Feb 01 01:26:48 bootstrap systemd[1]: mcollective.service: Supervising
process 7527 which is not our child. We'll most likely not notice when
it exits.
Feb 01 01:26:48 bootstrap systemd[1]: mcollective.service: Changed start
-> running
Feb 01 01:26:48 bootstrap systemd[1]: mcollective.service: Job
mcollective.service/start finished, result=done
Feb 01 01:26:48 bootstrap systemd[1]: Started The Marionette Collective.
Feb 01 01:26:48 bootstrap systemd[1]: mcollective.service: Child 7524
belongs to mcollective.service

-- AND THE SERVICE STARTED CORRECTLY (pingable via mco ping)

2) the second box:

root@bootstrap:~# service mcollective status
● mcollective.service
    Loaded: masked (/dev/null; bad)
    Active: inactive (dead)

Feb 01 01:18:34 bootstrap systemd[1]: mcollective.service: Trying to
enqueue job mcollective.service/restart/replace
Warning: mcollective.service changed on disk. Run 'systemctl
daemon-reload' to reload units.

root@bootstrap:~# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

root@bootstrap:~# service mcollective start
Failed to start mcollective.service: Unit mcollective.service is masked.

--- but if I run manually the fix config it can be started manually.

root@bootstrap:~# fix-configs-on-startup
Removed symlink /etc/systemd/system/mcollective.service.
Failed to execute operation: Connection timed out
root@bootstrap:~# sleep 2m; service mcollective start
root@bootstrap:~# service mcollective status
● mcollective.service - The Marionette Collective
    Loaded: loaded (/lib/systemd/system/mcollective.service; enabled;
vendor preset: enabled)
    Active: active (running) since Wed 2017-02-01 01:33:45 UTC; 3min 52s ago
   Process: 12371 ExecStart=/usr/sbin/mcollectived
--config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid
(code=exited, status=0/SUCCESS)
  Main PID: 12377 (ruby)
    CGroup: /system.slice/mcollective.service
            └─12377 ruby /usr/sbin/mcollectived
--config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid

Feb 01 01:33:45 bootstrap systemd[12371]: mcollective.service:
Executing: /usr/sbin/mcollectived --config=/etc/mcollective/server.cfg
--pidfile=/var/run/mcollective.pid
Feb 01 01:33:45 bootstrap systemd[1]: mcollective.service: Child 12371
belongs to mcollective.service
Feb 01 01:33:45 bootstrap systemd[1]: mcollective.service: Control
process exited, code=exited status=0
Feb 01 01:33:45 bootstrap systemd[1]: mcollective.service: Got final
SIGCHLD for state start.
Feb 01 01:33:45 bootstrap systemd[1]: mcollective.service: Main PID
loaded: 12377
Feb 01 01:33:45 bootstrap systemd[1]: mcollective.service: Supervising
process 12377 which is not our child. We'll most likely not notice when
it exits.
Feb 01 01:33:45 bootstrap systemd[1]: mcollective.service: Changed start
-> running
Feb 01 01:33:45 bootstrap systemd[1]: mcollective.service: Job
mcollective.service/start finished, result=done
Feb 01 01:33:45 bootstrap systemd[1]: Started The Marionette Collective.
Feb 01 01:33:45 bootstrap systemd[1]: mcollective.service: Child 12374
belongs to mcollective.service

mcollective server.cfg (if this what you want)

main_collective = mcollective
collectives = mcollective
libdir = /usr/share/mcollective/plugins
logfile = /var/log/mcollective.log
loglevel = debug
direct_addressing = 1
daemonize = 1

# Set TTL to 1.5 hours
ttl = 5400

# Plugins
securityprovider = psk
plugin.psk = unset

connector = rabbitmq
plugin.rabbitmq.vhost = mcollective
plugin.rabbitmq.pool.size = 1
plugin.rabbitmq.pool.1.host = 146.118.52.2
plugin.rabbitmq.pool.1.port = 61613
plugin.rabbitmq.pool.1.user = mcollective
plugin.rabbitmq.pool.1.password= 24RuZorKwTIJruYZAFQFJx03
plugin.rabbitmq.heartbeat_interval = 30
plugin.rabbitmq.max_hbrlck_fails = 0

# Facts
factsource = yaml
plugin.yaml = /etc/mcollective/facts.yaml

identity = 32

On 31/01/17 16:56, Georgy Kibardin wrote:
> Ok, lets start with path issue first. On master node edit
> /usr/share/fuel_bootstrap_cli/files/xenial/etc/rc.local add
>
> export PATH=$PATH:/bin:/usr/bin
>
> before fix-configs-on-startup line. Then rebuild a bootstrap:
>
> fuel-bootstrap build --activate
>
> Then reboot problem nodes and check whether mcollective is started. If
> not, lets check, at least its config is correct.
>