Fuel 8.0 installation fails at rabbitmq

Bug #1630290 reported by Nemanja Miletic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
In Progress
High
Oleksiy Molchanov

Bug Description

- Fuel 8.0 master node installation gets stuck at rabbitmq and then fails.

- Issue is always with "execution expired" but not always at the same command.

- It seems that the bug is in: /etc/puppet/modules/rabbitmq/lib/puppet/provider/rabbitmqctl.rb. Rescue catches wrong exception. Timeout needs to be changed to Timeout::Error. Issue is fixed upstream.

- Not sure what is causing rabitmqctl to timeout though. But bug in the previous paragraph prevents recovery.

/var/log/docker-logs/puppet/rabbitmq.log

2016-09-30 09:00:13 +0000 Puppet (debug): Executing '/usr/lib/rabbitmq/bin/rabbitmq-plugins list -E -m'
2016-09-30 09:00:23 +0000 /Stage[main]/Rabbitmq/Rabbitmq_plugin[rabbitmq_management] (err): Could not evaluate: execution expired
2016-09-30 09:00:23 +0000 Puppet (info): Computing checksum on file /etc/rabbitmq/rabbitmq-env.conf
2016-09-30 09:00:23 +0000 /Stage[main]/Rabbitmq::Config/File[rabbitmq-env.config] (info): Filebucketed /etc/rabbitmq/rabbitmq-env.conf to puppet with sum ff501ec49b7c626624023e711ee8ca08
2016-09-30 09:00:24 +0000 /Stage[main]/Rabbitmq::Config/File[rabbitmq-env.config]/content (notice): content changed '{md5}ff501ec49b7c626624023e711ee8ca08' to '{md5}254c853a0b10090495dbac8441db5793'
2016-09-30 09:00:24 +0000 /Stage[main]/Rabbitmq::Config/File[rabbitmq-env.config] (debug): The container Class[Rabbitmq::Config] will propagate my refresh event
2016-09-30 09:00:24 +0000 /Stage[main]/Rabbitmq::Config/File[rabbitmq-env.config] (info): Scheduling refresh of Class[Rabbitmq::Service]
2016-09-30 09:00:24 +0000 Class[Rabbitmq::Config] (debug): The container Stage[main] will propagate my refresh event
2016-09-30 09:00:24 +0000 Class[Rabbitmq::Config] (info): Scheduling refresh of Class[Rabbitmq::Service]
2016-09-30 09:00:24 +0000 Class[Rabbitmq::Service] (info): Scheduling refresh of Service[rabbitmq-server]
2016-09-30 09:00:24 +0000 /Stage[main]/Rabbitmq::Service/Service[rabbitmq-server] (notice): Dependency Rabbitmq_plugin[rabbitmq_management] has failures: true

- In the installation menu I made only following changes:
Network -> eth0
removed default gateway
change IP to 192.168.131.2

Network-> eth1
Enabled interface
Enabled Static IP
configured IP
configured netmask
configured default gateway

- There is enough space on disk

- Internet connection works

- It is KVM deployment with following parameters

 virt-install --name mirantis-fuel-8 --connect=qemu:///system --ram=8196 --vcpus=4 --cdrom MirantisOpenStack-8.0.iso --disk path=/var/lib/libvirt/images/mirantis-fuel-8.qcow2,format=qcow2,bus=virtio,cache=writethrough --os-type=linux --os-variant=rhel6 --network bridge=os-admin-pxe,model=virtio --network bridge=os-public,model=virtio --graphics vnc,port=7000

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Hi Nemanja,

could you attach a diagnostic snapshot to take a look at logs?

Changed in fuel:
importance: Undecided → High
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
milestone: none → 8.0-updates
status: New → Incomplete
Revision history for this message
Nemanja Miletic (nmiletic) wrote :

Hi Stanislaw,

- Snapshots are not working:
[root@fuel cone]# fuel snapshot
Can't connect to Nailgun server!
Please check connection settings in your configuration file.

- Installation fails and services are not availabled:
[root@fuel cone]# dockerctl list -l
Name Image Status Full container name
nginx fuel/nginx_8.0 Not created fuel-core-8.0-nginx
rabbitmq fuel/rabbitmq_8.0 Running fuel-core-8.0-rabbitmq
astute fuel/astute_8.0 Not created fuel-core-8.0-astute
rsync fuel/rsync_8.0 Not created fuel-core-8.0-rsync
keystone fuel/keystone_8.0 Not created fuel-core-8.0-keystone
postgres fuel/postgres_8.0 Running fuel-core-8.0-postgres
rsyslog fuel/rsyslog_8.0 Not created fuel-core-8.0-rsyslog
nailgun fuel/nailgun_8.0 Not created fuel-core-8.0-nailgun
cobbler fuel/cobbler_8.0 Not created fuel-core-8.0-cobbler
ostf fuel/ostf_8.0 Not created fuel-core-8.0-ostf
mcollective fuel/mcollective_8.0 Not created fuel-core-8.0-mcollective

- Please find /var/log in the attachment

- Please note "execution expired" in /var/log/docker-logs/puppet/rabbitmq.log

2016-10-06 07:59:54 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl add_vhost /'
2016-10-06 08:00:04 +0000 Puppet (err): Could not set 'present' on ensure: execution expired at 37:/etc/puppet/modules/nailgun/manifests/rabbitmq.pp
2016-10-06 08:00:04 +0000 Puppet (err): Could not set 'present' on ensure: execution expired at 37:/etc/puppet/modules/nailgun/manifests/rabbitmq.pp
Wrapped exception:
execution expired

- Please note that around that time we also have beam.smp crash

Oct 6 07:59:22 fuel kdumpctl: Starting kdump: [OK]
Oct 6 07:59:22 fuel systemd[1]: Started Crash recovery kernel arming.
Oct 6 07:59:31 fuel kernel: 2_scheduler[21218]: segfault at 8 ip 00000000005b8fb0 sp 00007f9db837db30 error 4 in beam.smp[400000+2a7000]
Oct 6 07:59:37 fuel kernel: async_5[21484]: segfault at 0 ip 00000000005b8c81 sp 00007ff06e897e00 error 4 in beam.smp[400000+2a7000]

- Bug that prevents the rabbitmqctl to be run again when it times out is in: /etc/puppet/liberty-8.0/modules/rabbitmq/lib/puppet/provider/rabbitmqctl.rb. Rescue catches wrong exception. Timeout needs to be changed to Timeout::Error.

- I did not try to fix the above issue as that would require rebuilding rabbitmq docker image which I am not sure how to do properly.

Thanks,
Nemanja

Revision history for this message
Nemanja Miletic (nmiletic) wrote :

Please also find /etc directory attached.

Revision history for this message
Nemanja Miletic (nmiletic) wrote :
Download full text (3.2 KiB)

Some more data:

[root@fuel /]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/os-root 9.5G 1.2G 7.9G 13% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 17M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/mapper/os-var 25G 4.6G 19G 20% /var
/dev/mapper/os-varlog 37G 56M 35G 1% /var/log
/dev/vda3 197M 108M 90M 55% /boot
/dev/vda2 200M 0 200M 0% /boot/efi
[root@fuel /]# ping www.google.com
PING www.google.com (216.58.194.164) 56(84) bytes of data.
64 bytes from sfo07s13-in-f4.1e100.net (216.58.194.164): icmp_seq=1 ttl=55 time=7.33 ms
64 bytes from sfo07s13-in-f4.1e100.net (216.58.194.164): icmp_seq=2 ttl=55 time=7.30 ms
64 bytes from sfo07s13-in-f4.1e100.net (216.58.194.164): icmp_seq=3 ttl=55 time=7.25 ms
^C
--- www.google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 7.257/7.298/7.334/0.031 ms
[root@fuel /]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
        inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
        ether 02:42:a5:96:e0:2b txqueuelen 0 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 192.168.131.2 netmask 255.255.255.0 broadcast 0.0.0.0
        inet6 fe80::5054:ff:fe3a:48f5 prefixlen 64 scopeid 0x20<link>
        ether 52:54:00:3a:48:f5 txqueuelen 1000 (Ethernet)
        RX packets 2098 bytes 154848 (151.2 KiB)
        RX errors 0 dropped 37 overruns 0 frame 0
        TX packets 37 bytes 3246 (3.1 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.46.66.2 netmask 255.255.255.240 broadcast 0.0.0.0
        inet6 fe80::5054:ff:fe22:ed9f prefixlen 64 scopeid 0x20<link>
        ether 52:54:00:22:ed:9f txqueuelen 1000 (Ethernet)
        RX packets 8020 bytes 821462 (802.2 KiB)
        RX errors 0 dropped 49 overruns 0 frame 0
        TX packets 6144 bytes 12796651 (12.2 MiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 0 (Local Loopback)
        RX packets 4622 bytes 772120 (754.0 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 4622 bytes 772120 (754.0 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

[root@fuel /]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.46.66.14 0.0.0.0 UG 0 0 0 eth1
10.46.66.0 0.0.0.0 255.255.255.240 U 0 0 0 eth1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
19...

Read more...

Revision history for this message
Nemanja Miletic (nmiletic) wrote :
Download full text (4.8 KiB)

- Qemu command shows that kvm acceleration is enabled:

root@ubuntu:~# ps aux | grep mirantis
libvirt+ 70024 14.4 7.7 13457696 5121832 ? Sl 00:50 9:55 qemu-system-x86_64 -enable-kvm -name mirantis-fuel-8 -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 8196 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid ce3adc0f-9534-9530-c359-39b0ae8c6dd6 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mirantis-fuel-8.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/mirantis-fuel-8.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=writethrough -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3a:48:f5,bus=pci.0,addr=0x3 -netdev tap,fd=26,id=hostnet1,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:22:ed:9f,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 10.46.66.1:1100,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
root 70170 0.0 0.0 10468 2144 pts/1 S+ 01:58 0:00 grep --color=auto mirantis

- Details about the VM

root@ubuntu:~# virsh dumpxml mirantis-fuel-8
<domain type='kvm' id='58'>
  <name>mirantis-fuel-8</name>
  <uuid>ce3adc0f-9534-9530-c359-39b0ae8c6dd6</uuid>
  <memory unit='KiB'>8392704</memory>
  <currentMemory unit='KiB'>8392704</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='writethrough'/>
      <source file='/var/lib/libvirt/images/mirantis-fuel-8.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='block' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <a...

Read more...

Revision history for this message
Nemanja Miletic (nmiletic) wrote :

- It is not enough to change Timeout to Timeout::Error in rabbitmqctl.rb as the loop does not check if the item exists. Please take a look at the log below.

- I changed rabbitmq image like this and restarted installation:
FROM fuel/rabbitmq_8.0
RUN sed -i 's/Timeout /Timeout::Error /g' /etc/puppet/liberty-8.0/modules/rabbitmq/lib/puppet/provider/rabbitmqctl.rb

- /var/log/docker-logs/puppet/rabbitmq.log now shows:

2016-10-06 11:32:29 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl -q list_vhosts'
2016-10-06 11:32:30 +0000 Puppet (debug): Command succeeded
2016-10-06 11:32:30 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl add_vhost /'
2016-10-06 11:32:40 +0000 Puppet (debug): Fail: execution expired Retry: 1
2016-10-06 11:32:46 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl add_vhost /'
2016-10-06 11:32:57 +0000 Puppet (debug): Fail: execution expired Retry: 2
2016-10-06 11:33:03 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl add_vhost /'
2016-10-06 11:33:04 +0000 Puppet (debug): Fail: Execution of '/usr/sbin/rabbitmqctl add_vhost /' returned 139: Creating vhost "/" ...
Error: vhost_already_exists: /
Segmentation fault (core dumped) Retry: 3
2016-10-06 11:33:10 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl add_vhost /'
2016-10-06 11:33:11 +0000 Puppet (debug): Fail: Execution of '/usr/sbin/rabbitmqctl add_vhost /' returned 2: Creating vhost "/" ...
Error: vhost_already_exists: / Retry: 4
2016-10-06 11:33:17 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl add_vhost /'
2016-10-06 11:33:17 +0000 Puppet (debug): Fail: Execution of '/usr/sbin/rabbitmqctl add_vhost /' returned 2: Creating vhost "/" ...
Error: vhost_already_exists: / Retry: 5

Revision history for this message
Nemanja Miletic (nmiletic) wrote :

- OK I have finally managed to install

Replaced "/etc/puppet/liberty-8.0/modules/rabbitmq/lib/puppet/provider/rabbitmqctl.rb" in rabbitmq image with:

class Puppet::Provider::Rabbitmqctl < Puppet::Provider
  initvars
  commands :rabbitmqctl => 'rabbitmqctl'

  def self.rabbitmq_version
    output = rabbitmqctl('-q', 'status')
    version = output.match(/\{rabbit,"RabbitMQ","([\d\.]+)"\}/)
    version[1] if version
  end

  # Retry the given code block 'count' retries or until the
  # command suceeeds. Use 'step' delay between retries.
  # Limit each query time by 'timeout'.
  # For example:
  # users = self.class.run_with_retries { rabbitmqctl 'list_users' }
  def self.run_with_retries(count=30, step=6, timeout=10)
    count.times do |n|
      begin
        output = Timeout::timeout(timeout) do
          yield
        end
      rescue Puppet::ExecutionFailure, Timeout::Error => e
        if e.to_s.include? "already_exists"
          return output
        else
          Puppet.debug "Fail: #{e.to_s} Retry: #{n + 1}"
          sleep step
        end
      else
        Puppet.debug 'Command succeeded'
        return output
      end
    end
    raise Puppet::Error, "Command is still failing after #{count * step} seconds expired!"
  end

end

- Now when executing expires it is retried and succeeds even if object exists. Patch is not nice but worked for me.

/var/log/docker-logs/puppet/rabbitmq.log:

2016-10-06 13:00:53 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl -q list_vhosts'
2016-10-06 13:00:54 +0000 Puppet (debug): Command succeeded
2016-10-06 13:00:54 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl add_vhost /'
2016-10-06 13:01:04 +0000 Puppet (debug): Fail: execution expired Retry: 1
2016-10-06 13:01:10 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl add_vhost /'
2016-10-06 13:01:10 +0000 /Stage[main]/Nailgun::Rabbitmq/Rabbitmq_vhost[/]/ensure (notice): created
2016-10-06 13:01:10 +0000 /Stage[main]/Nailgun::Rabbitmq/Rabbitmq_vhost[/] (debug): The container Class[Nailgun::Rabbitmq] will propagate my refresh event
2016-10-06 13:01:10 +0000 Puppet (debug): Executing '/usr/sbin/rabbitmqctl -q list_user_permissions naily'
2016-10-06 13:01:11 +0000 Puppet (debug): Command succeeded

Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Oleksiy Molchanov (omolchanov)
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/394965

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Change author: Oleksiy Molchanov <email address hidden>
Review: https://review.fuel-infra.org/37264

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Andreas Jaeger (<email address hidden>) on branch: stable/8.0
Review: https://review.opendev.org/394965
Reason: This repo is retired now, no further work will get merged.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.