Fuel for OpenStack

Bug #1624013
Comment #0

Comment 0 for bug 1624013

Revision history for this message

Bob Ball (bob-ball) wrote on 2016-09-15:

Detailed bug description:
MOS 9 environment cannot deploy due to mysql crashing failures

Puppet logs for the failed controller say:
(/Stage[main]/Cluster::Mysql/Exec[wait-initial-sync]) mysql -uclustercheck -pOObsCqCTtkLkRHK52n0H0N8O -Nbe "show status like 'wsrep_local_state_comment'" | grep -q -e Synced && sleep 10 returned 1 instead of one of [0]
(/Stage[main]/Cluster::Mysql/Exec[wait-initial-sync]) Failed to call refresh: mysql -uclustercheck -pOObsCqCTtkLkRHK52n0H0N8O -Nbe "show status like 'wsrep_local_state_comment'" | grep -q -e Synced && sleep 10 returned 1 instead of one of [0]
(/Stage[main]/Cluster::Mysql/Exec[wait-initial-sync]/returns) ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)

Logging in to failed controller node shows that, indeed, mysql is not running:
root@node-4:~# service mysql status
mysql stop/waiting

/var/log/mysql/error.log is attached, and shows a segfault occurring, possibly from the wsrep post commit function:

14:22:43 UTC - mysqld got signal 11 ;
stack_bottom = 7fbebb747e88 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x7fbebbf81b7c]
/usr/sbin/mysqld(handle_fatal_signal+0x3c2)[0x7fbebbcd25c2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fbeba9c7330]
/usr/sbin/mysqld(thd_get_ha_data+0xc)[0x7fbebbd1f54c]
/usr/sbin/mysqld(_Z20thd_binlog_trx_resetP3THD+0x2e)[0x7fbebbf2c79e]
/usr/sbin/mysqld(_Z17wsrep_post_commitP3THDb+0xcc)[0x7fbebbe0c32c]
/usr/sbin/mysqld(_Z12trans_commitP3THD+0x6f)[0x7fbebbdf2dcf]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x38d1)[0x7fbebbd60851]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x3c8)[0x7fbebbd649d8]
/usr/sbin/mysqld(+0x508c24)[0x7fbebbd64c24]
/usr/sbin/mysqld(_Z19do_handle_bootstrapP3THD+0x111)[0x7fbebbd64ff1]
/usr/sbin/mysqld(+0x509060)[0x7fbebbd65060]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7fbeba9bf184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fbeba0e237d]

Steps to reproduce:
Not sure which steps are needed, but my environment has:
3x Controller (4 CPU, 6GB RAM, 80GB HDD)
3x Qemu Compute/Cinder/Ceph-OSD (2 CPU, 1GB RAM, 50GB HDD)

Each host has two interfaces - PXE (eth0) and a VLAN network (eth1).
Public network is on a VLAN over eth1, and Neutron is also configured to use VLANs

This has been reproduced several times on different hardware and with different Fuel 9 installations, with different Ubuntu repositories and with the XenServer plugin disabled as well as enabled. MD5 sum of ISO has been confirmed:

# md5sum MirantisOpenStack-9.0.iso
07461ba42d5056830dd6f203e8fe9691 MirantisOpenStack-9.0.iso

Expected results:
Deployment succeeds :)

Actual result:
Deployment fails with the above errors :)

Reproducibility:
Deployments are occasionally successful, but once a deployment is successful it is not possible to add a new controller node as adding a controller fails 100% of the time.

Workaround:
None known

Impact:
Fatal; but also preventing the validation of the XenServer plugin for MOS 9 as this issue also occurs with the plugin installed.

Fuel snapshot is attached, with a second snapshot (with the XenServer plugin enabled) at https://citrix.sharefile.com/d-s4a809f3542947818

Detailed bug description:
 MOS 9 environment cannot deploy due to mysql crashing failures

Puppet logs for the failed controller say:
 (/Stage[main]/Cluster::Mysql/Exec[wait-initial-sync]) mysql -uclustercheck -pOObsCqCTtkLkRHK52n0H0N8O -Nbe "show status like 'wsrep_local_state_comment'" | grep -q -e Synced && sleep 10 returned 1 instead of one of [0]
 (/Stage[main]/Cluster::Mysql/Exec[wait-initial-sync]) Failed to call refresh: mysql -uclustercheck -pOObsCqCTtkLkRHK52n0H0N8O -Nbe "show status like 'wsrep_local_state_comment'" | grep -q -e Synced && sleep 10 returned 1 instead of one of [0]
 (/Stage[main]/Cluster::Mysql/Exec[wait-initial-sync]/returns) ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)

Logging in to failed controller node shows that, indeed, mysql is not running:
root@node-4:~# service mysql status
mysql stop/waiting

/var/log/mysql/error.log is attached, and shows a segfault occurring, possibly from the wsrep post commit function:

Steps to reproduce:
 Not sure which steps are needed, but my environment has:
  3x Controller (4 CPU, 6GB RAM, 80GB HDD)
  3x Qemu Compute/Cinder/Ceph-OSD (2 CPU, 1GB RAM, 50GB HDD)

Each host has two interfaces - PXE (eth0) and a VLAN network (eth1).
 Public network is on a VLAN over eth1, and Neutron is also configured to use VLANs

# md5sum MirantisOpenStack-9.0.iso
07461ba42d5056830dd6f203e8fe9691  MirantisOpenStack-9.0.iso

Expected results:
 Deployment succeeds :)

Actual result:
 Deployment fails with the above errors :)

Reproducibility:
 Deployments are occasionally successful, but once a deployment is successful it is not possible to add a new controller node as adding a controller fails 100% of the time.

Workaround:
 None known

Impact:
 Fatal; but also preventing the validation of the XenServer plugin for MOS 9 as this issue also occurs with the plugin installed.

Fuel snapshot is attached, with a second snapshot (with the XenServer plugin enabled) at https://citrix.sharefile.com/d-s4a809f3542947818