[regression] 9.0 mitaka mos is failed to deploy with nodes in error state

Bug #1578106 reported by Yury Tregubov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Critical
Bogdan Dobrelya
Mitaka
In Progress
Critical
Bogdan Dobrelya

Bug Description

Since about iso #262 on 9.0 mos mitaka the deployment of the environment is failed quite recently.
In about 50% cases the deploy is failed with nodes stucked in error state.

The following error are seen in pacemaker logs on one of the affected envs:
warning: status_from_rc: Action 61 (p_mysqld_start_0) on node-2.test.domain.local failed (target: 0 vs. rc: 1): Error

Diagnostic snapshot from the env is attached.

And looks like the problem appears only on environments with the 3 controllers.
Test environments with only one controller are deployed fine so far.

So to reproduce the problem just try to deploy env with 9.0 mitaks iso #262 or later.
Environment should contain at least 3 controllers, one compute and one storage (cinder or ceph doesn't matter).

Revision history for this message
Yury Tregubov (ytregubov) wrote :
Changed in fuel:
assignee: nobody → Fuel Library (Deprecated) (fuel-library)
importance: Undecided → Critical
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

It looks like deployment failed because fuel can't install mysql and rabbitmq clusters in HA mode, we can see many puppet errors in logs, like:

 (/Stage[main]/Cluster::Mysql/Exec[wait-initial-sync]) Failed to call refresh: mysql -uclustercheck -pd83wmbYJXsbpeo9ZCJz3sJI9 -Nbe "show status like 'wsrep_local_state_comment'" | grep -q -e Synced && sleep 10 returned 1 instead of one of [0]

 Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)

(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:kOVE8QMA6taaje1mZZjHCbZe@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]

It is blocker for QA team because many test configurations which we use to run Tempest tests failed with this error.
Priority changed to Critical because deployment of the environment failed and we don't have a workaround yet.

tags: added: blocker-for-qa
tags: added: area-library
summary: - 9.0 mitaka mos is failed to deploy with nodes in error state
+ [regression] 9.0 mitaka mos is failed to deploy with nodes in error
+ state
tags: added: mysql
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Can't be troubleshooted as blocked by https://bugs.launchpad.net/fuel/+bug/1575777

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Requires a live lab with the issue caught

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Addressed by (missed backport) https://review.openstack.org/#/c/312415/

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Addressed by (misised backport) https://review.openstack.org/#/c/312416/

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/newton
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.