UI does not show correct reason of deployment error

Bug #1659203 reported by Sergey Galkin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Sustaining
Mitaka
Invalid
High
Fuel Sustaining
Newton
Invalid
High
Fuel Sustaining

Bug Description

Steps to reproduce:
1. Install 9.0
2. Upgrade to 9.2 from http://mirror.fuel-infra.org/mos-repos/centos/mos9.0-centos7/snapshots/proposed-2017-01-13-184421/x86_64
3. Start deploying cluster (~300 nodes in my case)

Deployment failed with some reasons and can show nothing about reason or show incorrect reasons
On example in last case deployment failed with errors in openstack-haproxy-mysqld on node-2008 and openstack-haproxy-radosgw on node-2009 but openstack-haproxy-glance on node-2004 is ok

On UI
Error
All nodes are finished. Failed tasks: Task[openstack-haproxy-glance/2004], Task[openstack-haproxy-mysqld/2008], Task[openstack-haproxy-radosgw/2009] Stopping the deployment process!

In astute.log
[root@fuel astute]# grep ERROR astute.log | tail -n 5
2017-01-25 00:41:08 ERROR [32710] Node 2140(connectivity_tests) status: error
2017-01-25 00:41:55 ERROR [32710] Node 2257(connectivity_tests) status: error
2017-01-25 00:41:58 ERROR [32710] Node 2256(connectivity_tests) status: error
2017-01-25 00:42:07 ERROR [32710] Node 2062(connectivity_tests) status: error
2017-01-25 07:34:24 ERROR [32707] Node 2009(openstack-haproxy-mysqld) status: error

End of puppet.logs from node-2004 (no errors)
onf.d/081-glance-glare.cfg] (notice): Triggered 'refresh' from 4 events
2017-01-25 07:37:51 +0000 /Stage[main]/Openstack::Ha::Glance/Openstack::Ha::Haproxy_service[glance-glare]/Haproxy::Listen[glance-glare]/Concat[/etc/haproxy/conf.d/081-glance-glare.cfg]/File[/etc/haproxy/conf.d/081-glance-glare.cfg]/ensure (notice): defined content as '{md5}ce57858f27f649b9fffcdb33c726c211'
2017-01-25 07:40:01 +0000 /Stage[main]/Openstack::Ha::Haproxy_restart/Exec[haproxy-restart] (notice): Triggered 'refresh' from 6 events
2017-01-25 07:40:15 +0000 Puppet (notice): Finished catalog run in 389.80 seconds

root@node-2004:~# grep '(err)' /var/log/puppet.log
root@node-2004:~#

Revision history for this message
Sergey Galkin (sgalkin) wrote :

On the node-2008 the same picture

puppet.log is ok

root@node-2008:~# tail -n10 /var/log/puppet.log
2017-01-25 07:35:00 +0000 /Stage[main]/Openstack::Ha::Mysqld/Openstack::Ha::Haproxy_service[mysqld]/Haproxy::Listen[mysqld]/Concat[/etc/haproxy/conf.d/110-mysqld.cfg]/File[/var/lib/puppet/concat/_etc_haproxy_conf.d_110-mysqld.cfg/fragments.concat]/ensure (notice): created
2017-01-25 07:35:00 +0000 /Stage[main]/Openstack::Ha::Mysqld/Openstack::Ha::Haproxy_service[mysqld]/Haproxy::Listen[mysqld]/Concat[/etc/haproxy/conf.d/110-mysqld.cfg]/File[/var/lib/puppet/concat/_etc_haproxy_conf.d_110-mysqld.cfg/fragments]/ensure (notice): created
2017-01-25 07:35:00 +0000 /Stage[main]/Openstack::Ha::Mysqld/Openstack::Ha::Haproxy_service[mysqld]/Haproxy::Listen[mysqld]/Concat::Fragment[mysqld_listen_block]/File[/var/lib/puppet/concat/_etc_haproxy_conf.d_110-mysqld.cfg/fragments/00_mysqld_listen_block]/ensure (notice): defined content as '{md5}e6771a717f9e35cdf9b308ec983ecdc5'
2017-01-25 07:35:00 +0000 /Stage[main]/Openstack::Ha::Mysqld/Openstack::Ha::Haproxy_service[mysqld]/Haproxy::Listen[mysqld]/Concat[/etc/haproxy/conf.d/110-mysqld.cfg]/File[/var/lib/puppet/concat/_etc_haproxy_conf.d_110-mysqld.cfg/fragments.concat.out]/ensure (notice): created
2017-01-25 07:35:00 +0000 /Stage[main]/Openstack::Ha::Mysqld/Openstack::Ha::Haproxy_service[mysqld]/Haproxy::Balancermember[mysqld]/Concat::Fragment[mysqld_balancermember_mysqld]/File[/var/lib/puppet/concat/_etc_haproxy_conf.d_110-mysqld.cfg/fragments/01-mysqld_mysqld_balancermember_mysqld]/ensure (notice): defined content as '{md5}bfe44e37d4799f9e45a6fa40e53181ab'
2017-01-25 07:35:00 +0000 /Stage[main]/Openstack::Ha::Mysqld/Openstack::Ha::Haproxy_service[mysqld]/Haproxy::Listen[mysqld]/Concat[/etc/haproxy/conf.d/110-mysqld.cfg]/Exec[concat_/etc/haproxy/conf.d/110-mysqld.cfg]/returns (notice): executed successfully
2017-01-25 07:35:00 +0000 /Stage[main]/Openstack::Ha::Mysqld/Openstack::Ha::Haproxy_service[mysqld]/Haproxy::Listen[mysqld]/Concat[/etc/haproxy/conf.d/110-mysqld.cfg]/Exec[concat_/etc/haproxy/conf.d/110-mysqld.cfg] (notice): Triggered 'refresh' from 4 events
2017-01-25 07:35:00 +0000 /Stage[main]/Openstack::Ha::Mysqld/Openstack::Ha::Haproxy_service[mysqld]/Haproxy::Listen[mysqld]/Concat[/etc/haproxy/conf.d/110-mysqld.cfg]/File[/etc/haproxy/conf.d/110-mysqld.cfg]/ensure (notice): defined content as '{md5}58f1465a2e801e979c0391387d5e61d1'
2017-01-25 07:36:05 +0000 /Stage[main]/Openstack::Ha::Haproxy_restart/Exec[haproxy-restart] (notice): Triggered 'refresh' from 2 events
2017-01-25 07:36:11 +0000 Puppet (notice): Finished catalog run in 71.46 seconds

root@node-2008:~# grep '(err)' /var/log/puppet.log
root@node-2008:~#

Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Sergey Galkin (sgalkin) wrote :

Screenshot of Fuel Deployment History

tags: added: area-python
Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
milestone: none → 9.2
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

According to astute logs we can see that the node has not responded with puppet mc agent

2017-01-25 07:42:42 WARNING [32707] Puppet agent 2008 didn't respond within the allotted time

this seems to be an issue with network connectivity - needs further investigation

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

so the reporting is good - we just do not have logs from other nodes due to networking issues apparently related https://bugs.launchpad.net/fuel/+bug/1659210

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

so far, other issues are related only to where haproxy did not start to sysctl ip_nonlocal_bind variable reset to 0 - we need to check why that happened

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I would suppose that astute issues may have been related to rsyslog hogging all cpu and/or other service writing to local syslog through rsyslogd which got stuck, thus the mc agents reports could not reach astute which marked the nodes as failed

Changed in fuel:
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

@Vladimir, FYI, report for the root cause of the message is in the bug #1659205

Roman Vyalov (r0mikiam)
Changed in fuel:
status: Incomplete → Won't Fix
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 9.2 → 11.0
status: Won't Fix → Incomplete
Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.