N->O upgrade on IPv6 deployment get stuck during major-upgrade-composable-steps

Bug #1675782 reported by Sofer Athlan-Guyot on 2017-03-24
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Sofer Athlan-Guyot

Bug Description

Originaly reported there:
https://bugzilla.redhat.com/show_bug.cgi?id=1430384

Run the newton->ocata upgrade workflow on an IPv6 deployment with 3
controllers, 2 compute nodes and 3 ceph nodes and the upgrade gets
stuck.

[stack@undercloud-0 ~]$ openstack stack list --nested | grep PROGRESS
| 52ca5c6b-6353-450f-9ec3-6ffb3df5da96 | overcloud-AllNodesDeploySteps-n7gn2tdu4r7m-AllNodesPostUpgradeSteps-cob3l24fahnm-ControllerDeployment_Step1-gl6bzubujvwb | CREATE_IN_PROGRESS | 2017-03-08T13:28:05Z | None | 175caee2-c83d-4f70-b7f5-e2de6373a70b |
| 175caee2-c83d-4f70-b7f5-e2de6373a70b | overcloud-AllNodesDeploySteps-n7gn2tdu4r7m-AllNodesPostUpgradeSteps-cob3l24fahnm | CREATE_IN_PROGRESS | 2017-03-08T13:27:25Z | None | 5d78c2c4-fa2f-4560-aa09-40939044b9bb |
| 5d78c2c4-fa2f-4560-aa09-40939044b9bb | overcloud-AllNodesDeploySteps-n7gn2tdu4r7m | UPDATE_IN_PROGRESS | 2017-03-08T11:52:03Z | 2017-03-08T13:09:57Z | efe081d8-de20-4fef-98d8-12c23c578e6c |
| efe081d8-de20-4fef-98d8-12c23c578e6c | overcloud | UPDATE_IN_PROGRESS | 2017-03-08T11:41:47Z | 2017-03-08T13:02:34Z | None |

All the controller nodes are running the following in the os-collect-config log:

[root@overcloud-controller-2 heat-admin]# journalctl -fl -u os-collect-config
-- Logs begin at Wed 2017-03-08 11:47:14 UTC. --
Mar 08 13:28:17 overcloud-controller-2.localdomain os-collect-config[4244]: [2017-03-08 13:28:17,211] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/45ec9401-3381-4ed3-8066-5bf0b0a1442e.json
Mar 08 13:28:17 overcloud-controller-2.localdomain os-collect-config[4244]: [2017-03-08 13:28:17,212] (heat-config) [WARNING] Skipping config d4a79a71-3f6c-4ad0-be65-53ee87d38a18, already deployed
Mar 08 13:28:17 overcloud-controller-2.localdomain os-collect-config[4244]: [2017-03-08 13:28:17,212] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/d4a79a71-3f6c-4ad0-be65-53ee87d38a18.json
Mar 08 13:28:17 overcloud-controller-2.localdomain os-collect-config[4244]: [2017-03-08 13:28:17,212] (heat-config) [DEBUG] Running /usr/libexec/heat-config/hooks/puppet < /var/lib/heat-config/deployed/306ad840-29b4-4dfe-825d-0659cce43de8.json
Mar 08 13:28:23 overcloud-controller-2.localdomain su[510690]: (to rabbitmq) root on none
Mar 08 13:28:33 overcloud-controller-2.localdomain su[511048]: (to rabbitmq) root on none
Mar 08 13:28:34 overcloud-controller-2.localdomain su[511217]: (to rabbitmq) root on none
Mar 08 13:28:35 overcloud-controller-2.localdomain su[511396]: (to rabbitmq) root on none
Mar 08 13:28:36 overcloud-controller-2.localdomain su[511564]: (to rabbitmq) root on none
Mar 08 13:28:38 overcloud-controller-2.localdomain usermod[511879]: change user 'hacluster' password

The nodes seem to not be able to join the cluster:

http://paste.openstack.org/show/601938/

ip6tables rules:
http://paste.openstack.org/show/601939/

It looks that the firewall rules are blocking the nodes from joining
the cluster. After running 'ip6tables -F' the deployment was unblocked
and the nodes were able to join the cluster.

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: Confirmed → In Progress
Changed in tripleo:
milestone: ongoing → pike-1
importance: Critical → High

Reviewed: https://review.openstack.org/449613
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=670399a2caeecd9259bea454e9518ab6c92cff49
Submitter: Jenkins
Branch: master

commit 670399a2caeecd9259bea454e9518ab6c92cff49
Author: Sofer Athlan-Guyot <email address hidden>
Date: Fri Mar 24 13:45:10 2017 +0100

    N->O upgrade, blanks ipv6 rules before activating it.

    When the firewall is enabled with ipv6, the default rules set is
    taken as not ipv6 firewall was present for Newton. This make
    communication impossible until puppet is run again.

    This ensures that no rules are loaded when the firewall is enabled.

    This mimic this patch[1]

    [1] https://github.com/openstack/tripleo-heat-templates/commit/ae8aac36143d5dadb08af0d275f513678909dcc7

    Change-Id: Id878b5caae666a799c89c8466ce46b9ecb86d9f7
    Closes-Bug: #1675782

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/450144
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=440901b5026d0927ce74ab358fbe3d430f91b38a
Submitter: Jenkins
Branch: stable/ocata

commit 440901b5026d0927ce74ab358fbe3d430f91b38a
Author: Sofer Athlan-Guyot <email address hidden>
Date: Fri Mar 24 13:45:10 2017 +0100

    N->O upgrade, blanks ipv6 rules before activating it.

    When the firewall is enabled with ipv6, the default rules set is
    taken as not ipv6 firewall was present for Newton. This make
    communication impossible until puppet is run again.

    This ensures that no rules are loaded when the firewall is enabled.

    This mimic this patch[1]

    [1] https://github.com/openstack/tripleo-heat-templates/commit/ae8aac36143d5dadb08af0d275f513678909dcc7

    Change-Id: Id878b5caae666a799c89c8466ce46b9ecb86d9f7
    Closes-Bug: #1675782
    (cherry picked from commit 670399a2caeecd9259bea454e9518ab6c92cff49)

tags: added: in-stable-ocata

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b1 development milestone.

This issue was fixed in the openstack/tripleo-heat-templates 6.1.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers