9.2 deployment failed on haproxy start on one controller

Bug #1659205 reported by Sergey Galkin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Sustaining

Bug Description

Steps to reproduce:
1. Install 9.0
2. Upgrade to 9.2 from http://mirror.fuel-infra.org/mos-repos/centos/mos9.0-centos7/snapshots/proposed-2017-01-13-184421/x86_64
3. Start deploying cluster (~300 nodes in my case)

Deployment failed with error
Error
All nodes are finished. Failed tasks: Task[openstack-haproxy-glance/2004], Task[openstack-haproxy-mysqld/2008], Task[openstack-haproxy-radosgw/2009] Stopping the deployment process!

On node-2009
============
puppet.log:
2017-01-25 07:36:20 +0000 /Stage[main]/Openstack::Ha::Haproxy_restart/Exec[haproxy-restart] (err): /usr/lib/ocf/resource.d/fuel/ns_haproxy reload returned 1 instead of one of [0,]
Full - http://paste.openstack.org/show/596376/

/usr/lib/ocf/resource.d/fuel/ns_haproxy reload output:
root@node-2009:/var/log# /usr/lib/ocf/resource.d/fuel/ns_haproxy reload
ocf-ns_haproxy: DEBUG: default: check_ns(): recieved netns list: haproxy
ocf-ns_haproxy: DEBUG: default: get_variables(): set up variables and PIDFILE name
ocf-ns_haproxy: DEBUG: default: check_ns(): recieved netns list: haproxy
ocf-ns_haproxy: DEBUG: default: get_variables(): set up variables and PIDFILE name
ocf-ns_haproxy: INFO: haproxy daemon is not running
ocf-ns_haproxy: INFO: Haproxy daemon is not running. Starting it.
ocf-ns_haproxy: DEBUG: default: check_ns(): recieved netns list: haproxy
ocf-ns_haproxy: DEBUG: default: get_variables(): set up variables and PIDFILE name
ocf-ns_haproxy: DEBUG: default: check_ns(): recieved netns list: haproxy
ocf-ns_haproxy: DEBUG: default: get_variables(): set up variables and PIDFILE name
ocf-ns_haproxy: INFO: haproxy daemon is not running
ocf-ns_haproxy: ERROR: [ALERT] 024/080501 (27071) : Starting proxy mysqld: cannot bind socket [10.41.0.246:3306] [ALERT] 024/080501 (27071) : Starting proxy object-storage: cannot bind socket [10.3.60.74:8080] [ALERT] 024/080501 (27071) : Starting proxy object-storage: cannot bind socket [10.41.0.246:8080]
ocf-ns_haproxy: ERROR: Error. haproxy daemon returned error 0.

I can't find 10.41.0.246 or 10.3.60.74 on this node

root@node-2009:/var/log# ip netns exec vrouter ip -4 addr | grep -E '(10.41.0.246|10.3.60.74)'
root@node-2009:/var/log# ip netns exec haproxy ip -4 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
root@node-2009:/var/log# ip -4 addr | grep -E '(10.41.0.246|10.3.60.74)'
root@node-2009:/var/log#

Message about failures on node-2008 and node-2004 is not true and issue about this is
https://bugs.launchpad.net/fuel/+bug/1659203

Tags: scale
Sergey Galkin (sgalkin)
tags: added: scale
Revision history for this message
Sergey Galkin (sgalkin) wrote :

Perhaps it is important
in /var/log/syslog on the node-2009
<27>Jan 25 05:49:00 node-2009 ocf-ns_vrouter: ERROR: RTNETLINK answers: File exists
<27>Jan 25 05:49:30 node-2009 ocf-ns_vrouter: ERROR: RTNETLINK answers: File exists
<27>Jan 25 05:50:01 node-2009 ocf-ns_vrouter: ERROR: RTNETLINK answers: File exists
<27>Jan 25 05:50:31 node-2009 ocf-ns_vrouter: ERROR: RTNETLINK answers: File exists
<27>Jan 25 05:51:02 node-2009 ocf-ns_vrouter: ERROR: RTNETLINK answers: File exists
<27>Jan 25 05:51:32 node-2009 ocf-ns_vrouter: ERROR: RTNETLINK answers: File exists
<27>Jan 25 05:52:03 node-2009 ocf-ns_vrouter: ERROR: RTNETLINK answers: File exists

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Nope, the deployment failed because we had ip_nonlocal_bind set to 0 on one of the nodes. Apparently, someone or some process reset it back to 0 after initial setup of the namespace

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

the workaround is to run ip netns delete haproxy and restart the haproxy service on the node and then retrigger the deployment

Changed in fuel:
milestone: none → 9.2
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
importance: Undecided → High
milestone: 9.2 → 9.3
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

it seems that this is not a bug at all as nonlocal_bind is created on namespace start and someone just disabled it manually

Changed in fuel:
status: New → Invalid
Revision history for this message
Sergey Galkin (sgalkin) wrote :

ip netns delete haproxy
/usr/lib/ocf/resource.d/fuel/ns_haproxy start
on the node-2009 fixed issue

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.