Description of problem:
On a fresh deployment of Queen with ODL Oxygen, we are seeing haproxy-bundle containers killed on two of the controllers (the controlelrs without the VIP)
Version-Release number of selected component (if applicable):
OSP 13
How reproducible:
100%
Steps to Reproduce:
1. Deploy OpenStack Queens with ODL
2. use pcs status to view status
3.
Actual results:
haproxy containers killed on controller-1 and controller-2
Expected results:
haproxy containers should be started on all controllers
Additional info:
[root@overcloud-controller-0 heat-admin]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: overcloud-controller-2 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Mon Apr 9 23:44:07 2018
Last change: Mon Apr 9 23:44:05 2018 by hacluster via crmd on overcloud-controller-2
12 nodes configured
37 resources configured
Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
GuestOnline: [ galera-bundle-0@overcloud-controller-0 galera-bundle-1@overcloud-controller-1 galera-bundle-2@overcloud-controller-2 rabbitmq-bundle-0@overcloud-controller-0 rabbitmq-bundle-1@overcloud-controller-1 rabbitmq-bundle-2@overcloud-controller-2 redis-bundle-0@overcloud-controller-0 redis-bundle-1@overcloud-controller-1 redis-bundle-2@overcloud-controller-2 ]
Full list of resources:
Docker container set: rabbitmq-bundle [docker-registry.engineering.redhat.com/rhosp13/openstack-rabbitmq:pcmklatest]
rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-0
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-1
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-2
Docker container set: galera-bundle [docker-registry.engineering.redhat.com/rhosp13/openstack-mariadb:pcmklatest]
galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-controller-0
galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-controller-1
galera-bundle-2 (ocf::heartbeat:galera): Master overcloud-controller-2
Docker container set: redis-bundle [docker-registry.engineering.redhat.com/rhosp13/openstack-redis:pcmklatest]
redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-0
redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-1
redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-2
ip-192.168.24.54 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.21.0.100 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.16.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.16.0.14 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.18.0.18 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.19.0.13 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
Docker container set: haproxy-bundle [docker-registry.engineering.redhat.com/rhosp13/openstack-haproxy:pcmklatest]
haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0
haproxy-bundle-docker-1 (ocf::heartbeat:docker): Stopped
haproxy-bundle-docker-2 (ocf::heartbeat:docker): Stopped
Docker container: openstack-cinder-volume [docker-registry.engineering.redhat.com/rhosp13/openstack-cinder-volume:pcmklatest]
openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-1
==============================================================================
In /var/log/messages
Apr 9 19:49:09 overcloud-controller-1 docker(haproxy-bundle-docker-2)[886021]: ERROR: Newly created docker container exited after start
Apr 9 19:49:09 overcloud-controller-1 lrmd[20848]: notice: haproxy-bundle-docker-2_start_0:886021:stderr [ ocf-exit-reason:waiting on monitor_cmd to pass after start ]
Apr 9 19:49:09 overcloud-controller-1 lrmd[20848]: notice: haproxy-bundle-docker-2_start_0:886021:stderr [ ocf-exit-reason:Newly created docker container exited after start ]
Apr 9 19:49:09 overcloud-controller-1 crmd[20851]: notice: Result of start operation for haproxy-bundle-docker-2 on overcloud-controller-1: 1 (unknown error)
Apr 9 19:49:09 overcloud-controller-1 crmd[20851]: notice: overcloud-controller-1-haproxy-bundle-docker-2_start_0:159 [ ocf-exit-reason:waiting on monitor_cmd to pass after start\nocf-exit-reason:Newly created docker container exited after start\n ]
Apr 9 19:49:10 overcloud-controller-1 dockerd-current: time="2018-04-09T23:49:10.004764059Z" level=error msg="Handler for POST /v1.26/containers/haproxy-bundle-docker-2/stop?t=10 returned error: Container haproxy-bundle-docker-2 is already stopped"
Apr 9 19:49:10 overcloud-controller-1 dockerd-current: time="2018-04-09T23:49:10.005303162Z" level=error msg="Handler for POST /v1.26/containers/haproxy-bundle-docker-2/stop returned error: Container haproxy-bundle-docker-2 is already stopped"
Apr 9 19:49:10 overcloud-controller-1 docker(haproxy-bundle-docker-2)[886633]: INFO: haproxy-bundle-docker-2
Apr 9 19:49:10 overcloud-controller-1 docker(haproxy-bundle-docker-2)[886633]: NOTICE: Cleaning up inactive container, haproxy-bundle-docker-2.
Apr 9 19:49:10 overcloud-controller-1 docker(haproxy-bundle-docker-2)[886633]: INFO: haproxy-bundle-docker-2
Apr 9 19:49:10 overcloud-controller-1 crmd[20851]: notice: Result of stop operation for haproxy-bundle-docker-2 on overcloud-controller-1: 0 (ok)
Comment from Raoul,
On the machines we see that the problem related to the container is really specific, and you can see it by starting the container by hand:
[ALERT] 099/133036 (10) : Starting proxy opendaylight_ws: cannot bind socket [172.16.0.15:8185] 24.59:8185]
[ALERT] 099/133036 (10) : Starting proxy opendaylight_ws: cannot bind socket [192.168.
Which should mean that the ports that haproxy want to use are occupied by something, but in fact what we have on the controller is:
[root@overcloud -controller- 1 heat-admin]# netstat -nlp|grep 8185
tcp 0 0 172.16.0.20:8185 0.0.0.0:* LISTEN 496289/java
So the local IP of the machine 172.16.0.20 correctly listens with the opendaylight service (driven by the container) and nothing else.
One particular thing is that controller-1 do not have any VIP on it, and the problem does not happen on controller-0, where the VIP lives.
Commenting the opendaylight_ws section in /var/lib/ config- data/puppet- generated/ haproxy/ etc/haproxy/ haproxy. cfg on the machine makes haproxy start, but it remains to be understood why it cannot bind the port.