HAProxy in High Availability Guide

Bug #1649902 reported by Johnny on 2016-12-14
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-manuals
Undecided
Unassigned

Bug Description

As in the guide to configure, haproxy can't startup with errors:
Dec 13 08:47:11 controller2 haproxy[12647]: Proxy galera_cluster started.
  140 Dec 13 08:47:11 controller2 haproxy[12647]: Proxy galera_cluster started.
  141 Dec 13 08:47:11 controller2 haproxy-systemd-wrapper[12645]: [ALERT] 347/084711 (12647) : Starting proxy glance_api_cluster: cannot bind socket [10.0.0.10:92 92]
  142 Dec 13 08:47:11 controller2 haproxy-systemd-wrapper[12645]: [ALERT] 347/084711 (12647) : Starting proxy glance_registry_cluster: cannot bind socket [10.0.0. 10:9191]
  143 Dec 13 08:47:11 controller2 haproxy-systemd-wrapper[12645]: [ALERT] 347/084711 (12647) : Starting proxy nova_compute_api_cluster: cannot bind socket [10.0.0 .10:8774]
  144 Dec 13 08:47:11 controller2 haproxy-systemd-wrapper[12645]: [ALERT] 347/084711 (12647) : Starting proxy nova_metadata_api_cluster: cannot bind socket [10.0. 0.10:8775]

The controller's IP is 10.0.0.12 and vip is 10.0.0.10. I saw that the reason should be the related backend services has occupied
the ports: 8774, 9292 etc because they are listening on 0.0.0.0!

Then I added "net.ipv4.ip_nonlocal_bind=1" to /etc/sysctl.conf and reboot the machine, now the openstack services can start!
Dec 14 22:04:17 controller2 nova-api[17684]: 2016-12-14 22:04:17.633 17684 ERROR nova File "/usr/lib/python2.7/dist-packages/eventlet/convenience.py", line 44, in listen
Dec 14 22:04:17 controller2 nova-api[17684]: 2016-12-14 22:04:17.633 17684 ERROR nova sock.listen(backlog)
Dec 14 22:04:17 controller2 nova-api[17684]: 2016-12-14 22:04:17.633 17684 ERROR nova File "/usr/lib/python2.7/socket.py", line 228, in meth
Dec 14 22:04:17 controller2 nova-api[17684]: 2016-12-14 22:04:17.633 17684 ERROR nova return getattr(self._sock,name)(*args)
Dec 14 22:04:17 controller2 nova-api[17684]: 2016-12-14 22:04:17.633 17684 ERROR nova error: [Errno 98] Address already in use
Dec 14 22:04:17 controller2 nova-api[17684]: 2016-12-14 22:04:17.633 17684 ERROR nova
Dec 14 22:04:17 controller2 nova-api[17684]: haoqf: import GLib now
Dec 14 22:04:17 controller2 systemd[1]: nova-api.service: Main process exited, code=exited, status=1/FAILURE
Dec 14 22:04:17 controller2 systemd[1]: nova-api.service: Unit entered failed state.
Dec 14 22:04:17 controller2 systemd[1]: nova-api.service: Failed with result 'exit-code'.

The cause is because of the ports(8774,9292 etc) were occupied by haproxy now!

So I think the guide is wrong!
The solution might be either change haproxy's frontend to use different ports, or change openstack services to listen on internal
IP like 10.0.0.12 rather than 0.0.0.0.
I think the latter is better but some services have no such option to do so.

-----------------------------------
Release: 0.0.1 on 2016-12-13 00:09
SHA: 2ef60ea916ae1173eaf5f2c3db53d53543b0820e
Source: http://git.openstack.org/cgit/openstack/openstack-manuals/tree/doc/ha-guide/source/controller-ha-haproxy.rst
URL: http://docs.openstack.org/ha-guide/controller-ha-haproxy.html

Aleksandr Didenko (adidenko) wrote :

Fuel runs haproxy via pacemaker (not vis systemd/upstart) and pacemaker runs haproxy in a separate network namespace. So haproxy does not cause any problems by listedning on 0.0.0.0 since it's listening in a separate network namespace. You can see it via "ip netns ls" command and then "ip netns exec haproxy ip a".

Did you try to restart haproxy via systemd/upstart? If so then you could face this problem. You should use pacemaker to control haproxy service.

Johnny (qf-hao) wrote :

Thanks Aleksandr for your detailed explanation! I never used systemd or upstart to start the haproxy, I used service haproxy start/stop
to do it. But now I use pacemaker to start haproxy. And I used crmsh to configure pacemaker.
Although the cluster looks working, "ip netns ls" shows nothing, and there is no any namespace created!

root@controller2:/etc/corosync# crm status
Last updated: Fri Dec 16 19:02:39 2016 Last change: Fri Dec 16 08:04:25 2016 via controller3 on controller2
Stack: corosync
Current DC: controller2 (version 1.1.14-70404b0) - partition WITHOUT quorum
2 nodes and 3 resources configured

Online: [ controller2 ]
OFFLINE: [ controller3 ]

Full list of resources:

 vip (ocf::heartbeat:IPaddr2): Started controller2
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller2 ]
     Stopped: [ controller3 ]

Here is crm's config:
root@controller2:/etc/corosync# crm configure show
node 2: controller2
node 3: controller3
primitive haproxy lsb:haproxy \
 op monitor interval=1s
primitive vip IPaddr2 \
 params ip=10.0.0.10 cidr_netmask=24 \
 op monitor interval=30s
clone haproxy-clone haproxy
order haproxy-after-vip Mandatory: vip haproxy-clone
colocation vip-with-haproxy inf: vip haproxy-clone
property cib-bootstrap-options: \
 have-watchdog=false \
 dc-version=1.1.14-70404b0 \
 cluster-infrastructure=corosync \
 cluster-name=debian \
 stonith-enabled=false \
 pe-warn-series-max=1000 \
 pe-input-series-max=1000 \
 pe-error-series-max=1000 \
 cluster-recheck-interval=5min \
 no-quorum-policy=ignore

I installed pacemaker with "apt-get install pacemaker", not sure if it's enough, but all listed components are there except heartbeat.

corosync.conf is newer than that in the guide, but I tried to use multicast option, the only difference is that I didn't set "ring1_xxx",
but I don't think it matters and attach the corosync.conf and /etc/haproxy/haproxy.conf here.

root@controller2:/etc/corosync# corosync-cmapctl runtime.totem.pg.mrp.srp.members
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.0.12)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

root@controller2:/etc/corosync# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
 id = 10.0.0.12
 status = ring 0 active with no faults

I also didn't find any related error in /var/log/corosync/corosync.log.

I don't if I need to set hacluster user's password, but the guide doesn't mention it.
Is there anyway to debug or check the log that why pacemaker didn't create the namespace?

Thanks a lot!

Johnny (qf-hao) wrote :
Johnny (qf-hao) wrote :
Aleksandr Didenko (adidenko) wrote :

Whoops, sorry, my comment was about Fuel setup from LP #1459456, please disregard it.

Johnny (qf-hao) wrote :

It's OK, maybe I didn't mention it clearly that I installed it accordingly to the manual configuration guide:
http://docs.openstack.org/ha-guide/controller-ha-haproxy.html
Anyway, do you have any suggestion/workaround on how to solve this problem?
And why can't the manual starting pacemaker create a new namespace to avoid the problem?
Appreciate it!

Johnny (qf-hao) wrote :

After checked the latest code of fuel in fuel-library-master/files/fuel-ha-utils/ocf/ns_haproxy, I found it seems haproxy is started by the program, not pacemaker. Here is my understanding for the procedure:
1. create the haproxy network namespace.
2. start haproxy in the created namespace haproxy.
3. Create veth peer and setup the host and namespace's ip, route rules.

I don't know if my understanding is correct. And if this procedure is used by fuel to start haproxy, haproxy will not be started
by pacemaker? I didn't find any description in pacemaker that it supports running haproxy in a new namespace either.

Meanwhile, I tried the command to start haproxy but met problem:
ip netns exec test2 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid
[WARNING] 351/215752 (1018) : <debug> mode incompatible with <quiet>, <daemon> and <systemd>. Keeping <debug> only.
Available polling systems :
      epoll : pref=300, test result OK
       poll : pref=200, test result OK
     select : pref=150, test result FAILED
Total: 3 (2 usable), will use epoll.
Using epoll() as the polling mechanism.
[WARNING] 351/215752 (1018) : Server galera_cluster/controller2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 1 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 351/215753 (1018) : Backup Server galera_cluster/controller3 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 351/215753 (1018) : proxy 'galera_cluster' has no server available!

Could you please help to explain them?

Thanks!

Johnny (qf-hao) wrote :

After clustercheck is set, the galera_cluster can start.

Johnny (qf-hao) wrote :

After confirmation with pacemaker folks, I got that pacemaker can't start haproxy in a new network namespace
automatically. If we want to do that, we might create the new network namespace or container by ourselves
which is managed by pacemaker as resource and haproxy is running in. Something just like what fuel does.
Thx

Adding Andrew and Adam from the HA team to help triage this issue.

Hi Johnny,

Seeing as this bug you have filed is more of a troubleshooting and configuration manner, I am closing this as invalid until you are, at another time, able to file a bug with the exact change that is required to be made by the documentation team.

If you still require configuration and troubleshooting help, you will get support quicker through these channels:

- Q&A site: http://ask.openstack.org

- Mailing Lists: http://lists.openstack.org (Ask questions on the OpenStack mailing list or OpenStack-operators mailing list. DO NOT ask support questions on the OpenStack-dev mailing list.

- IRC: IRC channels are another way to ask questions, specifically the #openstack channel on Freenode. A list of all OpenStack IRC channels is here: https://wiki.openstack.org/wiki/IRC

Thank you :)

Alex

Changed in openstack-manuals:
status: New → Invalid
PERRY antoine (antoinep88) wrote :

Hi Johnny,

I have the same problem with haproxy. I read your description and i don't understand to configure "network namespace". Pacemaker is installed but haproxy can't start ( cannot bind glance ...keystone ... etc )

my VIP is 10.0.0.10
controller1 10.0.0.11
controller2 10.0.0.12

I use command : ip netns add haproxy

but what's is the command to start haproxy in the haproxy network namespace ?
I don't understand.

Thanks a lot of

( a new french guy in openstack :) )

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers