apparmor profile blocks operation of haproxy loadbalancer updates

Bug #1770040 reported by Xav Paice
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Neutron Gateway Charm
Fix Released
Medium
James Page
neutron-lbaas (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

Juju 3.2.7, current 18.02 charms. Deployed an OpenStack cloud including Neutron and LBaaS.

Made a fresh set of 3 instances, assigned floating IPS to all 3. Made a security group to allow port 80 in.

Made a fresh load balancer, listener, tcp based healthcheck. Used nc on all 3 instances to listen on port 80.

Connected to the load balancer floating IP on port 80, immediately discoonects me - there's no backends listening.

Initial haproxy status:
root@neut001:~# echo 'show stat;show table' | socat stdio /var/lib/neutron/lbaas/v2/b100e451-f3ba-46f3-bde4-274a6d15ae6d/haproxy_stats.sock
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
171808be-5e02-4bb9-8dbd-e559d541f473,FRONTEND,,,0,1,2000,1,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,

(no backends)

Restarted neutron-lbaasv2-agent

root@neut001:~# echo 'show stat;show table' | socat stdio /var/lib/neutron/lbaas/v2/b100e451-f3ba-46f3-bde4-274a6d15ae6d/haproxy_stats.sock
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
171808be-5e02-4bb9-8dbd-e559d541f473,FRONTEND,,,0,0,2000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,0,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
7ef8ab16-8fec-4ed5-9883-2613d4295476,632625a8-50cf-40e2-9c18-bdfdf79cac3c,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,13,0,,1,3,1,,0,,2,0,,0,L4OK,,0,,,,,,,0,,,,0,0,,,,,-1,,,0,0,0,0,
7ef8ab16-8fec-4ed5-9883-2613d4295476,16584f9d-e035-4550-9094-b26078e1fc87,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,13,0,,1,3,2,,0,,2,0,,0,L4OK,,0,,,,,,,0,,,,0,0,,,,,-1,,,0,0,0,0,
7ef8ab16-8fec-4ed5-9883-2613d4295476,d39e8de3-95a6-417d-81eb-4fca9079e57e,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,13,0,,1,3,3,,0,,2,0,,0,L4OK,,0,,,,,,,0,,,,0,0,,,,,-1,,,0,0,0,0,
7ef8ab16-8fec-4ed5-9883-2613d4295476,BACKEND,0,0,0,0,200,0,0,0,0,0,,0,0,0,0,UP,3,3,0,,0,13,0,,1,3,0,,0,,1,0,,0,,,,,,,,,,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,

(all 3 backends)

After restarting the service, all the traffic passes perfectly. The only thing I did was restart the service - I suspect there's some kind of race condition where we need to restart the services after the config is changed.

Revision history for this message
James Page (james-page) wrote :

This is an underlying neutron-lbaas issue - however we'll need details of the openstack release deployed including specific package versions for neutron-lbaasv2-agent please!

Changed in charm-neutron-gateway:
status: New → Invalid
Changed in neutron-lbaas (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
summary: - load balancer does not forward traffic unless it's restarted
+ lbaas load balancer does not forward traffic unless agent restarted
Revision history for this message
Paul Collins (pjdc) wrote : Re: lbaas load balancer does not forward traffic unless agent restarted

The customer cloud where we're seeing this is running pike on xenial from the Ubuntu Cloud Archive.

Package version 2:11.0.2-0ubuntu1~cloud0 is what's installed on both neutron-gateway units.

Changed in neutron-lbaas (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Xav Paice (xavpaice) wrote :

Please note that this affects customers as follows;

- customer creates a lbaas, no backends come up
- we restart the service, and backends come to life
- customer creates another lbaas, the running one is fine but the new one has no backends
- we restart... etc

This means for every new load balancer, we need to restart the service to get it actually forwarding traffic.

Revision history for this message
Xav Paice (xavpaice) wrote :

Due to customer impact, have subscribed field-high.

Revision history for this message
James Page (james-page) wrote :

Thanks Paul

There is a 11.0.3 update in pike-proposed - I can't see anything definitive but it would be good to test with that (both on neutron-gateway and neutron-api units) to see if that resolves the issue.

Revision history for this message
James Page (james-page) wrote :

Attempting to reproduce.

Revision history for this message
James Page (james-page) wrote :

I'm not able to reproduce following the lbaas v2 docs:

https://docs.openstack.org/mitaka/networking-guide/config-lbaas.html

haproxy stats reports both backend server are in the configuration indicating that haproxy has been reloaded as the pool was updated.

echo 'show stat;show table' | sudo socat stdio /var/lib/neutron/lbaas/v2/aa689d45-6853-44ba-8b46-a40da8663e9a/haproxy_stats.sock
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
ea3b4ef0-2cad-40b0-8051-1247c6c99bc0,FRONTEND,,,0,2,2000,4,308,848,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,1,,,,0,0,0,0,4,0,,0,1,4,,,0,0,0,0,,,,,,,,
66153c41-10d7-4f22-a63d-6ab276a0244a,57cc336c-cca9-4c8e-8fd1-680ca7379eff,0,0,0,2,,8,77,212,,0,,1,0,0,7,no check,1,1,0,,,,,,1,3,1,,8,,2,0,,2,,,,0,0,0,0,0,0,0,,,,0,0,,,,,29,,,0,0,0,0,
66153c41-10d7-4f22-a63d-6ab276a0244a,995b2445-ca83-4de4-93d0-fe106501265a,0,0,0,2,,8,231,636,,0,,3,0,0,5,no check,1,1,0,,,,,,1,3,2,,8,,2,0,,2,,,,0,0,0,0,0,0,0,,,,0,0,,,,,33,,,0,0,0,0,
66153c41-10d7-4f22-a63d-6ab276a0244a,BACKEND,0,0,0,2,200,4,308,848,0,0,,4,0,0,12,UP,2,2,0,,0,170,0,,1,3,0,,16,,1,0,,1,,,,0,0,0,0,4,0,,,,,0,0,0,0,0,0,29,,,0,0,0,0,

Revision history for this message
James Page (james-page) wrote :

I need logs from neutron-gateway and neutron-api units, as well as the exact commands the end-user is using to create the loadbalancers.

Changed in neutron-lbaas (Ubuntu):
status: Confirmed → Incomplete
assignee: nobody → James Page (james-page)
Revision history for this message
James Page (james-page) wrote :

(just to be clear that's logs from /var/log/neutron on the neutron-* units).

Revision history for this message
Nobuto Murata (nobuto) wrote :

I may be completely wrong, but one possible reason to cause 503 from haproxy is AppArmor.

@Xav, what happens if you disable apparmor, i.e. aa-disable /usr/bin/neutron-lbaasv2-agent?

As you see in an unrelated bug[1], the apparmor profile installed by neutron-gateway charm blocks lbaasv2 if it's set in enforced mode.

[kernel log]
Sep 21 19:46:44 HOSTNAME kernel: audit: type=1400 audit(1506023204.857:304): apparmor="DENIED" operation="connect" info="Failed name lookup - disconnected path" error=-13 profile="/usr/bin/neutron-lbaasv2-agent" name="var/lib/neutron/lbaas/v2/496d6d2b-8bf7-42b7-822f-c3f31d8db43f/haproxy_stats.sock" pid=736613 comm="neutron-lbaasv2" requested_mask="wr" denied_mask="wr" fsuid=115 ouid=0

[/var/log/neutron/neutron-lbaasv2-agent.log]
2017-09-21 19:44:44.850 736613 WARNING neutron_lbaas.drivers.haproxy.namespace_driver [-] Error while connecting to stats socket: [Errno 13] EACCES

In complain mode, if you see "ALLOWED" message for operation="connect" and info="Failed name lookup - disconnected path", but still see EACCES from lbaasv2 log. It may be hit by a bug in apparmor which blocks operation="connect" even in complain mode[2][3].

[1] https://bugs.launchpad.net/charm-neutron-gateway/+bug/1718768
[2] https://bugs.launchpad.net/apparmor/+bug/1624497
[3] https://bugs.launchpad.net/apparmor/+bug/1624300

Changed in charm-neutron-gateway:
status: Invalid → Incomplete
Revision history for this message
Xav Paice (xavpaice) wrote :

Apparmor is in 'complain' mode, the logs show the same entries but allowed rather than denied.

Worth trying that change first, then installing -proposed if that makes no difference. This is a production site after all.

Revision history for this message
Jean Duminy (jeand-sydney) wrote :

James,

I add some comments.
LBaaS not serving traffic with Floating IP (DVR)
https://answers.launchpad.net/ubuntu/+question/668889

I came across this bug which sort of touches on a few items, but I assume this would have already be fix is pike.
https://bugs.launchpad.net/neutron/+bug/1583694

"Distributed Virtual Routers are created on each Compute node dynamically on demand and removed when not required. Distributed Virtual Routers heavily depend on the port binding to identify the requirement of a DVR service on a particular node."

"This would create an issue because we will be seeing the same FloatingIP being advertised(GARP) from all nodes, and so the users on the external network will get confused on where the actual "ACTIVE" port is"

Revision history for this message
Jean Duminy (jeand-sydney) wrote :

When you restart
"After restarting the service, all the traffic passes perfectly."
this issues a GARP which re advertiser the location of the floating IP.
In our case the floating IP could be one any of the compute 6 nodes (if used by nova)
Or on the 2 neutron servers (used by LBaasS)

Revision history for this message
Nobuto Murata (nobuto) wrote :

So, /var/log/neutron/neutron-lbaasv2-agent.log had:
"WARNING neutron_lbaas.drivers.haproxy.namespace_driver [-] Error while connecting to stats socket: [Errno 13] EACCES: error: [Errno 13] EACCES"
with aa-profile-mode=complain.

After setting aa-profile-mode=disabled (juju config --reset), it seems working now (the customer is still in testing though).

Revision history for this message
Xav Paice (xavpaice) wrote :

This was reproduced with a heat template, but just running the steps at the start of the case from horizon are enough. Note that neutron-gateway was deployed with aa-profile-mode set to complain, not the default setting.

Changing this to 'disable' seems to have fixed the problem, more testing is in progress.

Revision history for this message
James Page (james-page) wrote :

The apparmor profile would appear to be the issue here.

I'll look at a fix but as a workaround please disable for gateway applications.

Changed in neutron-lbaas (Ubuntu):
status: Incomplete → Invalid
Changed in charm-neutron-gateway:
importance: Undecided → Medium
status: Incomplete → New
Changed in neutron-lbaas (Ubuntu):
assignee: James Page (james-page) → nobody
Changed in charm-neutron-gateway:
assignee: nobody → James Page (james-page)
James Page (james-page)
Changed in charm-neutron-gateway:
status: New → In Progress
James Page (james-page)
Changed in charm-neutron-gateway:
milestone: none → 18.05
summary: - lbaas load balancer does not forward traffic unless agent restarted
+ apparmor profile blocks operation of haproxy loadbalancer updates
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-gateway (master)

Fix proposed to branch: master
Review: https://review.openstack.org/568228

Revision history for this message
James Page (james-page) wrote :

Proposed apparmor profile fix is here: cs:~james-page/neutron-gateway-7

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-gateway (master)

Reviewed: https://review.openstack.org/568228
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-gateway/commit/?id=6e3e557a0a097d79f0eaac9453e45e142fa1a24e
Submitter: Zuul
Branch: master

commit 6e3e557a0a097d79f0eaac9453e45e142fa1a24e
Author: James Page <email address hidden>
Date: Mon May 14 09:24:43 2018 +0100

    apparmor: Misc fixes for lbaasv2 profile

    Ensure that profiles are correctly applied in network
    namespace using profile flag.

    Allow lbaasv2 agent binary to read /proc/*/stat to support
    monitoring of haproxy instances.

    Change-Id: Ifc3388e894db998bfad8e5998a02120222d9e3ae
    Closes-Bug: 1770040

Changed in charm-neutron-gateway:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-gateway (stable/18.02)

Fix proposed to branch: stable/18.02
Review: https://review.openstack.org/568391

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-gateway (stable/18.02)

Reviewed: https://review.openstack.org/568391
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-gateway/commit/?id=252dc547e75d5fc8e94dd15865e08c930b7e682b
Submitter: Zuul
Branch: stable/18.02

commit 252dc547e75d5fc8e94dd15865e08c930b7e682b
Author: James Page <email address hidden>
Date: Mon May 14 09:24:43 2018 +0100

    apparmor: Misc fixes for lbaasv2 profile

    Ensure that profiles are correctly applied in network
    namespace using profile flag.

    Allow lbaasv2 agent binary to read /proc/*/stat to support
    monitoring of haproxy instances.

    Change-Id: Ifc3388e894db998bfad8e5998a02120222d9e3ae
    Closes-Bug: 1770040
    (cherry picked from commit 6e3e557a0a097d79f0eaac9453e45e142fa1a24e)

Revision history for this message
Nobuto Murata (nobuto) wrote :

The patch has been backported to stable/18.02 so marking it as Fix Released.

Changed in charm-neutron-gateway:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.