Race condition causes keepalived to fail, namespace not fully configured

Bug #1695087 reported by Leandro Reox
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
Fix Released
Critical
Michael Johnson

Bug Description

Steps to reproduce it:

With an ACTIVE_STANBY pair fully working

1 - Shutoff the master node
2 - Traffic redirects to the backup node just fine
3 - Restart master node (since no preempt, vip stays at the backup)
4 - Shutoff the backup node
5 - Traffi passing through the LB gots interrupted
6 - On the restarted node keepalived failed to restart , looking at the logs :

● octavia-keepalived.service - Keepalive Daemon (LVS and VRRP)
   Loaded: loaded (/usr/lib/systemd/system/octavia-keepalived.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2017-06-01 20:37:26 UTC; 14min ago
  Process: 1010 ExecStart=/sbin/ip netns exec amphora-haproxy /usr/sbin/keepalived -D -d -f /var/lib/octavia/vrrp/octavia-keepalived.conf (code=exited, status=1/FAILURE)

Jun 01 20:37:26 amphora-02288985-f910-4c56-9253-9bcd0d0a8bc4 systemd[1]: Starting Keepalive Daemon (LVS and VRRP)...
Jun 01 20:37:26 amphora-02288985-f910-4c56-9253-9bcd0d0a8bc4 ip[1010]: Cannot open network namespace "amphora-haproxy": No such file or directory
Jun 01 20:37:26 amphora-02288985-f910-4c56-9253-9bcd0d0a8bc4 systemd[1]: octavia-keepalived.service: Control process exited, code=exited status=1
Jun 01 20:37:26 amphora-02288985-f910-4c56-9253-9bcd0d0a8bc4 systemd[1]: Failed to start Keepalive Daemon (LVS and VRRP).
Jun 01 20:37:26 amphora-02288985-f910-4c56-9253-9bcd0d0a8bc4 systemd[1]: octavia-keepalived.service: Unit entered failed state.
Jun 01 20:37:26 amphora-02288985-f910-4c56-9253-9bcd0d0a8bc4 systemd[1]: octavia-keepalived.service: Failed with result 'exit-code'.

7 - so the namespace still not fully configured or not even there when trying to bind keepalived there. But gets configured just fine after finish booting

Changed in octavia:
assignee: nobody → Michael Johnson (johnsom)
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (master)

Fix proposed to branch: master
Review: https://review.openstack.org/470051

Changed in octavia:
status: Triaged → In Progress
Revision history for this message
Leandro Reox (leandro-reox) wrote : Re: Race condition causes keepalived to fail, namepsace not fully configured

Tested the patch, worked perfectly. Thanks !

Elena Ezhova (eezhova)
summary: - Race condition causes keepalived to fail, namepsace not fully configured
+ Race condition causes keepalived to fail, namespace not fully configured
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (master)

Reviewed: https://review.openstack.org/470051
Committed: https://git.openstack.org/cgit/openstack/octavia/commit/?id=e4155624007337c4375eb5fe4ecc5b686ddaa2d2
Submitter: Jenkins
Branch: master

commit e4155624007337c4375eb5fe4ecc5b686ddaa2d2
Author: Michael Johnson <email address hidden>
Date: Thu Jun 1 15:43:24 2017 -0700

    Fix keepalived systemd race with haproxy namespace

    If an amphora gets rebooted there is a race condition in the systemd
    configurations where keepalived may start before the network namespace
    is restored by the haproxy processes. This patch makes sure the
    haproxy services start before the keepalived process.

    Change-Id: I0839161181143aee119c9a449f27671e5c5d2dd0
    Closes-Bug: #1695087

Changed in octavia:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia 1.0.0.0b2

This issue was fixed in the openstack/octavia 1.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.