If there is a /var/lib/neutron/ha_confs/<router-id>.pid then l3 agent fails to spawn a keepalived process for that router

Bug #1561046 reported by Hynek Mlnarik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Hynek Mlnarik
Kilo
New
Undecided
Unassigned

Bug Description

If the .pid file for the previous keepalived process (located in /var/lib/neutron/ha_confs/<router_id>.pid) already exists then the L3 agent fails to spawn a keepalived process for that router.

For example, upon neutron node shutdown and restart the processes are assigned new PIDs that can be same as those previously assigned to some of the keepalived processes. The latter are captured in PID files and once keepalived starts, it detects that there is a running process with that PID and reports "daemon is already running".

Steps to reproduce:
1) Pick a router that you want to make display this issue; record the router_id
2) kill the two processes denoted in these two files: /lib/neutron/ha_confs/<router_id>.pid and /lib/neutron/ha_confs/<router_id>.pid-vrrp
3) Make sure that no keepalived process comes back for this router
4) Now pick out an existing process id - anything that's really running - and put that processid into the PID files. For example, a background sleep process running as pid 12345 can be put into <router_id>.pid file and <router_id>.pid-vrrp.

Bug valid with keepalived version 1.2.13 and 1.2.19.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/296532

Changed in neutron:
assignee: nobody → Hynek Mlnarik (hmlnarik-s)
status: New → In Progress
Revision history for this message
Hynek Mlnarik (hmlnarik-s) wrote :

This bug is similar in symptoms but not the same as bug 1511311.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/296532
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e98fabb5836b12bc40a2b64a2668893ea73c2320
Submitter: Jenkins
Branch: master

commit e98fabb5836b12bc40a2b64a2668893ea73c2320
Author: Hynek Mlnarik <email address hidden>
Date: Wed Mar 23 14:51:59 2016 +0100

    Remove obsolete keepalived PID files before start

    keepalived refuses to start and claims "daemon already started"
    when there is already a process with the same PID as found in
    either the VRRP or the main process PID file. This happens even
    in case when the new process is not keepalived. The situation
    can happen when the neutron node is reset and the obsolete PID
    files are not cleaned before neutron is started.

    This commit adds PID file cleanup before keepalived start.

    Closes-Bug: 1561046
    Change-Id: Ib6b6f2fe76fe82253f195c9eab6b243d9eb76fa2

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/299135

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/299137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/299138

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/kilo)

Change abandoned by Hynek Mlnarik (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/299137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/liberty)

Change abandoned by Hynek Mlnarik (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/299138

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/mitaka)

Change abandoned by Hynek Mlnarik (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/299135

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/299211

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/299211
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2690eed19a749fb1b50bb38f3d01fce0f1497f39
Submitter: Jenkins
Branch: master

commit 2690eed19a749fb1b50bb38f3d01fce0f1497f39
Author: Hynek Mlnarik <email address hidden>
Date: Wed Mar 30 10:44:09 2016 +0200

    Refactor and fix dummy process fixture

    Extracting the test fixture that creates a new process and leaves it
    running for a given amount of time into helpers where other fixtures for
    functional tests live. This both keeps the fixtures at one place and
    increases visibility of the fixture so that it can be reused in other
    tests. At the same time, the fixture is fixed as the original code
    omitted starting the process.

    Change-Id: I97aeb8d1d5773ef3d59e8f908aea34ccceb38378
    Related-Bug: 1561046

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/299774

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/299784

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/299788

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

The bug makes a HA router totally broken. Raising to High.

Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/299138
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6dea586906efb1de33c23d5df07ffe12ee7d649b
Submitter: Jenkins
Branch: stable/liberty

commit 6dea586906efb1de33c23d5df07ffe12ee7d649b
Author: Hynek Mlnarik <email address hidden>
Date: Wed Mar 23 14:51:59 2016 +0100

    Remove obsolete keepalived PID files before start

    keepalived refuses to start and claims "daemon already started"
    when there is already a process with the same PID as found in
    either the VRRP or the main process PID file. This happens even
    in case when the new process is not keepalived. The situation
    can happen when the neutron node is reset and the obsolete PID
    files are not cleaned before neutron is started.

    This commit adds PID file cleanup before keepalived start.

    Conflicts:
     neutron/agent/linux/keepalived.py

    Closes-Bug: 1561046
    Change-Id: Ib6b6f2fe76fe82253f195c9eab6b243d9eb76fa2
    (cherry picked from commit e98fabb5836b12bc40a2b64a2668893ea73c2320)

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/299788
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7175fdef97d8d2d18c0f71f3c089fa5341b73d9e
Submitter: Jenkins
Branch: stable/liberty

commit 7175fdef97d8d2d18c0f71f3c089fa5341b73d9e
Author: Hynek Mlnarik <email address hidden>
Date: Wed Mar 30 10:44:09 2016 +0200

    Refactor and fix dummy process fixture

    Extracting the test fixture that creates a new process and leaves it
    running for a given amount of time into helpers where other fixtures for
    functional tests live. This both keeps the fixtures at one place and
    increases visibility of the fixture so that it can be reused in other
    tests. At the same time, the fixture is fixed as the original code
    omitted starting the process.

    Conflicts:
     neutron/tests/functional/agent/linux/helpers.py

    Change-Id: I97aeb8d1d5773ef3d59e8f908aea34ccceb38378
    Related-Bug: 1561046
    (cherry picked from commit 2690eed19a749fb1b50bb38f3d01fce0f1497f39)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/299137
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d2447c56ab3b945a32a711fe9068d776179bae8a
Submitter: Jenkins
Branch: stable/kilo

commit d2447c56ab3b945a32a711fe9068d776179bae8a
Author: Hynek Mlnarik <email address hidden>
Date: Wed Mar 23 14:51:59 2016 +0100

    Remove obsolete keepalived PID files before start

    keepalived refuses to start and claims "daemon already started"
    when there is already a process with the same PID as found in
    either the VRRP or the main process PID file. This happens even
    in case when the new process is not keepalived. The situation
    can happen when the neutron node is reset and the obsolete PID
    files are not cleaned before neutron is started.

    This commit adds PID file cleanup before keepalived start.

    Compared to upstream commit, the fixture in test_keepalived was adjusted
    for kilo's version of fixtures package.

    Conflicts:
        neutron/agent/linux/keepalived.py

    Closes-Bug: 1561046
    Change-Id: Ib6b6f2fe76fe82253f195c9eab6b243d9eb76fa2
    (cherry picked from commit e98fabb5836b12bc40a2b64a2668893ea73c2320)

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/299784
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9d483ee8166b4bed9b1c0d3e78f951a2e22a6f6a
Submitter: Jenkins
Branch: stable/kilo

commit 9d483ee8166b4bed9b1c0d3e78f951a2e22a6f6a
Author: Hynek Mlnarik <email address hidden>
Date: Wed Mar 30 10:44:09 2016 +0200

    Refactor and fix dummy process fixture

    Extracting the test fixture that creates a new process and leaves it
    running for a given amount of time into helpers where other fixtures for
    functional tests live. This both keeps the fixtures at one place and
    increases visibility of the fixture so that it can be reused in other
    tests. At the same time, the fixture is fixed as the original code
    omitted starting the process.

    Compared to upstream commit, the fixture in test_keepalived was adjusted
    for kilo's version of fixtures package.

    Conflicts:
        neutron/tests/functional/agent/linux/helpers.py
        neutron/tests/functional/agent/linux/test_keepalived.py

    Change-Id: I97aeb8d1d5773ef3d59e8f908aea34ccceb38378
    Related-Bug: 1561046
    (cherry picked from commit 2690eed19a749fb1b50bb38f3d01fce0f1497f39)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/299135
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=19ea6ba92379168e1bfff7a7235119cfbbc0172c
Submitter: Jenkins
Branch: stable/mitaka

commit 19ea6ba92379168e1bfff7a7235119cfbbc0172c
Author: Hynek Mlnarik <email address hidden>
Date: Wed Mar 23 14:51:59 2016 +0100

    Remove obsolete keepalived PID files before start

    keepalived refuses to start and claims "daemon already started"
    when there is already a process with the same PID as found in
    either the VRRP or the main process PID file. This happens even
    in case when the new process is not keepalived. The situation
    can happen when the neutron node is reset and the obsolete PID
    files are not cleaned before neutron is started.

    This commit adds PID file cleanup before keepalived start.

    Closes-Bug: 1561046
    Change-Id: Ib6b6f2fe76fe82253f195c9eab6b243d9eb76fa2
    (cherry picked from commit e98fabb5836b12bc40a2b64a2668893ea73c2320)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/299774
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=04fb1476de3b9d81ad579586c7010b5cdd2248a2
Submitter: Jenkins
Branch: stable/mitaka

commit 04fb1476de3b9d81ad579586c7010b5cdd2248a2
Author: Hynek Mlnarik <email address hidden>
Date: Wed Mar 30 10:44:09 2016 +0200

    Refactor and fix dummy process fixture

    Extracting the test fixture that creates a new process and leaves it
    running for a given amount of time into helpers where other fixtures for
    functional tests live. This both keeps the fixtures at one place and
    increases visibility of the fixture so that it can be reused in other
    tests. At the same time, the fixture is fixed as the original code
    omitted starting the process.

    Change-Id: I97aeb8d1d5773ef3d59e8f908aea34ccceb38378
    Related-Bug: 1561046
    (cherry picked from commit 2690eed19a749fb1b50bb38f3d01fce0f1497f39)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 8.1.0

This issue was fixed in the openstack/neutron 8.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/314250

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)
Download full text (36.9 KiB)

Reviewed: https://review.openstack.org/314250
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3bf73801df169de40d365e6240e045266392ca63
Submitter: Jenkins
Branch: master

commit a323769143001d67fd1b3b4ba294e59accd09e0e
Author: Ryan Moats <email address hidden>
Date: Tue Oct 20 15:51:37 2015 +0000

    Revert "Improve performance of ensure_namespace"

    This reverts commit 81823e86328e62850a89aef9f0b609bfc0a6dacd.

    Unneeded optimization: this commit only improves execution
    time on the order of milliseconds, which is less than 1% of
    the total router update execution time at the network node.

    This also

    Closes-bug: #1574881

    Change-Id: Icbcdf4725ba7d2e743bb6761c9799ae436bd953b

commit 7fcf0253246832300f13b0aa4cea397215700572
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Apr 21 07:05:16 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I9e930750dde85a9beb0b6f85eeea8a0962d3e020

commit 643b4431606421b09d05eb0ccde130adbf88df64
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Apr 19 06:52:48 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I52d7460b3265b5460b9089e1cc58624640dc7230

commit 1ffea42ccdc14b7a6162c1895bd8f2aae48d5dae
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Apr 18 15:03:30 2016 +0000

    Updated from global requirements

    Change-Id: Icb27945b3f222af1d9ab2b62bf2169d82b6ae26c

commit b970ed5bdac60c0fa227f2fddaa9b842ba4f51a7
Author: Kevin Benton <email address hidden>
Date: Fri Apr 8 17:52:14 2016 -0700

    Clear DVR MAC on last agent deletion from host

    Once all agents are deleted from a host, the DVR MAC generated
    for that host should be deleted as well to prevent a buildup of
    pointless flows generated in the OVS agent for hosts that don't
    exist.

    Closes-Bug: #1568206
    Change-Id: I51e736aa0431980a595ecf810f148ca62d990d20
    (cherry picked from commit 92527c2de2afaf4862fddc101143e4d02858924d)

commit eee9e58ed258a48c69effef121f55fdaa5b68bd6
Author: Mike Bayer <email address hidden>
Date: Tue Feb 9 13:10:57 2016 -0500

    Add an option for WSGI pool size

    Neutron currently hardcodes the number of
    greenlets used to process requests in a process to 1000.
    As detailed in
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/082717.html

    this can cause requests to wait within one process
    for available database connection while other processes
    remain available.

    By adding a wsgi_default_pool_size option functionally
    identical to that of Nova, we can lower the number of
    greenlets per process to be more in line with a typical
    max database connection pool size.

    DocImpact: a previously unused configuration value
               wsgi_default_pool_size is now used to a...

tags: added: neutron-proactive-backport-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 7.1.0

This issue was fixed in the openstack/neutron 7.1.0 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 9.0.0.0b1

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/912047

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/912047
Committed: https://opendev.org/openstack/neutron/commit/d3a8c9ca0f668cfefc271d7db01dbf0badbbecec
Submitter: "Zuul (22348)"
Branch: master

commit d3a8c9ca0f668cfefc271d7db01dbf0badbbecec
Author: Arefiev Anton <email address hidden>
Date: Mon Mar 11 11:53:48 2024 +0200

    Clean up state VRRP PID file

    Change Id62bf18067d0b144c3e8825c7603cc1e51dca052 removes explicit
    PID files clean up for keepalived and brings regression as
    there is no 'process enable' for VRRP.

    Always delete stale PID file if exists

    Related-Bug: 1561046
    Change-Id: I95a004a3acbe6a9160a19053a37fc0dd2b1875a5

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.