L3 agent doesn't restart metadata-proxy

Bug #1298405 reported by Gleb on 2014-03-27
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Medium
Ryan Moe

Bug Description

{"ostf_sha": "83ada35fec2664089e07fdc0d34861ae2a4d948a", "fuelmain_sha": "17eed776b30886851ae0042fa7a30184f5cd8eb6", "astute_sha": "8b2059a37be9bd82df49f684822727b4df4c511b", "release": "4.0", "nailgun_sha": "ac02e18990cd652db6577ce42bdea9838076c63c", "fuellib_sha": "098f381ff8a528a39d3b6f17ea70955baeb159e8"}

After I restarted L3 agent it doesn't restart the metadata-proxies. So they doesn't works properly.

Changed in fuel:
importance: Undecided → Medium
assignee: nobody → Sergey Vasilenko (xenolog)
milestone: none → 5.0
Andrew Woodward (xarses) on 2014-03-28
Changed in fuel:
importance: Medium → High
Changed in fuel:
status: New → Confirmed
Vladimir Kuklin (vkuklin) wrote :

it is completely unknown if this bug was ever reproduced with 4.1 version and Icehouse release. we need a reproducer first. also there is no logs and environment description.

Changed in fuel:
importance: High → Medium
status: Confirmed → Incomplete
Vladimir Kuklin (vkuklin) wrote :

Andrew Woodward confirmed that this bug was affecting 4.1.1:

"If neutron-l3-agent was previously running on a other box than the namespace (with no interfaces), and the neutron-ns-metadata-proxy are left behind. If the service returns to this node, then it will not start properly unless neutron-ns-metadata-proxy is killed first. This leaves the router ports down"

Changed in fuel:
status: Incomplete → Confirmed
Sergey Vasilenko (xenolog) wrote :

Should be closed after merge https://review.openstack.org/#/c/89872/

Changed in fuel:
status: Confirmed → Fix Committed
status: Fix Committed → In Progress
Changed in fuel:
status: In Progress → Fix Committed
Andrew Woodward (xarses) wrote :

Ryan found that there may be an issue with this explicitly from CentOS, this may bury the issue he is seeing

Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Ryan Moe (rmoe)
status: Fix Committed → In Progress
Vladimir Kuklin (vkuklin) wrote :

guys, please provide full debug info why you are reopening this bug. until then, I close it as fix committed

Changed in fuel:
status: In Progress → Fix Committed
Andrew Woodward (xarses) wrote :

already backported to 4.1.1

tags: added: backports-4.1.1
Changed in fuel:
milestone: 5.0 → 4.1.1
Ryan Moe (rmoe) wrote :

This was only an issue on CentOS.

Pacemaker has a hard-coded umask of 0026. The resource agent for the neutron-l3-agent inherits this umask and so do the python processes it launches. The ns-metadata-proxy (launched by the l3-agent) writes its pid file as root:root and because of the inherited umask the permissions end up 0751. When the l3-agent stops it attempts to kill the ns-metadata-proxy by reading the pid file and killing the process. The file read happens as the neutron user who does not have read permissions to the pid file.

This leaves an ns-metadata-proxy process running on the system with an exclusive lock to its pid file. When the l3-agent is started again it will attempt to spawn another ns-metadata-proxy and lock the same pid file as the existing process. This call[0] will block indefinitely in this case. This prevents the l3-agent from creating the interfaces for the router. Related bug is here: https://bugs.launchpad.net/neutron/+bug/1315507

[0] https://github.com/openstack/neutron/blob/master/neutron/agent/linux/daemon.py#L40

Mike Scherbakov (mihgen) on 2014-05-08
tags: added: release-notese
ram (ravipaty) on 2014-06-19
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers