Pacemaker neutron agent scripts start/stop/migration will fail if management vip moved recently

Bug #1287716 reported by Matthew Mosesohn
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
4.1.x
Fix Committed
High
Fuel Library (Deprecated)

Bug Description

{"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"}

Steps to reproduce:
1 - Deploy Ubuntu HA (Cinder LVM backend, Swift glance backend, Neutron with GRE segmentation) 3 computes - 1 controller - 1 storage
2 - Log into first controller and run crm_resource -r vip__management_old --move --node node-3 (NOTE: replace node-3 where it is the nonprimary controller)
3 - Wait ~60s for keystone and other services to recover
4 - Run neutron agent-list

Results:

# neutron agent-list
+--------------------------------------+--------------------+--------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------+-------+----------------+
| 09699e60-aa51-4a66-bf0f-bb8eeab49da5 | L3 agent | node-3 | xxx | True |
| 12236192-8980-4068-8ed8-adc94eb1f681 | Open vSwitch agent | node-1 | :-) | True |
| 2c0ec06d-087c-4e45-b066-403ce6a97f51 | Open vSwitch agent | node-2 | :-) | True |
| ad4c9181-6a26-4b4c-be22-214c3df2514e | DHCP agent | node-1 | xxx | True |
| bd893993-5768-4182-ac4a-ff71e7905a64 | Open vSwitch agent | node-3 | :-) | True |
| f7451cfd-600a-444b-8d36-2af7b21714c3 | Open vSwitch agent | node-4 | :-) | True |

# crm resource show | egrep 'l3|dhcp'
 p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started (unmanaged) FAILED
 p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started (unmanaged) FAILED

From l3 agent logs:
p_neutron-l3-agent_start_0:4166:stderr [ Traceback (most recent call last): ]
p_neutron-l3-agent_start_0:4166:stderr [ File "/usr/bin/q-agent-cleanup.py", line 525, in <module> ]
p_neutron-l3-agent_start_0:4166:stderr [ cleaner = NeutronCleaner(get_authconfig(args.authconf), options=vars(args), log=LOG) ]
p_neutron-l3-agent_start_0:4166:stderr [ File "/usr/bin/q-agent-cleanup.py", line 106, in __init__ ]
p_neutron-l3-agent_start_0:4166:stderr [ raise e ]
p_neutron-l3-agent_start_0:4166:stderr [ keystoneclient.apiclient.exceptions.AuthorizationFailure: Authorization Failed: An unexpected erro
(2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None (HTTP 500) ]
p_neutron-dhcp-agent_start_0:4153:stderr [ Traceback (most recent call last): ]
p_neutron-dhcp-agent_start_0:4153:stderr [ File "/usr/bin/q-agent-cleanup.py", line 525, in <module> ]
p_neutron-dhcp-agent_start_0:4153:stderr [ cleaner = NeutronCleaner(get_authconfig(args.authconf), options=vars(args), log=LOG) ]
p_neutron-dhcp-agent_start_0:4153:stderr [ File "/usr/bin/q-agent-cleanup.py", line 106, in __init__ ]
p_neutron-dhcp-agent_start_0:4153:stderr [ raise e ]

We should tune OCF scripts and/or q-agent-cleanup.py to be more tolerant of keystone being unavailable for up to 2 minutes.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :
tags: added: library neutron
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/77895

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergey Vasilenko (xenolog)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/4.1)

Fix proposed to branch: stable/4.1
Review: https://review.openstack.org/78067

Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Dmitry Borodaenko (dborodaenko)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/77895
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=48ced96753378e49883cdade8957160ef1b29899
Submitter: Jenkins
Branch: master

commit 48ced96753378e49883cdade8957160ef1b29899
Author: Sergey Vasilenko <email address hidden>
Date: Tue Mar 4 18:42:20 2014 +0400

    Make Neutron L3/DHCP agents OCF script more tolerant

    to mysql and keystone temporary fails.

    Change-Id: Iaf5d5b49932c1dc4db6bca0563607972150f4cf4
    Closes-bug: #1287716

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/4.1)

Reviewed: https://review.openstack.org/78067
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=73313007c0914e602246ea41fa5e8ca2dfead9f8
Submitter: Jenkins
Branch: stable/4.1

commit 73313007c0914e602246ea41fa5e8ca2dfead9f8
Author: Sergey Vasilenko <email address hidden>
Date: Tue Mar 4 18:42:20 2014 +0400

    Make Neutron L3/DHCP agents OCF script more tolerant

    to mysql and keystone temporary fails.

    Change-Id: Iaf5d5b49932c1dc4db6bca0563607972150f4cf4
    Closes-bug: #1287716

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/78178

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/78178
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=70e813d5b6b26dba0cd763ce24eab27747f4b573
Submitter: Jenkins
Branch: master

commit 70e813d5b6b26dba0cd763ce24eab27747f4b573
Author: Sergey Vasilenko <email address hidden>
Date: Wed Mar 5 15:40:04 2014 +0400

    Make Neutron L3/DHCP agents OCF script more tolerant to mysql and keystone temporary fails.

    In this implementation cleanup-script does not get information from Neutron API.
    Script inspects network namespaces on this node for given agent type and removes
    found ports from integration bridge.

    Closes-bug: #1287716
    Partial-bug: #1285929
    Change-Id: I2dfb31f240dca652341c4623f237f6a143414448

tags: added: in progress
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote : Re: Neutron L3/DHCP agents fail when VIP fails over

verified on fuel_5_0_iso#29

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: in progress
tags: added: backports-4.1.1
Andrew Woodward (xarses)
tags: added: ha
Revision history for this message
Andrew Woodward (xarses) wrote :

at a glance, it appears that 70e813d5b6b26dba0cd763ce24eab27747f4b573 was not backported

Changed in fuel:
status: Fix Released → Triaged
Changed in fuel:
assignee: Dmitry Borodaenko (dborodaenko) → Sergey Vasilenko (xenolog)
Andrew Woodward (xarses)
summary: - Neutron L3/DHCP agents fail when VIP fails over
+ Pacemaker neutron agent scripts start/stop/migration will fail if
+ management vip moved recently
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/4.1)

Fix proposed to branch: stable/4.1
Review: https://review.openstack.org/96840

no longer affects: fuel/5.0.x
no longer affects: fuel
Revision history for this message
Meg McRoberts (dreidellhasa) wrote :

Documented in 4.1.1 Release Notes

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.