AmphoraPostVIPPlug sometimes fails resulting in 409 result which leads to failover failure

Bug #1614311 reported by Michael Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
Fix Released
High
Unassigned

Bug Description

This was observed when running the "Fixes failover flow with namespace driver" patch https://review.openstack.org/#/c/312270 commit 8448056a4ac4ac16985506b65bd28a0ddf85d337
Though I'm not convinced this issue is related to this patch, thus the bug report.

Occasionally(tm) when we run the AmphoraPostVIPPlug task, when we loop sending our POST request to the amphora agent, we end up in a situation where the interface we are plugging is already present in the network namespace. I have added code that checks for this and returns a 409 "Conflict" to the controller and stops processing the post-vip-plug request.

My assumption here was that one of the previous AmphoraPostVIPPlug events had actually succeeded but the requests call from the controller had timed out before the task was finished and returned it's 202. This was because the amphora-agent log reflected that it had returned a 202 response just prior to the 409.

However, during testing failover in an active/standby topology I found that after this 409 situation had occurred, the eth1.cfg file showed that eth1 was set to use DHCP where it should have be configured with a static address. This could happen if the vrrp_ip was not passed into the plug_vip call.

Ultimately this causes the failover to fail with this error in the amphora-agent log:
2016-08-17 23:34:17.891 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:34:17] "GET /0.5/interface/10.0.0.12 HTTP/1.1" 404 -

We need to figure out why/how the eth1 interface was not properly configured with a static IP address.

amphora log:
cat amphora-agent.log
2016-08-17 23:29:08.716 1076 INFO octavia.common.config [-] Logging enabled!
2016-08-17 23:29:08.717 1076 WARNING oslo_reports.guru_meditation_report [-] Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
2016-08-17 23:29:08.724 1350 INFO octavia.amphorae.backends.health_daemon.health_daemon [-] Health Manager Sender starting.
2016-08-17 23:29:08.751 1076 INFO werkzeug [-] * Running on https://0.0.0.0:9443/ (Press CTRL+C to quit)
2016-08-17 23:29:08.755 1076 INFO werkzeug [-] * Restarting with stat
2016-08-17 23:29:09.161 1357 INFO octavia.common.config [-] Logging enabled!
2016-08-17 23:29:09.161 1357 WARNING oslo_reports.guru_meditation_report [-] Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
2016-08-17 23:29:09.163 1357 WARNING werkzeug [-] * Debugger is active!
2016-08-17 23:29:09.182 1357 INFO werkzeug [-] * Debugger pin code: 302-258-694
2016-08-17 23:29:09.211 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:29:09] "PUT /0.5/listeners/04d7f9ba-a36c-4e02-9f45-8609654a42af/901ce627-3059-4f49-9445-f7174af1a53a/haproxy HTTP/1.1" 202 -
2016-08-17 23:29:09.220 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:29:09] "GET /0.5/listeners/901ce627-3059-4f49-9445-f7174af1a53a HTTP/1.1" 200 -
2016-08-17 23:29:09.267 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:29:09] "PUT /0.5/listeners/901ce627-3059-4f49-9445-f7174af1a53a/start HTTP/1.1" 202 -
2016-08-17 23:34:12.087 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:34:12] "POST /0.5/plug/vip/10.0.0.4 HTTP/1.1" 202 -
2016-08-17 23:34:12.124 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:34:12] "POST /0.5/plug/vip/10.0.0.4 HTTP/1.1" 409 -
2016-08-17 23:34:17.737 1357 INFO octavia.amphorae.backends.agent.api_server.plug [-] Plugged interface eth1 will become eth2 in the namespace amphora-haproxy
2016-08-17 23:34:17.795 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:34:17] "POST /0.5/plug/network HTTP/1.1" 202 -
2016-08-17 23:34:17.824 1357 INFO octavia.amphorae.backends.agent.api_server.plug [-] Plugged interface eth2 will become eth3 in the namespace amphora-haproxy
2016-08-17 23:34:17.863 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:34:17] "POST /0.5/plug/network HTTP/1.1" 202 -
2016-08-17 23:34:17.891 1357 INFO werkzeug [-] 192.168.0.9 - - [17/Aug/2016 23:34:17] "GET /0.5/interface/10.0.0.12 HTTP/1.1" 404 -

Relevant controller log:
2016-08-17 23:29:12.751 88576 DEBUG octavia.controller.worker.controller_worker [-] Task 'octavia.controller.worker.tasks.network_tasks.PlugVIPPort' (c6bca2f8-db9c-4e18-adf5-7019d12426e4) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'None' _ta
sk_receiver /usr/local/lib/python2.7/dist-packages/taskflow/listeners/logging.py:178
2016-08-17 23:29:12.753 88576 DEBUG octavia.controller.worker.controller_worker [-] Task 'octavia.controller.worker.tasks.amphora_driver_tasks.AmphoraPostVIPPlug' (39c3427d-2caf-44cd-851a-30b2f637a943) transitioned into state 'RUNNING' from state 'PENDING' _task_re
ceiver /usr/local/lib/python2.7/dist-packages/taskflow/listeners/logging.py:189
2016-08-17 23:29:12.754 88576 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url plug/vip/10.0.0.4 request /opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py:239
2016-08-17 23:29:12.754 88576 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url https://192.168.0.12:9443/0.5/plug/vip/10.0.0.4 request /opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py:242
2016-08-17 23:30:12.820 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:30:23.834 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:30:34.839 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:30:45.854 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:30:56.871 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:31:07.877 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:31:18.893 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:31:29.909 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:31:40.922 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:31:51.942 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:32:02.953 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:32:13.969 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:32:24.988 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:32:36.000 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:32:47.016 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:32:58.032 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:33:09.047 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:33:20.068 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:33:31.083 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:33:42.096 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:33:53.106 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:34:04.119 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.
2016-08-17 23:34:14.333 88576 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connected to amphora. Response: <Response [409]> request /opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py:271
2016-08-17 23:34:14.334 88576 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] VIP with MAC fa:16:3e:7d:e8:b7 already exists on amphora, skipping post_vip_plug
2016-08-17 23:34:14.334 88576 DEBUG octavia.controller.worker.tasks.amphora_driver_tasks [-] Notified amphora of vip plug execute /opt/stack/octavia/octavia/controller/worker/tasks/amphora_driver_tasks.py:230
2016-08-17 23:34:14.335 88576 DEBUG octavia.controller.worker.controller_worker [-] Task 'octavia.controller.worker.tasks.amphora_driver_tasks.AmphoraPostVIPPlug' (39c3427d-2caf-44cd-851a-30b2f637a943) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'None' _task_receiver /usr/local/lib/python2.7/dist-packages/taskflow/listeners/logging.py:178

2016-08-17 23:34:20.100 88576 DEBUG octavia.controller.worker.controller_worker [-] Task 'octavia-mark-amp-backup-indb' (a8929c95-df33-45f0-877a-9366729466d1) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'None' _task_receiver /usr/local/lib/py
thon2.7/dist-packages/taskflow/listeners/logging.py:178
2016-08-17 23:34:20.102 88576 DEBUG octavia.controller.worker.controller_worker [-] Task 'BACKUP-octavia-get-vrrp-subflow-octavia-amphora-update-vrrp-intf' (e10572c5-2630-43bc-991d-ef292c748dba) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver
/usr/local/lib/python2.7/dist-packages/taskflow/listeners/logging.py:189
2016-08-17 23:34:20.102 88576 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url interface/10.0.0.12 request /opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py:239
2016-08-17 23:34:20.102 88576 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url https://192.168.0.12:9443/0.5/interface/10.0.0.12 request /opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py:242
2016-08-17 23:34:20.121 88576 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connected to amphora. Response: <Response [404]> request /opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py:271
2016-08-17 23:34:20.135 88576 WARNING octavia.controller.worker.controller_worker [-] Task 'BACKUP-octavia-get-vrrp-subflow-octavia-amphora-update-vrrp-intf' (e10572c5-2630-43bc-991d-ef292c748dba) transitioned into state 'FAILURE' from state 'RUNNING'
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker Traceback (most recent call last):
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker File "/usr/local/lib/python2.7/dist-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker result = task.execute(**arguments)
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker File "/opt/stack/octavia/octavia/controller/worker/tasks/amphora_driver_tasks.py", line 284, in execute
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker interface = self.amphora_driver.get_vrrp_interface(amp)
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker File "/opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 159, in get_vrrp_interface
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker return self.client.get_interface(amphora, amphora.vrrp_ip)['interface']
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker File "/opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 365, in get_interface
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker if exc.check_exception(r):
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker File "/opt/stack/octavia/octavia/amphorae/drivers/haproxy/exceptions.py", line 32, in check_exception
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker raise responses[status_code]()
2016-08-17 23:34:20.135 88576 ERROR octavia.controller.worker.controller_worker NotFound: Not Found

Revision history for this message
Michael Johnson (johnsom) wrote :
Changed in octavia:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.