akanda-rug hot-plugging failures can lead to an endless state machine loop

Bug #1466623 reported by Ryan Petrello
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Astara
Fix Released
High
Ryan Petrello
akanda
Fix Released
High
Ryan Petrello

Bug Description

It's possible for akanda-rug's state machine to fall into an endless "hot-plug of death" loop, e.g.,

ConfigureVM -> Replug -> ConfigureVM -> ReplugVM -> etc...

...if the `nova interface-attach` or `nova interface-detach` calls fail for any reason. In certain scenarios, this causes all of the akanda-rug worker processes to be stuck in this busy loop, thus causing the rug to completely stop processing events. Below is a sample log output we observed that exhibits this behavior:

2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:ConfigureVM.execute(poll) vm.state=replug
2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:Begin router config
2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:MACs found: A
2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:MACs expected: A, B, C
2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:Interfaces aren't plugged as expected.
2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:ConfigureVM.execute -> poll vm.state=replug
2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:ConfigureVM.transition(poll) -> ReplugVM vm.state=replug
2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:ReplugVM.execute(poll) vm.state=replug
2015-06-18 17:39:30:DEBUG:akanda.rug.state.XYZ:p01:t01:Attempting to replug...
2015-06-18 17:39:31:DEBUG:akanda.rug.state.XYZ:p01:t01:New port <port-uuid>, <mac-redacted> found, plugging...
2015-06-18 17:39:31:ERROR:akanda.rug.state.XYZ:p01:t01:ReplugVM.execute() failed for action: poll
Traceback (most recent call last):
  File "akanda/rug/state.py", line 430, in update
    worker_context,
  File "akanda/rug/state.py", line 266, in execute
    self.vm.replug(worker_context)
  File "akanda/rug/vm_manager.py", line 393, in replug
    instance.interface_attach(port.id, None, None)
  File "novaclient/v2/servers.py", line 405, in interface_attach
    return self.manager.interface_attach(self, port_id, net_id, fixed_ip)
  File "novaclient/v2/servers.py", line 1234, in interface_attach
    body, 'interfaceAttachment')
...
BadRequest: Port <port-uuid> is still in use. (HTTP 400) (Request-ID: <redacted>)

...and then the state machine returns to ConfigureVM and loops with this exception endlessly.

Revision history for this message
Ryan Petrello (ryan-petrello) wrote :

Also, I have a working patchset for this bug forthcoming.

Changed in akanda:
assignee: nobody → Ryan Petrello (ryan-petrello)
Changed in akanda:
status: New → In Progress
Changed in akanda:
importance: Undecided → High
milestone: none → liberty-1
Changed in akanda:
status: In Progress → Fix Committed
Sean Roberts (sarob)
Changed in akanda:
status: Fix Committed → Fix Released
Changed in astara:
milestone: none → 7.0.0
Changed in akanda:
milestone: liberty-1 → 7.0.0
Changed in astara:
assignee: nobody → Ryan Petrello (ryan-petrello)
importance: Undecided → High
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.