There's an race condition in setting console mode for shellinabox

Bug #1587313 reported by Zhenguo Niu
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Medium
Dmitry Galkin

Bug Description

Currently, the locks of node are released before really starting/stopping shellinabox process, then there can be multi threads running the codes at the same time, the pid file may be unlinked by thread A, and thread B gets success returned value but pid file doesn't exit, it will be stuck in popen.communicate().

Changed in ironic:
assignee: nobody → Zhenguo Niu (niu-zglinux)
description: updated
Changed in ironic:
status: New → Confirmed
Revision history for this message
Zhenguo Niu (niu-zglinux) wrote :

#!/bin/bash

for i in `seq 1 10000`; do
   echo $i
   ironic node-set-console-mode bm-1 true
   ironic node-set-console-mode bm-1 false
done

when testing with be above script, the conductor will be stuck after a few loop.

Sam Betts (sambetts)
Changed in ironic:
importance: Undecided → Medium
Revision history for this message
Ruby Loo (rloo) wrote :

hi Zhenguo, are you still working on this?

Changed in ironic:
status: Confirmed → Triaged
Revision history for this message
Vladyslav Drok (vdrok) wrote :

Jay, do you know the reason of this? Looking at the code, I'm not really sure what's the problem here. Otherwise, I'd move it back to confirmed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/603812

Changed in ironic:
assignee: Zhenguo Niu (niu-zglinux) → Dmitry Galkin (galkindmitrii)
status: Triaged → In Progress
Revision history for this message
Dmitry Galkin (galkindmitrii) wrote :

Hi,

We had exactly this problem with shellinabox. Node gets locked on console enable call and is not automatically unlocked. Console or any operations do not work after. Easily reproducible with bash script from above.

With the patch submitted I was not able to lock the node by triggering console enable/disable in a loop.

I assume that might be happening with socat as well, but we run shellinabox and I don't have another environment to test.

Changed in ironic:
assignee: Dmitry Galkin (galkindmitrii) → Julia Kreger (juliaashleykreger)
Changed in ironic:
assignee: Julia Kreger (juliaashleykreger) → Dmitry Galkin (galkindmitrii)
Changed in ironic:
assignee: Dmitry Galkin (galkindmitrii) → Julia Kreger (juliaashleykreger)
Changed in ironic:
assignee: Julia Kreger (juliaashleykreger) → Dmitry Galkin (galkindmitrii)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/603812
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=0256e1e69f4dd38cf0a80532337fa30c89502cb4
Submitter: Zuul
Branch: master

commit 0256e1e69f4dd38cf0a80532337fa30c89502cb4
Author: Dmitry Galkin <email address hidden>
Date: Wed Sep 19 14:02:48 2018 +0000

    Fix node exclusive lock not released on console start/restart.

    This patch forces kill of console process with SIGKILL if it did not
    terminate on SIGTERM within the CONF.console.kill_timeout and reads the
    shellinabox subprocess stdout/stderr after CONF.console.subprocess_timeout
    or if subprocess exited with non zero code.

    Change-Id: I55a112d877d94f31d27487846ff59fe27f602f8b
    Closes-Bug: 1587313
    Story: 1587313
    Task: 9654

Changed in ironic:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic 12.0.0

This issue was fixed in the openstack/ironic 12.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.