CI: Undercloud install failed due to timeout while setting up mistral
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Invalid
|
Critical
|
Unassigned |
Bug Description
While running the undercloud install, the mistral setup timed out. This happened on a stable/pike CI job in the gate.
2017-11-07 16:40:58 | 2017-11-07 16:40:58,959 ERROR: TIMEOUT waiting for execution 3382c596-
2017-11-07 16:40:58 | 2017-11-07 16:40:58,960 DEBUG: An exception occurred
2017-11-07 16:40:58 | Traceback (most recent call last):
2017-11-07 16:40:58 | File "/usr/lib/
2017-11-07 16:40:58 | _post_config(
2017-11-07 16:40:58 | File "/usr/lib/
2017-11-07 16:40:58 | _post_config_
2017-11-07 16:40:58 | File "/usr/lib/
2017-11-07 16:40:58 | _create_
2017-11-07 16:40:58 | File "/usr/lib/
2017-11-07 16:40:58 | fail_on_error=True)
2017-11-07 16:40:58 | File "/usr/lib/
2017-11-07 16:40:58 | raise RuntimeError(
2017-11-07 16:40:58 | RuntimeError: TIMEOUT waiting for execution 3382c596-
2017-11-07 16:40:58 | 2017-11-07 16:40:58,960 ERROR:
tags: | added: workflows |
To me it sounds like an hypervisor issue of the undercloud host. We can see the following errors in messages:
INFO: task kworker/u16:1:72 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. workfn (flush-252:0) 4a9>] schedule_ preempt_ disabled+ 0x29/0x70 3d7>] __mutex_ lock_slowpath+ 0xc7/0x1d0 7ef>] mutex_lock+ 0x1f/0x2f 4a8>] __jbd2_ log_wait_ for_space+ 0xc8/0x1f0 [jbd2] 3d3>] add_transaction _credits+ 0x2d3/0x2f0 [jbd2] a1a>] ? __blk_mq_ run_hw_ queue+0x9a/ 0xb0 5e1>] start_this_ handle+ 0x1a1/0x430 [jbd2] 78b>] ? blk_mq_ flush_plug_ list+0x13b/ 0x160 64a>] ? kmem_cache_ alloc+0x1ba/ 0x1e0 a93>] jbd2__journal_ start+0xf3/ 0x1e0 [jbd2] afc>] ? ext4_writepages +0x42c/ 0xd30 [ext4] ad9>] __ext4_ journal_ start_sb+ 0x69/0xe0 [ext4] afc>] ext4_writepages +0x42c/ 0xd30 [ext4] 02e>] do_writepages+ 0x1e/0x40 8e0>] __writeback_ single_ inode+0x40/ 0x220 524>] writeback_ sb_inodes+ 0x1c4/0x490 88f>] __writeback_ inodes_ wb+0x9f/ 0xd0 0c3>] wb_writeback+ 0x263/0x2f0 d4c>] ? get_nr_ inodes+ 0x4c/0x70 56b>] bdi_writeback_ workfn+ 0x2cb/0x460 82a>] process_ one_work+ 0x17a/0x440 4f6>] worker_ thread+ 0x126/0x3c0 3d0>] ? manage_ workers. isra.24+ 0x2a0/0x2a0 99f>] kthread+0xcf/0xe0 8d0>] ? insert_ kthread_ work+0x40/ 0x40 fd8>] ret_from_ fork+0x58/ 0x90 8d0>] ? insert_ kthread_ work+0x40/ 0x40
kernel: "echo 0 > /proc/sys/
kernel: kworker/u16:1 D ffff88003657e0f8 0 72 2 0x00000000
kernel: Workqueue: writeback bdi_writeback_
kernel: ffff880234def830 0000000000000046 ffff880234dbbf40 ffff880234deffd8
kernel: ffff880234deffd8 ffff880234deffd8 ffff880234dbbf40 ffff88003657e0f0
kernel: ffff88003657e0f4 ffff880234dbbf40 00000000ffffffff ffff88003657e0f8
kernel: Call Trace:
kernel: [<ffffffff816aa
kernel: [<ffffffff816a8
kernel: [<ffffffff816a7
kernel: [<ffffffffc0128
kernel: [<ffffffffc0122
kernel: [<ffffffff81302
kernel: [<ffffffffc0122
kernel: [<ffffffff81305
kernel: [<ffffffff811df
kernel: [<ffffffffc0122
kernel: [<ffffffffc0149
kernel: [<ffffffffc0177
kernel: [<ffffffffc0149
kernel: [<ffffffff8118f
kernel: [<ffffffff8122d
kernel: [<ffffffff8122e
kernel: [<ffffffff8122e
kernel: [<ffffffff8122f
kernel: [<ffffffff8121b
kernel: [<ffffffff8122f
kernel: [<ffffffff810a8
kernel: [<ffffffff810a9
kernel: [<ffffffff810a9
kernel: [<ffffffff810b0
kernel: [<ffffffff810b0
kernel: [<ffffffff816b4
kernel: [<ffffffff810b0
For a bunch of processes. Looks like the VM hang for about 10 minutes, and then recover. I think mistral handled it fine, but it took 11 minutes to handle the task, and the caller timed out at 10. Not sure there is anything specific to fix.