Activity log for bug #1814595

Date Who What changed Old value New value Message
2019-02-04 22:20:10 Wendy Mitchell bug added bug
2019-02-04 22:28:25 Wendy Mitchell attachment added kern log Feb1 https://bugs.launchpad.net/starlingx/+bug/1814595/+attachment/5235980/+files/kern.log
2019-02-04 22:30:45 Wendy Mitchell attachment added kern2 log Feb 4 https://bugs.launchpad.net/starlingx/+bug/1814595/+attachment/5235981/+files/kern2.log
2019-02-08 19:35:42 Ghada Khalil starlingx: importance Undecided Medium
2019-02-08 19:35:48 Ghada Khalil starlingx: status New Triaged
2019-02-08 19:37:51 Ghada Khalil description Brief Description ----------------- Kernel errors in repetitive migration operations (Containerized STX load) Linux version 3.10.0-862.11.6.el7.36.tis.x86_64 (mockbuild@yow-cgts1-lx) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP PREEMPT Wed Jan 30 20:30:03 EST 2019 2019-01-31T19:12:48.459 localhost kernel: info [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.36.tis.x86_64 root=UUID=c597a45c-3bd7-4d12-95aa-c750c859f605 ro module_blacklist=integrity,ima audit=0 tboot=false crashkernel=auto console=ttyS0,115200n8 intel_iommu=off usbcore.autosuspend=-1 loop.max_part=15 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 biosdevname=0 user_namespace.enable=1 Severity -------- Standard Steps to Reproduce ------------------ Run instance launch and migrations (live then cold migration) Repeat again (over the course of several hours) instance was created here req-c320289f-13b0-43e9-815a-e47b59562a02 Create Feb. 1, 2019, 9:39 p.m. eeea9bd3d3fb4019bcc6c0884 Live migrated here req-05faa0f4-ddad-48dc-9bd9-baba546297f0 live-migration Feb. 1, 2019, 9:45 p.m. 393c0829f21346f998e165bd25eb4acf - Cold migration here req-64f7de0a-aa64-427a-bfd4-bf52d6aacf00 migrate Feb. 1, 2019, 9:49 p.m. 393c0829f21346f998e165bd25eb4acf Error Cold-Migrate-Confirm issued by tenant2 against instance tenant2-volume-47 owned by tenant2 on host compute-1 Cold-Migrate complete for instance tenant2-volume-47 now enabled on host compute-1 waiting for confirmation Cold migration confirm here req-55876933-4300-4861-9d6e-3a680450988a Confirm Resize Feb. 1, 2019, 9:50 p.m. 393c0829f21346f998e165bd25eb4acf Error Cold-Migrate-Confirm failed for instance tenant2-volume-47 on host compute-1 Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1 Then the instance reboots a few times to try to recover here req-7c055100-d09d-4ee6-9bd2-a7237560452a Reboot Feb. 1, 2019, 10:05 p.m. 393c0829f21346f998e165bd25eb4acf - req-14b5c560-1c71-4cf9-84c3-42d3103c2d02 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf - req-256f317b-0ebd-4721-b2b6-2acfaf01f763 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf - Expected Behavior ------------------ Actual Behavior ---------------- Kernel Call trace hung task (see below) see nfv-vim-events.log log-id = 636 event-id = instance-failed event-type = action-event event-context = admin importance = high entity = tenant=bac8cdbe-598d-46ff-9d06-eab470342834.instance=f43b2460-bc67-4a08-a497-6953591490f0 reason_text = Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1 additional_text = timestamp = 2019-02-01 21:51:39.050201 controller-0 kern.log 2019-02-01T21:52:47.372 controller-0 kernel: err [87714.235526] INFO: task jbd2/rbd1-8:117573 blocked for more than 120 seconds. 2019-02-01T21:52:47.372 controller-0 kernel: err [87714.243402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2019-02-01T21:52:47.372 controller-0 kernel: info [87714.252189] jbd2/rbd1-8 D ffff908b11838000 0 117573 2 0x00000000 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252198] Call Trace: 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252208] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252214] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252218] [<ffffffffaf9be399>] schedule+0x29/0x70 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252233] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252238] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252243] [<ffffffffaf4eb57e>] ? blk_flush_plug_list+0xce/0x230 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252248] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252253] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252257] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252263] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252273] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252288] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252297] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252306] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252313] [<ffffffffaf4e9a40>] ? submit_bio+0x70/0x150 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252318] [<ffffffffaf41e387>] ? bio_alloc_bioset+0xd7/0x220 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252322] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252325] [<ffffffffaf365d07>] filemap_fdatawait+0x27/0x30 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252349] [<ffffffffc0ddf3ff>] jbd2_journal_commit_transaction+0xb4f/0x1c90 [jbd2] 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252354] [<ffffffffaf2ba8dd>] ? sched_clock_cpu+0xad/0xf0 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252358] [<ffffffffaf21a59e>] ? __switch_to+0xce/0x5c0 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252361] [<ffffffffaf2b08be>] ? finish_task_switch+0x4e/0x160 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252367] [<ffffffffc0de56ad>] kjournald2+0xcd/0x260 [jbd2] 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252368] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252372] [<ffffffffc0de55e0>] ? commit_timeout+0x10/0x10 [jbd2] 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252374] [<ffffffffaf2a3731>] kthread+0xd1/0xe0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252376] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252380] [<ffffffffaf9c1e1d>] ret_from_fork_nospec_begin+0x7/0x21 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252381] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140 2019-02-01T21:52:47.406 controller-0 kernel: err [87714.252404] INFO: task async_133:118588 blocked for more than 120 seconds. 2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260084] async_133 D ffff908d95ba9660 0 118588 118180 0x00000000 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260084] Call Trace: 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260085] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260107] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260108] [<ffffffffaf9be399>] schedule+0x29/0x70 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260109] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260113] [<ffffffffaf2e0ba2>] ? ktime_get_ts64+0x52/0xf0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260115] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260116] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260117] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260118] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260121] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260124] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260125] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260126] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260132] [<ffffffffaf373441>] ? do_writepages+0x21/0x50 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260135] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260139] [<ffffffffaf367da6>] filemap_write_and_wait_range+0x56/0x90 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260161] [<ffffffffc0e094aa>] ext4_sync_file+0xba/0x390 [ext4] 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260164] [<ffffffffaf415a37>] do_fsync+0x67/0xb0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260168] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260182] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260186] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27 2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260210] INFO: task async_275:118730 blocked for more than 120 seconds. 2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260211] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260213] async_275 D ffff908d96668b30 0 118730 118180 0x00000000 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260216] Call Trace: 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260217] [<ffffffffaf9be399>] schedule+0x29/0x70 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260222] [<ffffffffc0de5075>] jbd2_log_wait_commit+0xc5/0x150 [jbd2] 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260225] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260229] [<ffffffffc0de668c>] jbd2_complete_transaction+0x6c/0x90 [jbd2] 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260236] [<ffffffffc0e096e2>] ext4_sync_file+0x2f2/0x390 [ext4] 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260238] [<ffffffffaf415a37>] do_fsync+0x67/0xb0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260241] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260244] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20 2019-02-01T21:52:47.407 controller-0 kernel: warning [87714.260246] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27 Reproducibility --------------- Several hours to happen on the 1st set of migrations Within 2 hours on 2nd reproduction run System Configuration -------------------- standard Branch/Pull Time/Commit ----------------------- Master as of date: 2019-01-30_20-18-00 Timestamp/Logs -------------- see kern.log attached ~ 2019-02-01T21:52:47.372 see also kern2.log attached ~ 2019-02-04T21:50:10.441 Brief Description ----------------- Kernel errors in repetitive migration operations (Containerized STX load) Linux version 3.10.0-862.11.6.el7.36.tis.x86_64 (mockbuild@yow-cgts1-lx) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP PREEMPT Wed Jan 30 20:30:03 EST 2019 2019-01-31T19:12:48.459 localhost kernel: info [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.36.tis.x86_64 root=UUID=c597a45c-3bd7-4d12-95aa-c750c859f605 ro module_blacklist=integrity,ima audit=0 tboot=false crashkernel=auto console=ttyS0,115200n8 intel_iommu=off usbcore.autosuspend=-1 loop.max_part=15 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 biosdevname=0 user_namespace.enable=1 Severity -------- Standard Steps to Reproduce ------------------ Run instance launch and migrations (live then cold migration) Repeat again (over the course of several hours) instance was created here req-c320289f-13b0-43e9-815a-e47b59562a02 Create Feb. 1, 2019, 9:39 p.m. eeea9bd3d3fb4019bcc6c0884 Live migrated here req-05faa0f4-ddad-48dc-9bd9-baba546297f0 live-migration Feb. 1, 2019, 9:45 p.m. 393c0829f21346f998e165bd25eb4acf - Cold migration here req-64f7de0a-aa64-427a-bfd4-bf52d6aacf00 migrate Feb. 1, 2019, 9:49 p.m. 393c0829f21346f998e165bd25eb4acf Error Cold-Migrate-Confirm issued by tenant2 against instance tenant2-volume-47 owned by tenant2 on host compute-1 Cold-Migrate complete for instance tenant2-volume-47 now enabled on host compute-1 waiting for confirmation Cold migration confirm here req-55876933-4300-4861-9d6e-3a680450988a Confirm Resize Feb. 1, 2019, 9:50 p.m. 393c0829f21346f998e165bd25eb4acf Error Cold-Migrate-Confirm failed for instance tenant2-volume-47 on host compute-1 Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1 Then the instance reboots a few times to try to recover here req-7c055100-d09d-4ee6-9bd2-a7237560452a Reboot Feb. 1, 2019, 10:05 p.m. 393c0829f21346f998e165bd25eb4acf - req-14b5c560-1c71-4cf9-84c3-42d3103c2d02 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf - req-256f317b-0ebd-4721-b2b6-2acfaf01f763 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf - Expected Behavior ------------------ Actual Behavior ---------------- Kernel Call trace hung task (see below) see nfv-vim-events.log log-id = 636 event-id = instance-failed event-type = action-event event-context = admin importance = high entity = tenant=bac8cdbe-598d-46ff-9d06-eab470342834.instance=f43b2460-bc67-4a08-a497-6953591490f0 reason_text = Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1 additional_text = timestamp = 2019-02-01 21:51:39.050201 controller-0 kern.log 2019-02-01T21:52:47.372 controller-0 kernel: err [87714.235526] INFO: task jbd2/rbd1-8:117573 blocked for more than 120 seconds. 2019-02-01T21:52:47.372 controller-0 kernel: err [87714.243402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2019-02-01T21:52:47.372 controller-0 kernel: info [87714.252189] jbd2/rbd1-8 D ffff908b11838000 0 117573 2 0x00000000 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252198] Call Trace: 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252208] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252214] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252218] [<ffffffffaf9be399>] schedule+0x29/0x70 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252233] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252238] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252243] [<ffffffffaf4eb57e>] ? blk_flush_plug_list+0xce/0x230 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252248] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252253] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252257] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252263] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252273] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252288] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252297] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252306] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252313] [<ffffffffaf4e9a40>] ? submit_bio+0x70/0x150 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252318] [<ffffffffaf41e387>] ? bio_alloc_bioset+0xd7/0x220 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252322] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252325] [<ffffffffaf365d07>] filemap_fdatawait+0x27/0x30 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252349] [<ffffffffc0ddf3ff>] jbd2_journal_commit_transaction+0xb4f/0x1c90 [jbd2] 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252354] [<ffffffffaf2ba8dd>] ? sched_clock_cpu+0xad/0xf0 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252358] [<ffffffffaf21a59e>] ? __switch_to+0xce/0x5c0 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252361] [<ffffffffaf2b08be>] ? finish_task_switch+0x4e/0x160 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252367] [<ffffffffc0de56ad>] kjournald2+0xcd/0x260 [jbd2] 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252368] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252372] [<ffffffffc0de55e0>] ? commit_timeout+0x10/0x10 [jbd2] 2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252374] [<ffffffffaf2a3731>] kthread+0xd1/0xe0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252376] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252380] [<ffffffffaf9c1e1d>] ret_from_fork_nospec_begin+0x7/0x21 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252381] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140 2019-02-01T21:52:47.406 controller-0 kernel: err [87714.252404] INFO: task async_133:118588 blocked for more than 120 seconds. 2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260084] async_133 D ffff908d95ba9660 0 118588 118180 0x00000000 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260084] Call Trace: 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260085] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260107] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260108] [<ffffffffaf9be399>] schedule+0x29/0x70 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260109] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260113] [<ffffffffaf2e0ba2>] ? ktime_get_ts64+0x52/0xf0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260115] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260116] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260117] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260118] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260121] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260124] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260125] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260126] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260132] [<ffffffffaf373441>] ? do_writepages+0x21/0x50 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260135] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260139] [<ffffffffaf367da6>] filemap_write_and_wait_range+0x56/0x90 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260161] [<ffffffffc0e094aa>] ext4_sync_file+0xba/0x390 [ext4] 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260164] [<ffffffffaf415a37>] do_fsync+0x67/0xb0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260168] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260182] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260186] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27 2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260210] INFO: task async_275:118730 blocked for more than 120 seconds. 2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260211] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260213] async_275 D ffff908d96668b30 0 118730 118180 0x00000000 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260216] Call Trace: 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260217] [<ffffffffaf9be399>] schedule+0x29/0x70 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260222] [<ffffffffc0de5075>] jbd2_log_wait_commit+0xc5/0x150 [jbd2] 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260225] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260229] [<ffffffffc0de668c>] jbd2_complete_transaction+0x6c/0x90 [jbd2] 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260236] [<ffffffffc0e096e2>] ext4_sync_file+0x2f2/0x390 [ext4] 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260238] [<ffffffffaf415a37>] do_fsync+0x67/0xb0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260241] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0 2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260244] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20 2019-02-01T21:52:47.407 controller-0 kernel: warning [87714.260246] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27 Reproducibility --------------- Several hours to happen on the 1st set of migrations Within 2 hours on 2nd reproduction run System Configuration -------------------- standard Branch/Pull Time/Commit ----------------------- Master as of date: 2019-01-30_20-18-00 This was seen when running a containerized config using the pike docker images Timestamp/Logs -------------- see kern.log attached ~ 2019-02-01T21:52:47.372 see also kern2.log attached ~ 2019-02-04T21:50:10.441
2019-02-08 19:38:02 Ghada Khalil summary Kernel call trace Task jbd2 blocked Containers: Kernel call trace Task jbd2 blocked
2019-02-08 19:38:47 Ghada Khalil tags stx.2019.05 stx.containers stx.distro.other
2019-02-08 19:39:51 Ghada Khalil starlingx: assignee Cindy Xie (xxie1)
2019-02-18 00:50:35 Lin Shuicheng starlingx: assignee Cindy Xie (xxie1) Lin Shuicheng (shuicheng)
2019-02-25 15:55:55 Frank Miller starlingx: importance Medium High
2019-02-28 21:30:39 Jim Somerville bug added subscriber Jim Somerville
2019-03-01 19:09:34 Ghada Khalil starlingx: assignee Lin Shuicheng (shuicheng) Jim Somerville (jsomervi)
2019-03-01 20:03:40 Ghada Khalil starlingx: status Triaged In Progress
2019-03-18 15:08:05 Frank Miller starlingx: importance High Medium
2019-03-19 19:14:49 Jim Somerville bug watch added https://bugs.centos.org/view.php?id=15937
2019-03-28 06:37:12 Lin Shuicheng bug added subscriber Lin Shuicheng
2019-04-05 20:36:53 Ken Young tags stx.2019.05 stx.containers stx.distro.other stx.2.0 stx.containers stx.distro.other
2019-04-08 20:42:42 Ghada Khalil tags stx.2.0 stx.containers stx.distro.other stx.2.0 stx.containers stx.distro.other stx.retestneeded
2019-05-08 13:36:46 Cindy Xie tags stx.2.0 stx.containers stx.distro.other stx.retestneeded stx.containers stx.distro.other stx.retestneeded
2019-05-08 19:41:38 Ghada Khalil starlingx: importance Medium Low
2023-08-05 01:08:45 Ghada Khalil starlingx: status In Progress Won't Fix
2023-08-05 01:08:57 Ghada Khalil tags stx.containers stx.distro.other stx.retestneeded stx.containers stx.distro.other