2019-02-04 22:20:10 |
Wendy Mitchell |
bug |
|
|
added bug |
2019-02-04 22:28:25 |
Wendy Mitchell |
attachment added |
|
kern log Feb1 https://bugs.launchpad.net/starlingx/+bug/1814595/+attachment/5235980/+files/kern.log |
|
2019-02-04 22:30:45 |
Wendy Mitchell |
attachment added |
|
kern2 log Feb 4 https://bugs.launchpad.net/starlingx/+bug/1814595/+attachment/5235981/+files/kern2.log |
|
2019-02-08 19:35:42 |
Ghada Khalil |
starlingx: importance |
Undecided |
Medium |
|
2019-02-08 19:35:48 |
Ghada Khalil |
starlingx: status |
New |
Triaged |
|
2019-02-08 19:37:51 |
Ghada Khalil |
description |
Brief Description
-----------------
Kernel errors in repetitive migration operations
(Containerized STX load)
Linux version 3.10.0-862.11.6.el7.36.tis.x86_64 (mockbuild@yow-cgts1-lx) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP PREEMPT Wed Jan 30 20:30:03 EST 2019
2019-01-31T19:12:48.459 localhost kernel: info [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.36.tis.x86_64 root=UUID=c597a45c-3bd7-4d12-95aa-c750c859f605 ro module_blacklist=integrity,ima audit=0 tboot=false crashkernel=auto console=ttyS0,115200n8 intel_iommu=off usbcore.autosuspend=-1 loop.max_part=15 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 biosdevname=0 user_namespace.enable=1
Severity
--------
Standard
Steps to Reproduce
------------------
Run instance launch and migrations (live then cold migration)
Repeat again (over the course of several hours)
instance was created here
req-c320289f-13b0-43e9-815a-e47b59562a02 Create Feb. 1, 2019, 9:39 p.m. eeea9bd3d3fb4019bcc6c0884
Live migrated here
req-05faa0f4-ddad-48dc-9bd9-baba546297f0 live-migration Feb. 1, 2019, 9:45 p.m. 393c0829f21346f998e165bd25eb4acf -
Cold migration here
req-64f7de0a-aa64-427a-bfd4-bf52d6aacf00 migrate Feb. 1, 2019, 9:49 p.m. 393c0829f21346f998e165bd25eb4acf Error
Cold-Migrate-Confirm issued by tenant2 against instance tenant2-volume-47 owned by tenant2 on host compute-1
Cold-Migrate complete for instance tenant2-volume-47 now enabled on host compute-1 waiting for confirmation
Cold migration confirm here
req-55876933-4300-4861-9d6e-3a680450988a Confirm Resize Feb. 1, 2019, 9:50 p.m. 393c0829f21346f998e165bd25eb4acf Error
Cold-Migrate-Confirm failed for instance tenant2-volume-47 on host compute-1
Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1
Then the instance reboots a few times to try to recover here
req-7c055100-d09d-4ee6-9bd2-a7237560452a Reboot Feb. 1, 2019, 10:05 p.m. 393c0829f21346f998e165bd25eb4acf -
req-14b5c560-1c71-4cf9-84c3-42d3103c2d02 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf -
req-256f317b-0ebd-4721-b2b6-2acfaf01f763 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf -
Expected Behavior
------------------
Actual Behavior
----------------
Kernel Call trace hung task (see below)
see nfv-vim-events.log
log-id = 636
event-id = instance-failed
event-type = action-event
event-context = admin
importance = high
entity = tenant=bac8cdbe-598d-46ff-9d06-eab470342834.instance=f43b2460-bc67-4a08-a497-6953591490f0
reason_text = Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1
additional_text =
timestamp = 2019-02-01 21:51:39.050201
controller-0 kern.log
2019-02-01T21:52:47.372 controller-0 kernel: err [87714.235526] INFO: task jbd2/rbd1-8:117573 blocked for more than 120 seconds.
2019-02-01T21:52:47.372 controller-0 kernel: err [87714.243402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.372 controller-0 kernel: info [87714.252189] jbd2/rbd1-8 D ffff908b11838000 0 117573 2 0x00000000
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252198] Call Trace:
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252208] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252214] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252218] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252233] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252238] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252243] [<ffffffffaf4eb57e>] ? blk_flush_plug_list+0xce/0x230
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252248] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252253] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252257] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252263] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252273] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252288] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252297] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252306] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252313] [<ffffffffaf4e9a40>] ? submit_bio+0x70/0x150
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252318] [<ffffffffaf41e387>] ? bio_alloc_bioset+0xd7/0x220
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252322] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252325] [<ffffffffaf365d07>] filemap_fdatawait+0x27/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252349] [<ffffffffc0ddf3ff>] jbd2_journal_commit_transaction+0xb4f/0x1c90 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252354] [<ffffffffaf2ba8dd>] ? sched_clock_cpu+0xad/0xf0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252358] [<ffffffffaf21a59e>] ? __switch_to+0xce/0x5c0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252361] [<ffffffffaf2b08be>] ? finish_task_switch+0x4e/0x160
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252367] [<ffffffffc0de56ad>] kjournald2+0xcd/0x260 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252368] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252372] [<ffffffffc0de55e0>] ? commit_timeout+0x10/0x10 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252374] [<ffffffffaf2a3731>] kthread+0xd1/0xe0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252376] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252380] [<ffffffffaf9c1e1d>] ret_from_fork_nospec_begin+0x7/0x21
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252381] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.252404] INFO: task async_133:118588 blocked for more than 120 seconds.
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260084] async_133 D ffff908d95ba9660 0 118588 118180 0x00000000
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260084] Call Trace:
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260085] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260107] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260108] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260109] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260113] [<ffffffffaf2e0ba2>] ? ktime_get_ts64+0x52/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260115] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260116] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260117] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260118] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260121] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260124] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260125] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260126] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260132] [<ffffffffaf373441>] ? do_writepages+0x21/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260135] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260139] [<ffffffffaf367da6>] filemap_write_and_wait_range+0x56/0x90
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260161] [<ffffffffc0e094aa>] ext4_sync_file+0xba/0x390 [ext4]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260164] [<ffffffffaf415a37>] do_fsync+0x67/0xb0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260168] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260182] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260186] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260210] INFO: task async_275:118730 blocked for more than 120 seconds.
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260211] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260213] async_275 D ffff908d96668b30 0 118730 118180 0x00000000
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260216] Call Trace:
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260217] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260222] [<ffffffffc0de5075>] jbd2_log_wait_commit+0xc5/0x150 [jbd2]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260225] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260229] [<ffffffffc0de668c>] jbd2_complete_transaction+0x6c/0x90 [jbd2]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260236] [<ffffffffc0e096e2>] ext4_sync_file+0x2f2/0x390 [ext4]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260238] [<ffffffffaf415a37>] do_fsync+0x67/0xb0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260241] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260244] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20
2019-02-01T21:52:47.407 controller-0 kernel: warning [87714.260246] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27
Reproducibility
---------------
Several hours to happen on the 1st set of migrations
Within 2 hours on 2nd reproduction run
System Configuration
--------------------
standard
Branch/Pull Time/Commit
-----------------------
Master as of date: 2019-01-30_20-18-00
Timestamp/Logs
--------------
see kern.log attached ~ 2019-02-01T21:52:47.372
see also kern2.log attached ~ 2019-02-04T21:50:10.441 |
Brief Description
-----------------
Kernel errors in repetitive migration operations
(Containerized STX load)
Linux version 3.10.0-862.11.6.el7.36.tis.x86_64 (mockbuild@yow-cgts1-lx) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP PREEMPT Wed Jan 30 20:30:03 EST 2019
2019-01-31T19:12:48.459 localhost kernel: info [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.36.tis.x86_64 root=UUID=c597a45c-3bd7-4d12-95aa-c750c859f605 ro module_blacklist=integrity,ima audit=0 tboot=false crashkernel=auto console=ttyS0,115200n8 intel_iommu=off usbcore.autosuspend=-1 loop.max_part=15 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 biosdevname=0 user_namespace.enable=1
Severity
--------
Standard
Steps to Reproduce
------------------
Run instance launch and migrations (live then cold migration)
Repeat again (over the course of several hours)
instance was created here
req-c320289f-13b0-43e9-815a-e47b59562a02 Create Feb. 1, 2019, 9:39 p.m. eeea9bd3d3fb4019bcc6c0884
Live migrated here
req-05faa0f4-ddad-48dc-9bd9-baba546297f0 live-migration Feb. 1, 2019, 9:45 p.m. 393c0829f21346f998e165bd25eb4acf -
Cold migration here
req-64f7de0a-aa64-427a-bfd4-bf52d6aacf00 migrate Feb. 1, 2019, 9:49 p.m. 393c0829f21346f998e165bd25eb4acf Error
Cold-Migrate-Confirm issued by tenant2 against instance tenant2-volume-47 owned by tenant2 on host compute-1
Cold-Migrate complete for instance tenant2-volume-47 now enabled on host compute-1 waiting for confirmation
Cold migration confirm here
req-55876933-4300-4861-9d6e-3a680450988a Confirm Resize Feb. 1, 2019, 9:50 p.m. 393c0829f21346f998e165bd25eb4acf Error
Cold-Migrate-Confirm failed for instance tenant2-volume-47 on host compute-1
Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1
Then the instance reboots a few times to try to recover here
req-7c055100-d09d-4ee6-9bd2-a7237560452a Reboot Feb. 1, 2019, 10:05 p.m. 393c0829f21346f998e165bd25eb4acf -
req-14b5c560-1c71-4cf9-84c3-42d3103c2d02 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf -
req-256f317b-0ebd-4721-b2b6-2acfaf01f763 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf -
Expected Behavior
------------------
Actual Behavior
----------------
Kernel Call trace hung task (see below)
see nfv-vim-events.log
log-id = 636
event-id = instance-failed
event-type = action-event
event-context = admin
importance = high
entity = tenant=bac8cdbe-598d-46ff-9d06-eab470342834.instance=f43b2460-bc67-4a08-a497-6953591490f0
reason_text = Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1
additional_text =
timestamp = 2019-02-01 21:51:39.050201
controller-0 kern.log
2019-02-01T21:52:47.372 controller-0 kernel: err [87714.235526] INFO: task jbd2/rbd1-8:117573 blocked for more than 120 seconds.
2019-02-01T21:52:47.372 controller-0 kernel: err [87714.243402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.372 controller-0 kernel: info [87714.252189] jbd2/rbd1-8 D ffff908b11838000 0 117573 2 0x00000000
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252198] Call Trace:
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252208] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252214] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252218] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252233] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252238] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252243] [<ffffffffaf4eb57e>] ? blk_flush_plug_list+0xce/0x230
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252248] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252253] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252257] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252263] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252273] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252288] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252297] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252306] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252313] [<ffffffffaf4e9a40>] ? submit_bio+0x70/0x150
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252318] [<ffffffffaf41e387>] ? bio_alloc_bioset+0xd7/0x220
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252322] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252325] [<ffffffffaf365d07>] filemap_fdatawait+0x27/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252349] [<ffffffffc0ddf3ff>] jbd2_journal_commit_transaction+0xb4f/0x1c90 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252354] [<ffffffffaf2ba8dd>] ? sched_clock_cpu+0xad/0xf0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252358] [<ffffffffaf21a59e>] ? __switch_to+0xce/0x5c0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252361] [<ffffffffaf2b08be>] ? finish_task_switch+0x4e/0x160
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252367] [<ffffffffc0de56ad>] kjournald2+0xcd/0x260 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252368] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252372] [<ffffffffc0de55e0>] ? commit_timeout+0x10/0x10 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252374] [<ffffffffaf2a3731>] kthread+0xd1/0xe0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252376] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252380] [<ffffffffaf9c1e1d>] ret_from_fork_nospec_begin+0x7/0x21
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252381] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.252404] INFO: task async_133:118588 blocked for more than 120 seconds.
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260084] async_133 D ffff908d95ba9660 0 118588 118180 0x00000000
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260084] Call Trace:
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260085] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260107] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260108] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260109] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260113] [<ffffffffaf2e0ba2>] ? ktime_get_ts64+0x52/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260115] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260116] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260117] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260118] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260121] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260124] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260125] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260126] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260132] [<ffffffffaf373441>] ? do_writepages+0x21/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260135] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260139] [<ffffffffaf367da6>] filemap_write_and_wait_range+0x56/0x90
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260161] [<ffffffffc0e094aa>] ext4_sync_file+0xba/0x390 [ext4]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260164] [<ffffffffaf415a37>] do_fsync+0x67/0xb0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260168] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260182] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260186] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260210] INFO: task async_275:118730 blocked for more than 120 seconds.
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260211] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260213] async_275 D ffff908d96668b30 0 118730 118180 0x00000000
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260216] Call Trace:
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260217] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260222] [<ffffffffc0de5075>] jbd2_log_wait_commit+0xc5/0x150 [jbd2]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260225] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260229] [<ffffffffc0de668c>] jbd2_complete_transaction+0x6c/0x90 [jbd2]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260236] [<ffffffffc0e096e2>] ext4_sync_file+0x2f2/0x390 [ext4]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260238] [<ffffffffaf415a37>] do_fsync+0x67/0xb0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260241] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260244] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20
2019-02-01T21:52:47.407 controller-0 kernel: warning [87714.260246] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27
Reproducibility
---------------
Several hours to happen on the 1st set of migrations
Within 2 hours on 2nd reproduction run
System Configuration
--------------------
standard
Branch/Pull Time/Commit
-----------------------
Master as of date: 2019-01-30_20-18-00
This was seen when running a containerized config using the pike docker images
Timestamp/Logs
--------------
see kern.log attached ~ 2019-02-01T21:52:47.372
see also kern2.log attached ~ 2019-02-04T21:50:10.441 |
|
2019-02-08 19:38:02 |
Ghada Khalil |
summary |
Kernel call trace Task jbd2 blocked |
Containers: Kernel call trace Task jbd2 blocked |
|
2019-02-08 19:38:47 |
Ghada Khalil |
tags |
|
stx.2019.05 stx.containers stx.distro.other |
|
2019-02-08 19:39:51 |
Ghada Khalil |
starlingx: assignee |
|
Cindy Xie (xxie1) |
|
2019-02-18 00:50:35 |
Lin Shuicheng |
starlingx: assignee |
Cindy Xie (xxie1) |
Lin Shuicheng (shuicheng) |
|
2019-02-25 15:55:55 |
Frank Miller |
starlingx: importance |
Medium |
High |
|
2019-02-28 21:30:39 |
Jim Somerville |
bug |
|
|
added subscriber Jim Somerville |
2019-03-01 19:09:34 |
Ghada Khalil |
starlingx: assignee |
Lin Shuicheng (shuicheng) |
Jim Somerville (jsomervi) |
|
2019-03-01 20:03:40 |
Ghada Khalil |
starlingx: status |
Triaged |
In Progress |
|
2019-03-18 15:08:05 |
Frank Miller |
starlingx: importance |
High |
Medium |
|
2019-03-19 19:14:49 |
Jim Somerville |
bug watch added |
|
https://bugs.centos.org/view.php?id=15937 |
|
2019-03-28 06:37:12 |
Lin Shuicheng |
bug |
|
|
added subscriber Lin Shuicheng |
2019-04-05 20:36:53 |
Ken Young |
tags |
stx.2019.05 stx.containers stx.distro.other |
stx.2.0 stx.containers stx.distro.other |
|
2019-04-08 20:42:42 |
Ghada Khalil |
tags |
stx.2.0 stx.containers stx.distro.other |
stx.2.0 stx.containers stx.distro.other stx.retestneeded |
|
2019-05-08 13:36:46 |
Cindy Xie |
tags |
stx.2.0 stx.containers stx.distro.other stx.retestneeded |
stx.containers stx.distro.other stx.retestneeded |
|
2019-05-08 19:41:38 |
Ghada Khalil |
starlingx: importance |
Medium |
Low |
|
2023-08-05 01:08:45 |
Ghada Khalil |
starlingx: status |
In Progress |
Won't Fix |
|
2023-08-05 01:08:57 |
Ghada Khalil |
tags |
stx.containers stx.distro.other stx.retestneeded |
stx.containers stx.distro.other |
|