StarlingX

Containers: Kernel call trace Task jbd2 blocked

Bug #1814595 reported by Wendy Mitchell on 2019-02-04

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Won't Fix	Low	Jim Somerville

Bug Description

Brief Description
-----------------
Kernel errors in repetitive migration operations
(Containerized STX load)
Linux version 3.10.0-862.11.6.el7.36.tis.x86_64 (mockbuild@yow-cgts1-lx) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP PREEMPT Wed Jan 30 20:30:03 EST 2019
2019-01-31T19:12:48.459 localhost kernel: info [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.36.tis.x86_64 root=UUID=c597a45c-3bd7-4d12-95aa-c750c859f605 ro module_blacklist=integrity,ima audit=0 tboot=false crashkernel=auto console=ttyS0,115200n8 intel_iommu=off usbcore.autosuspend=-1 loop.max_part=15 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 biosdevname=0 user_namespace.enable=1

Severity
--------
Standard

Steps to Reproduce
------------------
Run instance launch and migrations (live then cold migration)
Repeat again (over the course of several hours)

instance was created here
req-c320289f-13b0-43e9-815a-e47b59562a02 Create Feb. 1, 2019, 9:39 p.m. eeea9bd3d3fb4019bcc6c0884
Live migrated here
req-05faa0f4-ddad-48dc-9bd9-baba546297f0 live-migration Feb. 1, 2019, 9:45 p.m. 393c0829f21346f998e165bd25eb4acf -
Cold migration here
req-64f7de0a-aa64-427a-bfd4-bf52d6aacf00 migrate Feb. 1, 2019, 9:49 p.m. 393c0829f21346f998e165bd25eb4acf Error

Cold-Migrate-Confirm issued by tenant2 against instance tenant2-volume-47 owned by tenant2 on host compute-1
Cold-Migrate complete for instance tenant2-volume-47 now enabled on host compute-1 waiting for confirmation

Cold migration confirm here
req-55876933-4300-4861-9d6e-3a680450988a Confirm Resize Feb. 1, 2019, 9:50 p.m. 393c0829f21346f998e165bd25eb4acf Error
Cold-Migrate-Confirm failed for instance tenant2-volume-47 on host compute-1
Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1

Then the instance reboots a few times to try to recover here
req-7c055100-d09d-4ee6-9bd2-a7237560452a Reboot Feb. 1, 2019, 10:05 p.m. 393c0829f21346f998e165bd25eb4acf -
req-14b5c560-1c71-4cf9-84c3-42d3103c2d02 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf -
req-256f317b-0ebd-4721-b2b6-2acfaf01f763 Reboot Feb. 1, 2019, 10:06 p.m. 393c0829f21346f998e165bd25eb4acf -

Expected Behavior
------------------

Actual Behavior
----------------

Kernel Call trace hung task (see below)

see nfv-vim-events.log
log-id = 636
event-id = instance-failed
event-type = action-event
event-context = admin
importance = high
entity = tenant=bac8cdbe-598d-46ff-9d06-eab470342834.instance=f43b2460-bc67-4a08-a497-6953591490f0
reason_text = Instance tenant2-volume-47 owned by tenant2 has failed on host compute-1
additional_text =
timestamp = 2019-02-01 21:51:39.050201

controller-0 kern.log
2019-02-01T21:52:47.372 controller-0 kernel: err [87714.235526] INFO: task jbd2/rbd1-8:117573 blocked for more than 120 seconds.
2019-02-01T21:52:47.372 controller-0 kernel: err [87714.243402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.372 controller-0 kernel: info [87714.252189] jbd2/rbd1-8 D ffff908b11838000 0 117573 2 0x00000000
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252198] Call Trace:
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252208] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252214] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252218] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252233] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252238] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.372 controller-0 kernel: warning [87714.252243] [<ffffffffaf4eb57e>] ? blk_flush_plug_list+0xce/0x230
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252248] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252253] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252257] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252263] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252273] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252288] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252297] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252306] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252313] [<ffffffffaf4e9a40>] ? submit_bio+0x70/0x150
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252318] [<ffffffffaf41e387>] ? bio_alloc_bioset+0xd7/0x220
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252322] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252325] [<ffffffffaf365d07>] filemap_fdatawait+0x27/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252349] [<ffffffffc0ddf3ff>] jbd2_journal_commit_transaction+0xb4f/0x1c90 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252354] [<ffffffffaf2ba8dd>] ? sched_clock_cpu+0xad/0xf0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252358] [<ffffffffaf21a59e>] ? __switch_to+0xce/0x5c0
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252361] [<ffffffffaf2b08be>] ? finish_task_switch+0x4e/0x160
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252367] [<ffffffffc0de56ad>] kjournald2+0xcd/0x260 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252368] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252372] [<ffffffffc0de55e0>] ? commit_timeout+0x10/0x10 [jbd2]
2019-02-01T21:52:47.373 controller-0 kernel: warning [87714.252374] [<ffffffffaf2a3731>] kthread+0xd1/0xe0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252376] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252380] [<ffffffffaf9c1e1d>] ret_from_fork_nospec_begin+0x7/0x21
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.252381] [<ffffffffaf2a3660>] ? kthread_create_on_node+0x140/0x140
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.252404] INFO: task async_133:118588 blocked for more than 120 seconds.
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260084] async_133 D ffff908d95ba9660 0 118588 118180 0x00000000
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260084] Call Trace:
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260085] [<ffffffffaf4f3914>] ? blk_mq_run_hw_queue+0x14/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260107] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260108] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260109] [<ffffffffaf9bb171>] schedule_timeout+0x2a1/0x330
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260113] [<ffffffffaf2e0ba2>] ? ktime_get_ts64+0x52/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260115] [<ffffffffaf9bb790>] ? bit_wait+0x50/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260116] [<ffffffffaf9bce7d>] io_schedule_timeout+0xad/0x130
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260117] [<ffffffffaf9bcf18>] io_schedule+0x18/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260118] [<ffffffffaf9bb7a1>] bit_wait_io+0x11/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260121] [<ffffffffaf9bb2c7>] __wait_on_bit+0x67/0x90
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260124] [<ffffffffaf365b01>] wait_on_page_bit+0x81/0xa0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260125] [<ffffffffaf2a47d0>] ? wake_bit_function+0x40/0x40
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260126] [<ffffffffaf365c31>] __filemap_fdatawait_range+0x111/0x190
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260132] [<ffffffffaf373441>] ? do_writepages+0x21/0x50
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260135] [<ffffffffaf365cc4>] filemap_fdatawait_range+0x14/0x30
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260139] [<ffffffffaf367da6>] filemap_write_and_wait_range+0x56/0x90
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260161] [<ffffffffc0e094aa>] ext4_sync_file+0xba/0x390 [ext4]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260164] [<ffffffffaf415a37>] do_fsync+0x67/0xb0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260168] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260182] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260186] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260210] INFO: task async_275:118730 blocked for more than 120 seconds.
2019-02-01T21:52:47.406 controller-0 kernel: err [87714.260211] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-02-01T21:52:47.406 controller-0 kernel: info [87714.260213] async_275 D ffff908d96668b30 0 118730 118180 0x00000000
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260216] Call Trace:
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260217] [<ffffffffaf9be399>] schedule+0x29/0x70
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260222] [<ffffffffc0de5075>] jbd2_log_wait_commit+0xc5/0x150 [jbd2]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260225] [<ffffffffaf2a4710>] ? wake_up_atomic_t+0x30/0x30
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260229] [<ffffffffc0de668c>] jbd2_complete_transaction+0x6c/0x90 [jbd2]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260236] [<ffffffffc0e096e2>] ext4_sync_file+0x2f2/0x390 [ext4]
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260238] [<ffffffffaf415a37>] do_fsync+0x67/0xb0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260241] [<ffffffffaf3e2614>] ? SyS_write+0xa4/0xf0
2019-02-01T21:52:47.406 controller-0 kernel: warning [87714.260244] [<ffffffffaf415d20>] SyS_fsync+0x10/0x20
2019-02-01T21:52:47.407 controller-0 kernel: warning [87714.260246] [<ffffffffaf9c1fdb>] system_call_fastpath+0x22/0x27

Reproducibility
---------------
Several hours to happen on the 1st set of migrations
Within 2 hours on 2nd reproduction run

System Configuration
--------------------
standard

Branch/Pull Time/Commit
-----------------------
Master as of date: 2019-01-30_20-18-00
This was seen when running a containerized config using the pike docker images

Timestamp/Logs
--------------
see kern.log attached ~ 2019-02-01T21:52:47.372
see also kern2.log attached ~ 2019-02-04T21:50:10.441

See original description

Tags:

Revision history for this message

Wendy Mitchell (wmitchellwr) wrote on 2019-02-04:

kern log Feb1 Edit (6.1 MiB, text/plain)

Revision history for this message

Wendy Mitchell (wmitchellwr) wrote on 2019-02-04:

kern2 log Feb 4 Edit (8.6 MiB, text/plain)

Revision history for this message

Wendy Mitchell (wmitchellwr) wrote on 2019-02-06:

The general steps are as follows, then they are repeated for several hours.

Test Step 1: Clear local storage cache on compute-0
rm -rf /var/lib/nova/instances/_base/*
'sync;echo 3 > /proc/sys/vm/drop_caches

Step 2: Create a flavor with 2 vcpus, dedicated cpu policy, and local_image storage
nova --os-username 'admin' --os-password '<password>' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne flavor-create local_image auto 1024 2 2'

Step 3: Tenant2 launches instance1
tenant2 boots the instance eg. from local_image on compute-0 (collect startup time info)
'nova --os-username 'tenant2' --os-password '<password>' --os-project-name tenant2 --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne boot --key-name keypair-tenant2 --flavor <flavorid> --image <tis-centos-guest image> --nic net-id=28348793-baef-4987-ae23-0baf02036819 --nic net-id=9267348a-c940-41e4-8a57-ce1046946f42 --nic net-id=3de977f7-94d2-45ff-907d-977a715c7782 tenant2-local_image-1 --poll'
Send 'ping -c 3 <ip of mgmt network for tenant2>

Step 4: Tenant1 launches instance2 (the observer instance)
Send 'nova --os-username 'tenant1' --os-password '<password>' --os-project-name tenant1 --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne boot --key-name keypair-tenant1 --flavor <flavorid> --image <tis-centos-guest image> --nic net-id=f6ac6f3d-45e2-4e1f-88ad-ac10bbe8ccf2 --nic net-id=0d3b246c-f0b5-4ec4-bbd4-c76eb39d1b9b --nic net-id=3de977f7-94d2-45ff-907d-977a715c7782 tenant1-observer-2 --poll'
Send 'ping -c 3 <ip of mgmt network for tenant1>

Step 5 Collects live migrate KPI for vm booted from local_image
Step 6: Setting up traffic for pairs instance1 and instance2

Step 7: Cold migrates the instance & confirm cold migraton (collects KPI for each)
Step 8: Setup traffic for pairs between the instances (the one launched in step 3 and the observer instance launched in step 4)

Step 9/10: Ping instance1 from NatBox on a new thread (collect kpi for rebuild)
Test Step 11: Perform rebuild on instance1 again and ensure it's reachable after that (ping tenant2 mgmt network)

Teardown

The general steps are as follows, then they are repeated for several hours.

Test Step 1: Clear local storage cache on compute-0
rm -rf /var/lib/nova/instances/_base/*
'sync;echo 3 > /proc/sys/vm/drop_caches

Step 5 Collects live migrate KPI for vm booted from local_image
Step 6: Setting up traffic for pairs instance1 and instance2

Step 9/10: Ping instance1 from NatBox on a new thread (collect kpi for rebuild)
Test Step 11: Perform rebuild on instance1 again and ensure it's reachable after that (ping tenant2 mgmt network)

Teardown

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-08:

Marking as release gating until further investigation.
The issue was reported during testing in a containerized environment. It was not seen w/o containers.

Changed in starlingx:
importance:	Undecided → Medium
status:	New → Triaged
description:	updated
summary:	- Kernel call trace Task jbd2 blocked + Containers: Kernel call trace Task jbd2 blocked
tags:	added: stx.2019.05 stx.containers stx.distro.other
Changed in starlingx:
assignee:	nobody → Cindy Xie (xxie1)

Revision history for this message

Ovidiu Poncea (ovidiuponcea) wrote on 2019-02-13:

We also saw the issue. During normal operation yet we were unable to reproduce it consistently, it's good that you are able to reproduce it. Sharing below our data:

So, given the kernel error & with a little google help we got to either one of these two:

1. An issue caused by "Too many open files" see https://access.redhat.com/solutions/1602683 . This has to be confirmed (see below).
2. Or may be an issue in the RBD driver in the kernel since no monitor reported slow operations and health was ok all the time (load on the cluster was light all the time). Google gets two related hits: https://tracker.ceph.com/issues/15891 and http://tracker.ceph.com/issues/14022#note-3 our kernel log is slightly different from what is reported yet it still deals with blk_mq_run_hw_queue and do_fsync. The problem is that they don't have a cure. They just say something like: OSD client has been rewritten in 4.7. : Which means that, unless we find the exact root cause, we may need to backport the kernel driver from a later kernel into 3.10.

Regarding #1 above, to get more info, in ceph.conf please add the section below before doing the test:

[client]
log file = /var/log/ceph/ceph.client.log
debug rbd = 20
debug librbd = 20

If it is the same issue that redhat was having we should get, in /var/log/ceph/ceph.client.log logs, something like:

7f0b769e7700 -1 -- 192.168.128.30:0/2021513 >> 192.168.128.35:6800/24374 pipe(0x7f0bcabc0000 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0bc55e1ce0).connect couldn't created socket (24) Too many open files

If not, at least you should get more info on what's happening.

Lin Shuicheng (shuicheng) on 2019-02-18

Changed in starlingx:
assignee:	Cindy Xie (xxie1) → Lin Shuicheng (shuicheng)

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-02-25:

Changed priority to High due to the # of labs that have seen this issue in the past week (at least 7). Also looks like this is a duplicate of https://bugs.launchpad.net/starlingx/+bug/1815541

Changed in starlingx:
importance:	Medium → High

Revision history for this message

Jim Somerville (jsomervi) wrote on 2019-02-28:

Possible candidate:

In the 862 kernel we have this patch which wasn't in 693:

commit bd166ef183c263c5ced656d49ef19c7da4adc774
Author: Jens Axboe <email address hidden>
Date: Tue Jan 17 06:03:22 2017 -0700

blk-mq-sched: add framework for MQ capable IO schedulers

    This adds a set of hooks that intercepts the blk-mq path of
    allocating/inserting/issuing/completing requests, allowing
    us to develop a scheduler within that framework.

    We reuse the existing elevator scheduler API on the registration
    side, but augment that with the scheduler flagging support for
    the blk-mq interfce, and with a separate set of ops hooks for MQ
    devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.

    Signed-off-by: Jens Axboe <email address hidden>
    Reviewed-by: Bart Van Assche <email address hidden>
    Reviewed-by: Omar Sandoval <email address hidden>

Subsequently, it had to be fixed (not in 862 or 957):

commit 85bd6e61f34dffa8ec2dc75ff3c02ee7b2f1cbce
Author: Jianchao Wang <email address hidden>
Date: Wed Jan 30 17:01:56 2019 +0800

blk-mq: fix a hung issue when fsync

Florian reported a io hung issue when fsync(). It should be
triggered by following race condition.

data + post flush a flush

    blk_flush_complete_seq
      case REQ_FSEQ_DATA
        blk_flush_queue_rq
        issued to driver blk_mq_dispatch_rq_list
                                try to issue a flush req
                                failed due to NON-NCQ command
                                .queue_rq return BLK_STS_DEV_RESOURCE

    request completion
      req->end_io // doesn't check RESTART
      mq_flush_data_end_io
        case REQ_FSEQ_POSTFLUSH
          blk_kick_flush
            do nothing because previous flush
            has not been completed
         blk_mq_run_hw_queue
                                  insert rq to hctx->dispatch
                                  due to RESTART is still set, do nothing

To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io
with blk_mq_sched_restart to check and clear the RESTART flag.

    Fixes: bd166ef1 (blk-mq-sched: add framework for MQ capable IO schedulers)
    Reported-by: Florian Stecker <email address hidden>
    Tested-by: Florian Stecker <email address hidden>
    Signed-off-by: Jianchao Wang <email address hidden>
    Signed-off-by: Jens Axboe <email address hidden>

Possible candidate:

In the 862 kernel we have this patch which wasn't in 693:

commit bd166ef183c263c5ced656d49ef19c7da4adc774
Author: Jens Axboe <axboe@fb.com>
Date:   Tue Jan 17 06:03:22 2017 -0700

blk-mq-sched: add framework for MQ capable IO schedulers

This adds a set of hooks that intercepts the blk-mq path of
    allocating/inserting/issuing/completing requests, allowing
    us to develop a scheduler within that framework.

We split driver and scheduler tags, so we can run the scheduling
    independently of device queue depth.

Signed-off-by: Jens Axboe <axboe@fb.com>
    Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
    Reviewed-by: Omar Sandoval <osandov@fb.com>

Subsequently, it had to be fixed (not in 862 or 957):

commit 85bd6e61f34dffa8ec2dc75ff3c02ee7b2f1cbce
Author: Jianchao Wang <jianchao.w.wang@oracle.com>
Date:   Wed Jan 30 17:01:56 2019 +0800

blk-mq: fix a hung issue when fsync

Florian reported a io hung issue when fsync(). It should be
    triggered by following race condition.

data + post flush         a flush

blk_flush_complete_seq
      case REQ_FSEQ_DATA
        blk_flush_queue_rq
        issued to driver      blk_mq_dispatch_rq_list
                                try to issue a flush req
                                failed due to NON-NCQ command
                                .queue_rq return BLK_STS_DEV_RESOURCE

To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io
    with blk_mq_sched_restart to check and clear the RESTART flag.

Fixes: bd166ef1 (blk-mq-sched: add framework for MQ capable IO schedulers)
    Reported-by: Florian Stecker <m19@florianstecker.de>
    Tested-by: Florian Stecker <m19@florianstecker.de>
    Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-03-01:

The issue has been recently reproduced in 3 labs - 2 running the CentOS 7.6 kernel. Jim Somerville is actively investigating; assigning the launchpad to him.

Changed in starlingx:
assignee:	Lin Shuicheng (shuicheng) → Jim Somerville (jsomervi)

Ghada Khalil (gkhalil) on 2019-03-01

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-03-06:

DX and Standard 2+2 configurations faced problems with the provisioning due to an existing issue related to “rabbitmq” service pod.
The same issue affected the Stress and Stability execution. Seems that “rabbitmq” pod had issue to manage signals when there is a big load.

Revision history for this message

Jim Somerville (jsomervi) wrote on 2019-03-06:

#10

That's interesting about rabbitmq, do you have a link or bug number?

Revision history for this message

Erich Cordoba (ericho) wrote on 2019-03-06:

#11

I think Juan Carlos refers to this one: https://bugs.launchpad.net/starlingx/+bug/1815541 which was marked as duplicated of this one.

Also, in that LP, it was mentioned that the issue happened in a duplex system before the "system application-apply", it was assumed that the failures regarding rabbitmq were related with this kernel oops (could be symptom?).

however it seems that the only constant is to just let the system up for a couple hours (or days) to reproduce the behaviour.

See for this comment: https://bugs.launchpad.net/starlingx/+bug/1815541/comments/6
it has a dmesg output attached that might be helpful.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-08: Related fix proposed to stx-config (master)

#12

Related fix proposed to branch: master
Review: https://review.openstack.org/642170

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-11: Related fix merged to stx-config (master)

#13

Reviewed: https://review.openstack.org/642170
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=77cbb985f294fbcb48c1a2946f2d63174588710e
Submitter: Zuul
Branch: master

commit 77cbb985f294fbcb48c1a2946f2d63174588710e
Author: Robert Church <email address hidden>
Date: Fri Mar 8 15:54:19 2019 -0500

For stability bump size of rabbitmq PV to 1Gi

    The rabbitmq chart requests a 256Mi PV for operational storage. With
    CentOS 7.5 and 7.6 kernels, a jbd2 kernel thread hang is observed after
    a long soak period. Once this occurs, a host reboot is required to
    recover access to the PV.

    We have been able to reliably recreate this using the stock upstream
    CentOS 7.6 kernel and the latest Ceph Jewel LTS (10.2.11) version using
    fsstress. This is currently pointing to a race condition in the
    filesystem code.

    With a reliable test available for this, other scenarios to characterize
    this have been performed including using different volume sizes and
    using different ext4 filesystem formatting options.

    We've been unable to cause the hang using a 1Gi PV over an extended soak
    period so we'll update the stx-openstack manifest to request a 1Gi PV
    until the root cause and fix has been addressed in the kernel.

    Change-Id: Ia0e5b7ffb049c6e3cedfb4a6d3afda597eedb18a
    Related-Bug: #1814595
    Signed-off-by: Robert Church <email address hidden>

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-03-18:

#14

Dropping priority to Medium since workaround (using 1G volume for rabbitmq) prevents issue. Still need to find a solution and Jim continues to investigate the kernel.

Changed in starlingx:
importance:	High → Medium

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-03-18:

#15

Downgraded to Medium given that a workaround has been pushed.

Revision history for this message

Jim Somerville (jsomervi) wrote on 2019-03-19:

#16

This issue occurs in out-of-the-box standard CentOS 7.6 and earlier, nothing to do with any of our kernel patches or other modifications.

I have reported it upstream. https://bugs.centos.org/view.php?id=15937

My sincere thanks to Bob Church for taking the time to put together a comprehensive recipe for reproducing the issue.

I also tested against the upstream 4.9 kernel from Linus' tree, using the same CentOS 7.6 user space ie. I just slid the newer kernel onto the same system, and the problem was gone.

Thus this really looks like it is specifically a CentOS kernel issue.

Ken Young (kenyis) on 2019-04-05

tags:

added: stx.2.0
removed: stx.2019.05

Ghada Khalil (gkhalil) on 2019-04-08

tags:

added: stx.retestneeded

Cindy Xie (xxie1) on 2019-05-08

tags:

removed: stx.2.0

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-08:

#17

Changing priority to Low based on Cindy's update that this should not be release gating. I've asked Cindy to add justification for tracking purposes as well.

Changed in starlingx:
importance:	Medium → Low

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-05-08:

#18

we reviewed in non-openstack-dist project meeting and agreed this is not gating stx.2.0. Due to the reason that Jim found it's CentOS 7.6 upstream issue, WR provided and upstream bug opened.

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-07-15:

#19

To add to Cindy's comment, the main reason this issue is no longer gating stx.2.0 is the original issue reported in StarlingX has been avoided by using a different implementation for size of PVs used by StarlingX (see: https://review.opendev.org/#/c/642170/).

Since StarlingX software is no longer using 256Mi sized PVs, the software is no longer hitting this jbd2 issue. As such this CentOS7.6 upstream issue is no longer impacting StarlingX and this LP is not gating stx.2.0.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2023-08-05:

#20

Closing; old LP with no recent updates