Prevent race condition when printing Inode in ll_sync_inode

Bug #2078906 reported by Chengen Du
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Status tracked in Oracular
Focal
Won't Fix
Undecided
Chengen Du
Jammy
Incomplete
Medium
Chengen Du
Noble
Incomplete
Medium
Chengen Du
Oracular
Incomplete
Medium
Chengen Du

Bug Description

[Impact]
In the ll_sync_inode function, the entire Inode structure is printed without holding a lock, which may lead to the following core trace:
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140705682900544) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140705682900544) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140705682900544, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffa92094476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffa9207a7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffa910783c3 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at ./src/common/assert.cc:75
#6 0x00007ffa91078525 in ceph::__ceph_assert_fail (ctx=...) at ./src/common/assert.cc:80
#7 0x00007ffa7049f602 in xlist<ObjectCacher::Object*>::size (this=0x7ffa20734638, this=0x7ffa20734638) at ./src/include/xlist.h:87
#8 operator<< (os=..., out=warning: RTTI symbol not found for class 'StackStringStream<4096ul>'
...) at ./src/osdc/ObjectCacher.h:760
#9 operator<< (out=warning: RTTI symbol not found for class 'StackStringStream<4096ul>'
..., in=...) at ./src/client/Inode.cc:80
#10 0x00007ffa7045545f in Client::ll_sync_inode (this=0x55958b8a5c60, in=in@entry=0x7ffa20734270, syncdataonly=syncdataonly@entry=false) at ./src/client/Client.cc:14717
#11 0x00007ffa703d0f75 in ceph_ll_sync_inode (cmount=cmount@entry=0x55958b0bd0d0, in=in@entry=0x7ffa20734270, syncdataonly=syncdataonly@entry=0) at ./src/libcephfs.cc:1865
#12 0x00007ffa9050ddc5 in fsal_ceph_ll_setattr (creds=<optimized out>, mask=<optimized out>, stx=0x7ff8983f25a0, i=<optimized out>, cmount=<optimized out>)
    at ./src/FSAL/FSAL_CEPH/statx_compat.h:209
#13 ceph_fsal_setattr2 (obj_hdl=0x7fecc8fefbe0, bypass=<optimized out>, state=<optimized out>, attrib_set=0x7ff8983f2830) at ./src/FSAL/FSAL_CEPH/handle.c:2410
#14 0x00007ffa92371da0 in mdcache_setattr2 (obj_hdl=0x7fecc9e98778, bypass=<optimized out>, state=0x7fef0d64c9b0, attrs=0x7ff8983f2830)
    at ../FSAL/Stackable_FSALs/FSAL_MDCACHE/./src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1012
#15 0x00007ffa922b2bbc in fsal_setattr (obj=0x7fecc9e98778, bypass=<optimized out>, state=0x7fef0d64c9b0, attr=0x7ff8983f2830) at ./src/FSAL/fsal_helper.c:573
#16 0x00007ffa9234c7bd in nfs4_op_setattr (op=0x7fecad7ac510, data=0x7fecac314a10, resp=0x7fecad1be200) at ../Protocols/NFS/./src/Protocols/NFS/nfs4_op_setattr.c:212
#17 0x00007ffa9232e413 in process_one_op (data=data@entry=0x7fecac314a10, status=status@entry=0x7ff8983f2a2c) at ../Protocols/NFS/./src/Protocols/NFS/nfs4_Compound.c:920
#18 0x00007ffa9232f9e0 in nfs4_Compound (arg=<optimized out>, req=0x7fecad491620, res=0x7fecac054580) at ../Protocols/NFS/./src/Protocols/NFS/nfs4_Compound.c:1327
#19 0x00007ffa922cb0ff in nfs_rpc_process_request (reqdata=0x7fecad491620) at ./src/MainNFSD/nfs_worker_thread.c:1508
#20 0x00007ffa92029be7 in svc_request (xprt=0x7fed640504d0, xdrs=<optimized out>) at ./src/svc_rqst.c:1202
#21 0x00007ffa9202df9a in svc_rqst_xprt_task_recv (wpe=<optimized out>) at ./src/svc_rqst.c:1183
#22 0x00007ffa9203344d in svc_rqst_epoll_loop (wpe=0x559594308e60) at ./src/svc_rqst.c:1564
#23 0x00007ffa920389e1 in work_pool_thread (arg=0x7feeb802ea10) at ./src/work_pool.c:184
#24 0x00007ffa920e6b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#25 0x00007ffa92178a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Upon further analysis of the call trace using GDB, both the _front and _back member variables in xlist<ObjectCacher::Object*> are set to zero, yet an assertion failure is still triggered.
(gdb) frame 7
#7 0x00007ffa7049f602 in xlist<ObjectCacher::Object*>::size (this=0x7ffa20734638, this=0x7ffa20734638) at ./src/include/xlist.h:87
87 ./src/include/xlist.h: No such file or directory.
(gdb) p *this
$1 = {_front = 0x0, _back = 0x0, _size = 0}
(gdb) frame 6
#6 0x00007ffa91078525 in ceph::__ceph_assert_fail (ctx=...) at ./src/common/assert.cc:80
80 ./src/common/assert.cc: No such file or directory.
(gdb) p ctx
$2 = (const ceph::assert_data &) @0x7ffa70587900: {assertion = 0x7ffa70530598 "(bool)_front == (bool)_size", file = 0x7ffa705305b4 "./src/include/xlist.h", line = 87,
  function = 0x7ffa7053b410 "size_t xlist<T>::size() const [with T = ObjectCacher::Object*; size_t = long unsigned int]"}

A race condition occurred, leading to abnormal behavior in the judgment.

[Fix]
It may not be necessary to print the entire Inode structure; simply printing the inode number should be sufficient.
There is an upstream commit that fixes this issue:
commit 2b78a5b3147d4e97be332ca88d286aec0ce44dc3
Author: Chengen Du <email address hidden>
Date: Mon Aug 12 18:17:37 2024 +0800

    client: Prevent race condition when printing Inode in ll_sync_inode

    In the ll_sync_inode function, the entire Inode structure is printed without
    holding a lock. This can lead to a race condition when evaluating the assertion
    in xlist<ObjectCacher::Object*>::size(), resulting in abnormal behavior.

    Fixes: https://tracker.ceph.com/issues/67491

    Co-authored-by: dongdong tao <email address hidden>
    Signed-off-by: Chengen Du <email address hidden>

[Test Plan]
The race condition might be challenging to reproduce, but we can test to ensure that the normal call path functions correctly.
1. Create a Manila share and mount it locally
openstack share type create nfs_share_type False --description "NFS share type"
openstack share create --share-type nfs_share_type --name my_share NFS 1
openstack share list
openstack share access create my_share ip XX.XX.XX.XX/XX
openstack share show my_share
sudo mount -t nfs <-export_location_path-> <-mountpoint->
2. Create a file and change its permissions, ensuring that all functions work correctly without any errors
touch test
chmod 755 test

[Where problems could occur]
The patch only modifies the log to prevent a race condition.
However, if there are any issues with the patch, it could disrupt Ceph's ll_sync_inode functionality, which is utilized when setting attributes on a manila share via the NFS protocol.

Tags: patch
Revision history for this message
Chengen Du (chengendu) wrote :

The problematic code is already wrapped by the lock in Focal, so no fix is needed.

Changed in ceph (Ubuntu Focal):
status: New → Won't Fix
assignee: nobody → Chengen Du (chengendu)
Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Jammy

Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Noble

Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Oracular

Changed in ceph (Ubuntu Jammy):
assignee: nobody → Chengen Du (chengendu)
Changed in ceph (Ubuntu Noble):
assignee: nobody → Chengen Du (chengendu)
Changed in ceph (Ubuntu Oracular):
assignee: nobody → Chengen Du (chengendu)
Changed in ceph (Ubuntu Jammy):
status: New → In Progress
Changed in ceph (Ubuntu Noble):
status: New → In Progress
Changed in ceph (Ubuntu Oracular):
status: New → In Progress
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "lp2078906-ceph-jammy.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote (last edit ):

Hi Chengen,

Thanks for the bug report and congrats on the upstream fix!

In Ubuntu, Ceph is owned by Openstack and Ceph Engineering team.

The usual policy for Ubuntu stable releases for Ceph (AFAI know/recall) is to prefer upstream point releases (so as to at least save on testing time with all bug-fixes/changes together), but I'm aware there are exceptions, especially more recently when upstream point releases and their adoption into Ubuntu doesn't happen as often.

So, please, let's first get a confirmation from James Page on which approch they would prefer:

A) Upstream backports (i.e., the master tracker has backports trackers open for Quincy/Reef/Squid; and Quincy would meet the needs/bug-tasks down to Jammy), then wait for and SRU the next upstream point releases;

B) Independently of upstream backports/point releases, SRU this individual patch.

I'll ask James to review this for input.

Thanks,
Mauricio

Changed in ceph (Ubuntu Oracular):
status: In Progress → Incomplete
Changed in ceph (Ubuntu Noble):
status: In Progress → Incomplete
Changed in ceph (Ubuntu Jammy):
status: In Progress → Incomplete
Changed in ceph (Ubuntu Oracular):
importance: Undecided → Medium
Changed in ceph (Ubuntu Noble):
importance: Undecided → Medium
Changed in ceph (Ubuntu Jammy):
importance: Undecided → Medium
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Chengen,

> [Test Plan]
> The race condition might be challenging to reproduce, [...]

You may want to look at/try synchronizing threads with GDB.
Please see bug 1994002 comment 21 for an example with QEMU.

Hope this helps,
Mauricio

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.