Prevent race condition when printing Inode in ll_sync_inode
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ceph (Ubuntu) |
Incomplete
|
Medium
|
Chengen Du | ||
Focal |
Won't Fix
|
Undecided
|
Chengen Du | ||
Jammy |
Incomplete
|
Medium
|
Chengen Du | ||
Noble |
Incomplete
|
Medium
|
Chengen Du | ||
Oracular |
Incomplete
|
Medium
|
Chengen Du |
Bug Description
[Impact]
In the ll_sync_inode function, the entire Inode structure is printed without holding a lock, which may lead to the following core trace:
#0 __pthread_
#1 __pthread_
#2 __GI___pthread_kill (threadid=
#3 0x00007ffa92094476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/
#4 0x00007ffa9207a7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffa910783c3 in ceph::_
#6 0x00007ffa91078525 in ceph::_
#7 0x00007ffa7049f602 in xlist<ObjectCac
#8 operator<< (os=..., out=warning: RTTI symbol not found for class 'StackStringStr
...) at ./src/osdc/
#9 operator<< (out=warning: RTTI symbol not found for class 'StackStringStr
..., in=...) at ./src/client/
#10 0x00007ffa7045545f in Client:
#11 0x00007ffa703d0f75 in ceph_ll_sync_inode (cmount=
#12 0x00007ffa9050ddc5 in fsal_ceph_
at ./src/FSAL/
#13 ceph_fsal_setattr2 (obj_hdl=
#14 0x00007ffa92371da0 in mdcache_setattr2 (obj_hdl=
at ../FSAL/
#15 0x00007ffa922b2bbc in fsal_setattr (obj=0x7fecc9e9
#16 0x00007ffa9234c7bd in nfs4_op_setattr (op=0x7fecad7ac510, data=0x7fecac31
#17 0x00007ffa9232e413 in process_one_op (data=data@
#18 0x00007ffa9232f9e0 in nfs4_Compound (arg=<optimized out>, req=0x7fecad491620, res=0x7fecac054580) at ../Protocols/
#19 0x00007ffa922cb0ff in nfs_rpc_
#20 0x00007ffa92029be7 in svc_request (xprt=0x7fed640
#21 0x00007ffa9202df9a in svc_rqst_
#22 0x00007ffa9203344d in svc_rqst_epoll_loop (wpe=0x55959430
#23 0x00007ffa920389e1 in work_pool_thread (arg=0x7feeb802
#24 0x00007ffa920e6b43 in start_thread (arg=<optimized out>) at ./nptl/
#25 0x00007ffa92178a00 in clone3 () at ../sysdeps/
Upon further analysis of the call trace using GDB, both the _front and _back member variables in xlist<ObjectCac
(gdb) frame 7
#7 0x00007ffa7049f602 in xlist<ObjectCac
87 ./src/include/
(gdb) p *this
$1 = {_front = 0x0, _back = 0x0, _size = 0}
(gdb) frame 6
#6 0x00007ffa91078525 in ceph::_
80 ./src/common/
(gdb) p ctx
$2 = (const ceph::assert_data &) @0x7ffa70587900: {assertion = 0x7ffa70530598 "(bool)_front == (bool)_size", file = 0x7ffa705305b4 "./src/
function = 0x7ffa7053b410 "size_t xlist<T>::size() const [with T = ObjectCacher:
A race condition occurred, leading to abnormal behavior in the judgment.
[Fix]
It may not be necessary to print the entire Inode structure; simply printing the inode number should be sufficient.
There is an upstream commit that fixes this issue:
commit 2b78a5b3147d4e9
Author: Chengen Du <email address hidden>
Date: Mon Aug 12 18:17:37 2024 +0800
client: Prevent race condition when printing Inode in ll_sync_inode
In the ll_sync_inode function, the entire Inode structure is printed without
holding a lock. This can lead to a race condition when evaluating the assertion
in xlist<ObjectCac
Fixes: https:/
Co-authored-by: dongdong tao <email address hidden>
Signed-off-by: Chengen Du <email address hidden>
[Test Plan]
The race condition might be challenging to reproduce, but we can test to ensure that the normal call path functions correctly.
1. Create a Manila share and mount it locally
openstack share type create nfs_share_type False --description "NFS share type"
openstack share create --share-type nfs_share_type --name my_share NFS 1
openstack share list
openstack share access create my_share ip XX.XX.XX.XX/XX
openstack share show my_share
sudo mount -t nfs <-export_
2. Create a file and change its permissions, ensuring that all functions work correctly without any errors
touch test
chmod 755 test
[Where problems could occur]
The patch only modifies the log to prevent a race condition.
However, if there are any issues with the patch, it could disrupt Ceph's ll_sync_inode functionality, which is utilized when setting attributes on a manila share via the NFS protocol.
The problematic code is already wrapped by the lock in Focal, so no fix is needed.