Root Cause: ----------
Note: this is completely independent of Kubernetes.
The aufs filesystem calls i_readcount_inc() when opening a file in read-only mode, not paired with an i_readcount_dec().
@ fs/aufs/vfsub.c
struct file *vfsub_dentry_open(struct path *path, int flags) { struct file *file;
file = dentry_open(path, flags /* | __FMODE_NONOTIFY */, current_cred()); if (!IS_ERR_OR_NULL(file) && (file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) i_readcount_inc(d_inode(path->dentry));
return file; }
That is _incorrect_ as only the VFS layer should maintain the 'struct inode.i_readcount' value.
Neither of i_readcount_inc() or i_readcount_dec() should happen there. They don't exist out of VFS on Linux tree.
So,
If the same file is opened in read-only mode so many times, its backing inode.i_readcount value overflows back to zero.
Once that happens, when the file is closed, __fput() calls i_readcount_dec(), and that will trigger the BUG_ON().
That causes a kernel panic/crash if panic_on_oops is set; otherwise, just kernel messages.
By default it's not, but usually the 'enterprise'/larger users set it so to save kernel crashdumps on such errors.
See the 'Problem Demonstration / Instrumentation' section to watch the number to overflow and hit the BUG_ON/panic.
Root Cause:
----------
Note: this is completely independent of Kubernetes.
The aufs filesystem calls i_readcount_inc() when opening a
file in read-only mode, not paired with an i_readcount_dec().
@ fs/aufs/vfsub.c
struct file *vfsub_ dentry_ open(struct path *path, int flags)
{
struct file *file;
file = dentry_open(path, flags /* | __FMODE_NONOTIFY */,
current_ cred()) ; OR_NULL( file)
i_ readcount_ inc(d_inode( path->dentry) );
if (!IS_ERR_
&& (file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
return file;
}
That is _incorrect_ as only the VFS layer should maintain
the 'struct inode.i_readcount' value.
Neither of i_readcount_inc() or i_readcount_dec() should
happen there. They don't exist out of VFS on Linux tree.
So,
If the same file is opened in read-only mode so many times,
its backing inode.i_readcount value overflows back to zero.
Once that happens, when the file is closed, __fput() calls
i_readcount_dec(), and that will trigger the BUG_ON().
That causes a kernel panic/crash if panic_on_oops is set;
otherwise, just kernel messages.
By default it's not, but usually the 'enterprise'/larger
users set it so to save kernel crashdumps on such errors.
See the 'Problem Demonstration / Instrumentation' section
to watch the number to overflow and hit the BUG_ON/panic.