Crashdump Analysis: ------------------ Part 1) Where The BUG_ON() in fs.h:2583 originates from i_readcount_dec(). This function decrements the struct inode.i_readcount field, but first it checks if that got to zero before decrementing (which indeed indicates a bug with the i_readcount balance.) 2580 #ifdef CONFIG_IMA 2581 static inline void i_readcount_dec(struct inode *inode) 2582 { 2583 BUG_ON(!atomic_read(&inode->i_readcount)); 2584 atomic_dec(&inode->i_readcount); 2585 } So, that happened: i_readcount_dec() found i_readcount to be zero, which is not expected, and trigerred the BUG_ON() call. This is indeed called from __fput(): 187 static void __fput(struct file *file) ... 217 if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) 218 i_readcount_dec(inode); From the crashdump, we can confirm that i_readcount (value in the EAX register) is indeed zero, and that jumps to ud2 (undefined/invalid opcode in BUG()) at offset 0x223 = 547.) crash> disass -x __fput ... 0xffffffff81218ad2 <+418>: mov 0x154(%r13),%eax 0xffffffff81218ad9 <+425>: test %eax,%eax 0xffffffff81218adb <+427>: je 0xffffffff81218b53 <__fput+547> ... 0xffffffff81218b53 <+547>: ud2 crash> bt ... [exception RIP: __fput+547] ... RAX: 0000000000000000 RBX: ffff882a32191c00 RCX: 000000001b1a76dc ... Part 2) What Looking at which 'struct file' triggered this problem, we have 'struct inode.i_readcount' at R13 + 0x154, so inode is at R13 since i_readcount offset is 0x154. crash> struct -x -o inode.i_readcount struct inode { [0x154] atomic_t i_readcount; } crash> bt ... R13: ffff883f2ad40e90 R14: ffff887f6368d0a0 R15: ffff883f2ad0d200 ... So, inode = ffff883f2ad40e90 Checking the assembly for 'struct file', it's kept at RBx (above.) So, file = ffff882a32191c00 And the inode pointer in file does match the value we have, good. crash> struct -x file.f_inode ffff882a32191c00 f_inode = 0xffff883f2ad40e90 Now, walking up the file's dentry chain, we get the path: crash> struct -x file.f_path.dentry ffff882a32191c00 f_path.dentry = 0xffff883f2ad0d200 crash> struct -x dentry.d_name.name,d_parent 0xffff883f2ad0d200 d_name.name = 0xffff883f2ad0d238 "protocols" d_parent = 0xffff883f42a33080 crash> struct -x dentry.d_name.name,d_parent 0xffff883f42a33080 d_name.name = 0xffff883f42a330b8 "etc" d_parent = 0xffff883f5d747b00 crash> struct -x dentry.d_name.name,d_parent 0xffff883f5d747b00 d_name.name = 0xffff883f4e892b50 "7e8d17d0d767bad43bbab4953b457660e2ad9d61162efd00261db0b36c1f7558" d_parent = 0xffff883f619cc480 crash> struct -x dentry.d_name.name,d_parent 0xffff883f619cc480 d_name.name = 0xffff883f619cc4b8 "diff" d_parent = 0xffff883f619ccc00 crash> struct -x dentry.d_name.name,d_parent 0xffff883f619ccc00 d_name.name = 0xffff883f619ccc38 "aufs" d_parent = 0xffff883f61930780 crash> struct -x dentry.d_name.name,d_parent 0xffff883f61930780 d_name.name = 0xffff883f619307b8 "docker" d_parent = 0xffff887f5e8b9bc0 crash> struct -x dentry.d_name.name,d_parent 0xffff887f5e8b9bc0 d_name.name = 0xffff887f5e8b9bf8 "/" d_parent = 0xffff887f5e8b9bc0 And that file/path is on top of this mount point: crash> struct -x file.f_path.mnt ffff882a32191c00 f_path.mnt = 0xffff887f6368d0a0, crash> struct -x vfsmount.mnt_sb 0xffff887f6368d0a0 mnt_sb = 0xffff887f63a7c000 crash> mount | grep ffff887f63a7c000 ffff887f6368d080 ffff887f63a7c000 xfs /dev/nvme0n1 /opt/k8s So the full path to the file is: /opt/k8s/docker/aufs/diff/7e8d17d0d767bad43bbab4953b457660e2ad9d61162efd00261db0b36c1f7558/etc/protocols And indeed, iptables-save/restore use the /etc/procotols file to check for protocol/port numbers used by the iptables rules (code from iptables package version 1.6.0-2ubuntu3 on Ubuntu.) @ iptables/iptables-save.c iptables_save_main() -> do_output() -> print_rule4() -> print_proto() -> getprotobynumber() -> /etc/protocols @iptables/iptables-restore.c iptables_restore_main() -> do_command4() / case 'p' -> xtables_parse_protocol() -> getprotobyname() -> /etc/protocols Important: Interestingly the problem happened with the real file on disk rather than with the virtual file in the aufs mount point for the container kube-proxy (and iptables-save/restore) run from. Related: This user had another crashdump on Ubuntu Yakkety 4.8 kernel, with same file (/opt/k8s/docker/aufs/diff/.../etc/protocols). Part 3) Why Long story short. After exploring the possible paths that update i_readcount, and failing to find any theory or actual problem which may have caused it to become unbalanced negatively (additional decrements or less increments) finally this not so obvious "opposite" seemed to be the problem: i_readcount to become unbalanced _positively_ (e.g., due to additional increments), until it overflows a 32-bit integer limit (the atomic type) back to zero. And, sure enough, aufs has an (unpaired) i_readcount_inc(). Also, if affects the inode of the real file on disk, not of the virtual file in aufs, which matches our crashdump. ... This seemed unlikely but does make sense/explain the long time to reproduce the problem (25-35 days) for both users. And it also does make sense/explain several systems doing similar things then crashing around similar times. (on this particular user/example, the kube-proxy calls to iptables-save/restore happen as a response to kubernetes services changes and on a periodic basis too, thus since the systems are big, probably running many services, and changing services/network rules often, the rate of calls may have become high enough to trigger it over the weeks.) ... Part 4) Really? In order to confirm it, looking at the crashdump, there are several inodes in the underlying/backing filesystem; indeed with very unbalanced/large values in i_readcount. Looking at the inodes list of the superblock mentioned, let's list the top 20 values for i_readcount. crash> mount | grep ffff887f63a7c000 ffff887f6368d080 ffff887f63a7c000 xfs /dev/nvme0n1 /opt/k8s crash> struct -o super_block.s_inodes ffff887f63a7c000 struct super_block { [ffff887f63a7c608] struct list_head s_inodes; } crash> list -S inode.i_readcount.counter -l inode.i_sb_list -H ffff887f63a7c608 | grep i_readcount.counter | sort -rn -k3,3 | head -n20 i_readcount.counter = 755438589 i_readcount.counter = 14799006 i_readcount.counter = 11247257 i_readcount.counter = 11247257 i_readcount.counter = 11247242 i_readcount.counter = 11247242 i_readcount.counter = 11247242 i_readcount.counter = 11247242 i_readcount.counter = 11247242 i_readcount.counter = 11247242 i_readcount.counter = 6511562 i_readcount.counter = 5327637 i_readcount.counter = 3551757 i_readcount.counter = 1812946 i_readcount.counter = 1775876 i_readcount.counter = 1775876 i_readcount.counter = 1775876 i_readcount.counter = 1775875 i_readcount.counter = 1775871 i_readcount.counter = 1408817 Part 5) Really from aufs? It's interesting to examine other aufs files used by the kube-proxy container when running iptables-save/restore (as they are opened as well, thus should also show the symptom.) Obviously, /etc/protocols is accessed more often, as it's the file for which i_readcount increased so much that has overflown back to zero and triggered a crash. (opened on a "per-iptables rule with port number" basis.) But for the same reason, it's not 'large' anymore now. So, let's look at other files used by iptables-save/restore (and other programs), say the standard C library, libc.so, which are thus expected have large i_readcount. Part 5.1) crash scripting The hash in the file path is an aufs branch/layer identifier, which is referred to by the aufs superblock info structure: /opt/k8s/docker/aufs/diff/7e8d17d0d767bad43bbab4953b457660e2ad9d61162efd00261db0b36c1f7558/etc/protocols Now there is some crash scripting to navigate through the aufs structures/specific fields while keeping pointers as reference for their info, so that we can backtrack to use the structures that lead us to a particular value later. We want to find the aufs superblock with that branch hash. Notes: - struct super_block.s_fs_info (pointer to struct au_sbinfo) - struct au_sbinfo.si_branch (array of pointers to struct au_branch, with si_bend elements) - struct au_branch.br_path.dentry (pointer to dentry of that branch/layer in disk) - struct dentry.d_name.name (string with the dentry's basename in disk) Idea: crash> < list-of-commands | tee list-of-commands-output | shell glue to generate more commands > list.of-commands.next Therefore: 0) Load debuginfo for aufs kernel module: crash> mod -s aufs /usr/lib/debug/lib/modules/4.4.0-133-generic/kernel/fs/aufs/aufs.ko 1) For each aufs superblock, output commands to print its super_block.s_fs_info (i.e., au_sbinfo pointer) crash> mount | awk '/aufs/ { print "struct super_block.s_fs_info", $2 }' > crash.struct-sb-s_fs_info.script 2) For each au_sbinfo, output commands to print its si_bend (number of branches) and si_branch array pointer crash> < crash.struct-sb-s_fs_info.script | tee crash.struct-sb-s_fs_info.output | awk '/s_fs_info =/ { print "struct au_sbinfo.si_bend,si_branch", $3 }' > crash.struct-au_sbinfo_bend_branch.script 3) For each si_branch array pointer, output commands to read its si_bend elements (i.e., si_branch pointers) crash> < crash.struct-au_sbinfo_bend_branch.script | tee crash.struct-au_sbinfo_bend_branch.output | awk '/si_bend =/ { bend=$3 } /si_branch =/ { br=$3; print "rd -64", br, bend }' > crash.rd-64-si_branch-si_bend.script 4) For each si_branch, output commands to print its dentry (i.e., directory on disk used by this branch) crash> < crash.rd-64-si_branch-si_bend.script | tee crash.rd-64-si_branch-si_bend.output | grep -v '^crash>' | cut -d: -f2- | grep -wo '[0-9a-f]\{16\}' | sed 's/^/struct au_branch.br_path.dentry /' > crash.au_br.br_path.dentry.script 5) For each dentry, output commands to print its name (i.e., name of the directory) crash> < crash.au_br.br_path.dentry.script | tee crash.au_br.br_path.dentry.output | awk '/br_path.dentry =/ { print "struct dentry.d_name.name", $3 }' > crash.dentry-d_name-name.script 6) Ran those commands, and save the name of the directories. crash> < crash.dentry-d_name-name.script > crash.dentry-d_name-name.output Backtracking through the intermediary .output files (which contain the command used to generate output) from the branch directory we're after, up until its superblock: The branch hash is 7e8d17d0d767bad43bbab4953b457660e2ad9d61162efd00261db0b36c1f7558, from the path: /opt/k8s/docker/aufs/diff/7e8d17d0d767bad43bbab4953b457660e2ad9d61162efd00261db0b36c1f7558/etc/protocols 1) Find the dentry with that name: $ grep -B1 7e8d17d0d767bad43bbab4953b457660e2ad9d61162efd00261db0b36c1f7558 crash.dentry-d_name-name.output crash> struct dentry.d_name.name 0xffff883f5d747b00 d_name.name = 0xffff883f4e892b50 "7e8d17d0d767bad43bbab4953b457660e2ad9d61162efd00261db0b36c1f7558" 2) Find the au_branch with that dentry: $ grep -B1 0xffff883f5d747b00 crash.au_br.br_path.dentry.output crash> struct au_branch.br_path.dentry ffff887f61fcc800 br_path.dentry = 0xffff883f5d747b00 3) Find the au_branch array with that au_branch: $ grep -B2 ffff887f61fcc800 crash.rd-64-si_branch-si_bend.output crash> rd -64 0xffff883f4dd773c0 0x4 ffff883f4dd773c0: ffff887f61fce600 ffff887f61fcc600 ...a.......a.... ffff883f4dd773d0: ffff887f61fcc700 ffff887f61fcc800 ...a.......a.... 4) Find the au_sbinfo with that au_branch array: $ grep -B2 0xffff883f4dd773c0 crash.struct-au_sbinfo_bend_branch.output crash> struct au_sbinfo.si_bend,si_branch 0xffff887f5ecdd000 si_bend = 0x4 si_branch = 0xffff883f4dd773c0 5) Find the super_block with that au_sbinfo: $ grep -B1 0xffff887f5ecdd000 crash.struct-sb-s_fs_info.output crash> struct super_block.s_fs_info ffff883f63ee8800 s_fs_info = 0xffff887f5ecdd000 6) And that super_block / mount is: crash> mount | grep ffff883f63ee8800 ffff887f5ecc7000 ffff883f63ee8800 aufs none /opt/bb/docker/aufs/mnt/7740aebe2a1a983b5b703067fc04455a33ce42942f6f930f426f62552a70b958 Part 5.2) Checking the libc.so dentry/inode and its backing inode i_readcount value. Navigating through the: -> aufs superblock -> root dentry -> subdirs -> lib dentry -> subdirs -> x86_64-linux-gnu dentry -> subdirs -> libc-*.so dentry -> backing dentry/inode - root dentry/subdirs: crash> * super_block.s_root ffff883f63ee8800 s_root = 0xffff883f61805140 crash> * -o dentry.d_subdirs 0xffff883f61805140 struct dentry { [ffff883f618051e0] struct list_head d_subdirs; } - lib dentry/subdirs: crash> * dentry.d_inode^Cxffff887f510f9140 crash> list -s dentry.d_name.name -l dentry.d_child -H ffff883f618051e0 | grep -B1 -w lib ffff887f4e423e90 d_name.name = 0xffff887f4e423e38 "lib" crash> * -o dentry.d_subdirs -l dentry.d_child ffff887f4e423e90 struct dentry { [ffff887f4e423ea0] struct list_head d_subdirs; } - x86_64-linux-gnu dentry/subdirs: crash> list -s dentry.d_name.name -l dentry.d_child -H ffff887f4e423ea0 | grep -B1 x86_64-linux-gnu ffff887f4e420990 d_name.name = 0xffff887f4e420938 "x86_64-linux-gnu" crash> * -o dentry.d_subdirs -l dentry.d_child ffff887f4e420990 struct dentry { [ffff887f4e4209a0] struct list_head d_subdirs; } - libc-*.so file crash> list -s dentry.d_name.name -l dentry.d_child -H ffff887f4e4209a0 | grep -B1 'libc-.*so' ffff887f4e75c090 d_name.name = 0xffff887f4e75c038 "libc-2.24.so" Checking its superblock matches the above: (yes) and that its inode file operations is aufs (yes)s crash> * dentry.d_sb -l dentry.d_child ffff887f4e75c090 d_sb = 0xffff883f63ee8800 crash> * dentry.d_inode -l dentry.d_child ffff887f4e75c090 d_inode = 0xffff887f4e768358 crash> * inode.i_fop 0xffff887f4e768358 i_fop = 0xffffffffc02c1ce0 Now, navigating through the aufs structures linked from the dentry in aufs to the dentry/inode in the underlying filesystem/disk. The dentry.d_fsdata field points to: - aufs struct au_dinfo (dentry info) crash> * -x dentry.d_fsdata -l dentry.d_child ffff887f4e75c090 d_fsdata = 0xffff887f4e75e000 Which points to an array of aufs struct au_hdentry (host/hard? dentry) - aufs struct au_hdentry array pointed by au_dinfo.di_hdentry (with di_bend+1 elements, of size 0x10) crash> * -x au_dinfo.di_hdentry,di_bstart,di_bend 0xffff887f4e75e000 di_hdentry = 0xffff887f50644000 di_bstart = 0x4 di_bend = 0x4 crash> * -x au_hdentry struct au_hdentry { struct dentry *hd_dentry; aufs_bindex_t hd_id; } SIZE: 0x10 And looking at elements until 0x4 (i.e, at offset 0x4*0x10) crash> * -x au_hdentry.hd_dentry 0xffff887f50644000 hd_dentry = 0x0 crash> * -x au_hdentry.hd_dentry 0xffff887f50644010 hd_dentry = 0x0 crash> * -x au_hdentry.hd_dentry 0xffff887f50644020 hd_dentry = 0x0 crash> * -x au_hdentry.hd_dentry 0xffff887f50644030 hd_dentry = 0x0 crash> * -x au_hdentry.hd_dentry 0xffff887f50644040 hd_dentry = 0xffff887f4e75c300 And this points to the dentry on the real filesystem. crash> struct dentry.d_name.name 0xffff887f4e75c300 d_name.name = 0xffff887f4e75c338 "libc-2.24.so" And finally, looking at the backing inode: crash> struct dentry.d_inode 0xffff887f4e75c300 d_inode = 0xffff887f4e7701d0 crash> struct inode.i_readcount.counter 0xffff887f4e7701d0 i_readcount.counter = 11247257 We can see it's fairly unbalanced towards a large number. And it's the _TOP 3_ i_readcount value observed up above! (which is expected, as libc is definitely opened often.)