Comment 29 for bug 1442892

Revision history for this message
Cedric Jehasse (cedricj) wrote :

The trace i get is this:

SegvAnalysis:
 Segfault happened at: 0x7f022f4a25b9 <ftw_startup+105>: callq 0x7f022f438240 <__memset_sse2>
 PC (0x7f022f4a25b9) ok
 source "0x7f022f438240" (0x7f022f438240) ok
 destination "(%rsp)" (0x7fffbca06f40) not located in a known VMA region (needed writable region)!
 Stack memory exhausted (SP below stack segment)
SegvReason: writing unknown VMA
SourcePackage: aufs-tools
Stacktrace:
 #0 ftw_startup (dir=0x1081010 "/var/lib/docker/aufs/mnt/5f3306c6fad9d2fc750be5dd28b3d8d58db4eaa0d5947583d318829e2ea4326e", is_nftw=1, func=0x40149c, descriptors=1048566, flags=19) at ../sysdeps/wordsize-64/../../io/ftw.c:654
         data = {dirstreams = 0x7fffbca06f40, actdir = 0, maxdir = 1048566, dirbuf = 0xff0000 <error: Cannot access memory at address 0xff0000>, dirbufsize = 0, ftw = {base = 0, level = 0}, flags = 0, cvt_arr = 0x0, func = 0x0, dev = 0, known_objects = 0x0}
         st = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 1523, st_blksize = 4, st_blocks = 17345968, st_atim = {tv_sec = 1531, tv_nsec = 17345968}, st_mtim = {tv_sec = 48, tv_nsec = 17305728}, st_ctim = {tv_sec = 139647365056152, tv_nsec = 1048576}, __glibc_reserved = {19, 1048566, 4199580}}
         result = 0
         cwdfd = -1
         cwd = 0x0
         cp = <optimized out>
 #1 0x0000000000401d52 in ?? ()
 No symbol table info available.
 #2 0x00000000004013ec in ?? ()
 No symbol table info available.
 #3 0x00007f022f3c9830 in __libc_start_main (main=0x401266, argc=3, argv=0x7fffbd207228, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffbd207218) at ../csu/libc-start.c:291
         result = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -7392484191596224330, 4198768, 140736366408224, 0, 0, 7392628052837490870, 7452542804783751350}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x402820, 0x7f022f783ab0 <_dl_fini>}, data = {prev = 0x0, cleanup = 0x0, canceltype = 4204576}}}
         not_first_call = <optimized out>
 #4 0x0000000000401199 in ?? ()
 No symbol table info available.

I looked at the ftw_startup implementation in glibc, and think this line could be exhausting the stack memory:
 data.dirstreams = (struct dir_data **) alloca (data.maxdir
       * sizeof (struct dir_data *));

The stacktrace shows data.maxdir is 1048566.
data.maxdir is coming from the nopendfd parameter of ntfw.

Looking at plink.c from aufs-tools package shows it uses the current number of files limit as nopenfd:
 err = getrlimit(RLIMIT_NOFILE, &rlim);
 if (err)
  AuFin("getrlimit");
 nftw(cwd, func, rlim.rlim_cur - 10,
      FTW_PHYS | FTW_MOUNT | FTW_ACTIONRETVAL);

It looks like rlim.rlim_cur is 1048576 when auplink crashes.

Is auplink started by dockerd?
I can see on my system dockerd has it's max open files set to 1048576:
cat /proc/2603/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes unlimited unlimited processes
Max open files 1048576 1048576 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 30228 30228 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us