Note that the following is not final statement but sharing some thoughts to whoever else is looking at this report (and for me to remember). So while I did find nothing that really looked odd in the xen-netfront code I saw there was some change to the generic timer code:
commit 1dabbcec2c0a36fe43509d06499b9e512e70a028
timer: Use hlist for the timer wheel hash buckets
That change was part of 4.2 but if it would be the cause I would expect problems not only on AWS instances. But then it might just be that bare-metal servers with a similarly high traffic tend to be upgraded much less often.... anyway... Part of the change above seems to be some exchange of special meaning of list pointer values. Not sure I grasp the implications, yet. While using double linked lists before, the pointer to the next element seemed to serve as pending indicator and the pointer to the previous element was invalidated with a LIST_POISON2 value. Now its the other way round. Referring to the detach_timer function which is called from __run_timers via detached_expired_timer.
The crash happens at offset 0x116 in run_timer_softirq (thats 278 decimal). The disassembly of that function around there is:
0xffffffff810e5c1e <+254>: mov %r15,0x8(%rbx)
0xffffffff810e5c22 <+258>: nopl 0x0(%rax,%rax,1)
// Guest this is __hlist_del(struct hlist_node *n)
// rax = n->next
0xffffffff810e5c27 <+263>: mov (%r15),%rax
// rdx = n->ppev
0xffffffff810e5c2a <+266>: mov 0x8(%r15),%rdx
0xffffffff810e5c2e <+270>: test %rax,%rax
// *(n->pprev) = n->next
0xffffffff810e5c31 <+273>: mov %rax,(%rdx)
// if (n->next == NULL) jump
0xffffffff810e5c34 <+276>: je 0xffffffff810e5c3a <run_timer_softirq+282>
// (n->next)->pprev = n->pprev (but n->next is LIST_POISON2 / invalid ptr)
0xffffffff810e5c36 <+278>: mov %rdx,0x8(%rax)
0xffffffff810e5c3a <+282>: testb $0x10,0x2a(%r15)
// here we seem back at detach_timer inlined and clear_pending assumed true
// entry->next = LIST_POISON2 and entry->pprev = NULL
0xffffffff810e5c3f <+287>: movabs $0xdead000000200200,%rax
0xffffffff810e5c49 <+297>: movq $0x0,0x8(%r15)
0xffffffff810e5c51 <+305>: mov %rax,(%r15)
Note that the following is not final statement but sharing some thoughts to whoever else is looking at this report (and for me to remember). So while I did find nothing that really looked odd in the xen-netfront code I saw there was some change to the generic timer code:
commit 1dabbcec2c0a36f e43509d06499b9e 512e70a028
timer: Use hlist for the timer wheel hash buckets
That change was part of 4.2 but if it would be the cause I would expect problems not only on AWS instances. But then it might just be that bare-metal servers with a similarly high traffic tend to be upgraded much less often.... anyway... Part of the change above seems to be some exchange of special meaning of list pointer values. Not sure I grasp the implications, yet. While using double linked lists before, the pointer to the next element seemed to serve as pending indicator and the pointer to the previous element was invalidated with a LIST_POISON2 value. Now its the other way round. Referring to the detach_timer function which is called from __run_timers via detached_ expired_ timer.
The crash happens at offset 0x116 in run_timer_softirq (thats 278 decimal). The disassembly of that function around there is:
0xffffffff81 0e5c1e <+254>: mov %r15,0x8(%rbx) 0e5c22 <+258>: nopl 0x0(%rax,%rax,1) 0e5c27 <+263>: mov (%r15),%rax 0e5c2a <+266>: mov 0x8(%r15),%rdx 0e5c2e <+270>: test %rax,%rax 0e5c31 <+273>: mov %rax,(%rdx) 0e5c34 <+276>: je 0xffffffff810e5c3a <run_timer_ softirq+ 282> 0e5c36 <+278>: mov %rdx,0x8(%rax) 0e5c3a <+282>: testb $0x10,0x2a(%r15) 0e5c3f <+287>: movabs $0xdead00000020 0200,%rax 0e5c49 <+297>: movq $0x0,0x8(%r15) 0e5c51 <+305>: mov %rax,(%r15)
0xffffffff81
// Guest this is __hlist_del(struct hlist_node *n)
// rax = n->next
0xffffffff81
// rdx = n->ppev
0xffffffff81
0xffffffff81
// *(n->pprev) = n->next
0xffffffff81
// if (n->next == NULL) jump
0xffffffff81
// (n->next)->pprev = n->pprev (but n->next is LIST_POISON2 / invalid ptr)
0xffffffff81
0xffffffff81
// here we seem back at detach_timer inlined and clear_pending assumed true
// entry->next = LIST_POISON2 and entry->pprev = NULL
0xffffffff81
0xffffffff81
0xffffffff81