enosys terminates the user session on Azure 4.13

Bug #1757967 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Fix Released
High
Colin Ian King

Bug Description

Identical to bug 1755358, user session will be terminated with running the enosys test in the ubuntu_stress_smoke test.

This happens on an Azure instance Standard-H16mr, looks like this is not affecting running the test remotely from the jenkins server, just occurs when I'm running this manually.

please find the attachment for the syslog.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.13.0-1012-azure 4.13.0-1012.15
ProcVersionSignature: User Name 4.13.0-1012.15-username 4.13.13
Uname: Linux 4.13.0-1012-azure x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
Date: Thu Mar 22 06:37:01 2018
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-azure
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

It seems that the stress-ng log messages from the enosys testcase are:

Mar 22 06:27:10 x-la-azure-4-13-0-Standard-H16mr-stress-smoke-test stress-ng: info: [14237] dispatching hogs: 4 enosys
Mar 22 06:27:10 x-la-azure-4-13-0-Standard-H16mr-stress-smoke-test stress-ng: info: [14237] cache allocate: using built-in defaults as unable to determine cache details
Mar 22 06:27:10 x-la-azure-4-13-0-Standard-H16mr-stress-smoke-test kernel: [ 513.698028] signal_fault: 5 callbacks suppressed
Mar 22 06:27:10 x-la-azure-4-13-0-Standard-H16mr-stress-smoke-test kernel: [ 513.698030] stress-ng-enosy[14972] bad frame in x32 rt_sigreturn frame:00007ffc115313c0 ip:40128f5c28f5c28f sp:500cd8a6c9bbb800 orax:ffffffffffffffff
Mar 22 06:27:10 x-la-azure-4-13-0-Standard-H16mr-stress-smoke-test kernel: [ 513.699810] stress-ng-enosy[14993] bad frame in x32 rt_sigreturn frame:00007ffc115313c0 ip:40128f5c28f5c28f sp:500cd8a6c9bbb800 orax:ffffffffffffffff
Mar 22 06:27:10 x-la-azure-4-13-0-Standard-H16mr-stress-smoke-test kernel: [ 513.700422] stress-ng-enosy[15001] bad frame in x32 rt_sigreturn frame:00007ffc115313c0 ip:40128f5c28f5c28f sp:500cd8a6c9bbb800 orax:ffffffffffffffff
Mar 22 06:27:10 x-la-azure-4-13-0-Standard-H16mr-stress-smoke-test kernel: [ 513.702243] stress-ng-enosy[15022] bad frame in x32 rt_sigreturn frame:00007ffc115313c0 ip:40128f5c28f5c28f sp:500cd8a6c9bbb800 orax:ffffffffffffffff
Mar 22 06:27:13 x-la-azure-4-13-0-Standard-H16mr-stress-smoke-test stress-ng: info: [14237] successful run completed in 2.96s

Revision history for this message
Colin Ian King (colin-king) wrote :

"bad frame in x32 rt_sigreturn" kernel messages are OK, it's because the enosys stressor is calling an x32 rt_sigreturn() in the incorrect context and the kernel flags this up as a misbehaving application.

Revision history for this message
Colin Ian King (colin-king) wrote :

I thought the test exiting early issues were fixed with commits:

http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=d0e42d6315403fd35f681275cc32ec1e466d96c0
http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=ed8729f09e13887de97a530ed293de9086bff67c

Is this still breaking? If so, if I can get access to a host and I can try and debug it further.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi Colin, yes I can still reproduce this (enosys terminates my SSH session) on Standard_H16mr

I will send you the info for this system.

Revision history for this message
Colin Ian King (colin-king) wrote :

This occurs because we are calling aliases of vhangup that terminate the session. Added mask'd detection of this system call.

Fix committed: http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=94e40091651659cf08a1ba11965e56102d15d87a

Changed in linux-azure (Ubuntu):
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Verified on the very same node, and this fix works.
Thanks!

Changed in linux-azure (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.