juju ssh results in a panic: runtime error

Bug #1347322 reported by Matt Bruzek on 2014-07-23
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Medium
Unassigned
juju-core (Ubuntu)
Undecided
Unassigned

Bug Description

I am using Juju on Power 8 hardware and I get a panic when I use juju to ssh to a system. The reproduction steps are

juju init
juju bootstrap -e local
juju switch local
juju deploy local:trusty/ubuntu
juju ssh ubuntu/0

(use the session to the local machine and after some time the panic happens and the screen is unreadable.)

ubuntu@ubuntu-local-machine-1:~$ panic: runtime error: invalid memory address or nil pointer dereference
   [signal 0xb code=0x1 addr=0x8]

                                 goroutine 11 [running]:
                                                        code.google.com_p_go.net_websocket.Send.N40_code.
google.com_p_go.net_websocket.Codec
                                        /build/buildd/juju-core-1.18.1/src/code.google.com/p/go.net/webso
cket/websocket.go:293
                         launchpad.net_juju_core_rpc_jsoncodec.Send.N48_launchpad.net_juju_core_rpc_jsonc
odec.wsJSONConn
                        /build/buildd/juju-core-1.18.1/src/launchpad.net/juju-core/rpc/jsoncodec/conn.go:
21
      launchpad.net_juju_core_rpc_jsoncodec.WriteMessage.pN43_launchpad.net_juju_core_rpc_jsoncodec.Codec
        /build/buildd/juju-core-1.18.1/src/launchpad.net/juju-core/rpc/jsoncodec/codec.go:178
                                                                                             launchpad.ne
t_juju_core_rpc.send.pN32_launchpad.net_juju_core_rpc.Conn
                                                                /build/buildd/juju-core-1.18.1/src/launch
pad.net/juju-core/rpc/client.go:72
                                      launchpad.net_juju_core_rpc.Go.pN32_launchpad.net_juju_core_rpc.Con
n
        /build/buildd/juju-core-1.18.1/src/launchpad.net/juju-core/rpc/client.go:174
                                                                                    launchpad.net_juju_co
re_rpc.Call.pN32_launchpad.net_juju_core_rpc.Conn
                                                        /build/buildd/juju-core-1.18.1/src/launchpad.net/
juju-core/rpc/client.go:148
                               launchpad.net_juju_core_state_api.Call.pN39_launchpad.net_juju_core_state_
api.State
                /build/buildd/juju-core-1.18.1/src/launchpad.net/juju-core/state/api/apiclient.go:168
launchpad.net_juju_core_state_api.Ping.pN39_launchpad.net_juju_core_state_api.State
                                                                                        /build/buildd/juj
u-core-1.18.1/src/launchpad.net/juju-core/state/api/apiclient.go:158
                                                                        launchpad.net_juju_core_state_api
.heartbeatMonitor.pN39_launchpad.net_juju_core_state_api.State
                                                                        /build/buildd/juju-core-1.18.1/sr
c/launchpad.net/juju-core/state/api/apiclient.go:149
                                                        created by launchpad.net_juju_core_state_api.Open
        /build/buildd/juju-core-1.18.1/src/launchpad.net/juju-core/state/api/apiclient.go:143

                                                                                             goroutine 1
[syscall]:
                goroutine in C code; stack unavailable

                                                      goroutine 3 [syscall]:
                                                                                goroutine in C code; stac
k unavailable

                 goroutine 10 [IO wait]:
                                        code.google.com_p_go.net_websocket.ReadByte.N57_code.google.com_p
_go.net_websocket.hybiFrameReaderFactory
                                                /build/buildd/juju-core-1.18.1/src/code.google.com/p/go.n
et/websocket/hybi.go:113
                            code.google.com_p_go.net_websocket.NewFrameReader.N57_code.google.com_p_go.ne
t_websocket.hybiFrameReaderFactory
                                        /build/buildd/juju-core-1.18.1/src/code.google.com/p/go.net/webso
cket/hybi.go:126
                    code.google.com_p_go.net_websocket.Receive.N40_code.google.com_p_go.net_websocket.Cod
ec
        /build/buildd/juju-core-1.18.1/src/code.google.com/p/go.net/websocket/websocket.go:314
                                                                                              launchpad.n
et_juju_core_rpc_jsoncodec.Receive.N48_launchpad.net_juju_core_rpc_jsoncodec.wsJSONConn
                                                                                                /build/bu
ildd/juju-core-1.18.1/src/launchpad.net/juju-core/rpc/jsoncodec/conn.go:25
                                                                              launchpad.net_juju_core_rpc
_jsoncodec.ReadHeader.pN43_launchpad.net_juju_core_rpc_jsoncodec.Codec
                                                                                /build/buildd/juju-core-1
.18.1/src/launchpad.net/juju-core/rpc/jsoncodec/codec.go:113
                                                                launchpad.net_juju_core_rpc.loop.pN32_lau
nchpad.net_juju_core_rpc.Conn
                                        /build/buildd/juju-core-1.18.1/src/launchpad.net/juju-core/rpc/se
rver.go:344
               launchpad.net_juju_core_rpc.input.pN32_launchpad.net_juju_core_rpc.Conn
                                                                                        /build/buildd/juj
u-core-1.18.1/src/launchpad.net/juju-core/rpc/server.go:317
                                                               created by launchpad.net_juju_core_rpc.Sta
rt.pN32_launchpad.net_juju_core_rpc.Conn
                                                /build/buildd/juju-core-1.18.1/src/launchpad.net/juju-cor
e/rpc/server.go:200

Here are the specifics on the host system.

ubuntu@stilson-01:~$ uname -a
Linux stilson-01 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:50:31 UTC 2014 ppc64le ppc64le ppc64le GN
U/Linux
ubuntu@stilson-01:~$ dpkg -l | grep juju
ii juju 1.18.1-0ubuntu1.1 all next generation
 service orchestration system
ii juju-core 1.18.1-0ubuntu1.1 ppc64el Juju is devops
distilled - client
ii juju-deployer 0.3.6-0ubuntu2 all Deploy complex
stacks of services using Juju
ii juju-jitsu 0.20-1 all external tools
to enhance juju
ii juju-local 1.18.1-0ubuntu1.1 all dependency pack
age for the Juju local provider
ii juju-mongodb 2.4.9-0ubuntu3 ppc64el MongoDB object/
document-oriented database for Juju
ii juju-quickstart 1.4.1+bzr88+ppa25~ubuntu14.04.1 all Easy configurat
ion of Juju environments
ii python-jujuclient 0.17.5-0ubuntu2 all Python API clie
nt for juju
ubuntu@stilson-01:~$ getconf PAGE_SIZE
65536

I got this error when testing a fix for another bug: https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754

Since the dmesg looks different we believe this is a new problem.

Curtis Hovey (sinzui) on 2014-07-23
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → next-stable
tags: added: ppc64el
Antonio Rosales (arosales) wrote :

Note, we have gotten an initial report from the field as to this failing and it is blocking Juju Power deployments.

-thanks,
Antonio

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in juju-core (Ubuntu):
status: New → Confirmed
Nate Finch (natefinch) wrote :

Is there a more complete log output? This seems like just a snippet.

Matt Bruzek (mbruzek) wrote :
Steve Langasek (vorlon) on 2014-07-23
description: updated
Steve Langasek (vorlon) on 2014-07-23
description: updated
Steve Langasek (vorlon) wrote :

I have tried to reproduce this problem in utopic on rockne (a P7+ system), and it's not reproducible for me.

# grep P /proc/cpuinfo
cpu : POWER7+ (raw), altivec supported
cpu : POWER7+ (raw), altivec supported
machine : CHRP IBM pSeries (emulated by qemu)
# uname -a
Linux rockne-06 3.13.0-22-generic #44-Ubuntu SMP Wed Apr 2 20:06:28 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
# zgrep 64K_PAGES /proc/config.gz
CONFIG_PPC_64K_PAGES=y
# sudo apt-get install lxc
# lxc-create -n utopic-test -t ubuntu -- --release utopic
# sed -i -e'/lxc.aa_profile/d' /var/lib/lxc/utopic-test/config
# echo 'lxc.aa_profile = lxc-container-default-with-nesting' >> /var/lib/lxc/utopic-test/config
# echo 'lxc.mount.auto = cgroup' >> /var/lib/lxc/utopic-test/config
# lxc-start --name utopic-test
[...]
Ubuntu Utopic Unicorn (development branch) utopic-test console

utopic-test login: ubuntu
Password: ubuntu
[...]
ubuntu@utopic-test:~$ sudo apt-get install juju juju-local bzr
$ mkdir -p charms/trusty
$ bzr branch http://bazaar.launchpad.net/~charmers/charms/trusty/ubuntu/trunk/ charms/trusty/ubuntu
$ juju init
$ juju bootstrap -e local
$ juju switch local
$ juju deploy --repository=$(pwd)/charms local:trusty/ubuntu
$ juju ssh ubuntu/0
[...]
ubuntu@ubuntu-local-machine-1:~$ /var/lib/juju/tools/machine-1/jujud --version
1.18.4.1-trusty-ppc64
$

I've stayed logged in for over 10 minutes with no problems, have installed packages inside the container, etc.

So there are several variables here:

 - utopic has juju 1.18.4, not 1.18.1.
 - the system is a P7+, not a P8 as in the original report.
 - this is built with the utopic gccgo-4.9 package, not the backported one in trusty.

The last of these is unlikely to be the cause. Someone with access to a P8 machine should ideally try to reproduce with this same method there, to check whether this is a P8-specific issue.

Matt Bruzek (mbruzek) wrote :
Download full text (6.6 KiB)

With sinzui's help I have compiled the latest version of Juju 1.20.2 with the gccgo-4.9_4.9.1-1ubuntu3_ppc64el.deb.

NOT able to reproduce the “juju ssh” panic/problem having several ssh sessions open simultaneously and having output streaming for over 2 hours.

For clarity and review here are the details of how we build the deb packages for Juju with the gccgo 4.9.1 compiler on the CI machine stilson-09 (a P8 64k page size machine).

ubuntu@stilson-09:~⟫ uname -a
Linux stilson-09 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:29:18 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
ubuntu@stilson-09:~⟫ getconf PAGE_SIZE
65536
ubuntu@stilson-09:~⟫ more /proc/cpuinfo
processor : 0
cpu : POWER8E (raw), altivec supported
clock : 4116.000000MHz
revision : 2.0 (pvr 004b 0200)

processor : 1
cpu : POWER8E (raw), altivec supported
clock : 4116.000000MHz
revision : 2.0 (pvr 004b 0200)

timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)

Downloaded the following packages from: https://launchpad.net/ubuntu/utopic/ppc64el/gccgo-4.9/4.9.1-1ubuntu3

binutils_2.24.51.20140709-1ubuntu1_ppc64el.deb
cpp-4.9_4.9.1-1ubuntu3_ppc64el.deb
gcc-4.9_4.9.1-1ubuntu3_ppc64el.deb
gcc-4.9-base_4.9.1-1ubuntu3_ppc64el.deb
gccgo-4.9_4.9.1-1ubuntu3_ppc64el.deb
libatomic1_4.9.1-1ubuntu3_ppc64el.deb
libgcc1_4.9.1-1ubuntu3_ppc64el.deb
libgcc-4.9-dev_4.9.1-1ubuntu3_ppc64el.deb
libgo5_4.9.1-1ubuntu3_ppc64el.deb
libgomp1_4.9.1-1ubuntu3_ppc64el.deb
libitm1_4.9.1-1ubuntu3_ppc64el.deb

Installed the packages on stilson-09:

ubuntu@stilson-09:~/gccgo⟫ sudo dpkg -i *.deb

ubuntu@stilson-09:~/gccgo⟫ dpkg -l | grep gcc
ii gcc 4:4.8.2-1ubuntu6 ppc64el GNU C compiler
ii gcc-4.8 4.8.3-3ubuntu0.2 ppc64el GNU C compiler
ii gcc-4.8-base:ppc64el 4.8.3-3ubuntu0.2 ppc64el GCC, the GNU Compiler Collection (base package)
ii gcc-4.9 4.9.1-1ubuntu3 ppc64el GNU C compiler
ii gcc-4.9-base:ppc64el 4.9.1-1ubuntu3 ppc64el GCC, the GNU Compiler Collection (base package)
ii gccgo 4:4.9-1ubuntu6 ppc64el Go compiler, based on the GCC backend
ii gccgo-4.9 4.9.1-1ubuntu3 ppc64el GNU Go compiler
ii gccgo-go 1.2.1-0ubuntu1 ppc64el Go tool for use with gccgo
ii libgcc-4.8-dev:ppc64el 4.8.3-3ubuntu0.2 ppc64el GCC support library (development files)
ii libgcc-4.9-dev:ppc64el 4.9.1-1ubuntu3 ppc64el GCC support library (development files)
ii libgcc1:ppc64el 1:4.9.1-1ubuntu3 ppc64el GCC support library

Built Juju from source on stilson-09:

<sinzui> ~/Work/juju-release-tools/make-source-packages.bash stable juju-core_1.20.2.tar.gz 'Curtis Hovey <curtis.hovey@cano...

Read more...

Download full text (15.2 KiB)

btw. This has nothing to do with ssh, it is to do with the length of
time the juju cli process is running for. Normally this is a very
short period, but juju ssh forks ssh as a child then waits for it to
finish. The crash is caused by a bug in the previous runtime (compiled
statically into the binary) when the garbage collector had been
running for long enough to call madvise(DONT_NEED, ...) with the wrong
page size.

On Sat, Jul 26, 2014 at 8:22 AM, Matt Bruzek
<email address hidden> wrote:
> With sinzui's help I have compiled the latest version of Juju 1.20.2
> with the gccgo-4.9_4.9.1-1ubuntu3_ppc64el.deb.
>
> NOT able to reproduce the “juju ssh” panic/problem having several ssh
> sessions open simultaneously and having output streaming for over 2
> hours.
>
> For clarity and review here are the details of how we build the deb
> packages for Juju with the gccgo 4.9.1 compiler on the CI machine
> stilson-09 (a P8 64k page size machine).
>
> ubuntu@stilson-09:~⟫ uname -a
> Linux stilson-09 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:29:18 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
> ubuntu@stilson-09:~⟫ getconf PAGE_SIZE
> 65536
> ubuntu@stilson-09:~⟫ more /proc/cpuinfo
> processor : 0
> cpu : POWER8E (raw), altivec supported
> clock : 4116.000000MHz
> revision : 2.0 (pvr 004b 0200)
>
> processor : 1
> cpu : POWER8E (raw), altivec supported
> clock : 4116.000000MHz
> revision : 2.0 (pvr 004b 0200)
>
> timebase : 512000000
> platform : pSeries
> model : IBM pSeries (emulated by qemu)
> machine : CHRP IBM pSeries (emulated by qemu)
>
> Downloaded the following packages from:
> https://launchpad.net/ubuntu/utopic/ppc64el/gccgo-4.9/4.9.1-1ubuntu3
>
> binutils_2.24.51.20140709-1ubuntu1_ppc64el.deb
> cpp-4.9_4.9.1-1ubuntu3_ppc64el.deb
> gcc-4.9_4.9.1-1ubuntu3_ppc64el.deb
> gcc-4.9-base_4.9.1-1ubuntu3_ppc64el.deb
> gccgo-4.9_4.9.1-1ubuntu3_ppc64el.deb
> libatomic1_4.9.1-1ubuntu3_ppc64el.deb
> libgcc1_4.9.1-1ubuntu3_ppc64el.deb
> libgcc-4.9-dev_4.9.1-1ubuntu3_ppc64el.deb
> libgo5_4.9.1-1ubuntu3_ppc64el.deb
> libgomp1_4.9.1-1ubuntu3_ppc64el.deb
> libitm1_4.9.1-1ubuntu3_ppc64el.deb
>
> Installed the packages on stilson-09:
>
> ubuntu@stilson-09:~/gccgo⟫ sudo dpkg -i *.deb
>
> ubuntu@stilson-09:~/gccgo⟫ dpkg -l | grep gcc
> ii gcc 4:4.8.2-1ubuntu6 ppc64el GNU C compiler
> ii gcc-4.8 4.8.3-3ubuntu0.2 ppc64el GNU C compiler
> ii gcc-4.8-base:ppc64el 4.8.3-3ubuntu0.2 ppc64el GCC, the GNU Compiler Collection (base package)
> ii gcc-4.9 4.9.1-1ubuntu3 ppc64el GNU C compiler
> ii gcc-4.9-base:ppc64el 4.9.1-1ubuntu3 ppc64el GCC, the GNU Compiler Collection (base package)
> ii gccgo 4:4.9-1ubuntu6 ppc64el Go compiler, based on the GCC backend
> ii gccgo-4.9 4.9.1-1ubuntu3 ppc64el GNU Go compiler
> ii gccgo-go ...

Curtis Hovey (sinzui) on 2014-08-18
tags: added: panic
Curtis Hovey (sinzui) on 2014-10-28
Changed in juju-core:
importance: High → Medium
milestone: next-stable → none
Curtis Hovey (sinzui) wrote :

This was fixed with updates and backports to the gccgo tool-chain.

Changed in juju-core (Ubuntu):
status: Confirmed → Fix Released
Changed in juju-core:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers