intermittent panic: runtime error: invalid memory address

Bug #1336891 reported by Matt Bruzek on 2014-07-02
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
High
Unassigned
1.20
High
Unassigned

Bug Description

I am doing some Juju testing on the IBM power system and I found a juju-core panic on 3 of the 4 systems. The signal values are different but the stack traces on machine-0 and machine-1 look the same. But machine-2's stack trace is slightly different.

Here is the stack trace from the machine-0.log file.

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0]

goroutine 14 [running]:

goroutine 1 [chan receive]:
launchpad.net_tomb.Wait.pN23_launchpad.net_tomb.Tomb
 /build/buildd/juju-core-1.18.4/src/launchpad.net/tomb/tomb.go:110
launchpad.net_juju_core_worker.Wait.pN37_launchpad.net_juju_core_worker.runner
 /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/worker/runner.go:124
main.Run.pN17_main.MachineAgent
 /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/machine.go:166
launchpad.net_juju_core_cmd.Run.pN40_launchpad.net_juju_core_cmd.SuperCommand
 /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/supercommand.go:303
launchpad.net_juju_core_cmd.Main
 /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/cmd.go:244
main.jujuDMain
 /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/main.go:107
main.Main
 /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/main.go:122
main.main
 /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/main.go:139

The version of Juju I am using is: 1.18.4
$ juju --version
1.18.4-trusty-ppc64

The version of kernel is 3.13.0-30
$ uname -a
Linux stilson-01 3.13.0-30-generic #54-Ubuntu SMP Mon Jun 9 22:46:02 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

Please let me know if you need any more information.

Matt Bruzek (mbruzek) wrote :
Curtis Hovey (sinzui) wrote :

I have seen several panics like this, but not this specific one. There was a systemic fix make in the 1.19.x series to use the newer compiler and update many calls to be golang 1.2/3 compliant. I think this issue is fixes in trunk. If this issue is still present in 1.20.0, it will be escalated to be fixed in the next milestone.

If you can demonstrate this is an issue with 1.19.4, then we can assume the bug is still present in the code.

Changed in juju-core:
status: New → Incomplete
tags: added: gccgo ppc64el
Matt Bruzek (mbruzek) wrote :

I updated to Juju to version 1.19.4 and found a similar invalid memory address error:

2014-07-03 14:54:26 DEBUG juju.worker.rsyslog worker.go:164 Reloading rsyslog configuration
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x18]

goroutine 129 [running]:
github.com_juju_juju_rpc_jsoncodec.WriteMessage.pN40_github.com_juju_juju_rpc_jsoncodec.Codec
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/rpc/jsoncodec/codec.go:181
github.com_juju_juju_rpc.send.pN29_github.com_juju_juju_rpc.Conn
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/rpc/client.go:72
github.com_juju_juju_rpc.Go.pN29_github.com_juju_juju_rpc.Conn
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/rpc/client.go:174
github.com_juju_juju_rpc.Call.pN29_github.com_juju_juju_rpc.Conn
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/rpc/client.go:148
github.com_juju_juju_state_api.Call.pN36_github.com_juju_juju_state_api.State
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/state/api/apiclient.go:279
github.com_juju_juju_state_api.Ping.pN36_github.com_juju_juju_state_api.State
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/state/api/apiclient.go:269
github.com_juju_juju_state_api.heartbeatMonitor.pN36_github.com_juju_juju_state_api.State
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/state/api/apiclient.go:260
created by github.com_juju_juju_state_api.Open
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/state/api/apiclient.go:196

goroutine 1 [chan receive]:
launchpad.net_tomb.Wait.pN23_launchpad.net_tomb.Tomb
        /build/buildd/juju-core-1.19.4/src/launchpad.net/tomb/tomb.go:110
github.com_juju_juju_worker.Wait.pN34_github.com_juju_juju_worker.runner
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/worker/runner.go:122
main.Run.pN17_main.MachineAgent
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/cmd/jujud/machine.go:169
github.com_juju_cmd.Run.pN32_github.com_juju_cmd.SuperCommand
        /build/buildd/juju-core-1.19.4/src/github.com/juju/cmd/supercommand.go:321
github.com_juju_cmd.Main
        /build/buildd/juju-core-1.19.4/src/github.com/juju/cmd/cmd.go:247
main.jujuDMain
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/cmd/jujud/main.go:107
main.Main
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/cmd/jujud/main.go:122
main.main
        /build/buildd/juju-core-1.19.4/src/github.com/juju/juju/cmd/jujud/main.go:139

goroutine 3 [syscall]:
        goroutine in C code; stack unavailable

Changed in juju-core:
status: Incomplete → New
Curtis Hovey (sinzui) on 2014-07-03
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.21-alpha1
Matt Bruzek (mbruzek) wrote :

I upgraded to the latest version of juju 1.20.0 and I got yet another panic on the Power systems.

The original error reported in this bug is still present in the logs:
2014-07-03 19:54:07 DEBUG juju.worker.rsyslog worker.go:164 Reloading rsyslog configuration
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0]

$ juju --version
1.20.0-trusty-ppc64

$ cat /proc/cpuinfo
processor : 0
cpu : POWER8E (raw), altivec supported
clock : 4116.000000MHz
revision : 2.0 (pvr 004b 0200)

processor : 1
cpu : POWER8E (raw), altivec supported
clock : 4116.000000MHz
revision : 2.0 (pvr 004b 0200)

timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)

Machine logs are attached please let me know if there is any other information you need.

Download full text (3.7 KiB)

Matt, can you post the output of `dmesg` from the affected system. I
have not seen the compiler fix land in trusty-updates yet and the best
way to spot this is looking in dmesg.

On Fri, Jul 4, 2014 at 8:43 AM, Matt Bruzek
<email address hidden> wrote:
> I upgraded to the latest version of juju 1.20.0 and I got yet another
> panic on the Power systems.
>
> The original error reported in this bug is still present in the logs:
> 2014-07-03 19:54:07 DEBUG juju.worker.rsyslog worker.go:164 Reloading rsyslog configuration
> panic: runtime error: invalid memory address or nil pointer dereference
> [signal 0xb code=0x1 addr=0x0]
>
>
> $ juju --version
> 1.20.0-trusty-ppc64
>
> $ cat /proc/cpuinfo
> processor : 0
> cpu : POWER8E (raw), altivec supported
> clock : 4116.000000MHz
> revision : 2.0 (pvr 004b 0200)
>
> processor : 1
> cpu : POWER8E (raw), altivec supported
> clock : 4116.000000MHz
> revision : 2.0 (pvr 004b 0200)
>
> timebase : 512000000
> platform : pSeries
> model : IBM pSeries (emulated by qemu)
> machine : CHRP IBM pSeries (emulated by qemu)
>
>
> Machine logs are attached please let me know if there is any other information you need.
>
> ** Attachment added: "The machine logs from version 1.20.0 of juju running on Power."
> https://bugs.launchpad.net/juju-core/+bug/1336891/+attachment/4145051/+files/machine_logs_1.20.tar.gz
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1336891
>
> Title:
> intermittent panic: runtime error: invalid memory address
>
> Status in juju-core:
> Triaged
>
> Bug description:
> I am doing some Juju testing on the IBM power system and I found a
> juju-core panic on 3 of the 4 systems. The signal values are
> different but the stack traces on machine-0 and machine-1 look the
> same. But machine-2's stack trace is slightly different.
>
> Here is the stack trace from the machine-0.log file.
>
> panic: runtime error: invalid memory address or nil pointer dereference
> [signal 0xb code=0x1 addr=0x0]
>
> goroutine 14 [running]:
>
> goroutine 1 [chan receive]:
> launchpad.net_tomb.Wait.pN23_launchpad.net_tomb.Tomb
> /build/buildd/juju-core-1.18.4/src/launchpad.net/tomb/tomb.go:110
> launchpad.net_juju_core_worker.Wait.pN37_launchpad.net_juju_core_worker.runner
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/worker/runner.go:124
> main.Run.pN17_main.MachineAgent
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/machine.go:166
> launchpad.net_juju_core_cmd.Run.pN40_launchpad.net_juju_core_cmd.SuperCommand
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/supercommand.go:303
> launchpad.net_juju_core_cmd.Main
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/cmd.go:244
> main.jujuDMain
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/main.go:107
> main.Main
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-co...

Read more...

Matt Bruzek (mbruzek) wrote :

Dave, I have attached the dmesg output to this bug. Please let me know if there is anything else you need.

Dave Cheney (dave-cheney) wrote :

Thanks Matt.

This is the compiler bug. AFAIK the fix has not landed in T yet.

On Mon, Jul 7, 2014 at 12:32 PM, Matt Bruzek
<email address hidden> wrote:
> Dave, I have attached the dmesg output to this bug. Please let me know
> if there is anything else you need.
>
> ** Attachment added: "The output from dmesg on the stilson-01 system."
> https://bugs.launchpad.net/juju-core/+bug/1336891/+attachment/4146863/+files/dmesg_output.txt
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1336891
>
> Title:
> intermittent panic: runtime error: invalid memory address
>
> Status in juju-core:
> Triaged
>
> Bug description:
> I am doing some Juju testing on the IBM power system and I found a
> juju-core panic on 3 of the 4 systems. The signal values are
> different but the stack traces on machine-0 and machine-1 look the
> same. But machine-2's stack trace is slightly different.
>
> Here is the stack trace from the machine-0.log file.
>
> panic: runtime error: invalid memory address or nil pointer dereference
> [signal 0xb code=0x1 addr=0x0]
>
> goroutine 14 [running]:
>
> goroutine 1 [chan receive]:
> launchpad.net_tomb.Wait.pN23_launchpad.net_tomb.Tomb
> /build/buildd/juju-core-1.18.4/src/launchpad.net/tomb/tomb.go:110
> launchpad.net_juju_core_worker.Wait.pN37_launchpad.net_juju_core_worker.runner
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/worker/runner.go:124
> main.Run.pN17_main.MachineAgent
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/machine.go:166
> launchpad.net_juju_core_cmd.Run.pN40_launchpad.net_juju_core_cmd.SuperCommand
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/supercommand.go:303
> launchpad.net_juju_core_cmd.Main
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/cmd.go:244
> main.jujuDMain
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/main.go:107
> main.Main
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/main.go:122
> main.main
> /build/buildd/juju-core-1.18.4/src/launchpad.net/juju-core/cmd/jujud/main.go:139
>
>
> The version of Juju I am using is: 1.18.4
> $ juju --version
> 1.18.4-trusty-ppc64
>
> The version of kernel is 3.13.0-30
> $ uname -a
> Linux stilson-01 3.13.0-30-generic #54-Ubuntu SMP Mon Jun 9 22:46:02 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
>
> Please let me know if you need any more information.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1336891/+subscriptions

Antonio Rosales (arosales) wrote :

@Dave,

Thanks for the feedback and investigation. Is ther a proposed or patched package we can use in the interium to work around this compiler bug?

-thanks,
Antonio

Dave Cheney (dave-cheney) wrote :

This is tracked as issue https://bugs.launchpad.net/bugs/1304754 but at this time is not marked fixed released.

We work around this in CI with this PPA

ubuntu@winton-09:~$ cat /etc/apt/sources.list.d/ubuntu-toolchain-r-ppa-trusty.list
deb http://ppa.launchpad.net/ubuntu-toolchain-r/ppa/ubuntu trusty main
# deb-src http://ppa.launchpad.net/ubuntu-toolchain-r/ppa/ubuntu trusty main

However in the case of released builds of Juju, this workaround is not available as the LP builders do not use this PPA and are building from the outdated compiler version.

Ian Booth (wallyworld) wrote :

Marking as won't fix because this is not a juju code issue that can be solved - the issue needs to be solved by using the updated compiler

Changed in juju-core:
status: Triaged → Won't Fix
Curtis Hovey (sinzui) on 2014-07-09
Changed in juju-core:
milestone: 1.21-alpha1 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers