Juju Panic'ing on MAAS Power8le Environment

Bug #1375268 reported by Antonio Rosales
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Medium
Unassigned

Bug Description

This a Power8LE MAAS set up, where Juju is deploying workloads onto commissioned [in MAAS] Power KVMs. Juju is version is 1.20.8

Andres latest update debugging this issue:

At this point I've successfully being able to test the Curtin / Fast Path installer. Two things were done

1. I installed curtin latest version (Same one as release in Ubuntu Utopic). This version include the required fixes for Curtin / Fast Path Installer. I've switched both VM's to use it by default.

2. Tim Robinson from IBM switched the VM's to use the virtio driver. This greatly improved the network performance, hugely reducing installation times.

Unfortunately, I'm still seeing issues with juju:

AliveInterval 30" -i /home/iicroot/.juju/ssh/juju_id_rsa -i /home/iicroot/.ssh/id_rsa ubuntu@172.26.48.102 /bin/bash
panic: runtime error: invalid memory address or nil pointer dereference
        panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x30]

goroutine 1 [running]:

goroutine 3 [syscall]:
        goroutine in C code; stack unavailable

goroutine 15 [runnable]:
main.$nested2
        /build/buildd/juju-core-1.20.8/src/github.com/juju/juju/cmd/juju/bootstrap.go:225
created by main.Run.pN21_main.BootstrapCommand
        /build/buildd/juju-core-1.20.8/src/github.com/juju/juju/cmd/juju/bootstrap.go:224

goroutine 0 [idle]:

------

Andres is collecting a bootstrap with --debug for additional information.

-thanks,
Antonio

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
tags: added: bootstrap maas-provider ppc64el
Revision history for this message
Curtis Hovey (sinzui) wrote :

Power8LE? Ca we get clarification that this is indeed the arch and is the the arch being bootstrapped? "le" is not supported by ubuntu and juju:
     http://ports.ubuntu.com/pool/universe/j/juju-core/
^ Ubuntu has not built a ppc64le deb for ubuntu. There is no client nor is there a agent to make a tool for.

I checked the stable ppa that made 1.20.8 and it doesn't support the "le" arch
    https://launchpad.net/~juju/+archive/ubuntu/stable/+packages?field.name_filter=juju-core&field.status_filter=published&field.series_filter=trusty

I have seen "le" deb built in the past but not for about 6 months.

Revision history for this message
Dave Cheney (dave-cheney) wrote : Re: [Bug 1375268] Re: Juju Panic'ing on MAAS Power8le Environment

All.

When reporting this class of panic involving juju and ppc64, can you
please _always_ include the output of dmesg as that is the best (and
only) tool we have to detect when a version of juju is built with the
wrong compiler.

On Mon, Sep 29, 2014 at 11:58 PM, Curtis Hovey <email address hidden> wrote:
> ** Changed in: juju-core
> Status: New => Triaged
>
> ** Changed in: juju-core
> Importance: Undecided => Critical
>
> ** Tags added: bootstrap maas-provider ppc64el
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1375268
>
> Title:
> Juju Panic'ing on MAAS Power8le Environment
>
> Status in juju-core:
> Triaged
>
> Bug description:
> This a Power8LE MAAS set up, where Juju is deploying workloads onto
> commissioned [in MAAS] Power KVMs. Juju is version is 1.20.8
>
> Andres latest update debugging this issue:
>
> At this point I've successfully being able to test the Curtin / Fast
> Path installer. Two things were done
>
> 1. I installed curtin latest version (Same one as release in Ubuntu
> Utopic). This version include the required fixes for Curtin / Fast
> Path Installer. I've switched both VM's to use it by default.
>
> 2. Tim Robinson from IBM switched the VM's to use the virtio driver.
> This greatly improved the network performance, hugely reducing
> installation times.
>
> Unfortunately, I'm still seeing issues with juju:
>
> AliveInterval 30" -i /home/iicroot/.juju/ssh/juju_id_rsa -i /home/iicroot/.ssh/id_rsa ubuntu@172.26.48.102 /bin/bash
> panic: runtime error: invalid memory address or nil pointer dereference
> panic: runtime error: invalid memory address or nil pointer dereference
> [signal 0xb code=0x1 addr=0x30]
>
> goroutine 1 [running]:
>
> goroutine 3 [syscall]:
> goroutine in C code; stack unavailable
>
> goroutine 15 [runnable]:
> main.$nested2
> /build/buildd/juju-core-1.20.8/src/github.com/juju/juju/cmd/juju/bootstrap.go:225
> created by main.Run.pN21_main.BootstrapCommand
> /build/buildd/juju-core-1.20.8/src/github.com/juju/juju/cmd/juju/bootstrap.go:224
>
> goroutine 0 [idle]:
>
> ------
>
> Andres is collecting a bootstrap with --debug for additional
> information.
>
> -thanks,
> Antonio
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1375268/+subscriptions

Revision history for this message
Matt Bruzek (mbruzek) wrote :

More details about the system:

S822L02-vm5 172.26.48.25 (An IBM Power 8 VM)
$ uname -a
Linux S822L02-vm5 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:50:31 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
$ juju version
1.20.8-trusty-ppc64el

I have attached he dmesg output from this system.

Revision history for this message
Antonio Rosales (arosales) wrote :

Confirmed with Steve Langasek on the Ubuntu Foundations teams on arch stream naming:

ppc64el is the dpkg architecture; powerpc64le is the GNU architecture; ppc64le is the kernel architecture

In summary Ubuntu > 14.04 and Juju > 1.18 support the Power8 Little Endian platform. You'll see the .deb build as "ppc64el" and the kernel show "ppc64le."

I can confirm we are installing Ubuntu 14.04 machines on Power8 Little Endian machines and building Juju binarires (ppc64el).

-thanks,
Antonio

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :
Revision history for this message
Antonio Rosales (arosales) wrote :

From the build log it does appear that juju-core is being built using 4.9 gcc-go:
http://pastebin.ubuntu.com/8460480/

Full build log at:
https://launchpadlibrarian.net/185288911/buildlog_ubuntu-trusty-ppc64el.juju-core_1.20.8-0ubuntu1~14.04.1~juju1_UPLOADING.txt.gz

But, having a juju-core dev check off would be good confirmation.

-thanks,
Antonio

Revision history for this message
Dave Cheney (dave-cheney) wrote :
Download full text (3.8 KiB)

Hi Antonio,

Thanks for confirming the version of gccgo used. With that said, what I see
in dmesg

*[ 1188.391932] juju[6827]: bad frame in setup_rt_frame: 0000000000000000
nip 0000000000000000 lr 0000000000000000[ 3156.144698] juju[5386]: bad
frame in setup_rt_frame: 0000000000000000 nip 0000000000000000 lr
0000000000000000[ 3738.370112] juju[13160]: bad frame in setup_rt_frame:
0000000000000000 nip 0000000000000000 lr 0000000000000000*
[ 4091.006811] init: maas-region-celery main process (1784) killed by KILL
signal
[ 4097.050375] init: maas-cluster-celery main process (683) killed by KILL
signal
[ 4124.588307] init: maas-region-celery main process (24312) killed by KILL
signal
[ 4129.616762] init: maas-cluster-celery main process (24449) killed by
KILL signal
[ 4129.928675] init: maas-dhcp-server main process (1023) killed by TERM
signal
[ 4165.568452] init: maas-dhcp-server main process (25262) killed by TERM
signal
*[ 7849.376198] juju[20109]: bad frame in setup_rt_frame: 0000000000000000
nip 0000000000000000 lr 0000000000000000*

Is the textbook sign that the compiler (specifically the runtime compiled
into the juju binary) is the old version which is not compatible with 64k
kernels.

Using the broken compiler matches the symptoms you are seeing; bin/juju
crashes after running for a long (greater than 5 minutes) time.

I cannot explain why the version of juju you are using is built with a
broken compiler.

On Tue, Sep 30, 2014 at 4:27 AM, Antonio Rosales <<email address hidden>
> wrote:

> >From the build log it does appear that juju-core is being built using 4.9
> gcc-go:
> http://pastebin.ubuntu.com/8460480/
>
> Full build log at:
>
> https://launchpadlibrarian.net/185288911/buildlog_ubuntu-trusty-ppc64el.juju-core_1.20.8-0ubuntu1~14.04.1~juju1_UPLOADING.txt.gz
>
> But, having a juju-core dev check off would be good confirmation.
>
> -thanks,
> Antonio
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1375268
>
> Title:
> Juju Panic'ing on MAAS Power8le Environment
>
> Status in juju-core:
> Triaged
>
> Bug description:
> This a Power8LE MAAS set up, where Juju is deploying workloads onto
> commissioned [in MAAS] Power KVMs. Juju is version is 1.20.8
>
> Andres latest update debugging this issue:
>
> At this point I've successfully being able to test the Curtin / Fast
> Path installer. Two things were done
>
> 1. I installed curtin latest version (Same one as release in Ubuntu
> Utopic). This version include the required fixes for Curtin / Fast
> Path Installer. I've switched both VM's to use it by default.
>
> 2. Tim Robinson from IBM switched the VM's to use the virtio driver.
> This greatly improved the network performance, hugely reducing
> installation times.
>
> Unfortunately, I'm still seeing issues with juju:
>
> AliveInterval 30" -i /home/iicroot/.juju/ssh/juju_id_rsa -i
> /home/iicroot/.ssh/id_rsa ubuntu@172.26.48.102 /bin/bash
> panic: runtime error: invalid memory address or nil pointer dereference
> panic: runtime error: invalid memory address or nil pointer
> de...

Read more...

Revision history for this message
Dave Cheney (dave-cheney) wrote :
Download full text (4.7 KiB)

Antonio,

I'm very concerned that we can no longer even rely on the compiler
version number, or dpkg package name to confirm that the fixed
compiler was being used.

There is one other related possibility. We build the juju agents, the
tools statically against libgo (the go runtime which contains the
garbage collector), but for the command line tools, /usr/bin/juju they
are compiled to use a shared copy of libgo. If an older version of
/usr/lib/powerpc64le-linux-gnu/libgo.so.5 exists on the client machine
executing juju bootstrap then it is vulnerable to this bug.

It's a bit of a long shot but can you please confirm the output of

% apt-cache policy libgo5

On the affected client machine.

On Tue, Sep 30, 2014 at 7:19 AM, David Cheney
<email address hidden> wrote:
> Hi Antonio,
>
> Thanks for confirming the version of gccgo used. With that said, what I see
> in dmesg
>
> [ 1188.391932] juju[6827]: bad frame in setup_rt_frame: 0000000000000000 nip
> 0000000000000000 lr 0000000000000000
> [ 3156.144698] juju[5386]: bad frame in setup_rt_frame: 0000000000000000 nip
> 0000000000000000 lr 0000000000000000
> [ 3738.370112] juju[13160]: bad frame in setup_rt_frame: 0000000000000000
> nip 0000000000000000 lr 0000000000000000
> [ 4091.006811] init: maas-region-celery main process (1784) killed by KILL
> signal
> [ 4097.050375] init: maas-cluster-celery main process (683) killed by KILL
> signal
> [ 4124.588307] init: maas-region-celery main process (24312) killed by KILL
> signal
> [ 4129.616762] init: maas-cluster-celery main process (24449) killed by KILL
> signal
> [ 4129.928675] init: maas-dhcp-server main process (1023) killed by TERM
> signal
> [ 4165.568452] init: maas-dhcp-server main process (25262) killed by TERM
> signal
> [ 7849.376198] juju[20109]: bad frame in setup_rt_frame: 0000000000000000
> nip 0000000000000000 lr 0000000000000000
>
> Is the textbook sign that the compiler (specifically the runtime compiled
> into the juju binary) is the old version which is not compatible with 64k
> kernels.
>
> Using the broken compiler matches the symptoms you are seeing; bin/juju
> crashes after running for a long (greater than 5 minutes) time.
>
> I cannot explain why the version of juju you are using is built with a
> broken compiler.
>
> On Tue, Sep 30, 2014 at 4:27 AM, Antonio Rosales
> <email address hidden> wrote:
>>
>> >From the build log it does appear that juju-core is being built using 4.9
>> gcc-go:
>> http://pastebin.ubuntu.com/8460480/
>>
>> Full build log at:
>>
>> https://launchpadlibrarian.net/185288911/buildlog_ubuntu-trusty-ppc64el.juju-core_1.20.8-0ubuntu1~14.04.1~juju1_UPLOADING.txt.gz
>>
>> But, having a juju-core dev check off would be good confirmation.
>>
>> -thanks,
>> Antonio
>>
>> --
>> You received this bug notification because you are subscribed to juju-
>> core.
>> Matching subscriptions: MOAR JUJU SPAM!
>> https://bugs.launchpad.net/bugs/1375268
>>
>> Title:
>> Juju Panic'ing on MAAS Power8le Environment
>>
>> Status in juju-core:
>> Triaged
>>
>> Bug description:
>> This a Power8LE MAAS set up, where Juju is deploying workloads onto
>> commissioned [in MAAS] Power KVMs. Juju is version is 1.20....

Read more...

Revision history for this message
Antonio Rosales (arosales) wrote :

Dave,

Thanks for the reply. At this time we cannont install a client in the target environment. However, I have the below information from a ppc64le system:

ubuntu@stilson-01:~$ uname -a
Linux stilson-01 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:50:31 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
ubuntu@stilson-01:~$ juju --version
1.20.1-trusty-ppc64
ubuntu@stilson-01:~$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ubuntu@stilson-01:~$ dpkg -s juju-core
Package: juju-core
Status: install ok installed
Priority: extra
Section: devel
Installed-Size: 93252
Maintainer: Curtis Hovey <email address hidden>
Architecture: ppc64el
Version: 1.20.1-0ubuntu1~14.04.1~juju1
Depends: libc6 (>= 2.17), libgcc1 (>= 1:4.1.1), libgo5
Conflicts: juju (<< 0.7-0ubuntu1~)
Conffiles:
 /etc/bash_completion.d/juju-core 30d5ce2d83c36132059552ec0d4ce209
Description: Juju is devops distilled - client
 Through the use of charms, juju provides you with shareable, re-usable,
 and repeatable expressions of devops best practices. You can use them
 unmodified, or easily change and connect them to fit your needs. Deploying
 a charm is similar to installing a package on Ubuntu: ask for it and
 it’s there, remove it and it’s completely gone.
 .
 This package provides the client application of creating and interacting
 with Juju environments.
Homepage: http://launchpad.net/juju-core
ubuntu@stilson-01:~$ apt-cache policy libgo5
libgo5:
  Installed: 4.9.1-1ubuntu3
  Candidate: 4.9.1-1ubuntu3
  Version table:
 *** 4.9.1-1ubuntu3 0
        100 /var/lib/dpkg/status
     4.9.1-0ubuntu1 0
        500 http://ports.ubuntu.com/ubuntu-ports/ trusty-updates/main ppc64el Packages
     4.9-20140406-0ubuntu1 0
        500 http://ports.ubuntu.com/ubuntu-ports/ trusty/main ppc64el Packages
ubuntu@stilson-01:~$

Revision history for this message
Antonio Rosales (arosales) wrote :

Here is a ppc64el client, from local provider, output. In this case this is a Zend workload (charm) running in LXC on a power8le Ubuntu Power KVM:

$ apt-cache policy libgo5
libgo5:
  Installed: (none)
  Candidate: 4.9.1-0ubuntu1
  Version table:
     4.9.1-0ubuntu1 0
        500 http://ports.ubuntu.com/ubuntu-ports/ trusty-updates/main ppc64el Packages
     4.9-20140406-0ubuntu1 0
        500 http://ports.ubuntu.com/ubuntu-ports/ trusty/main ppc64el Packages

Revision history for this message
Antonio Rosales (arosales) wrote :

As a test to try to narrow down the compiler issue and reproduce in house I updated a different Power8le system not experiencing the panic.

The hypothesis was that the compiler used in the juju stable ppa to build 1.20.8 was using ubuntu-toolchain-r instead of the archive compiler. ubuntu-toolchain-r should have been close to what is in Utopic, but is a variable as that was a test compiler.

One thought was this compiler did not have the fix (see comment 8 and comment 7). To test this we should be able to reproduce the panic fairly easy as the original seg fault occurred minutes into using Juju and happened when using simple commands such as juju ssh. I was not able to reproduce the panic or segfault using 1.20.8. on a power8le system. Note that I am using the local provider and not MAAS.

More info @ http://pastebin.ubuntu.com/8465722/

Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
Curtis Hovey (sinzui)
tags: added: charmers
Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Medium
Revision history for this message
Antonio Rosales (arosales) wrote :

This is a critical issue to Power stakeholders we need to ensure is addressed. If the user is not running the latest updates they will get a seg fault which renders Juju completely unusable. One fix may be require this dependency in the Juju package. Current running customers will need to apt-get update.

-thanks,
Antonio

Revision history for this message
Curtis Hovey (sinzui) wrote :

Which dependency?

I can see that 1.20.11 in trusty updates has
    Depends: distro-info, libc6 (>= 2.17), libgcc1 (>= 1:4.1.1), libgo5
which is what I understand was required to force stale trusty to install the correct packages.

I understand that this hack address this bug, but we don't like that hack, we cannot remove the odd dep until the compiler is reliable.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

We believe this has been resolved in Juju 2 since we are using go 1.6.

Please re-test. If you are experience any further failure, please file a bug against "juju" project where we track Juju 2.

Changed in juju-core:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.