Juju testsuite fails in random ways

Bug #1393825 reported by Martin Packman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gccgo-go (Ubuntu)
Invalid
Medium
Unassigned
Trusty
New
Undecided
Unassigned
Utopic
Won't Fix
Undecided
Unassigned

Bug Description

Two parts of the juju-core test suite panic or segfault when run with -compiler gccgo on amd64.

Some examples, in provider/openstack:

ubuntu@go:~/go/src/github.com/juju/juju$ (cd provider/openstack/&& go test ./... -compiler gccgo -gocheck.v -v)
3
=== RUN Test
...
PASS: local_test.go:1207: com_juju_juju_provider_openstack_test.TestFetchFromToolsMetadataSources.pN66_github.com_juju_juju_provider_openstack_test.localHTTPSServerSuite 0.243s
munmap of stack space failed: errno 22
Aborted
...

ubuntu@go:~/go/src/github.com/juju/juju$ (cd provider/openstack/&& go test ./... -compiler gccgo -gocheck.v -v)
3
=== RUN Test
...
PASS: local_test.go:1207: com_juju_juju_provider_openstack_test.TestFetchFromToolsMetadataSources.pN66_github.com_juju_juju_provider_openstack_test.localHTTPSServerSuite 0.295s
signal: segmentation fault (core dumped)
FAIL github.com/juju/juju/provider/openstack 12.746s

And in environs/simplestreams:

ubuntu@go:~/go/src/github.com/juju/juju$ (cd environs/simplestreams&& go test ./... -compiler gccgo -gocheck.v -v)
=== RUN Test
...
PASS: datasource_test.go:79: com_juju_juju_environs_simplestreams_test.TestNonVerifyingClientSucceeds.pN69_github.com_juju_juju_environs_simplestreams_test.datasourceHTTPSSuite 0.014s
unexpected fault address 0x7f3300000011
fatal error: fault
[signal 0xb code=0x1 addr=0x7f3300000011]
...

This one looks a bit like bug 1287879 but the same run can just pass:

$ (cd environs/simplestreams&& go test ./... -compiler gccgo -gocheck.v -v)
=== RUN Test
...
OK: 46 passed
--- PASS: Test (0.27 seconds)
PASS

Some investigations from IRC:
<http://irclogs.ubuntu.com/2014/10/31/%23juju-dev.html#t17:29>

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

This is the gdb traceback seen while running the openstack provider tests: http://pastebin.ubuntu.com/9073749/

Funnily enough I saw a traceback very much like this just yesterday. What I know at this point:

It's failing while parsing a https certificate. The failure occurs during allocation: go captures profiling data every $Nth allocation, and part of this profiling data is a traceback. Capturing this traceback hits a SIGBUS. I don't know why it always fails during https stuff -- maybe that code corrupts the stack? Or maybe that code just does more allocations and so is more likely to hit the profiling code.

Martin Packman (gz)
description: updated
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

The crash is happening during stack splitting. The immediate cause for the crash is that the stack pointer is totally bogus, presumably because the function that is supposed to return an address to use as the new stack segment returned something bogus. I don't really see how that is possible (but also this is getting pretty deep and scary).

Revision history for this message
Martin Packman (gz) wrote :

The HTTPS side of this I can reproduce (inconsistently) running just one test:

ubuntu@go:~/go/src/github.com/juju/juju$ (cd provider/openstack/&& go test ./... -compiler gccgo -v -gocheck.v -gocheck.f=TestMustDisableSSLVerify)
3
=== RUN Test
unexpected fault address 0x7f6b00000011
fatal error: fault
[signal 0xb code=0x1 addr=0x7f6b00000011]
...

ubuntu@go:~/go/src/github.com/juju/juju$ (cd provider/openstack/&& go test ./... -compiler gccgo -v -gocheck.v -gocheck.f=TestMustDisableSSLVerify)
3
=== RUN Test
signal: segmentation fault (core dumped)
FAIL github.com/juju/juju/provider/openstack 1.324s

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

After a bit more poking I am still pretty confused, but (a) it seems to do with stack splitting, so won't affect the platforms we actually care about gccgo on (b) it doesn't happen with gcc mainline so it's probably a bug that's fixed upstream. So I'm not sure it's worth spending a heap of time on this.

Revision history for this message
Martin Packman (gz) wrote :

Another part of this is from a panic in a later test:

ubuntu@go:~/go/src/github.com/juju/juju$ (cd provider/openstack/&& go test ./... -compiler gccgo -v -gocheck.v -gocheck.f=TestStartInstanceWithUnknownAZError)
3
=== RUN Test

----------------------------------------------------------------------
PANIC: local_test.go:1530: com_juju_juju_provider_openstack_test.TestStartInstanceWithUnknownAZError.pN61_github.com_juju_juju_provider_openstack_test.localServerSuite
...
[LOG] 0:01.158 INFO juju.provider.openstack started instance "2"
... Panic: runtime error: invalid memory address or nil pointer dereference (PC=0x7FC0CCED367F)

This is at least partly because of a coding error in that test - the err value is not nil checked before being used:

provider/openstack/local_test.go l1563-1565
        _, _, _, err = testing.StartInstance(env, "1")
        errString := strings.Replace(err.Error(), "\n", "", -1)
        c.Assert(errString, gc.Matches, ".*Some unknown error.*")

The test does not fail on ppc however.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I filed a bug in the gcc tracker about this (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64001).

James Page (james-page)
Changed in gccgo-go (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Related bug 1417771. Let's release an updated version of gccgo-go for trusty at least, if not precise. In vivid, a more recent release (1.2.1-0ubuntu7 ) has the fix.

Changed in gccgo-go (Ubuntu):
status: New → Confirmed
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

It turns out the vivid package does not yet have the needed patch from https://go-review.googlesource.com/#/c/1840/ we'll have to wait for an upstream release I guess.

Revision history for this message
Tim Penhey (thumper) wrote :

Dave has done some testing, and we are pretty sure that this is due to lack of memory. Can you confirm please the amount of memory on the machine or VM running these tests, and the memory use of the test process?

We were not able to reproduce these problems on a VM running with 8gig or ram.

Changed in gccgo-go (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Curtis Hovey (sinzui) wrote :

The stilson machines that run the tests have 9.8G RAM. They always have.

Changed in gccgo-go (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Matthias Klose (doko) wrote :

gccgo-go is removed in vivid, superseded by gccgo-5

Changed in gccgo-go (Ubuntu):
status: Confirmed → Invalid
Rolf Leggewie (r0lf)
Changed in gccgo-go (Ubuntu Utopic):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.