Upgrades of precise localhost fail

Bug #1334273 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins
1.20
Fix Released
Critical
Andrew Wilkins

Bug Description

With the introduction of e01ac93e "Merge pull #153 from davecheney/112-state-life-takes-a-tag", upgrades on local precise fail. No other series or provider is affected. I am attaching the log, but I don't see an errors in it. Maybe one of the other logs has better info.
    http://juju-ci.vapour.ws:8080/job/local-upgrade-precise-amd64/1423/

Revision history for this message
Curtis Hovey (sinzui) wrote :
Curtis Hovey (sinzui)
description: updated
Revision history for this message
Curtis Hovey (sinzui) wrote :

from the machine-1 log
2014-06-25 10:39:29 ERROR juju apiclient.go:119 state/api: websocket.Dial wss://10.0.3.1:17072/: dial tcp 10.0.3.1:17072: connection refused
2014-06-25 10:39:29 ERROR juju runner.go:220 worker: exited "api": websocket.Dial wss://10.0.3.1:17072/: dial tcp 10.0.3.1:17072: connection refused
2014-06-25 10:39:29 INFO juju runner.go:254 worker: restarting "api" in 3s

While the test is stalled, a ps shows mongod is not running. I can manually start it, but it doesn't help

Revision history for this message
Curtis Hovey (sinzui) wrote :

I watch the procs on the host machine: the host agent and db do restart after 7 minutes. The two test machines cannot connect to the state server, neither can the juju cli.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Mongodb-server does not comeback after the call to upgrade. i am attaching machine-0 log

PS When juju fails like this, destroy-environment takes 30 minutes, where as the test set an reached its deadlock in 15 minutes. Since CI will try five times, this test will slow all testing by 2.5 hours.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.19.4 → 1.19.5
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

The all-machines.log seems to be truncated (on Jenkins too). The upgrade to 1.19.4 is missing. The beginning of the tools upload is shown and then it stops.

Could the full log file still be available somewhere?

Regardless, I'm trying to reproduce the problem locally.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

It turns out the truncated all-machines.log is due to changes in rsyslogd configuration between versions. If the upgrade had been successful the log will most likely have continued. This is a red herring as far as this ticket is concerned.

Revision history for this message
Tim Penhey (thumper) wrote :

Curtis, the log in comment #4 seemed truncated. Towards the end of the log file the replica set was just being configured, and before a minute had passed (sometimes it takes a while), the log file ended.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Menno noticed there's a possible race in the machine agent whereby the upgrade steps could complete before the state worker starts and upgrades mongo; thus it would not think it's "pre HA". It's possible this is related to the issue.

Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Andrew Wilkins (axwalk)
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.20.0 → 1.21-alpha1
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Menno tested my change, upgrading 1.18.4 to 1.19.5 on precise, and it appears to be working well now.

Changed in juju-core:
status: Triaged → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.