juju-core

juju machine agents suiciding

Bug #1345014 reported by Kapil Thangavelu on 2014-07-19

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Fix Released	High	Andrew Wilkins	juju-core 1.21-alpha1
	1.20	Fix Released	High	Andrew Wilkins	juju-core 1.20.5

Bug Description

I've got a system with several containers are created and come up, but then they suicide after contacting the state server (and removing their upstart jobs), with their status (and their units) forever stuck in pending.

ie. status excerpt

      0/lxc/6:
        agent-state: pending
        instance-id: juju-machine-0-lxc-6
        series: trusty
        hardware: arch=amd64
      0/lxc/7:
        agent-state: pending
        instance-id: juju-machine-0-lxc-7
        series: trusty
        hardware: arch=amd64

per cloudinit the machine comes up fine.. per the agent log it suicides

2014-07-19 13:33:27 INFO juju.cmd supercommand.go:37 running jujud [1.20.1-trusty-amd64 gc]
2014-07-19 13:33:27 INFO juju.cmd.jujud machine.go:156 machine agent machine-0-lxc-7 start (1.20.1-trusty-amd64 [gc])
2014-07-19 13:33:27 DEBUG juju.agent agent.go:377 read agent config, format "1.18"
2014-07-19 13:33:27 INFO juju.worker runner.go:260 start "api"
2014-07-19 13:33:27 INFO juju.worker runner.go:260 start "statestarter"
2014-07-19 13:33:27 INFO juju.worker runner.go:260 start "termination"
2014-07-19 13:33:27 INFO juju.state.api apiclient.go:242 dialing "wss://192.168.9.74:17070/"
2014-07-19 13:33:27 INFO juju.state.api apiclient.go:176 connection established to "wss://192.168.9.74:17070/"
2014-07-19 13:33:29 INFO juju.state.api apiclient.go:242 dialing "wss://192.168.9.74:17070/"
2014-07-19 13:33:29 INFO juju.state.api apiclient.go:176 connection established to "wss://192.168.9.74:17070/"
2014-07-19 13:33:31 ERROR juju.worker runner.go:207 fatal "api": agent should be terminated
2014-07-19 13:33:31 DEBUG juju.worker runner.go:241 killing "statestarter"
2014-07-19 13:33:31 DEBUG juju.worker runner.go:241 killing "termination"
2014-07-19 13:33:31 INFO juju.cmd supercommand.go:329 command finished

re machine-0.log around that same time only these lines

2014-07-19 13:31:14 ERROR juju.state.unit unit.go:556 unit nova-cloud-controller/0 cannot get assigned machine: unit "nova-cloud-controller/0" is not assigned to a machine
2014-07-19 13:31:14 ERROR juju.state.unit unit.go:556 unit nova-cloud-controller/0 cannot get assigned machine: unit "nova-cloud-controller/0" is not assigned to a machine

the agent's upstart job is not on the system, and nothing is running. there are several containers exhibiting this

Tags:

Kapil Thangavelu (hazmat) on 2014-07-19

summary:	- juju agents suicide if state server under load + juju agents suiciding
summary:	- juju agents suiciding + juju machine agents suiciding

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2014-07-19:

also to note both machines and units assigned are left in pending forever.

Curtis Hovey (sinzui) on 2014-07-21

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 1.21-alpha1
tags:	added: deploy lxc

Adam Collard (adam-collard) on 2014-07-21

tags:

added: cloud-installer landscape

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2014-07-22:

There is a race in the provisioner that may explain this. If the machine agent on the provisioned instance comes up before the provisioner has recorded its instance ID in state, then the machine agent may connect to state and find that it doesn't exist in state, and then kill itself. I can reproduce this easily by adding a sleep in the provisioner.

Andrew Wilkins (axwalk) on 2014-07-22

Changed in juju-core:
status:	Triaged → In Progress
assignee:	nobody → Andrew Wilkins (axwalk)

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2014-07-22:

Proposed fix in https://github.com/juju/juju/pull/357

Andrew Wilkins (axwalk) on 2014-07-22

Changed in juju-core:
status:	In Progress → Fix Committed

Curtis Hovey (sinzui) on 2014-09-08

Changed in juju-core:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.