juju machine numbers being incorrectly assigned

Bug #1334683 reported by Matthew Brown
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Medium
Andrew Wilkins

Bug Description

Normally machines have the same dns-name and instance-name, however, machines (including machine 0) are being jumbled up to new machine ids.

juju status eg:

machines:
  "0":
    agent-state: started
    agent-version: 1.18.1
    dns-name: juju-azure-5hsjnmbsv3.domainname.net
    instance-id: juju-azure-rlnisxa4f5
    instance-state: Created
    series: precise
  "1":
    agent-state: started
    agent-version: 1.18.1
    dns-name: juju-azure-tvj40ksov4.domainname.net
    instance-id: juju-azure-tvj40ksov4
    instance-state: Created
    series: precise
  "2":
    agent-state: started
    agent-version: 1.18.1
    dns-name: juju-azure-e27vzkotej.domainname.net
    instance-id: juju-azure-e27vzkotej
    instance-state: Created
    series: precise
  "3":
    agent-state: started
    agent-version: 1.18.1
    dns-name: juju-azure-9gak9o5jrx.domainname.net
    instance-id: juju-azure-9gak9o5jrx
    instance-state: Created
    series: precise
  "4":
    agent-state: started
    agent-version: 1.18.1
    dns-name: juju-azure-rlnisxa4f5.domainname.net
    instance-id: juju-azure-5hsjnmbsv3
    instance-state: Created
    series: precise

In this example machine 0 and machine 4 have had their numbers swapped. Sometimes multiple swaps take place (ie, 2->5, 5(2)->3 ), the swaps happen intermittently, and eventually may restore themselves to their original order.

Revision history for this message
Matthew Brown (matthew-brown) wrote :
Download full text (6.8 KiB)

These are errors that occurred during a time period of the machines having the wrong ID.

machine-0.log output

2014-06-26 12:10:43 WARNING juju.worker.instanceupdater updater.go:98 ignoring error when stopping watcher: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju runner.go:209 worker: fatal "minunitsworker": watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju runner.go:209 worker: fatal "cleaner": watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.lifecycleWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.relationUnitsWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.relationUnitsWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.relationUnitsWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.relationUnitsWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.machineUnitsWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.entityWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *state.relationUnitsWatcher resource: watcher iteration error: EOF
2014-06-26 12:10:43 ERROR juju resource.go:83 state/api: error stopping *apiserver.machinePinger resource: EOF

machine-8.log output:

2014-06-26 12:10:43 ERROR juju watcher.go:66 state/api: error trying to stop watcher connection is shut down
2014-06-26 12:10:43 ERROR juju watcher.go:66 state/api: error trying to stop watcher connection is shut down
2014-06-26 12:10:43 ERROR juju watcher.go:66 state/api: error trying to stop watcher connection is shut ...

Read more...

Jorge Castro (jorge)
tags: added: azure-provider
Revision history for this message
Matthew Brown (matthew-brown) wrote :

juju version: 1.18.1-precise-amd64

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Matthew, thanks very much for the report. I've taken a look at the code and there was an ordering issue that caused this (addresses were assigned randomly as a result). This is fixed in 1.19, and we have a new stable release (1.20) expected next week.

Revision history for this message
Ian Booth (wallyworld) wrote :

Marking as fix released as it is fixed in the 1.19 releases

Changed in juju-core:
assignee: nobody → Andrew Wilkins (axwalk)
status: Triaged → Fix Released
Revision history for this message
Dave Tonge (dave-tonge) wrote :

Thanks Andrew, do you think this issue will cause us problems in the upgrade process when we go to 1.20?

Revision history for this message
Andrew Wilkins (axwalk) wrote :

> Thanks Andrew, do you think this issue will cause us problems in the upgrade process when we go to 1.20?

I think it should be fine; the CLI uses a different (stable) method of obtaining the state server's address, so it should be able to upgrade the state server without any issues. Once that's done, it will correct the addresses of the other machines.

FYI, there will be some fairly major changes coming to the Azure provider in 1.20, which are optional but require an environment to be recreated to get the benefits of. Namely, we'll be supporting Availability Sets (each unit of a service goes into the same availability set). Look out for this in the release notes.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.