Canonical Juju

Instance poller reports: states changing too quickly

Bug #1948824 reported by Simon Richardson on 2021-10-26

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Joseph Phillips	Canonical Juju 2.9.33

Bug Description

Bootstrap to lxc, then enable HA. Deploy a bundle with multiple machines, then switch to the controller model and destroy the default model.

The logs report:

machine-0: 15:00:37 ERROR juju.apiserver.instancepoller link layer device merge attempt for machine 1 failed due to error: state changing too quickly; try again soon; waiting until next instance-poller run to retry

Canonical Juju QA Bot (juju-qa-bot) on 2021-11-04

Changed in juju:
milestone:	2.9.18 → 2.9.19

Canonical Juju QA Bot (juju-qa-bot) on 2021-11-17

Changed in juju:
milestone:	2.9.19 → 2.9.20

Canonical Juju QA Bot (juju-qa-bot) on 2021-11-26

Changed in juju:
milestone:	2.9.20 → 2.9.21

Canonical Juju QA Bot (juju-qa-bot) on 2021-12-01

Changed in juju:
milestone:	2.9.21 → 2.9.22

Canonical Juju QA Bot (juju-qa-bot) on 2021-12-10

Changed in juju:
milestone:	2.9.22 → 2.9.23

Canonical Juju QA Bot (juju-qa-bot) on 2022-01-12

Changed in juju:
milestone:	2.9.23 → 2.9.24

Canonical Juju QA Bot (juju-qa-bot) on 2022-02-01

Changed in juju:
milestone:	2.9.24 → 2.9.25

Canonical Juju QA Bot (juju-qa-bot) on 2022-02-15

Changed in juju:
milestone:	2.9.25 → 2.9.26

Canonical Juju QA Bot (juju-qa-bot) on 2022-03-09

Changed in juju:
milestone:	2.9.26 → 2.9.27

Canonical Juju QA Bot (juju-qa-bot) on 2022-03-18

Changed in juju:
milestone:	2.9.27 → 2.9.28

Canonical Juju QA Bot (juju-qa-bot) on 2022-03-30

Changed in juju:
milestone:	2.9.28 → 2.9.29

Canonical Juju QA Bot (juju-qa-bot) on 2022-04-27

Changed in juju:
milestone:	2.9.29 → 2.9.30

Ian Booth (wallyworld) on 2022-05-12

Changed in juju:
milestone:	2.9.30 → 2.9.31

Revision history for this message

Joseph Phillips (manadart) wrote on 2022-05-23:

I couldn't replicate this.

It is conceivable that we could poll instances and be processing the result when the model transitions to dead, but it doesn't say here whether the error was recurrent.

Changed in juju:
milestone:	2.9.31 → none
status:	Triaged → Incomplete
assignee:	nobody → Joseph Phillips (manadart)

Revision history for this message

Heather Lanigan (hmlanigan) wrote on 2022-07-06:

DB data from a model reproducing, the machine is in a dying state per juju. It exists in openstack but is shutdown right now.

juju:PRIMARY> db.txns.find({'s':5, "o.c":{$eq:"linklayerdevices"}}).sort({'_id': -1}).limit(1).pretty()
{
"_id" : ObjectId("62c5d42eea195321ac2233d2"),
"s" : 5,
"o" : [
  {
   "c" : "machines",
   "d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16",
   "a" : {
    "life" : 0 <-- ERROR
   }
  },
  {
   "c" : "providerIDs",
   "d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:linklayerdevice:9ca7bc2f-6379-43fd-a42c-5baae8f4f654",
   "a" : "d-",
   "i" : {
    "_id" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:linklayerdevice:9ca7bc2f-6379-43fd-a42c-5baae8f4f654",
    "model-uuid" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16"
   }
  },
  {
   "c" : "linklayerdevices",
   "d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0",
   "a" : "d+",
   "u" : {
    "$set" : {
     "providerid" : "9ca7bc2f-6379-43fd-a42c-5baae8f4f654"
    }
   }
  },
  {
   "c" : "ip.addresses",
   "d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0#ip#10.48.132.243",
   "a" : "d+",
   "u" : {
    "$set" : {
     "provider-network-id" : "d75416f2-093a-4ffb-aaf6-d22f350d01ea",
     "provider-subnet-id" : "0915bb73-94b6-493d-af18-2aa9a21fc415"
    }
   }
  }
],
"n" : "d61deb9e"
}

juju:PRIMARY> db.machines.find({"_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16"}).pretty();
{
"_id" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16",
"machineid" : "16",
"model-uuid" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16",
"nonce" : "machine-0:e8396142-140d-45b8-8072-5cc9305121fe",
"series" : "focal",
"containertype" : "",
"principals" : [
"ubuntu-arm64/0"
],
"life" : 1, <-- doesn't match txn
"jobs" : [
1
],
...

db.linklayerdevices.find({"_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0"}) does exist
db.ip.addresses.find({ "_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0#ip#10.48.132.243" }) does exist
db.providerIDs.find({"_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:linklayerdevice:9ca7bc2f-6379-43fd-a42c-5baae8f4f654" }) does not exist.

DB data from a model reproducing, the machine is in a dying state per juju. It exists in openstack but is shutdown right now.

juju:PRIMARY> db.txns.find({'s':5, "o.c":{$eq:"linklayerdevices"}}).sort({'_id': -1}).limit(1).pretty()
{
	"_id" : ObjectId("62c5d42eea195321ac2233d2"),
	"s" : 5,
	"o" : [
		{
			"c" : "machines",
			"d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16",
			"a" : {
				"life" : 0                                 <-- ERROR
			}
		},
		{
			"c" : "providerIDs",
			"d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:linklayerdevice:9ca7bc2f-6379-43fd-a42c-5baae8f4f654",
			"a" : "d-",
			"i" : {
				"_id" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:linklayerdevice:9ca7bc2f-6379-43fd-a42c-5baae8f4f654",
				"model-uuid" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16"
			}
		},
		{
			"c" : "linklayerdevices",
			"d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0",
			"a" : "d+",
			"u" : {
				"$set" : {
					"providerid" : "9ca7bc2f-6379-43fd-a42c-5baae8f4f654"
				}
			}
		},
		{
			"c" : "ip.addresses",
			"d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0#ip#10.48.132.243",
			"a" : "d+",
			"u" : {
				"$set" : {
					"provider-network-id" : "d75416f2-093a-4ffb-aaf6-d22f350d01ea",
					"provider-subnet-id" : "0915bb73-94b6-493d-af18-2aa9a21fc415"
				}
			}
		}
	],
	"n" : "d61deb9e"
}

juju:PRIMARY> db.machines.find({"_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16"}).pretty();
{
	"_id" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16",
	"machineid" : "16",
	"model-uuid" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16",
	"nonce" : "machine-0:e8396142-140d-45b8-8072-5cc9305121fe",
	"series" : "focal",
	"containertype" : "",
	"principals" : [
		"ubuntu-arm64/0"
	],
	"life" : 1,                                                   <-- doesn't match txn
	"jobs" : [
		1
	],
...

Changed in juju:
status:	Incomplete → Triaged

Joseph Phillips (manadart) on 2022-07-07

Changed in juju:
status:	Triaged → In Progress
milestone:	none → 2.9.33

Revision history for this message

Heather Lanigan (hmlanigan) wrote on 2022-07-07:

Reproduced by:

1. juju deploy ubuntu
2. when up, shutdown the machine in the cloud (used openstack)
3. juju remove-unit ubuntu/0
4. juju remove-machine 0

Didn't want to use force and perhaps move past the machine in dying.

juju:PRIMARY> db.machines.find({},{"life":1,"principals":1}).pretty()
{
"_id" : "05d2fcdb-a93f-43cd-8ca8-cd1d610d6ed4:0",
"principals" : [
"ubuntu/0"
],
"life" : 1
}

Revision history for this message

Heather Lanigan (hmlanigan) wrote on 2022-07-07:

#3 reproduces the scenario in the db, but not the error. Also seen
controller-0: 15:18:24 INFO juju.worker.instancepoller machine "0" (instance ID "faea9b61-b8bc-4bb5-8b14-b7b37f57407f") instance status changed from {"running" "ACTIVE"} to {"" "SHUTOFF"}

Revision history for this message

Joseph Phillips (manadart) wrote on 2022-07-08:

https://github.com/juju/juju/pull/14272 does the trick.

Changed in juju:
status:	In Progress → Fix Committed

Canonical Juju QA Bot (juju-qa-bot) on 2022-08-08

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.