Instance poller reports: states changing too quickly

Bug #1948824 reported by Simon Richardson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Joseph Phillips

Bug Description

Bootstrap to lxc, then enable HA. Deploy a bundle with multiple machines, then switch to the controller model and destroy the default model.

The logs report:

    machine-0: 15:00:37 ERROR juju.apiserver.instancepoller link layer device merge attempt for machine 1 failed due to error: state changing too quickly; try again soon; waiting until next instance-poller run to retry

Changed in juju:
milestone: 2.9.18 → 2.9.19
Changed in juju:
milestone: 2.9.19 → 2.9.20
Changed in juju:
milestone: 2.9.20 → 2.9.21
Changed in juju:
milestone: 2.9.21 → 2.9.22
Changed in juju:
milestone: 2.9.22 → 2.9.23
Changed in juju:
milestone: 2.9.23 → 2.9.24
Changed in juju:
milestone: 2.9.24 → 2.9.25
Changed in juju:
milestone: 2.9.25 → 2.9.26
Changed in juju:
milestone: 2.9.26 → 2.9.27
Changed in juju:
milestone: 2.9.27 → 2.9.28
Changed in juju:
milestone: 2.9.28 → 2.9.29
Changed in juju:
milestone: 2.9.29 → 2.9.30
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9.30 → 2.9.31
Revision history for this message
Joseph Phillips (manadart) wrote :

I couldn't replicate this.

It is conceivable that we could poll instances and be processing the result when the model transitions to dead, but it doesn't say here whether the error was recurrent.

Changed in juju:
milestone: 2.9.31 → none
status: Triaged → Incomplete
assignee: nobody → Joseph Phillips (manadart)
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

DB data from a model reproducing, the machine is in a dying state per juju. It exists in openstack but is shutdown right now.

juju:PRIMARY> db.txns.find({'s':5, "o.c":{$eq:"linklayerdevices"}}).sort({'_id': -1}).limit(1).pretty()
{
 "_id" : ObjectId("62c5d42eea195321ac2233d2"),
 "s" : 5,
 "o" : [
  {
   "c" : "machines",
   "d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16",
   "a" : {
    "life" : 0 <-- ERROR
   }
  },
  {
   "c" : "providerIDs",
   "d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:linklayerdevice:9ca7bc2f-6379-43fd-a42c-5baae8f4f654",
   "a" : "d-",
   "i" : {
    "_id" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:linklayerdevice:9ca7bc2f-6379-43fd-a42c-5baae8f4f654",
    "model-uuid" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16"
   }
  },
  {
   "c" : "linklayerdevices",
   "d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0",
   "a" : "d+",
   "u" : {
    "$set" : {
     "providerid" : "9ca7bc2f-6379-43fd-a42c-5baae8f4f654"
    }
   }
  },
  {
   "c" : "ip.addresses",
   "d" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0#ip#10.48.132.243",
   "a" : "d+",
   "u" : {
    "$set" : {
     "provider-network-id" : "d75416f2-093a-4ffb-aaf6-d22f350d01ea",
     "provider-subnet-id" : "0915bb73-94b6-493d-af18-2aa9a21fc415"
    }
   }
  }
 ],
 "n" : "d61deb9e"
}

juju:PRIMARY> db.machines.find({"_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16"}).pretty();
{
 "_id" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16:16",
 "machineid" : "16",
 "model-uuid" : "d2bf2df5-7c81-4508-89c2-bf33ac89df16",
 "nonce" : "machine-0:e8396142-140d-45b8-8072-5cc9305121fe",
 "series" : "focal",
 "containertype" : "",
 "principals" : [
  "ubuntu-arm64/0"
 ],
 "life" : 1, <-- doesn't match txn
 "jobs" : [
  1
 ],
...

db.linklayerdevices.find({"_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0"}) does exist
db.ip.addresses.find({ "_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:m#16#d#enp1s0#ip#10.48.132.243" }) does exist
db.providerIDs.find({"_id": "d2bf2df5-7c81-4508-89c2-bf33ac89df16:linklayerdevice:9ca7bc2f-6379-43fd-a42c-5baae8f4f654" }) does not exist.

Changed in juju:
status: Incomplete → Triaged
Changed in juju:
status: Triaged → In Progress
milestone: none → 2.9.33
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Reproduced by:

1. juju deploy ubuntu
2. when up, shutdown the machine in the cloud (used openstack)
3. juju remove-unit ubuntu/0
4. juju remove-machine 0

Didn't want to use force and perhaps move past the machine in dying.

juju:PRIMARY> db.machines.find({},{"life":1,"principals":1}).pretty()
{
 "_id" : "05d2fcdb-a93f-43cd-8ca8-cd1d610d6ed4:0",
 "principals" : [
  "ubuntu/0"
 ],
 "life" : 1
}

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

#3 reproduces the scenario in the db, but not the error. Also seen
controller-0: 15:18:24 INFO juju.worker.instancepoller machine "0" (instance ID "faea9b61-b8bc-4bb5-8b14-b7b37f57407f") instance status changed from {"running" "ACTIVE"} to {"" "SHUTOFF"}

Revision history for this message
Joseph Phillips (manadart) wrote :
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.