Activity log for bug #1948906

Date Who What changed Old value New value Message
2021-10-27 07:11:51 Hemanth Nakkina bug added bug
2021-10-27 10:21:18 Hemanth Nakkina description Hi Performing a series upgrade from bionic to focal. And the post-series-upgrade hook failed with a known error from the application. The application error is resolved manually. But the post-series-upgrade is stuck with the following error without completing the upgrade. Error message in juju unit logs: 2021-10-27 06:28:14 INFO juju.worker.uniter uniter.go:339 unit "designate-bind/1" started 2021-10-27 06:28:14 INFO juju.worker.uniter uniter.go:357 hooks are retried true 2021-10-27 06:28:14 INFO juju.worker.uniter resolver.go:150 awaiting error resolution for "post-series-upgrade" hook 2021-10-27 06:28:19 INFO juju.worker.uniter resolver.go:150 awaiting error resolution for "post-series-upgrade" hook 2021-10-27 06:28:19 ERROR juju.worker.uniter.operation runhook.go:200 error updating workload status before post-series-upgrade hook: upgrade series status "complete running" 2021-10-27 06:28:19 ERROR juju.worker.uniter agent.go:31 resolver loop error: executing operation "run post-series-upgrade hook" for designate-bind/1: upgrade series status "complete running" 2021-10-27 06:28:19 INFO juju.worker.uniter uniter.go:323 unit "designate-bind/1" shutting down: executing operation "run post-series-upgrade hook" for designate-bind/1: upgrade series status "complete running" 2021-10-27 06:28:19 ERROR juju.worker.dependency engine.go:671 "uniter" manifold worker returned unexpected error: executing operation "run post-series-upgrade hook" for designate-bind/1: upgrade series status "complete running" I have to change mongodb to recover from this. Even though juju does not expect application to fail, there should be mechanism to recover from the situation without changing the database. (Also note upgrading the application unit to the charm revision that has application issue fix cannot resolve the agent status to idle) Reproducer Steps: 1. juju add-model test 2. juju deploy cs:designate-bind-34 --series bionic 3. juju upgrade-series 1 prepare focal 4. # ssh to designate-bind unit and perform upgrade 5. juju upgrade-series 1 complete # This is hanged 6. Manually fix the application error juju ssh designate-bind/1 sudo dpkg-reconfigure bind9 7. juju run -u designate-bind/1 -- hooks/update-status # After this step, workload status is in Active but the Agent is in Failed state. Analysis: After step 5, the mongodb upgradeserieslock looks like this. See the unit status for designate-bind/1 is "complete running" db.machineUpgradeSeriesLocks.find().forEach(printjson) { "_id" : "d3f7c58a-a55f-495c-8b23-bf86deb93a6b:1", "machine-id" : "1", "to-series" : "focal", "from-series" : "bionic", "machine-status" : "complete started", "messages" : [ { "message" : "machine-1 validation of upgrade series from \"bionic\" to \"focal\"", "timestamp" : ISODate("2021-10-27T05:57:44.583Z"), "seen" : true }, { "message" : "machine-1 started upgrade series from \"bionic\" to \"focal\"", "timestamp" : ISODate("2021-10-27T05:57:44.669Z"), "seen" : true }, { "message" : "designate-bind/1 pre-series-upgrade hook running", "timestamp" : ISODate("2021-10-27T05:57:44.783Z"), "seen" : true }, { "message" : "designate-bind/1 pre-series-upgrade completed", "timestamp" : ISODate("2021-10-27T05:57:48.824Z"), "seen" : true }, { "message" : "machine-1 binaries and service files written", "timestamp" : ISODate("2021-10-27T05:57:48.964Z"), "seen" : true }, { "message" : "machine-1 complete phase started", "timestamp" : ISODate("2021-10-27T06:23:40.403Z"), "seen" : true }, { "message" : "machine-1 start units after series upgrade", "timestamp" : ISODate("2021-10-27T06:23:40.509Z"), "seen" : true }, { "message" : "designate-bind/1 post-series-upgrade hook running", "timestamp" : ISODate("2021-10-27T06:23:40.646Z"), "seen" : true } ], "timestamp" : ISODate("2021-10-27T06:23:40.646Z"), "unit-statuses" : { "designate-bind/1" : { "status" : "complete running", "timestamp" : ISODate("2021-10-27T06:23:40.646Z") } }, "model-uuid" : "d3f7c58a-a55f-495c-8b23-bf86deb93a6b", "txn-revno" : NumberLong(16), "txn-queue" : [ "6178f06c3f0b39661e6cb75a_e44848a8" ] } As part of post-upgrade-series beforeHook, juju expects the unit to be in "complete started" state so that FSM can change it to "complete running". Since the unit is already in "complete running", the agent retry of post-upgrade-series hook fails in beforeHook. https://github.com/juju/juju/blob/develop/worker/uniter/operation/runhook.go#L196 Workaround: Update the unit status to "complete status" in mongodb db.machineUpgradeSeriesLocks.update( { "machine-id" : "1" }, { $set: { "unit-statuses.designate-bind/1.status": "complete started" } } ) After 5-10 minutes, the post-series-upgrade hook is retriggered automatically and agent is back to idle state. Hi Performing a series upgrade from bionic to focal. And the post-series-upgrade hook failed with a known error from the application. The application error is resolved manually. But the post-series-upgrade is stuck with the following error without completing the upgrade. Error message in juju unit logs: 2021-10-27 06:28:14 INFO juju.worker.uniter uniter.go:339 unit "designate-bind/1" started 2021-10-27 06:28:14 INFO juju.worker.uniter uniter.go:357 hooks are retried true 2021-10-27 06:28:14 INFO juju.worker.uniter resolver.go:150 awaiting error resolution for "post-series-upgrade" hook 2021-10-27 06:28:19 INFO juju.worker.uniter resolver.go:150 awaiting error resolution for "post-series-upgrade" hook 2021-10-27 06:28:19 ERROR juju.worker.uniter.operation runhook.go:200 error updating workload status before post-series-upgrade hook: upgrade series status "complete running" 2021-10-27 06:28:19 ERROR juju.worker.uniter agent.go:31 resolver loop error: executing operation "run post-series-upgrade hook" for designate-bind/1: upgrade series status "complete running" 2021-10-27 06:28:19 INFO juju.worker.uniter uniter.go:323 unit "designate-bind/1" shutting down: executing operation "run post-series-upgrade hook" for designate-bind/1: upgrade series status "complete running" 2021-10-27 06:28:19 ERROR juju.worker.dependency engine.go:671 "uniter" manifold worker returned unexpected error: executing operation "run post-series-upgrade hook" for designate-bind/1: upgrade series status "complete running" I have to change mongodb to recover from this. Even though juju does not expect application to fail, there should be mechanism to recover from the situation without changing the database. (Also note upgrading the application unit to the charm revision that has application issue fix cannot resolve the agent status to idle) Reproducer Steps: 1. juju add-model test 2. juju deploy cs:designate-bind-34 --series bionic 3. juju upgrade-series 1 prepare focal 4. # ssh to designate-bind unit and perform upgrade 5. juju upgrade-series 1 complete    # This is hanged 6. Manually fix the application error    juju ssh designate-bind/1 sudo dpkg-reconfigure bind9 7. juju run -u designate-bind/1 -- hooks/update-status    # After this step, workload status is in Active but the Agent is in Failed state. Analysis: After step 5, the mongodb upgradeserieslock looks like this. See the unit status for designate-bind/1 is "complete running" db.machineUpgradeSeriesLocks.find().forEach(printjson) {  "_id" : "d3f7c58a-a55f-495c-8b23-bf86deb93a6b:1",  "machine-id" : "1",  "to-series" : "focal",  "from-series" : "bionic",  "machine-status" : "complete started",  "messages" : [   {    "message" : "machine-1 validation of upgrade series from \"bionic\" to \"focal\"",    "timestamp" : ISODate("2021-10-27T05:57:44.583Z"),    "seen" : true   },   {    "message" : "machine-1 started upgrade series from \"bionic\" to \"focal\"",    "timestamp" : ISODate("2021-10-27T05:57:44.669Z"),    "seen" : true   },   {    "message" : "designate-bind/1 pre-series-upgrade hook running",    "timestamp" : ISODate("2021-10-27T05:57:44.783Z"),    "seen" : true   },   {    "message" : "designate-bind/1 pre-series-upgrade completed",    "timestamp" : ISODate("2021-10-27T05:57:48.824Z"),    "seen" : true   },   {    "message" : "machine-1 binaries and service files written",    "timestamp" : ISODate("2021-10-27T05:57:48.964Z"),    "seen" : true   },   {    "message" : "machine-1 complete phase started",    "timestamp" : ISODate("2021-10-27T06:23:40.403Z"),    "seen" : true   },   {    "message" : "machine-1 start units after series upgrade",    "timestamp" : ISODate("2021-10-27T06:23:40.509Z"),    "seen" : true   },   {    "message" : "designate-bind/1 post-series-upgrade hook running",    "timestamp" : ISODate("2021-10-27T06:23:40.646Z"),    "seen" : true   }  ],  "timestamp" : ISODate("2021-10-27T06:23:40.646Z"),  "unit-statuses" : {   "designate-bind/1" : {    "status" : "complete running",    "timestamp" : ISODate("2021-10-27T06:23:40.646Z")   }  },  "model-uuid" : "d3f7c58a-a55f-495c-8b23-bf86deb93a6b",  "txn-revno" : NumberLong(16),  "txn-queue" : [   "6178f06c3f0b39661e6cb75a_e44848a8"  ] } As part of post-upgrade-series beforeHook, juju expects the unit to be in "complete started" state so that FSM can change it to "complete running". Since the unit is already in "complete running", the agent retry of post-upgrade-series hook fails in beforeHook. https://github.com/juju/juju/blob/develop/worker/uniter/operation/runhook.go#L196 Workaround: Update the unit status to "complete started" in mongodb db.machineUpgradeSeriesLocks.update(     { "machine-id" : "1" },     {       $set: {         "unit-statuses.designate-bind/1.status": "complete started"       }     } ) After 5-10 minutes, the post-series-upgrade hook is retriggered automatically and agent is back to idle state.
2021-11-04 17:07:06 Simon Richardson juju: status New Triaged
2021-11-04 17:07:50 Simon Richardson juju: importance Undecided High
2021-11-05 04:08:20 Hemanth Nakkina tags seg
2022-06-14 06:44:32 Nobuto Murata bug added subscriber Nobuto Murata
2022-06-14 07:05:55 DUFOUR Olivier attachment added upgrade-issue-bundle.yaml https://bugs.launchpad.net/juju/+bug/1948906/+attachment/5597153/+files/upgrade-issue-bundle.yaml
2022-06-14 07:06:34 DUFOUR Olivier bug added subscriber Canonical Field High
2022-06-14 07:08:18 DUFOUR Olivier bug added subscriber DUFOUR Olivier
2022-06-16 08:39:30 Joseph Phillips juju: status Triaged In Progress
2022-06-16 08:39:32 Joseph Phillips juju: assignee Joseph Phillips (manadart)
2022-06-16 08:39:41 Joseph Phillips juju: milestone 2.9.33
2022-06-17 12:54:15 Joseph Phillips juju: status In Progress Fix Committed
2022-08-08 22:04:32 Canonical Juju QA Bot juju: status Fix Committed Fix Released