charm hook may hang indefinitely querying payload if payload does not run
Bug #1912820 reported by
Alexander Balderson
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
charm-ovn-central |
Triaged
|
High
|
Unassigned |
Bug Description
On openstack charmers next, the deployment hangs (after relating the lma stack) waiting for the upgrade-charm hook to execute. it runs for 4 hours before we stopped the run.
the testrun can be found at:
https:/
and crashdump at:
https:/
Marking this as a release blocker
Changed in charm-ovn-central: | |
status: | New → Triaged |
assignee: | nobody → Alex Kavanagh (ajkavanagh) |
Changed in charm-ovn-central: | |
status: | Triaged → Incomplete |
tags: | added: charm-upgrade |
Changed in charm-ovn-central: | |
assignee: | Alex Kavanagh (ajkavanagh) → nobody |
status: | New → Triaged |
importance: | Undecided → High |
summary: |
- Charm never clears upgrade-charm flag and runs forvever + charm hook may hang indefinitely querying payload if payload does not + run |
To post a comment you must log in.
So it looks like the ovs-sbctl command hung in the charm. Looking at the syslog for ovn-central/0 it looks like the sb process was offline when the command was run and this may have contributed to the hang. Bug https:/ /bugzilla. redhat. com/show_ bug.cgi? id=1622051 might be what is responsible.
Does this happen on every run?
Do you know what triggered the upgrade-charm (was it a resource addition?)
I don't think the title is correct as this appears in the log for the unit (it was in the upgrade-charm hook, but had finished 'upgrading the charm'. It was in the render handler where it hung.
2021-01-22 12:18:52 INFO juju-log Invoking reactive handler: reactive/ ovn_central_ handlers. py:157: render
My suspicion is:
- the upgrading of the payload resulted in the ovn-ovsdb-server-sb service dying / being stopped.
- the ovs-sbctl command hung waiting to connect to the service (which isn't running).
- I don't know why the ovn-ovsdb-server-sb process didn't come back (it may have, and the ovs-sbctl command) may just have hung.
I wonder if the charm should timeout the command and retry it a few times instead of relying on it working first time?