broker crash on database timeout
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
git-ubuntu |
Triaged
|
Low
|
Unassigned |
Bug Description
This could use some investigation. Is the sqlite3 lock timeout retry code buggy somehow?
Associated with this was a broker exit status 1 with no traceback a few days after an automatic restart following this crash.
Apr 16 03:11:31 reber git-ubuntu[78448]: Traceback (most recent call last):
Apr 16 03:11:31 reber git-ubuntu[78448]: File "/snap/
Apr 16 03:11:31 reber git-ubuntu[78448]: load_entry_
Apr 16 03:11:31 reber git-ubuntu[78448]: File "/snap/
Apr 16 03:11:31 reber git-ubuntu[78448]: sys.exit(
Apr 16 03:11:31 reber git-ubuntu[78448]: File "/snap/
Apr 16 03:11:31 reber git-ubuntu[78448]: worker_
Apr 16 03:11:31 reber git-ubuntu[78448]: File "/snap/
Apr 16 03:11:31 reber git-ubuntu[78448]: main_loop(
Apr 16 03:11:31 reber git-ubuntu[78448]: File "/snap/
Apr 16 03:11:31 reber git-ubuntu[78448]: handle_
Apr 16 03:11:31 reber git-ubuntu[78448]: File "/snap/
Apr 16 03:11:31 reber git-ubuntu[78448]: reply.identity,
Apr 16 03:11:31 reber git-ubuntu[78448]: File "/snap/
Apr 16 03:11:31 reber git-ubuntu[78448]: self.db.
Apr 16 03:11:31 reber git-ubuntu[78448]: sqlite3.
Just looking cursorily at this I notice a comment:
# We only update the status of the request if the worker is telling us state.clear_ ipc_worker( ) to get state.lookup_ ipc_worker(
reply. identity,
# about the request we have recorded as having given it. It's possible that
# the worker is reporting work from an old identity and we've already moved
# on from that, so in that case we just ignore the worker's work and rely
# on the cleanup done later using service_
# back to a good state.
package, level, reckoning_time = service_
)
Might it be possible that this is in fact the documented situation where the worker is reporting on old stuff (or returning after a timeout)? Might be why it ends with a "no transaction is active" result.