Hostconfig task in db stuck in processing state

Bug #1813047 reported by Achuth M
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-odl
New
Undecided
Unassigned
odl
New
Undecided
Unassigned

Bug Description

Issue reproduced with Openstack networking-odl stable/pike release ( but could be applicable to later releases as weell as per description below)

Frequency - Low

It is observed that Hostconfig periodic task maybe stuck in processing state (like during a neutron server reboot scenario or one of the neutron process died ) and there is no way to unlock the task within the n-odl driver to set it back to pending for such scenarios.
This will eventually cause ODL agents to be marked as down.

mysql> select * from opendaylight_periodic_task;
 +------------+-----------------------------+-------------+---------------------+
 | state | processing_operation | task | lock_updated |
 +------------+-----------------------------+-------------+---------------------+
 | processing | _get_and_update_hostconfigs | hostconfig | 2018-12-03 05:31:39 | ====> 05:31:39
 | pending | NULL | maintenance | 2018-12-03 08:16:18 | ====> 08:16:18
 +------------+-----------------------------+-------------+---------------------+
 2 rows in set (0.00 sec)

Neutron Log Snippet
WARNING neutron.db.agents_db [req-c872e719-1268-4aff-853f-9a954f05cecc - - - - -] Agent healthcheck: found 3 dead agents out of 9:
                 Type Last heartbeat host
               ODL L2 2018-12-03 07:53:58 compute-0-3.domain.tld
               ODL L2 2018-12-03 07:52:32 compute-0-1.domain.tld
               ODL L2 2018-12-03 07:54:09 compute-0-2.domain.tld

It is not certain as to what is the exact cause of the problem for state to remain in processing but there needs to be a mechanism for the task to come out
of the processing state beyond an interval and prevent the ODL L2 agent from
going down permanently

Achuth M (achuthm)
Changed in networking-odl:
assignee: nobody → Achuth M (achuthm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-odl (master)

Fix proposed to branch: master
Review: https://review.openstack.org/633190

Changed in networking-odl:
status: New → In Progress
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Hi,

Generally it helps other developers if you can provide:

1) exact versions of the software pieces you used, and

2) instructions on how to reproduce the problem (especially what process needs to be killed and exactly when).

I'm no networking-odl developer, but if I were this would help me to triage this bug.

Cheers,
Bence

Revision history for this message
Achuth M (achuthm) wrote :

Hi Bence,

I have added some additional information to the bug

Thanks,
Achuth

description: updated
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in networking-odl:
assignee: Achuth M (achuthm) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-odl (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/633190
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.