Comment 18 for bug 1952400

Revision history for this message
Frank Miller (sensfan22) wrote :

Analysis from Bob Church:

Long story short:
• This virtual system is suffering from I/O overload probably due to the new 5.10 kernel and I/O scheduler issues (hopefully Gerry’s update will help this: https://review.opendev.org/c/starlingx/config-files/+/820263 )
• On controller-1 unlock the DRBD devices start syncing, ETCD immediately starts to see slow response times, SM starts to see audit misses and kills processes (etcd and sysinv) during the middle of the application-apply
• SM thrashes between the controllers for approx. 5 minutes before stabilizing back on controller-0
• Etcd and sysinv are restarted
• Once sysinv is restarted, it will reset the state of the app to apply-failed from applying.

Timeline added into next comment.