First time application-apply stx-openstack failed on vbox due to timeout from unbalanced CPU load

Bug #1825423 reported by Al Bailey
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
David Sullivan

Bug Description

This issue started to appear around April 15.

Env: VBOX
Configuration: AIO-SX
Problem: system application-apply fails during the neutron phase (compute-kit charts).

Diagnosis: None of the pods show any indication of Crashing. The pods (nova, neutron, openvswitch, nova-api-proxy, libvirt) eventually come up. The problem is that the timeout occurs. openvswitch times out after 15 minutes because not all those pods came up fast enough.

Troubleshooting:
running top and pressing "1" will show the cpu load distribution.
The platform processes should be running across CPU 0 and 1. However they are all affined to only one CPU.
This leads to a load average of over 50, and eventually a timeout.

Theories:
cat /proc/cmdline shows empty value for isolcpus

Additional Notes:
 I didn't use the bug reporting template.

Revision history for this message
Frank Miller (sensfan22) wrote :

Marking as release gating. Issue seen on virtual environments and appears related to https://review.openstack.org/#/c/648511/

Changed in starlingx:
assignee: nobody → Jim Gauld (jgauld)
importance: Undecided → High
tags: added: stx.containers
tags: added: stx.2.0
Changed in starlingx:
status: New → Triaged
Frank Miller (sensfan22)
Changed in starlingx:
assignee: Jim Gauld (jgauld) → David Sullivan (dsullivanwr)
Revision history for this message
Al Bailey (albailey1974) wrote :

The cpu pinning changes were unrelated to the problem.
The issue was the isolcpus

if vswitch_type = none
  Std worker/aio – remove isolcpu’s from the kernel boot args
  Low latency worker/aio – set isolcpu’s to all cores minus the platform cores.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/655240

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/655240
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=e0d453a98b72606ec9a0b90a3acb5bbda546d2ff
Submitter: Zuul
Branch: master

commit e0d453a98b72606ec9a0b90a3acb5bbda546d2ff
Author: David Sullivan <email address hidden>
Date: Tue Apr 23 15:44:00 2019 -0400

    isolcpus incorrectly specified when option is unused

    When 0 vswitch cpu cores are requested we passed 'isolcpu= ' to the grub
    config. This resulted in isolcpu being allocated when none should have
    been. We now check the values passed the grub and ensure they are not
    empty.

    Change-Id: I398e67820f4a12f78a8b457f2618fc1bb7b7cf81
    Closes-Bug: 1825423
    Signed-off-by: David Sullivan <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
tags: added: stx.config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.