compute node keeps offline after unlock due to vswitch error, caused by a hugepage allocation failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Austin Sun |
Bug Description
Brief Description
-----------------
During lab initial period, after unlock compute nodes, one of the compute node keep "offline"
Severity
--------
Major
Steps to Reproduce
------------------
- Commission lab
- unlock worker/compute nodes
Expected Behavior
------------------
- All compute nodes are enabled without issues
Actual Behavior
----------------
- one compute node remains offline
Reproducibility
---------------
Reproducible
System Configuration
-------
Multi-node system
Lab-name: WCP_113-121
Branch/Pull Time/Commit
-------
stx master as of 20190515T220331Z
Last Pass
---------
2019-05-09_16-05-20
Timestamp/Logs
--------------
[2019-05-16 08:01:40,938] 262 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-list --nowrap'
[2019-05-16 08:01:42,478] 387 DEBUG MainThread ssh.expect :: Output:
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | storage-0 | storage | unlocked | enabled | available |
| 4 | storage-1 | storage | unlocked | enabled | degraded |
| 5 | compute-0 | worker | locked | disabled | online |
| 6 | compute-1 | worker | locked | disabled | online |
| 7 | compute-2 | worker | locked | disabled | online |
| 8 | compute-3 | worker | locked | disabled | online |
| 9 | compute-4 | worker | locked | disabled | online |
+----+-
[2019-05-16 08:02:30,918] 262 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-unlock compute-3'
[2019-05-16 08:25:49,329] 262 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-list --nowrap'
[2019-05-16 08:25:50,848] 387 DEBUG MainThread ssh.expect :: Output:
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | storage-0 | storage | unlocked | enabled | available |
| 4 | storage-1 | storage | unlocked | enabled | available |
| 5 | compute-0 | worker | unlocked | enabled | available |
| 6 | compute-1 | worker | unlocked | enabled | available |
| 7 | compute-2 | worker | unlocked | enabled | available |
| 8 | compute-3 | worker | unlocked | disabled | offline |
| 9 | compute-4 | worker | unlocked | enabled | available |
+----+-
Test Activity
-------------
lab setup
tags: | added: stx.retestneeded |
tags: | added: stx.networking |
Changed in starlingx: | |
assignee: | nobody → Forrest Zhao (forrest.zhao) |
description: | updated |
Changed in starlingx: | |
assignee: | Forrest Zhao (forrest.zhao) → ChenjieXu (midone) |
Changed in starlingx: | |
status: | New → Incomplete |
Changed in starlingx: | |
status: | Triaged → Incomplete |
summary: |
- compute node keeps offline after unlock due to vswitch error + compute node keeps offline after unlock due to vswitch error, caused by + a hugepage allocation failure |
Changed in starlingx: | |
status: | Incomplete → Confirmed |
daemon.log (compute-3)
2019-05- 16T08:06: 47.366 compute-3 systemd[1]: info Starting Open vSwitch... 16T08:06: 47.377 compute-3 systemd[1]: info Started Open vSwitch. 16T08:06: 47.395 compute-3 systemd[1]: info Reloading. 16T08:06: 47.000 compute-3 ovs-vsctl: notice ovs|00001| vsctl|INFO| Called as /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config: dpdk-hugepage- dir=/mnt/ huge-1048576kB 16T08:06: 47.000 compute-3 ovs-vsctl: notice ovs|00001| vsctl|INFO| Called as /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config: pmd-cpu- mask=6 16T08:06: 47.000 compute-3 ovs-vsctl: notice ovs|00001| vsctl|INFO| Called as /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config: dpdk-socket- mem=0,0 16T08:06: 47.000 compute-3 ovs-vsctl: notice ovs|00001| vsctl|INFO| Called as /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . "other_ config: dpdk-extra= -n 4" 16T08:06: 47.000 compute-3 ovs-vsctl: notice ovs|00001| vsctl|INFO| Called as /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config: dpdk-lcore- mask=7 16T08:06: 47.000 compute-3 ovs-vsctl: notice ovs|00001| vsctl|INFO| Called as /usr/bin/ovs-vsctl set Open_vSwitch . other_config: dpdk-init= true 16T08:06: 47.000 compute-3 ovs-vswitchd: err ovs|00018| dpdk|ERR| EAL: invalid parameters for --socket-mem 16T08:06: 47.000 compute-3 ovs-vswitchd: err ovs|00019| dpdk|ERR| EAL: Invalid 'command line' arguments. 16T08:06: 47.000 compute-3 ovs-vswitchd: alert ovs|00020| dpdk|EMER| Unable to initialize DPDK: Invalid argument 16T08:06: 48.380 compute-3 ovs-ctl[37096]: info 2019-05- 16T08:06: 48Z|00001| unixctl| WARN|failed to connect to /var/run/ openvswitch/ ovs-vswitchd. 36875.ctl 16T08:06: 48.000 compute-3 ovs-appctl: warning ovs|00001| unixctl| WARN|failed to connect to /var/run/ openvswitch/ ovs-vswitchd. 36875.ctl
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-