host-unlock compute node rejected: Total allocated memory exceeds the total memory
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Tao Liu |
Bug Description
Brief Description
-----------------
In regular lowlatency system. host-lock compute success, but host-unlock rejected by Total allocated memory exceeds the total memory of compute-1 numa node 0
Severity
--------
Major
Steps to Reproduce
------------------
as description
TC-name: mtc/test_
Expected Behavior
------------------
Actual Behavior
----------------
Reproducibility
---------------
Seen once
System Configuration
-------
Multi-node system
Lab-name: Ip_1-4
Branch/Pull Time/Commit
-------
stx master as of 20190724T013000Z
Last Pass
---------
Lab: IP_1_4
Load: 20190721T233000Z
Timestamp/Logs
--------------
[2019-07-24 08:48:43,872] 301 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-07-24 08:48:45,246] 423 DEBUG MainThread ssh.expect :: Output:
[sysadmin@
[2019-07-24 08:48:45,685] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-07-24 08:48:47,402] 423 DEBUG MainThread ssh.expect :: Output:
+------
| application | version | manifest name | manifest file | status | progress |
+------
| platform-integ-apps | 1.0-7 | platform-
| stx-openstack | 1.0-17-
+------
[sysadmin@
[2019-07-24 08:48:50,973] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-07-24 08:49:33,429] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-07-24 08:49:35,049] 423 DEBUG MainThread ssh.expect :: Output:
+------
| Property | Value |
+------
| action | none |
| administrative | locked |
| availability | online |
| bm_ip | None |
| bm_type | None |
| bm_username | None |
| boot_device | /dev/disk/
| capabilities | {} |
| config_applied | 3b0daed7-
| config_status | None |
| config_target | 3b0daed7-
| console | ttyS0,115200n8 |
| created_at | 2019-07-
| hostname | compute-1 |
| id | 2 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.204.149 |
| mgmt_mac | 00:1e:67:4e:02:44 |
| operational | disabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/
| serialid | None |
| software_load | 19.01 |
| subfunctions | worker,lowlatency |
| task | |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-07-
| uptime | 5551 |
| uuid | 4e91f428-
| vim_progress_status | services-disabled |
+------
[sysadmin@
[2019-07-24 08:49:37,142] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-07-24 08:49:39,203] 423 DEBUG MainThread ssh.expect :: Output:
Rejected: Total allocated memory exceeds the total memory of compute-1 numa node 0
[sysadmin@
Test Activity
-------------
Sanity
Changed in starlingx: | |
status: | New → Incomplete |
Changed in starlingx: | |
importance: | Undecided → Low |
status: | Incomplete → Triaged |
importance: | Low → Medium |
tags: | added: stx.retestneeded |
Changed in starlingx: | |
status: | Triaged → In Progress |
tags: | added: stx.regression |
tags: | added: in-r-stx20 |
I took a look at IP 1-4 and found that compute-1 numa node 0 total memory has decreased from 15478 MiB to 14584 MiB between the 2 unlock actions. The second one was rejected due to “Total allocated memory exceeds the total memory”. There is no log to display the node free memory retrieved from Linux on the agent side and it is hard to tell why the available memory was reduced.
# first unlock api.controllers .v1.host [-] compute-1 ihost check_unlock_worker api.controllers .v1.host [-] Memory: Total=15478 MiB, Allocated=8000 MiB, 2M: 7739 pages None pages pending, 1G: 15 pages None pages pending api.controllers .v1.host [-] Memory: Total=15745 MiB, Allocated=2000 MiB, 2M: 7872 pages None pages pending, 1G: 15 pages None pages pending api.controllers .v1.host [-] host(compute-1) node(5): vm_mem_ mib=6454, vm_mem_ mib_possible (from agent) = 15478 api.controllers .v1.host [-] Updating mem values of host(compute-1) node(5): {'vm_hugepages_ nr_4K': 165376, 'vm_hugepages_ nr_2M': 2904, 'vswitch_ hugepages_ nr': 1} api.controllers .v1.host [-] host(compute-1) node(6): vm_mem_ mib=12721, vm_mem_ mib_possible (from agent) = 15744 api.controllers .v1.host [-] Updating mem values of host(compute-1) node(6): {'vm_hugepages_ nr_4K': 325888, 'vm_hugepages_ nr_2M': 5724, 'vswitch_ hugepages_ nr': 1}
2019-07-24 07:12:50.136 104744 INFO sysinv.
2019-07-24 07:12:50.559 104744 INFO sysinv.
2019-07-24 07:12:50.593 104744 INFO sysinv.
2019-07-24 07:12:50.615 104744 INFO sysinv.
2019-07-24 07:12:50.615 104744 INFO sysinv.
2019-07-24 07:12:50.740 104744 INFO sysinv.
2019-07-24 07:12:50.741 104744 INFO sysinv.
# second unlock api.controllers .v1.host [-] compute-1 ihost check_unlock_worker api.controllers .v1.host [-] Memory: Total=14584 MiB, Allocated=14832 MiB, 2M: 2780 pages None pages pending, 1G: 5 pages None pages pending api.controllers .v1.host [-] host unlock check didn't pass, so set the ihost_action back to None and re-raise the exception
2019-07-24 08:49:38.825 104743 INFO sysinv.
2019-07-24 08:49:39.153 104743 INFO sysinv.
2019-07-24 08:49:39.154 104743 INFO sysinv.
2019-07-24 08:49:39.166 104743 WARNING wsme.api [-] Client-side error: Rejected: Total allocated memory exceeds the total memory of compute-1 numa node 0