1. Create an instance on compute-0(with any flavor extra specs and image properties)
2. reboot compute-0
3. wait the evacuate success
4. wait the compute-0's services up
5. live migrate instance to compute-0
At the beginning our test show that live migrate works well. Then we found that our test step in the fourth part is different from the reporter's step. We use `openstack compute service list` to ensure that compute-0's service is available.
controller-0:~$ openstack compute service list
+----+------------------+-----------------------------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+------------------+-----------------------------------+----------+---------+-------+----------------------------+
| 29 | nova-compute | compute-1 | nova | enabled | up | 2019-07-31T13:55:37.000000 |
| 32 | nova-compute | compute-0 | nova | enabled | up | 2019-07-31T13:55:37.000000 |
| 50 | nova-consoleauth | nova-consoleauth-748bffc767-vslb5 | internal | enabled | up | 2019-07-31T13:55:38.000000 |
| 52 | nova-conductor | nova-conductor-5977dbb7c5-jjfwm | internal | enabled | up | 2019-07-31T13:55:35.000000 |
| 54 | nova-scheduler | nova-scheduler-6f78459858-gbtnb | internal | enabled | up | 2019-07-31T13:55:36.000000 |
| 61 | nova-consoleauth | nova-consoleauth-748bffc767-s5lsp | internal | enabled | up | 2019-07-31T13:55:32.000000 |
| 62 | nova-scheduler | nova-scheduler-6f78459858-46shx | internal | enabled | up | 2019-07-31T13:55:39.000000 |
| 63 | nova-conductor | nova-conductor-5977dbb7c5-l5f6x | internal | enabled | up | 2019-07-31T13:55:39.000000 |
+----+------------------+-----------------------------------+----------+---------+-------+----------------------------+
When the both show that compute-0 is ready, we start live-migrate and it works well.
@anujeyan is using starlingx's command to ensure that compute-0's service is available. Therefore, we try to use this method to take a test. Use `watch -n5 "system host-list"` to monitor compute-0's status.
When start reboot compute-0, it shows compute-0 is offline, after waiting for a while, it shows compute-0 is available. At this moment, we use `openstack compute service list` to check compute-0's status, the output show that compute-0's status is 'enabled' and state is still 'down'. We also use `kubectl get pod -n namespace`, and the result show that compute-0's compute service is in init.
Try to live-migrate instance to compute-0, it'll be failed and the log show scheduler failed with "NoValidHost: No valid host was found", which is same as @anujeyan uploaded.
@yong, can you re-test it and ensure nova compute service is available before live-migrate?
If it still not work, please paste nova-scheduler debug log here.
@yhu6, we have retest these bug, the steps are:
1. Create an instance on compute-0(with any flavor extra specs and image properties)
2. reboot compute-0
3. wait the evacuate success
4. wait the compute-0's services up
5. live migrate instance to compute-0
At the beginning our test show that live migrate works well. Then we found that our test step in the fourth part is different from the reporter's step. We use `openstack compute service list` to ensure that compute-0's service is available.
controller-0:~$ openstack compute service list ------- ------- ---+--- ------- ------- ------- ------- ----+-- ------- -+----- ----+-- -----+- ------- ------- ------- ------+ ------- ------- ---+--- ------- ------- ------- ------- ----+-- ------- -+----- ----+-- -----+- ------- ------- ------- ------+ 31T13:55: 37.000000 | 31T13:55: 37.000000 | h-748bffc767- vslb5 | internal | enabled | up | 2019-07- 31T13:55: 38.000000 | 5977dbb7c5- jjfwm | internal | enabled | up | 2019-07- 31T13:55: 35.000000 | 6f78459858- gbtnb | internal | enabled | up | 2019-07- 31T13:55: 36.000000 | h-748bffc767- s5lsp | internal | enabled | up | 2019-07- 31T13:55: 32.000000 | 6f78459858- 46shx | internal | enabled | up | 2019-07- 31T13:55: 39.000000 | 5977dbb7c5- l5f6x | internal | enabled | up | 2019-07- 31T13:55: 39.000000 | ------- ------- ---+--- ------- ------- ------- ------- ----+-- ------- -+----- ----+-- -----+- ------- ------- ------- ------+
+----+-
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+-
| 29 | nova-compute | compute-1 | nova | enabled | up | 2019-07-
| 32 | nova-compute | compute-0 | nova | enabled | up | 2019-07-
| 50 | nova-consoleauth | nova-consoleaut
| 52 | nova-conductor | nova-conductor-
| 54 | nova-scheduler | nova-scheduler-
| 61 | nova-consoleauth | nova-consoleaut
| 62 | nova-scheduler | nova-scheduler-
| 63 | nova-conductor | nova-conductor-
+----+-
We also check the k8s pod's status.
controller- 0:/home/ sysadmin# kubectl get pod -n openstack api-64955798b7- 9c9mh 1/1 Running 0 4h47m api-64955798b7- gbjlx 1/1 Running 1 10h backup- 77fb945bb- 6qpz5 1/1 Running 0 10h backup- 77fb945bb- mxg78 1/1 Running 0 4h47m scheduler- 974d7bc9b- nbkxl 1/1 Running 0 10h scheduler- 974d7bc9b- w259r 1/1 Running 0 4h47m volume- 56fd7994cc- vsb2h 1/1 Running 0 10h volume- 56fd7994cc- xwr9w 1/1 Running 0 4h47m volume- usage-audit- 1564580700- qfqj7 0/1 Completed 0 12m volume- usage-audit- 1564581000- lp9vj 0/1 Completed 0 7m33s volume- usage-audit- 1564581300- lmmk4 0/1 Completed 0 2m30s api-7574db7ff9- pfg9s 1/1 Running 1 10h api-7574db7ff9- q8z7m 1/1 Running 1 4h47m 7c5c769c99- 9sgzw 1/1 Running 0 4h47m 7c5c769c99- lhrv7 1/1 Running 1 10h 7f76797c9f- 8nptm 1/1 Running 1 10h 7f76797c9f- v8xqn 1/1 Running 0 4h47m 6b94b76595- 7j28d 1/1 Running 0 10h 6b94b76595- dz8vx 1/1 Running 0 4h47m cleaner- 1564580700- 4d8wm 0/1 Completed 0 12m cleaner- 1564581000- br2zd 0/1 Completed 0 7m33s cleaner- 1564581300- tzmmj 0/1 Completed 0 2m30s 6fcbcdcbfb- 8cpw8 1/1 Running 0 4h41m bdfbc4ccc- 2wlnk 1/1 Running 0 4h47m bdfbc4ccc- w9kxv 1/1 Running 0 10h error-pages- 7b789b5df8- hrtr9 1/1 Running 0 4h47m error-pages- 7b789b5df8- qbtfb 1/1 Running 0 10h api-6fffbb6f7c- hgmlk 1/1 Running 5 10h api-6fffbb6f7c- qcd62 1/1 Running 0 4h47m fernet- rotate- 1564574400- 6xfpv 0/1 Completed 0 117m libvirt- default- k9qg2 1/1 Running 2 6d7h libvirt- default- v5pgf 1/1 Running 2 6d7h ingress- 6ff964556d- 5btgl 1/1 Running 0 10h ingress- 6ff964556d- bjmwh 1/1 Running 0 4h47m ingress- error-pages- 764cfd869b- bdv94 1/1 Running 0 8h dhcp-agent- compute- 0-75ea0372- vv4xq 1/1 Running 2 6d7h dhcp-agent- compute- 1-271fedba- 74znj 1/1 Running 2 6d7h l3-agent- compute- 0-75ea0372- 5hlh6 1/1 Running 2 6d7h l3-agent- compute- 1-eae26dba- h8v4q 1/1 Running 2 6d7h metadata- agent-compute- 0-75ea0372- dlwq2 1/1 Running 2 6d7h metadata- agent-compute- 1-eae26dba- 2wfg2 1/1 Running 2 6d7h ovs-agent- compute- 0-75ea0372- b9kqh 1/1 Running 2 6d7h ovs-agent- compute- 1-eae26dba- p58rm 1/1 Running 24 6d7h server- df4d6757b- wnq2l 1/1 Running 1 10h server- df4d6757b- xjlrt 1/1 Running 0 4h47m sriov-agent- compute- 0-75ea0372- rwcp7 1/1 Running 2 6d7h sriov-agent- compute- 1-eae26dba- f76mk 1/1 Running 2 6d7h metadata- 7c64c8458f- 467r6 1/1 Running 1 10h metadata- 7c64c8458f- 5q645 1/1 Running 0 4h47m osapi-56798b4b6 8-62bf9 1/1 Running 0 10h osapi-56798b4b6 8-8ljzh 1/1 Running 1 4h47m proxy-598684846 d-7jtsv 1/1 Running 0 8h compute- 0-75ea0372- kpzgj 2/2 Running 4 6d7h compute- 1-eae26dba- w4f9m 2/2 Running 26 6d7h 5977dbb7c5- jjfwm 1/1 Running 0 10h 5977dbb7c5- l5f6x 1/1 Running 0 4h47m h-748bffc767- s5lsp 1/1 Running 0 4h47m h-748bffc767- vslb5 1/1 Running 0 10h -ff6456d77- jksz7 1/1 Running 0 4h47m -ff6456d77- qthmd 1/1 Running 1 10h 6f78459858- 46shx 1/1 Running 0 4h47m 6f78459858- gbtnb 1/1 Running 0 10h db-6nctl 1/1 Running 2 6d7h db-jghgk 1/1 Running 2 6d7h vswitchd- 4vfp2 1/1 Running 2 6d7h vswitchd- qv9bz 1/1 Running 2 6d7h garbd-garbd- 8d64b6886- 498qt 1/1 Running 0 3h54m memcached- memcached- 6c94979765- rtsmc 1/1 Running 0 8h rabbitmq- rabbitmq- 0 1/1 Running 0 4h41m rabbitmq- rabbitmq- 1 1/1 Running 1 10h api-5798c855bc- 8hq8b 1/1 Running 0 10h api-5798c855bc- l62f6 1/1 Running 0 4h47m
NAME READY STATUS RESTARTS AGE
cinder-
cinder-
cinder-
cinder-
cinder-
cinder-
cinder-
cinder-
cinder-
cinder-
cinder-
glance-
glance-
heat-api-
heat-api-
heat-cfn-
heat-cfn-
heat-engine-
heat-engine-
heat-engine-
heat-engine-
heat-engine-
horizon-
ingress-
ingress-
ingress-
ingress-
keystone-
keystone-
keystone-
libvirt-
libvirt-
mariadb-
mariadb-
mariadb-
mariadb-server-0 1/1 Running 0 4h40m
mariadb-server-1 1/1 Running 0 10h
neutron-
neutron-
neutron-
neutron-
neutron-
neutron-
neutron-
neutron-
neutron-
neutron-
neutron-
neutron-
nova-api-
nova-api-
nova-api-
nova-api-
nova-api-
nova-compute-
nova-compute-
nova-conductor-
nova-conductor-
nova-consoleaut
nova-consoleaut
nova-novncproxy
nova-novncproxy
nova-scheduler-
nova-scheduler-
openvswitch-
openvswitch-
openvswitch-
openvswitch-
osh-openstack-
osh-openstack-
osh-openstack-
osh-openstack-
placement-
placement-
When the both show that compute-0 is ready, we start live-migrate and it works well.
@anujeyan is using starlingx's command to ensure that compute-0's service is available. Therefore, we try to use this method to take a test. Use `watch -n5 "system host-list"` to monitor compute-0's status.
When start reboot compute-0, it shows compute-0 is offline, after waiting for a while, it shows compute-0 is available. At this moment, we use `openstack compute service list` to check compute-0's status, the output show that compute-0's status is 'enabled' and state is still 'down'. We also use `kubectl get pod -n namespace`, and the result show that compute-0's compute service is in init.
Try to live-migrate instance to compute-0, it'll be failed and the log show scheduler failed with "NoValidHost: No valid host was found", which is same as @anujeyan uploaded.
@yong, can you re-test it and ensure nova compute service is available before live-migrate?
If it still not work, please paste nova-scheduler debug log here.