Comment 8 for bug 1836402

Revision history for this message
ya.wang (ya.wang) wrote :

@yhu6, we have retest these bug, the steps are:

1. Create an instance on compute-0(with any flavor extra specs and image properties)
2. reboot compute-0
3. wait the evacuate success
4. wait the compute-0's services up
5. live migrate instance to compute-0

At the beginning our test show that live migrate works well. Then we found that our test step in the fourth part is different from the reporter's step. We use `openstack compute service list` to ensure that compute-0's service is available.

controller-0:~$ openstack compute service list
+----+------------------+-----------------------------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+------------------+-----------------------------------+----------+---------+-------+----------------------------+
| 29 | nova-compute | compute-1 | nova | enabled | up | 2019-07-31T13:55:37.000000 |
| 32 | nova-compute | compute-0 | nova | enabled | up | 2019-07-31T13:55:37.000000 |
| 50 | nova-consoleauth | nova-consoleauth-748bffc767-vslb5 | internal | enabled | up | 2019-07-31T13:55:38.000000 |
| 52 | nova-conductor | nova-conductor-5977dbb7c5-jjfwm | internal | enabled | up | 2019-07-31T13:55:35.000000 |
| 54 | nova-scheduler | nova-scheduler-6f78459858-gbtnb | internal | enabled | up | 2019-07-31T13:55:36.000000 |
| 61 | nova-consoleauth | nova-consoleauth-748bffc767-s5lsp | internal | enabled | up | 2019-07-31T13:55:32.000000 |
| 62 | nova-scheduler | nova-scheduler-6f78459858-46shx | internal | enabled | up | 2019-07-31T13:55:39.000000 |
| 63 | nova-conductor | nova-conductor-5977dbb7c5-l5f6x | internal | enabled | up | 2019-07-31T13:55:39.000000 |
+----+------------------+-----------------------------------+----------+---------+-------+----------------------------+

We also check the k8s pod's status.

controller-0:/home/sysadmin# kubectl get pod -n openstack
NAME READY STATUS RESTARTS AGE
cinder-api-64955798b7-9c9mh 1/1 Running 0 4h47m
cinder-api-64955798b7-gbjlx 1/1 Running 1 10h
cinder-backup-77fb945bb-6qpz5 1/1 Running 0 10h
cinder-backup-77fb945bb-mxg78 1/1 Running 0 4h47m
cinder-scheduler-974d7bc9b-nbkxl 1/1 Running 0 10h
cinder-scheduler-974d7bc9b-w259r 1/1 Running 0 4h47m
cinder-volume-56fd7994cc-vsb2h 1/1 Running 0 10h
cinder-volume-56fd7994cc-xwr9w 1/1 Running 0 4h47m
cinder-volume-usage-audit-1564580700-qfqj7 0/1 Completed 0 12m
cinder-volume-usage-audit-1564581000-lp9vj 0/1 Completed 0 7m33s
cinder-volume-usage-audit-1564581300-lmmk4 0/1 Completed 0 2m30s
glance-api-7574db7ff9-pfg9s 1/1 Running 1 10h
glance-api-7574db7ff9-q8z7m 1/1 Running 1 4h47m
heat-api-7c5c769c99-9sgzw 1/1 Running 0 4h47m
heat-api-7c5c769c99-lhrv7 1/1 Running 1 10h
heat-cfn-7f76797c9f-8nptm 1/1 Running 1 10h
heat-cfn-7f76797c9f-v8xqn 1/1 Running 0 4h47m
heat-engine-6b94b76595-7j28d 1/1 Running 0 10h
heat-engine-6b94b76595-dz8vx 1/1 Running 0 4h47m
heat-engine-cleaner-1564580700-4d8wm 0/1 Completed 0 12m
heat-engine-cleaner-1564581000-br2zd 0/1 Completed 0 7m33s
heat-engine-cleaner-1564581300-tzmmj 0/1 Completed 0 2m30s
horizon-6fcbcdcbfb-8cpw8 1/1 Running 0 4h41m
ingress-bdfbc4ccc-2wlnk 1/1 Running 0 4h47m
ingress-bdfbc4ccc-w9kxv 1/1 Running 0 10h
ingress-error-pages-7b789b5df8-hrtr9 1/1 Running 0 4h47m
ingress-error-pages-7b789b5df8-qbtfb 1/1 Running 0 10h
keystone-api-6fffbb6f7c-hgmlk 1/1 Running 5 10h
keystone-api-6fffbb6f7c-qcd62 1/1 Running 0 4h47m
keystone-fernet-rotate-1564574400-6xfpv 0/1 Completed 0 117m
libvirt-libvirt-default-k9qg2 1/1 Running 2 6d7h
libvirt-libvirt-default-v5pgf 1/1 Running 2 6d7h
mariadb-ingress-6ff964556d-5btgl 1/1 Running 0 10h
mariadb-ingress-6ff964556d-bjmwh 1/1 Running 0 4h47m
mariadb-ingress-error-pages-764cfd869b-bdv94 1/1 Running 0 8h
mariadb-server-0 1/1 Running 0 4h40m
mariadb-server-1 1/1 Running 0 10h
neutron-dhcp-agent-compute-0-75ea0372-vv4xq 1/1 Running 2 6d7h
neutron-dhcp-agent-compute-1-271fedba-74znj 1/1 Running 2 6d7h
neutron-l3-agent-compute-0-75ea0372-5hlh6 1/1 Running 2 6d7h
neutron-l3-agent-compute-1-eae26dba-h8v4q 1/1 Running 2 6d7h
neutron-metadata-agent-compute-0-75ea0372-dlwq2 1/1 Running 2 6d7h
neutron-metadata-agent-compute-1-eae26dba-2wfg2 1/1 Running 2 6d7h
neutron-ovs-agent-compute-0-75ea0372-b9kqh 1/1 Running 2 6d7h
neutron-ovs-agent-compute-1-eae26dba-p58rm 1/1 Running 24 6d7h
neutron-server-df4d6757b-wnq2l 1/1 Running 1 10h
neutron-server-df4d6757b-xjlrt 1/1 Running 0 4h47m
neutron-sriov-agent-compute-0-75ea0372-rwcp7 1/1 Running 2 6d7h
neutron-sriov-agent-compute-1-eae26dba-f76mk 1/1 Running 2 6d7h
nova-api-metadata-7c64c8458f-467r6 1/1 Running 1 10h
nova-api-metadata-7c64c8458f-5q645 1/1 Running 0 4h47m
nova-api-osapi-56798b4b68-62bf9 1/1 Running 0 10h
nova-api-osapi-56798b4b68-8ljzh 1/1 Running 1 4h47m
nova-api-proxy-598684846d-7jtsv 1/1 Running 0 8h
nova-compute-compute-0-75ea0372-kpzgj 2/2 Running 4 6d7h
nova-compute-compute-1-eae26dba-w4f9m 2/2 Running 26 6d7h
nova-conductor-5977dbb7c5-jjfwm 1/1 Running 0 10h
nova-conductor-5977dbb7c5-l5f6x 1/1 Running 0 4h47m
nova-consoleauth-748bffc767-s5lsp 1/1 Running 0 4h47m
nova-consoleauth-748bffc767-vslb5 1/1 Running 0 10h
nova-novncproxy-ff6456d77-jksz7 1/1 Running 0 4h47m
nova-novncproxy-ff6456d77-qthmd 1/1 Running 1 10h
nova-scheduler-6f78459858-46shx 1/1 Running 0 4h47m
nova-scheduler-6f78459858-gbtnb 1/1 Running 0 10h
openvswitch-db-6nctl 1/1 Running 2 6d7h
openvswitch-db-jghgk 1/1 Running 2 6d7h
openvswitch-vswitchd-4vfp2 1/1 Running 2 6d7h
openvswitch-vswitchd-qv9bz 1/1 Running 2 6d7h
osh-openstack-garbd-garbd-8d64b6886-498qt 1/1 Running 0 3h54m
osh-openstack-memcached-memcached-6c94979765-rtsmc 1/1 Running 0 8h
osh-openstack-rabbitmq-rabbitmq-0 1/1 Running 0 4h41m
osh-openstack-rabbitmq-rabbitmq-1 1/1 Running 1 10h
placement-api-5798c855bc-8hq8b 1/1 Running 0 10h
placement-api-5798c855bc-l62f6 1/1 Running 0 4h47m

When the both show that compute-0 is ready, we start live-migrate and it works well.

@anujeyan is using starlingx's command to ensure that compute-0's service is available. Therefore, we try to use this method to take a test. Use `watch -n5 "system host-list"` to monitor compute-0's status.
When start reboot compute-0, it shows compute-0 is offline, after waiting for a while, it shows compute-0 is available. At this moment, we use `openstack compute service list` to check compute-0's status, the output show that compute-0's status is 'enabled' and state is still 'down'. We also use `kubectl get pod -n namespace`, and the result show that compute-0's compute service is in init.
Try to live-migrate instance to compute-0, it'll be failed and the log show scheduler failed with "NoValidHost: No valid host was found", which is same as @anujeyan uploaded.

@yong, can you re-test it and ensure nova compute service is available before live-migrate?
If it still not work, please paste nova-scheduler debug log here.