nova-live-migration job fails during evacuate negative test

Bug #1903979 reported by Balazs Gibizer
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Lee Yarwood

Bug Description

2020-11-12 08:55:30.829939 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_negative_evacuate.sh:evacuate_and_wait_for_error:29 : echo 'Timed out waiting for server evacuate-test to go to ERROR status'

2020-11-12 08:55:30.829994 | controller | Timed out waiting for server evacuate-test to go to ERROR status

2020-11-12 08:55:30.831833 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_negative_evacuate.sh:evacuate_and_wait_for_error:30 : exit 4

Example run:
https://zuul.opendev.org/t/openstack/build/ca4ccb93df034e4fa32da7bb1a9fc336/log/job-output.txt

Logstash signature:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20server%20evacuate-test%20to%20go%20to%20ERROR%20status%5C%22%20AND%20tags%3Aconsole

Changed in nova:
status: New → Confirmed
importance: Undecided → High
tags: added: evacuate gate-failure testing
Revision history for this message
Lee Yarwood (lyarwood) wrote :

Looks like libvirtd restarted allowing the negative evacuation test to result in an ACTIVE instance on the controller. The current task stops the following service and associated sockets:

roles/run-evacuate-hook/tasks/main.yaml

   34 - name: Stop libvirtd on "{{ inventory_hostname }}"
   35 become: true
   36 systemd:
   37 ¦ name: "{{ item }}"
   38 ¦ state: stopped
   39 with_items:
   40 ¦ - libvirtd.service
   41 ¦ - libvirtd.socket
   42 ¦ - libvirtd-admin.socket
   43 ¦ - libvirtd-ro.socket

I guess I'm missing something here?

Lee Yarwood (lyarwood)
Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/762623

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Andrey Volkov (avolkov) wrote :
Revision history for this message
Andrey Volkov (avolkov) wrote :
Download full text (3.8 KiB)

What we have here:

1. booting a VM on SUBNODE_HOSTNAME and disabling compute service on SUBNODE_HOSTNAME
openstack --os-compute-api-version 2.74 server create --image ec033028-30f6-497d-936a-4bea98cdafea --flavor 1 --nic net-id=209ff518-ff9f-46bb-ac0a-a20e76b35daa --host ubuntu-focal-rax-iad-0021734807 --wait evacuate-test

...

2. Evacuating that VM to CONTROLLER_HOSTNAME
nova --os-compute-api-version 2.67 evacuate --force evacuate-test ubuntu-focal-rax-iad-0021734805
where it actually becomes "active" instead of going to "error".
In case of success, it should fail on controlller with error "nova.exception.HypervisorUnavailable: Connection to the hypervisor is broken on host".

But indeed libvirtd was started again:

2020-11-12 08:44:00.128+0000: 102594: info : libvirt version: 6.0.0, package: 0ubuntu8.5 (Christian Ehrhardt <email address hidden> Thu, 08 Oct 2020 07:36:06 +0200)
2020-11-12 08:44:00.128+0000: 102594: info : hostname: ubuntu-focal-rax-iad-0021734805
2020-11-12 08:44:00.128+0000: 102594: debug : virNetlinkStartup:124 : Running global netlink initialization

Just before 'Run negative evacuate tests':

2020-11-12 08:44:02.021505 | TASK [run-evacuate-hook : Run negative evacuate tests]

Ansible report:

2020-11-12 08:43:59.147000 | LOOP [run-evacuate-hook : Stop libvirtd on "controller"]
2020-11-12 08:44:00.097319 | controller | changed: "libvirtd.service"
2020-11-12 08:44:00.099003 | controller | changed: All items complete
2020-11-12 08:44:00.099120 |
2020-11-12 08:44:00.714245 | controller | changed: "libvirtd.socket"
2020-11-12 08:44:01.298388 | controller | ok: "libvirtd-admin.socket"
2020-11-12 08:44:01.872071 | controller | ok: "libvirtd-ro.socket"

Syslog report:

Nov 12 08:43:59 ubuntu-focal-rax-iad-0021734805 sudo[102576]: zuul : TTY=unknown ; PWD=/home/zuul ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-ikljghyeytpdnrvynsdgogdmzisxdquq ; /usr/bin/python3
Nov 12 08:43:59 ubuntu-focal-rax-iad-0021734805 sudo[102576]: pam_unix(sudo:session): session opened for user root by (uid=0)
Nov 12 08:43:59 ubuntu-focal-rax-iad-0021734805 sudo[102576]: pam_unix(sudo:session): session closed for user root
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102596]: zuul : TTY=unknown ; PWD=/home/zuul ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-qdjdltsjeqgfxczzgvknhrqebzarfpca ; /usr/bin/python3
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102596]: pam_unix(sudo:session): session opened for user root by (uid=0)
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102596]: pam_unix(sudo:session): session closed for user root
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102682]: zuul : TTY=unknown ; PWD=/home/zuul ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-bxmxlpjmzyjadkbullovdfrbmiqvbibg ; /usr/bin/python3
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102682]: pam_unix(sudo:session): session opened for user root by (uid=0)
Nov 12 08:44:01 ubuntu-focal-rax-iad-0021734805 kernel: iptables dropped: IN=eth0 OUT= MAC=bc:76:4e:20:17:0d:e4:c7:22:60:5a:c1:08:00 SRC=45.146.165.152 DST=104.130.239.160 LEN=60 TOS=0x00 PREC=0x00 TTL=48 ID=32541 DF PROTO=TCP SPT=38250 DPT=3389...

Read more...

Revision history for this message
Andrey Volkov (avolkov) wrote :

libvirtd.service has Restart=on-failure. This can be a reason libvirtd started immediately after stop.

Lee Yarwood (lyarwood)
Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.0.0rc1

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers