Instance enabled on host but can not ping vm from natbox

Bug #1841626 reported by Wendy Mitchell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Triaged
High
YaoLe

Bug Description

Brief Description
----------------
Instance schedules and appears to be active on compute but can not ping VM from natbox

Severity
--------
Major

Steps to Reproduce
------------------
1. clean install

2. Hypervisors confirmed to be unlocked and enabled con compute-3 and compute-1 at ~14:33:09

2019-08-25T14:33:09.000 controller-0 fmManager: info { "event_log_id" : "275.001", "reason_text" : "Host compute-3 hypervisor is now unlocked-enabled", "entity_instance_id" : "host=compute-3.hypervisor=c933301a-bf29-4e60-a198-c36e9ba0dc5e", "severity" : "critical", "state" : "msg", "timestamp" : "
2019-08-25 14:33:09.729915" }
2019-08-25T14:33:09.000 controller-0 fmManager: info { "event_log_id" : "275.001", "reason_text" : "Host compute-0 hypervisor is now unlocked-enabled", "entity_instance_id" : "host=compute-0.hypervisor=9266efa5-0cd9-4c09-a0da-c91b120ad41c", "severity" : "critical", "state" : "msg", "timestamp" : "

The application apply completed
platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed
stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applied | completed

fm-event.log
2019-08-25 14:33:09.772201" }
2019-08-25T14:34:52.000 controller-0 fmManager: info { "event_log_id" : "750.004", "reason_text" : "Application Apply In Progress", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-wildcat-113-121.k8s_application=stx-openstack", "severity" : "warning", "state" : "clear", "timestamp" : "2019-08-25 14:34:52.197055" }

2019-08-25T14:43:33.000 controller-0 fmManager: info { "event_log_id" : "700.108", "reason_text" : "Instance tenant1-vol_root-1 owned by tenant1 has been created", "entity_instance_id" : "tenant=8861432c-b06c-4b6e-be32-75da72a2e944.instance=c52f094e-2c75-4660-a4a4-14c99b1fcb04", "severity" : "critical", "state" : "msg", "timestamp" : "2019-08-25 14:43:33.720109" }

2019-08-25T14:43:46.000 controller-0 fmManager: info { "event_log_id" : "700.101", "reason_text" : "Instance tenant1-vol_root-1 is enabled on host compute-3", "entity_instance_id" : "tenant=8861432c-b06c-4b6e-be32-75da72a2e944.instance=c52f094e-2c75-4660-a4a4-14c99b1fcb04", "severity" : "critical", "state" : "msg", "timestamp" : "2019-08-25 14:43:45.861492" }

All hosts are enabled and available
[2019-08-25 14:42:23,675]
 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | storage-0 | storage | unlocked | enabled | available |
| 3 | storage-1 | storage | unlocked | enabled | available |
| 4 | compute-0 | worker | unlocked | enabled | available |
| 5 | compute-1 | worker | unlocked | enabled | available |
| 6 | compute-2 | worker | unlocked | enabled | available |
| 7 | compute-3 | worker | unlocked | enabled | available |
| 8 | compute-4 | worker | unlocked | enabled | available |
| 9 | controller-1 | controller | unlocked | enabled | available

compute-0 and compute-3 are remote storage hosts
| availability_zone | None
|
| created_at | 2019-08-25T14:40:33.000000
|
| deleted | False
|
| deleted_at | None
|
| hosts | [u'compute-0', u'compute-3']
|
| id | 8
|
| name | remote_storage_hosts
|
| properties | stx_storage='remote'
|
| updated_at | None

3. Flavor 1a04f50f-e589-4db7-8203-df29a81b0713
created with --property aggregate_instance_extra_specs:stx_storage=remote --property hw:mem_page_size=large

| disk | 2
|
| id | 1a04f50f-e589-4db7-8203-df29a81b0713
|
| name | flv_rootdisk
|
| os-flavor-access:is_public | True
|
| properties |
|
| ram | 1024

4. volume created
[2019-08-25 14:43:25,878] 301 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'tenant1' --os-password 'Li69nux*' --os-project-name tenant1 --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne volume create --image=ec34c5ef-8270-44f5-9dab-83d5704272f6 --size=2 --bootable vol-tenant1-vol_root-1-1'

5. instance booted and lands on compute-3 (host with remote label)
[2019-08-25 14:43:29,724] 301 DEBUG MainThread ssh.send :: Send 'nova --os-username 'tenant1' --os-password 'Li69nux*' --os-project-name tenant1 --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne boot --flavor=1a04f50f-e589-4db7-8203-df29a81b0713 --poll --boot-volume=03cb9c01-4657-4b07-84f4-73fd13109a2a --availability-zone=nova:compute-3 --key-name=keypair-tenant1 --nic net-id=4c530bc7-06db-46de-bb63-75b8004676c9 --nic net-id=02060741-2e80-4b6e-a740-c514ecef8722 tenant1-vol_root-1'

'sudo virsh domblklist instance-000001f1'
| c52f094e-2c75-4660-a4a4-14c99b1fcb04 | tenant1-vol_root-1 | ACTIVE | tenant1-net0=172.16.0.244; tenant2-mgmt-net=192.168.187.13 | | flv_rootdisk

6. attempt to ping to the tenant2-mgmt-net from the natbox
ping -c 3 192.168.187.13'

Expected Behavior
------------------
Ping should not have failed

Actual Behavior
----------------

Reproducibility
---------------
Fails (ping failures) in all nova remote test on this load
FAIL TestDefaultGuest::test_evacuate_vms_with_inst_backing[remote]
FAIL test_live_migrate_vm_positive[remote-0-0-None-2-volume-False]
FAIL test_live_migrate_vm_positive[remote-1-0-dedicated-1-volume-False]
FAIL test_live_migrate_vm_positive[remote-1-512-None-1-image-False]
FAIL test_live_migrate_vm_positive[remote-0-512-dedicated-2-image_with_vol-False]
FAIL test_live_migrate_vm_negative[remote-0-0-volume-True-LiveMigErr.BLOCK_MIG_UNSUPPORTED]
FAIL test_live_migrate_vm_negative[remote-1-0-volume-True-LiveMigErr.BLOCK_MIG_UNSUPPORTED]
FAIL test_live_migrate_vm_negative[remote-0-512-volume-True-LiveMigErr.BLOCK_MIG_UNSUPPORTED]
FAIL test_live_migrate_vm_negative[remote-0-512-image-True-LiveMigErr.BLOCK_MIG_UNSUPPORTED]
FAIL test_live_migrate_vm_negative[remote-0-0-image_with_vol-True-LiveMigErr.BLOCK_MIG_UNSUPPORTED]
FAIL test_cold_migrate_vm[remote-0-0-None-2-volume-confirm]
FAIL test_cold_migrate_vm[remote-1-0-None-1-volume-confirm]
FAIL test_cold_migrate_vm[remote-1-512-None-1-image-confirm]
FAIL test_cold_migrate_vm[remote-0-0-None-2-image_with_vol-confirm]
FAIL test_cold_migrate_vm[remote-0-0-None-2-volume-revert]
FAIL test_cold_migrate_vm[remote-1-512-None-1-volume-revert]
FAIL test_cold_migrate_vm[remote-0-0-None-1-image-revert]
FAIL test_cold_migrate_vm[remote-1-0-None-2-image_with_vol-revert]
FAIL test_migrate_vm_various_guest[ubuntu_14-1-1024-shared-volume]
FAIL test_migrate_vm_various_guest[ubuntu_14-2-1024-dedicated-image]
FAIL TestResizeSameHost::test_resize_vm_positive[remote-4_0_0-5_1_512-image]
FAIL TestResizeSameHost::test_resize_vm_positive[remote-4_1_512-5_2_1024-image]
FAIL TestResizeSameHost::test_resize_vm_positive[remote-4_1_512-4_1_0-image]
FAIL TestResizeSameHost::test_resize_vm_positive[remote-4_0_0-1_1_512-volume]
FAIL TestResizeSameHost::test_resize_vm_positive[remote-4_1_512-8_2_1024-volume]
FAIL TestResizeSameHost::test_resize_vm_positive[remote-4_1_512-0_1_0-volume]
FAIL TestResizeSameHost::test_resize_vm_negative[remote-5_0_0-0_0_0-image]
FAIL TestResizeSameHost::test_resize_vm_negative[remote-5_2_512-5_1_512-image]
FAIL TestResizeSameHost::test_resize_vm_negative[remote-1_1_512-1_0_512-volume]
FAIL TestResizeDiffHost::test_resize_different_comp_node[remote]

System Configuration
--------------------
2+5+2
h/w Lab: WCP_113_121

Branch/Pull Time/Commit
-----------------------
2019-08-23_20-59-00

Timestamp/Logs

see attached console.log

nfv-vim-events.log

log-id = 976
event-id = hypervisor-state-change
event-type = state-event
event-context = admin
importance = high
entity = host=compute-0.hypervisor=9266efa5-0cd9-4c09-a0da-c91b120ad41c
reason_text = Host compute-0 hypervisor is now locked-enabled
additional_text =
timestamp = 2019-08-25 14:21:51.968209
====================================================================================
====================================================================================
log-id = 977
event-id = hypervisor-state-change
event-type = state-event
event-context = admin
importance = high
entity = host=compute-0.hypervisor=9266efa5-0cd9-4c09-a0da-c91b120ad41c
reason_text = Host compute-0 hypervisor is now locked-disabled
additional_text =
timestamp = 2019-08-25 14:21:52.481091
====================================================================================
====================================================================================
log-id = 978
event-id = hypervisor-state-change
event-type = state-event
event-context = admin
importance = high
entity = host=compute-3.hypervisor=c933301a-bf29-4e60-a198-c36e9ba0dc5e
reason_text = Host compute-3 hypervisor is now locked-enabled
additional_text =
timestamp = 2019-08-25 14:22:15.938086
====================================================================================
====================================================================================
log-id = 979
event-id = hypervisor-state-change
event-type = state-event
event-context = admin
importance = high
entity = host=compute-3.hypervisor=c933301a-bf29-4e60-a198-c36e9ba0dc5e
reason_text = Host compute-3 hypervisor is now locked-disabled
additional_text =
timestamp = 2019-08-25 14:22:16.508601
====================================================================================
====================================================================================
log-id = 980
event-id = hypervisor-state-change
event-type = state-event
event-context = admin
importance = high
entity = host=compute-3.hypervisor=c933301a-bf29-4e60-a198-c36e9ba0dc5e
reason_text = Host compute-3 hypervisor is now unlocked-disabled
additional_text =
timestamp = 2019-08-25 14:30:57.326260
====================================================================================
====================================================================================
log-id = 981
event-id = hypervisor-state-change
event-type = state-event
event-context = admin
importance = high
entity = host=compute-0.hypervisor=9266efa5-0cd9-4c09-a0da-c91b120ad41c
reason_text = Host compute-0 hypervisor is now unlocked-disabled
additional_text =
timestamp = 2019-08-25 14:30:57.328746
====================================================================================
====================================================================================
log-id = 982
event-id = hypervisor-state-change
event-type = state-event
event-context = admin
importance = high
entity = host=compute-3.hypervisor=c933301a-bf29-4e60-a198-c36e9ba0dc5e
reason_text = Host compute-3 hypervisor is now unlocked-enabled
additional_text =
timestamp = 2019-08-25 14:33:09.728937
====================================================================================
====================================================================================
log-id = 983
event-id = hypervisor-state-change
event-type = state-event
event-context = admin
importance = high
entity = host=compute-0.hypervisor=9266efa5-0cd9-4c09-a0da-c91b120ad41c
reason_text = Host compute-0 hypervisor is now unlocked-enabled
additional_text =
timestamp = 2019-08-25 14:33:09.731669
====================================================================================
====================================================================================
log-id = 984
event-id = instance-created
event-type = action-event
event-context = admin
importance = high
entity = tenant=8861432c-b06c-4b6e-be32-75da72a2e944.instance=c52f094e-2c75-4660-a4a4-14c99b1fcb04
reason_text = Instance tenant1-vol_root-1 owned by tenant1 has been created
additional_text =
timestamp = 2019-08-25 14:43:33.719109
====================================================================================
====================================================================================
log-id = 985
event-id = instance-enabled
event-type = action-event
event-context = admin
importance = high
entity = tenant=8861432c-b06c-4b6e-be32-75da72a2e944.instance=c52f094e-2c75-4660-a4a4-14c99b1fcb04
reason_text = Instance tenant1-vol_root-1 is enabled on host compute-3
additional_text =
timestamp = 2019-08-25 14:43:45.860829
====================================================================================
=====================================================

instance-000001f1.log (compute-3)
2019-08-25 14:43:41.399+0000: starting up libvirt version: 4.7.0, package: 1.tis.101 (Unknown, 2019-08-19-05:07:36, 4d0394103521), qemu version: 3.0.0qemu-kvm-ev-3.0.0-0.tis.100, kernel: 3.10.0-957.12.2.el7.3.tis.x86_64, hostname: compute-3
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOME=/root QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-000001f1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-instance-000001f1/master-key.aes -machine pc-i440fx-3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client,ss=on,vmx=on,hypervisor=on,tsc_adjust=on,pdpe1gb=on,topoext=on,mpx=off,xsavec=off,xgetbv1=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/huge-2048kB/libvirt/qemu/1-instance-000001f1,share=yes,size=1073741824,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid c52f094e-2c75-4660-a4a4-14c99b1fcb04 -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=19.0.2,serial=c52f094e-2c75-4660-a4a4-14c99b1fcb04,uuid=c52f094e-2c75-4660-a4a4-14c99b1fcb04,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=27,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -object secret,id=virtio-disk0-secret0,data=a81iLj+PpI1kzyeEkq01/97kNMZUB90sSNE5tHygfBk=,keyid=masterKey0,iv=nMkVDBVEKtNIrgZgVyZEOg==,format=base64 -drive 'file=rbd:cinder-volumes/03cb9c01-4657-4b07-84f4-73fd13109a2a:id=cinder:auth_supported=cephx\;none:mon_host=192.168.222.3\:6789\;192.168.222.4\:6789\;192.168.222.231\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on,serial=03cb9c01-4657-4b07-84f4-73fd13109a2a -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue53950d9-03,server -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:ff:97:95,bus=pci.0,addr=0x3 -chardev socket,id=charnet1,path=/var/run/openvswitch/vhu746a78be-31,server -netdev vhost-user,chardev=charnet1,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:4b:d5:06,bus=pci.0,addr=0x4 -chardev pty,id=charserial0,logfile=/var/lib/nova/instances/c52f094e-2c75-4660-a4a4-14c99b1fcb04/console.log,logappend=off -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
2019-08-25 14:43:41.399+0000: Domain id=1 is tainted: high-privileges
2019-08-25T14:43:41.637854Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue53950d9-03,server: info: QEMU waiting for connection on: disconnected:unix:/var/run/openvswitch/vhue53950d9-03,server
2019-08-25T14:43:42.629695Z qemu-kvm: -chardev socket,id=charnet1,path=/var/run/openvswitch/vhu746a78be-31,server: info: QEMU waiting for connection on: disconnected:unix:/var/run/openvswitch/vhu746a78be-31,server
2019-08-25T14:43:42.630216Z qemu-kvm: -chardev pty,id=charserial0,logfile=/var/lib/nova/instances/c52f094e-2c75-4660-a4a4-14c99b1fcb04/console.log,logappend=off: char device redirected to /dev/pts/0 (label charserial0)
2019-08-25T14:48:37.210073Z qemu-kvm: terminating on signal 15 from pid 45400 (/usr/sbin/libvirtd)
2019-08-25 14:48:37.418+0000: shutting down, reason=shutdown

see nova-compute.log compute-3

{"log":"2019-08-25 14:43:45.762 48388 INFO nova.compute.manager [req-35bccecb-42cd-45d8-82f4-b57a10790497 98bddb3440ec451097276bce4a529811 8861432cb06c4b6ebe3275da72a2e944 - default default] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] Took 9.12 seconds to spawn the instance on the hypervisor.\n","stream":"stdout","time":"2019-08-25T14:43:45.762576429Z"}
{"log":"2019-08-25 14:43:45.762 48388 INFO nova.compute.manager [req-35bccecb-42cd-45d8-82f4-b57a10790497 98bddb3440ec451097276bce4a529811 8861432cb06c4b6ebe3275da72a2e944 - default default] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] Took 9.12 seconds to spawn the instance on the hypervisor.\n","stream":"stdout","time":"2019-08-25T14:43:45.762842977Z"}
{"log":"2019-08-25 14:43:45.869 48388 INFO nova.compute.manager [req-35bccecb-42cd-45d8-82f4-b57a10790497 98bddb3440ec451097276bce4a529811 8861432cb06c4b6ebe3275da72a2e944 - default default] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] Took 12.12 seconds to build instance.\n","stream":"stdout","time":"2019-08-25T14:43:45.869641994Z"}
{"log":"2019-08-25 14:43:45.869 48388 INFO nova.compute.manager [req-35bccecb-42cd-45d8-82f4-b57a10790497 98bddb3440ec451097276bce4a529811 8861432cb06c4b6ebe3275da72a2e944 - default default] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] Took 12.12 seconds to build instance.\n","stream":"stdout","time":"2019-08-25T14:43:45.870048087Z"}
{"log":"2019-08-25 14:43:45.927 48388 INFO nova.compute.manager [req-19bf0eaa-664d-4c52-b83c-15c5916e335f - - - - -] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] During sync_power_state the instance has a pending task (spawning). Skip.\n","stream":"stdout","time":"2019-08-25T14:43:45.928122108Z"}
{"log":"2019-08-25 14:43:45.927 48388 INFO nova.compute.manager [req-19bf0eaa-664d-4c52-b83c-15c5916e335f - - - - -] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] During sync_power_state the instance has a pending task (spawning). Skip.\n","stream":"stdout","time":"2019-08-25T14:43:45.928420437Z"}
{"log":"2019-08-25 14:43:45.928 48388 INFO nova.compute.manager [req-19bf0eaa-664d-4c52-b83c-15c5916e335f - - - - -] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] VM Resumed (Lifecycle Event)\n","stream":"stdout","time":"2019-08-25T14:43:45.92878652Z"}
{"log":"2019-08-25 14:43:45.928 48388 INFO nova.compute.manager [req-19bf0eaa-664d-4c52-b83c-15c5916e335f - - - - -] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] VM Resumed (Lifecycle Event)\n","stream":"stdout","time":"2019-08-25T14:43:45.929021138Z"}
{"log":"2019-08-25 14:47:25.866 48388 INFO nova.compute.manager [req-efa4602f-076f-4884-befe-36c778536245 bdafc7345149445a9866032291963ce5 daf1e68dadf94a318a9e1597c2b10b7f - default default] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] Get console output\n","stream":"stdout","time":"2019-08-25T14:47:25.866587624Z"}
{"log":"2019-08-25 14:47:25.866 48388 INFO nova.compute.manager [req-efa4602f-076f-4884-befe-36c778536245 bdafc7345149445a9866032291963ce5 daf1e68dadf94a318a9e1597c2b10b7f - default default] [instance: c52f094e-2c75-4660-a4a4-14c99b1fcb04] Get console output\n","stream":"stdout","time":"2019-08-25T14:47:25.866809681Z"}

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote : Re: Instance enabled on host but can not ping
summary: - Instance enabled on host but not getting an ip
+ Instance enabled on host but can not ping
description: updated
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
summary: - Instance enabled on host but can not ping
+ Instance enabled on host but can not ping vm from natbox
Numan Waheed (nwaheed)
tags: added: stx.retestneeded
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

Same issue experienced after clean install of the following load on different h/w
ip 7-12
2019-08-26_20-59-00
JOB="StarlingX_2.0_build"

Will be tracked with LP-1841660

Launched a single instance after clean install (flavor has hw:cpu_policy=dedicated)
Instance appears active but cannot be reached from natbox
req-a826992c-5499-455e-a3a1-875b1f6e173c Create Aug. 28, 2019, 17:08 p.m. 0f63df81938e4f8d923cbd0f6bb7ba02

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

This was deemed as a duplicate of Bug #1841660 on 2019-08-28; updating the importance/release and assigning to the developer of that bug

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → YaoLe (yaole)
tags: added: stx.3.0 stx.networking
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.