OpenContrail

Few (vFirefly) instances hungs after boot and are unreachable

Bug #1507882 reported by Vijay Anand on 2015-10-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenContrail	New	Undecided	Unassigned

Bug Description

Build: 2.21(102)

Test: 20 parallel instances
Service VM : vFirefly

Problem: 3/20 VMs hungs after boot - Console and Instance IPs are unreachable.

Note:
Problem not seen consistently but noticed more than twice.

Logs:
vijanand@bng-lnx-shell2#pwd
/homes/vijanand/Contrail-2.21-bugs/Console-hang
vijanand@bng-lnx-shell2#cd Console-hang-log/
vijanand@bng-lnx-shell2#ls -l
total 12K
drwxr-x--- 2 vijanand slt 4096 Oct 20 11:33 contrail
-rw-r--r-- 1 vijanand slt 3454 Oct 20 11:33 contrail.log
drwxr-x--- 2 vijanand slt 4096 Oct 20 11:33 nova
vijanand@bng-lnx-shell2#

Revision history for this message

Hari Prasad Killi (haripk) wrote on 2015-10-20:

Qemu was consistently taking 200% cpu.
12857 libvirt+ 20 0 4760448 3.027g 12780 S 200.0 4.8 110:54.00 qemu-system-x86

root@csp-sol-lexus:~# ps aux | grep 12857
libvirt+ 12857 198 4.8 4760448 3174032 ? Sl 14:15 114:10 /usr/bin/qemu-system-x86_64 -name instance-000002ba -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7742acc2-a1a4-4e61-9cb7-78530abc023f -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2014.1.3,serial=35383339-3134-5347-4832-33303950504b,uuid=7742acc2-a1a4-4e61-9cb7-78530abc023f -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-000002ba.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/7742acc2-a1a4-4e61-9cb7-78530abc023f/disk,if=none,id=drive-ide0-0-0,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,ifname=tap55bb8902-bb,script=,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=02:b3:1a:17:80:53,bus=pci.0,addr=0x3 -netdev tap,ifname=tapa9159f8e-39,script=,id=hostnet1 -device e1000,netdev=hostnet1,id=net1,mac=02:91:5f:f8:53:42,bus=pci.0,addr=0x4 -netdev tap,ifname=tap38aafb06-20,script=,id=hostnet2 -device e1000,netdev=hostnet2,id=net2,mac=02:c3:ab:90:48:71,bus=pci.0,addr=0x5 -chardev file,id=charserial0,path=/var/lib/nova/instances/7742acc2-a1a4-4e61-9cb7-78530abc023f/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 7.7.7.50:12 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

The UUID 7742acc2-a1a4-4e61-9cb7-78530abc023f belongs to the VM in question.

Qemu was consistently taking 200% cpu. 
12857 libvirt+  20   0 4760448 3.027g  12780 S 200.0  4.8 110:54.00 qemu-system-x86

root@csp-sol-lexus:~# ps aux | grep 12857
libvirt+ 12857  198  4.8 4760448 3174032 ?     Sl   14:15 114:10 /usr/bin/qemu-system-x86_64 -name instance-000002ba -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7742acc2-a1a4-4e61-9cb7-78530abc023f -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2014.1.3,serial=35383339-3134-5347-4832-33303950504b,uuid=7742acc2-a1a4-4e61-9cb7-78530abc023f -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-000002ba.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/7742acc2-a1a4-4e61-9cb7-78530abc023f/disk,if=none,id=drive-ide0-0-0,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,ifname=tap55bb8902-bb,script=,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=02:b3:1a:17:80:53,bus=pci.0,addr=0x3 -netdev tap,ifname=tapa9159f8e-39,script=,id=hostnet1 -device e1000,netdev=hostnet1,id=net1,mac=02:91:5f:f8:53:42,bus=pci.0,addr=0x4 -netdev tap,ifname=tap38aafb06-20,script=,id=hostnet2 -device e1000,netdev=hostnet2,id=net2,mac=02:c3:ab:90:48:71,bus=pci.0,addr=0x5 -chardev file,id=charserial0,path=/var/lib/nova/instances/7742acc2-a1a4-4e61-9cb7-78530abc023f/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 7.7.7.50:12 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

The UUID 7742acc2-a1a4-4e61-9cb7-78530abc023f belongs to the VM in question.

Revision history for this message

Vijay Anand (vijanand) wrote on 2015-10-22: Re: [Bug 1507882] Re: Few (vFirefly) instances hungs after boot and are unreachable

Download full text (4.8 KiB)

Hi Hari
We are hitting this issue consistently which is blocking scale testing. Spawned 10 instances in a compute - 1 or 2 going to unresponsive state. Any workaround or suggestions?

root@csp-sol-lexus:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Stepping: 2
CPU MHz: 1600.000
BogoMIPS: 5331.78
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23
root@csp-sol-lexus:~#

Regards
Vijay

On 10/20/15, 3:18 PM, "<email address hidden> on behalf of Hari Prasad Killi" <<email address hidden> on behalf of <email address hidden>> wrote:

Hi Hari
 We are hitting this issue consistently which is blocking scale testing. Spawned 10 instances in a compute - 1 or 2 going to unresponsive state. Any workaround or suggestions?

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                
 2838 libvirt+  20   0 33.026g 0.018t  12860 S 104.9 29.7   9015:59 qemu-system-x86                                                                                                                        
 5612 libvirt+  20   0 4760452 3.025g  12700 S  14.9  4.8   4:03.13 qemu-system-x86                                                                                                                        
 5415 libvirt+  20   0 4760452 3.025g  12700 S  14.3  4.8   4:10.39 qemu-system-x86

root@csp-sol-lexus:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              2
CPU MHz:               1600.000
BogoMIPS:              5331.78
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23
root@csp-sol-lexus:~#

Regards
Vijay

On 10/20/15, 3:18 PM, "bounces@canonical.com on behalf of Hari Prasad Killi" <bounces@canonical.com on behalf of haripk@juniper.net> wrote:

>Qemu was consistently taking 200% cpu. 
>12857 libvirt+  20   0 4760448 3.027g  12780 S 200.0  4.8 110:54.00 qemu-system-x86 
>
>root@csp-sol-lexus:~# ps aux | grep 12857
>libvirt+ 12857  198  4.8 4760448 3174032 ?     Sl   14:15 114:10 /usr/bin/qemu-system-x86_64 -name instance-000002ba -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7742acc2-a1a4-4e61-9cb7-78530abc023f -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2014.1.3,serial=35383339-3134-5347-4832-33303950504b,uuid=7742acc2-a1a4-4e61-9cb7-78530abc023f -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-000002ba.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/7742acc2-a1a4-4e61-9cb7-78530abc023f/disk,if=none,id=drive-ide0-0-0,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,ifname=tap55bb8902-bb,script=,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=02:b3:1a:17:80:53,bus=pci.0,addr=0x3 -netdev tap,ifname=tapa9159f8e-39,script=,id=hostnet1 -device e1000,netdev=hostnet1,id=net1,mac=02:91:5f:f8:53:42,bus=pci.0,addr=0x4 -netdev tap,ifname=tap38aafb06-20,script=,id=hostnet2 -device e1000,netdev=hostnet2,id=net2,mac=02:c3:ab:90:48:71,bus=pci.0,addr=0x5 -chardev file,id=charserial0,path=/var/lib/nova/instances/7742acc2-a1a4-4e61-9cb7-78530abc023f/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 7.7.7.50:12 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
>
>The UUID 7742acc2-a1a4-4e61-9cb7-78530abc023f belongs to the VM in
>question.
>
>-- 
>You received this bug notification because you are subscribed to the bug
>report.
>https://bugs.launchpad.net/bugs/1507882
>
>Title:
>  Few (vFirefly) instances hungs after boot  and are unreachable
>
>Status in OpenContrail:
>  New
>
>Bug description:
>  Build: 2.21(102)
>
>  Test: 20 parallel instances
>  Service VM : vFirefly
>
>  Problem: 3/20 VMs hungs after boot - Console and Instance IPs are
>  unreachable.
>
>  
>  Note: 
>   Problem not seen consistently but noticed more than twice.
>
>  Logs:
>  vijanand@bng-lnx-shell2#pwd
>  /homes/vijanand/Contrail-2.21-bugs/Console-hang
>  vijanand@bng-lnx-shell2#cd Console-hang-log/
>  vijanand@bng-lnx-shell2#ls -l
>  total 12K
>  drwxr-x--- 2 vijanand slt 4096 Oct 20 11:33 contrail
>  -rw-r--r-- 1 vijanand slt 3454 Oct 20 11:33 contrail.log
>  drwxr-x--- 2 vijanand slt 4096 Oct 20 11:33 nova
>  vijanand@bng-lnx-shell2#
>
>To manage notifications about this bug go to:
>https://bugs.launchpad.net/opencontrail/+bug/1507882/+subscriptions

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.