colo: vm crash with segmentation fault

Bug #1754542 reported by lee
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

I use Arch Linux x86_64
Zhang Chen's(https://github.com/zhangckid/qemu/tree/qemu-colo-18mar10)
Following document 'COLO-FT.txt',
I test colo feature on my hosts

I run this command
Primary:
sudo /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp stdio -name primary \
-device piix3-usb-uhci \
-device usb-tablet -netdev tap,id=hn0,vhost=off \
-device virtio-net-pci,id=net-pci0,netdev=hn0 \
-drive if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
children.0.file.filename=/var/lib/libvirt/images/1.raw,\
children.0.driver=raw -S

Secondary:
sudo /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp stdio -name secondary \
-device piix3-usb-uhci \
-device usb-tablet -netdev tap,id=hn0,vhost=off \
-device virtio-net-pci,id=net-pci0,netdev=hn0 \
-drive if=none,id=secondary-disk0,file.filename=/var/lib/libvirt/images/2.raw,driver=raw,node-name=node0 \
-drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
file.driver=qcow2,top-id=active-disk0,\
file.file.filename=/mnt/ramfs/active_disk.img,\
file.backing.driver=qcow2,\
file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
file.backing.backing=secondary-disk0 \
-incoming tcp:0:8888

Secondary:
{'execute':'qmp_capabilities'}
{ 'execute': 'nbd-server-start',
  'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.0.34', 'port': '8889'} } }
}
{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 'writable': true } }

Primary:
{'execute':'qmp_capabilities'}
{ 'execute': 'human-monitor-command',
  'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=192.168.0.34,file.port=8889,file.export=secondary-disk0,node-name=nbd_client0'}}
{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'primary-disk0', 'node': 'nbd_client0' } }
{ 'execute': 'migrate-set-capabilities',
      'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.0.34:8888' } }
And two VM with cash
Primary:
{"timestamp": {"seconds": 1520763655, "microseconds": 511415}, "event": "RESUME"}
[1] 329 segmentation fault sudo /usr/local/bin/qemu-system-x86_64 -boot c -enable-kvm -m 2048 -smp 2 -qm

Secondary:
{"timestamp": {"seconds": 1520763655, "microseconds": 510907}, "event": "RESUME"}
[1] 367 segmentation fault sudo /usr/local/bin/qemu-system-x86_64 -boot c -enable-kvm -m 2048 -smp 2 -qm

Tags: colo
lee (lisuiheng)
description: updated
tags: added: colo
lee (lisuiheng)
description: updated
lee (lisuiheng)
description: updated
description: updated
lee (lisuiheng)
summary: - colo: secondary vm crash when execute x-colo-lost-heartbeat
+ colo: vm crash with segmentation fault
lee (lisuiheng)
description: updated
lee (lisuiheng)
description: updated
Revision history for this message
Zhang Chen (zhangckid) wrote : Re: [Qemu-devel] [Bug 1754542] [NEW] colo: secondary vm crash when execute x-colo-lost-heartbeat
Download full text (10.9 KiB)

Hi Suiheng,

Sorry for slow reply, the document 'COLO-FT.txt' in qemu is out of date, I
will update it lately.
Please follow this step to run COLO(the command has been changed).
https://wiki.qemu.org/Features/COLO

Thanks
Zhang Chen

On Fri, Mar 9, 2018 at 10:54 AM, 李穗恒 <email address hidden> wrote:

> Public bug reported:
>
> I use Arch Linux x86_64
> both qemu 2.11.1 and Zhang Chen's(https://github.com/
> zhangckid/qemu/commits/colo-with-virtio-net-internal-jul10)
> Following document 'COLO-FT.txt',
> I test colo feature on my hosts
>
> I run this command
> Primary:
> sudo qemu-system-x86_64 -boot c -enable-kvm -m 2048 -smp 2 -qmp stdio
> -name primary \
> -device piix3-usb-uhci \
> -device usb-tablet -netdev tap,id=hn0,vhost=off \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=
> fifo,vote-threshold=1,children.0.file.filename=/var/
> lib/libvirt/images/1.raw,children.0.driver=raw -S
>
> Secondary:
> sudo qemu-system-x86_64 -boot c -enable-kvm -m 2048 -smp 2 -qmp stdio
> -name secondary \
> -device piix3-usb-uhci \
> -device usb-tablet -netdev tap,id=hn0,vhost=off \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -drive if=none,id=colo-disk0,file.filename=/var/lib/libvirt/
> images/2.raw,driver=raw,node-name=node0 \
> -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> file.driver=qcow2,top-id=active-disk0,\
> file.file.filename=/mnt/ramfs/active_disk.img,\
> file.backing.driver=qcow2,\
> file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
> file.backing.backing=colo-disk0 \
> -incoming tcp:0:8888
>
> Secondary:
> {'execute':'qmp_capabilities'}
> { 'execute': 'nbd-server-start',
> 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.0.33',
> 'port': '8889'} } }
> }
> {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0',
> 'writable': true } }
>
> Primary:
> {'execute':'qmp_capabilities'}
> { 'execute': 'human-monitor-command',
> 'arguments': {'command-line': 'drive_add -n buddy
> driver=replication,mode=primary,file.driver=nbd,file.
> host=192.168.0.34,file.port=8889,file.export=colo-disk0,
> node-name=nbd_client0'}}
> { 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0',
> 'node': 'nbd_client0' } }
> { 'execute': 'migrate-set-capabilities',
> 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state':
> true } ] } }
> { 'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.0.34:8888' } }
> { 'execute': 'migrate-set-parameters' , 'arguments':{
> 'x-checkpoint-delay': 2000 } }
>
> Above are all OK.Two VM syncing.
>
> Primary:
> { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0',
> 'child': 'children.1'}}
> { 'execute': 'human-monitor-command','arguments': {'command-line':
> 'drive_del blk-buddy0'}}
>
> Secondary:
> { 'execute': 'nbd-server-stop' }
> { 'execute': 'x-colo-lost-heartbeat' }
>
> But When I execute x-colo-lost-heartbeat.Primary run Secondary cash
>
> { 'execute': 'nbd-server-stop' }
> {"return": {}}
> qemu-system-x86_64: Disconnect client, due to: Unexpected end-of-file
> before all bytes were read
> { 'execute': 'x-colo-lost-heartbeat' }
> {"return"...

Revision history for this message
Zhang Chen (zhangckid) wrote :
Download full text (10.9 KiB)

 Hi Suiheng,

Sorry for slow reply, the document 'COLO-FT.txt' in qemu is out of date, I
will update it lately.
Please follow this step to run COLO(the command has been changed).
https://wiki.qemu.org/Features/COLO

Thanks
Zhang Chen

On Fri, Mar 9, 2018 at 10:54 AM, 李穗恒 <email address hidden> wrote:

> Public bug reported:
>
> I use Arch Linux x86_64
> both qemu 2.11.1 and Zhang Chen's(https://github.com/
> zhangckid/qemu/commits/colo-with-virtio-net-internal-jul10)
> Following document 'COLO-FT.txt',
> I test colo feature on my hosts
>
> I run this command
> Primary:
> sudo qemu-system-x86_64 -boot c -enable-kvm -m 2048 -smp 2 -qmp stdio
> -name primary \
> -device piix3-usb-uhci \
> -device usb-tablet -netdev tap,id=hn0,vhost=off \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=
> fifo,vote-threshold=1,children.0.file.filename=/var/
> lib/libvirt/images/1.raw,children.0.driver=raw -S
>
> Secondary:
> sudo qemu-system-x86_64 -boot c -enable-kvm -m 2048 -smp 2 -qmp stdio
> -name secondary \
> -device piix3-usb-uhci \
> -device usb-tablet -netdev tap,id=hn0,vhost=off \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -drive if=none,id=colo-disk0,file.filename=/var/lib/libvirt/
> images/2.raw,driver=raw,node-name=node0 \
> -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> file.driver=qcow2,top-id=active-disk0,\
> file.file.filename=/mnt/ramfs/active_disk.img,\
> file.backing.driver=qcow2,\
> file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
> file.backing.backing=colo-disk0 \
> -incoming tcp:0:8888
>
> Secondary:
> {'execute':'qmp_capabilities'}
> { 'execute': 'nbd-server-start',
> 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.0.33',
> 'port': '8889'} } }
> }
> {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0',
> 'writable': true } }
>
> Primary:
> {'execute':'qmp_capabilities'}
> { 'execute': 'human-monitor-command',
> 'arguments': {'command-line': 'drive_add -n buddy
> driver=replication,mode=primary,file.driver=nbd,file.
> host=192.168.0.34,file.port=8889,file.export=colo-disk0,
> node-name=nbd_client0'}}
> { 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0',
> 'node': 'nbd_client0' } }
> { 'execute': 'migrate-set-capabilities',
> 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state':
> true } ] } }
> { 'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.0.34:8888' } }
> { 'execute': 'migrate-set-parameters' , 'arguments':{
> 'x-checkpoint-delay': 2000 } }
>
> Above are all OK.Two VM syncing.
>
> Primary:
> { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0',
> 'child': 'children.1'}}
> { 'execute': 'human-monitor-command','arguments': {'command-line':
> 'drive_del blk-buddy0'}}
>
> Secondary:
> { 'execute': 'nbd-server-stop' }
> { 'execute': 'x-colo-lost-heartbeat' }
>
> But When I execute x-colo-lost-heartbeat.Primary run Secondary cash
>
> { 'execute': 'nbd-server-stop' }
> {"return": {}}
> qemu-system-x86_64: Disconnect client, due to: Unexpected end-of-file
> before all bytes were read
> { 'execute': 'x-colo-lost-heartbeat' }
> {"return":...

Revision history for this message
lee (lisuiheng) wrote :
Download full text (4.2 KiB)

Hi Zhang Chen,
I follow the https://wiki.qemu.org/Features/COLO, And Vm no crash.
But SVM rebooting constantly after print RESET, PVM normal startup.

Secondary:
{"timestamp": {"seconds": 1521421788, "microseconds": 541058}, "event": "RESUME"}
{"timestamp": {"seconds": 1521421808, "microseconds": 493484}, "event": "STOP"}
{"timestamp": {"seconds": 1521421808, "microseconds": 686466}, "event": "RESUME"}
{"timestamp": {"seconds": 1521421808, "microseconds": 696152}, "event": "RESET", "data": {"guest": true}}
{"timestamp": {"seconds": 1521421808, "microseconds": 740653}, "event": "RESET", "data": {"guest": true}}
{"timestamp": {"seconds": 1521421818, "microseconds": 742222}, "event": "STOP"}
{"timestamp": {"seconds": 1521421818, "microseconds": 969883}, "event": "RESUME"}
{"timestamp": {"seconds": 1521421818, "microseconds": 979986}, "event": "RESET", "data": {"guest": true}}
{"timestamp": {"seconds": 1521421819, "microseconds": 22652}, "event": "RESET", "data": {"guest": true}}

The command(I run two VM in sample machine):

Primary:
sudo /home/lee/Documents/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet \
    -netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device rtl8139,id=e0,netdev=hn0 \
    -chardev socket,id=mirror0,host=192.168.0.33,port=9003,server,nowait \
    -chardev socket,id=compare1,host=192.168.0.33,port=9004,server,wait \
    -chardev socket,id=compare0,host=192.168.0.33,port=9001,server,nowait \
    -chardev socket,id=compare0-0,host=192.168.0.33,port=9001 \
    -chardev socket,id=compare_out,host=192.168.0.33,port=9005,server,nowait \
    -chardev socket,id=compare_out0,host=192.168.0.33,port=9005 \
    -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
    -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
    -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
    -object iothread,id=iothread1 \
    -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,iothread=iothread1 \
    -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/var/lib/libvirt/images/1.raw,children.0.driver=raw -S

Secondary:
sudo /home/lee/Documents/qemu/x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -name secondary -enable-kvm -cpu qemu64,+kvmclock \
    -device piix3-usb-uhci -device usb-tablet \
    -netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -device rtl8139,netdev=hn0 \
    -chardev socket,id=red0,host=192.168.0.33,port=9003,reconnect=1 \
    -chardev socket,id=red1,host=192.168.0.33,port=9004,reconnect=1 \
    -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
    -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
    -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
    -drive if=none,id=colo-disk0,file.filename=/var/lib/libvirt/images/2.raw,driver=raw,node-name=node0 \
    -drive if=ide,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-i...

Read more...

Revision history for this message
lee (lisuiheng) wrote :

It is my trace event file.
I read it many times, but still can't find the cause of the error.
I just found after colo_vm_state_change ide_reset and ps2_kbd_reset ...

Revision history for this message
lee (lisuiheng) wrote :

It is svn trace even

Revision history for this message
Zhang Chen (zhangckid) wrote : Re: [Bug 1754542] Re: colo: vm crash with segmentation fault
Download full text (8.1 KiB)

Hi Suiheng,

I made a new guest image and retest it, and got the same bug from latest
branch.
I found that after the COLO checkpoint begin, the secondary guest always
send
reset request to Qemu like someone still push the reset button in the guest.
And this bug occurred in COLO frame related codes. This part of codes wrote
by Li zhijian and Zhang hailiang and currently maintained by Zhang hailiang.
So, I add them to this thread.

CC Zhijian and Hailiang:
Any idea or comments about this bug?

If you want to test COLO currently, you can try the old version of COLO:
https://github.com/zhangckid/qemu/tree/qemu-colo-18mar10-legacy

Thanks
Zhang Chen

On Mon, Mar 19, 2018 at 10:08 AM, 李穗恒 <email address hidden> wrote:

> Hi Zhang Chen,
> I follow the https://wiki.qemu.org/Features/COLO, And Vm no crash.
> But SVM rebooting constantly after print RESET, PVM normal startup.
>
> Secondary:
> {"timestamp": {"seconds": 1521421788, "microseconds": 541058}, "event":
> "RESUME"}
> {"timestamp": {"seconds": 1521421808, "microseconds": 493484}, "event":
> "STOP"}
> {"timestamp": {"seconds": 1521421808, "microseconds": 686466}, "event":
> "RESUME"}
> {"timestamp": {"seconds": 1521421808, "microseconds": 696152}, "event":
> "RESET", "data": {"guest": true}}
> {"timestamp": {"seconds": 1521421808, "microseconds": 740653}, "event":
> "RESET", "data": {"guest": true}}
> {"timestamp": {"seconds": 1521421818, "microseconds": 742222}, "event":
> "STOP"}
> {"timestamp": {"seconds": 1521421818, "microseconds": 969883}, "event":
> "RESUME"}
> {"timestamp": {"seconds": 1521421818, "microseconds": 979986}, "event":
> "RESET", "data": {"guest": true}}
> {"timestamp": {"seconds": 1521421819, "microseconds": 22652}, "event":
> "RESET", "data": {"guest": true}}
>
>
> The command(I run two VM in sample machine):
>
> Primary:
> sudo /home/lee/Documents/qemu/x86_64-softmmu/qemu-system-x86_64
> -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -name primary -cpu
> qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet \
> -netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
> -device rtl8139,id=e0,netdev=hn0 \
> -chardev socket,id=mirror0,host=192.168.0.33,port=9003,server,nowait \
> -chardev socket,id=compare1,host=192.168.0.33,port=9004,server,wait \
> -chardev socket,id=compare0,host=192.168.0.33,port=9001,server,nowait
> \
> -chardev socket,id=compare0-0,host=192.168.0.33,port=9001 \
> -chardev socket,id=compare_out,host=192.168.0.33,port=9005,server,nowait
> \
> -chardev socket,id=compare_out0,host=192.168.0.33,port=9005 \
> -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
> -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out
> \
> -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0
> \
> -object iothread,id=iothread1 \
> -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=
> compare1,outdev=compare_out0,iothread=iothread1 \
> -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-
> threshold=1,children.0.file.filename=/var/lib/libvirt/
> images/1.raw,children.0.driver=raw -S
>
> Secondary:
> sudo /home/lee...

Read more...

Revision history for this message
lee (lisuiheng) wrote :

I took a photo when SVM print reset.
It seems like kernel panic.
I hope this will help.

Revision history for this message
lee (lisuiheng) wrote :

SVM error photo

Revision history for this message
Zhang Chen (zhangckid) wrote : Re: [Qemu-devel] [Bug 1754542] Re: colo: vm crash with segmentation fault
Download full text (9.9 KiB)

Thanks zhijian.

On Fri, Mar 23, 2018 at 4:34 PM, Li Zhijian <email address hidden>
wrote:

> Just noticed that's a little old, you may need to rebase it
>
>
> Thanks
>
>
> On 03/23/2018 11:51 AM, Li Zhijian wrote:
>
>>
>>
>> On 03/21/2018 02:04 PM, Zhang Chen wrote:
>>
>>> Hi Suiheng,
>>>
>>> I made a new guest image and retest it, and got the same bug from latest
>>> branch.
>>> I found that after the COLO checkpoint begin, the secondary guest always
>>> send
>>> reset request to Qemu like someone still push the reset button in the
>>> guest.
>>> And this bug occurred in COLO frame related codes. This part of codes
>>> wrote
>>> by Li zhijian and Zhang hailiang and currently maintained by Zhang
>>> hailiang.
>>> So, I add them to this thread.
>>>
>>> CC Zhijian and Hailiang:
>>> Any idea or comments about this bug?
>>>
>>
>> One clue is the memory of SVM not is same with PVM.
>> we can try to compare the memory after checkpoint, i had a draft patch to
>> do this before.
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>> If you want to test COLO currently, you can try the old version of COLO:
>>> https://github.com/zhangckid/qemu/tree/qemu-colo-18mar10-legacy
>>>
>>>
>>> Thanks
>>> Zhang Chen
>>>
>>> On Mon, Mar 19, 2018 at 10:08 AM, 李穗恒 <<email address hidden>
>>> <mailto:<email address hidden>>> wrote:
>>>
>>> Hi Zhang Chen,
>>> I follow the https://wiki.qemu.org/Features/COLO <
>>> https://wiki.qemu.org/Features/COLO>, And Vm no crash.
>>>
>>> But SVM rebooting constantly after print RESET, PVM normal startup.
>>>
>>> Secondary:
>>> {"timestamp": {"seconds": 1521421788, "microseconds": 541058},
>>> "event": "RESUME"}
>>> {"timestamp": {"seconds": 1521421808, "microseconds": 493484},
>>> "event": "STOP"}
>>> {"timestamp": {"seconds": 1521421808, "microseconds": 686466},
>>> "event": "RESUME"}
>>> {"timestamp": {"seconds": 1521421808, "microseconds": 696152},
>>> "event": "RESET", "data": {"guest": true}}
>>> {"timestamp": {"seconds": 1521421808, "microseconds": 740653},
>>> "event": "RESET", "data": {"guest": true}}
>>> {"timestamp": {"seconds": 1521421818, "microseconds": 742222},
>>> "event": "STOP"}
>>> {"timestamp": {"seconds": 1521421818, "microseconds": 969883},
>>> "event": "RESUME"}
>>> {"timestamp": {"seconds": 1521421818, "microseconds": 979986},
>>> "event": "RESET", "data": {"guest": true}}
>>> {"timestamp": {"seconds": 1521421819, "microseconds": 22652},
>>> "event": "RESET", "data": {"guest": true}}
>>>
>>>
>>> The command(I run two VM in sample machine):
>>>
>>> Primary:
>>> sudo /home/lee/Documents/qemu/x86_64-softmmu/qemu-system-x86_64
>>> -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -name primary -cpu
>>> qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet \
>>> -netdev tap,id=hn0,vhost=off,script=/e
>>> tc/qemu-ifup,downscript=/etc/qemu-ifdown -device
>>> rtl8139,id=e0,netdev=hn0 \
>>> -chardev socket,id=mirror0,host=192.168.0.33,port=9003,server,nowait
>>> \
>>> -chardev socket,id=compare1,host=192.168.0.33,port=9004,server,wait
>>> \
>>> -chardev socket,id=compare0,host=192.168.0.33,port=9001,server,nowait
>>> \
>>> ...

Revision history for this message
Zhang Chen (zhangckid) wrote :

Hi Suiheng,

This bug have been fixed in my latest patch.
Please retest it.
https://<email address hidden>/msg538383.html

github:
https://github.com/zhangckid/qemu/tree/qemu-colo-18jun1

Thanks
Zhang Chen

Revision history for this message
WANG Chao (chao.wang) wrote :
Download full text (5.8 KiB)

Hi, Zhang Chen

It seems virtio blk isn't working.

I test coloft against https://github.com/zhangckid/qemu/tree/qemu-colo-18jul22, got the following error on very early stage:

On primary:
qemu-system-x86_64: Can't receive COLO message: Input/output error

On secondary:
qemu-system-x86_64: block.c:4893: bdrv_detach_aio_context: Assertion `!bs->walking_aio_notifiers' failed.

Run the test as follows:

1. Setup primary:

# qemu-img create -b centos6base.img -f qcow2 centos6sp.img
# qemu-system-x86_64 -machine dump-guest-core=off -accel kvm -m 128 \
-smp 2 -name primary -serial stdio \
-qmp unix://root/wangchao/pvm.monitor.sock,server,nowait -vnc :10 \
-netdev tap,id=hn0,vhost=off,script=no,downscript=no -drive \
if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/root/wangchao/images/centos6sp.img,children.0.driver=qcow2 \
-S -nodefaults

2. Setup secondary:

# qemu-img create -b centos6base.img -f qcow2 centos6sp.img
# qemu-img create -f qcow2 /dev/shm/active.img 20G
# qemu-img create -f qcow2 /dev/shm/hidden.img 20G
# qemu-system-x86_64 -machine dump-guest-core=off -accel kvm -m 128 \
-smp 2 -name secondary -serial stdio \
-qmp unix://root/wangchao/svm.monitor.sock,server,nowait -vnc :10 \
-netdev tap,id=hn0,vhost=off,script=no,downscript=no \
-drive if=none,id=secondary-disk0,file.filename=/root/wangchao/images/centos6sp.img,driver=qcow2,node-name=node0 \
-drive if=virtio,id=active-disk0,driver=replication,mode=secondary,top-id=active-disk0,file.driver=qcow2,file.file.filename=/dev/shm/active.img,file.backing.driver=qcow2,file.backing.file.filename=/dev/shm/hidden.img,file.backing.backing=secondary-disk0 \
-incoming tcp:0:8888 -nodefaults

3. Issue the following qmp:

On secondary:
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'x.x.x.x', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device':'secondary-disk0', 'writable': true } }

On primary:
{'execute': 'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=x.x.x.x,file.port=8889,file.export=secondary-disk0,node-name=nbd_client0'}}
{'execute': 'x-blockdev-change', 'arguments':{'parent': 'primary-disk0', 'node': 'nbd_client0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [{'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:x.x.x.x:8888'}}

4. Then secondary immediately crashed:
qemu-system-x86_64: block.c:4893: bdrv_detach_aio_context: Assertion `!bs->walking_aio_notifiers' failed.

(gdb) bt
#0 0x00007fb50d241277 in raise () from /lib64/libc.so.6
#1 0x00007fb50d242968 in abort () from /lib64/libc.so.6
#2 0x00007fb50d23a096 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007fb50d23a142 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000706ae9 in bdrv_detach_aio_context (bs=0x2e84000) at block.c:4893
#5 0x0000000000706ab8 in bdrv_detach_aio_context (bs=bs@entry=0x315d400) at block.c:4911
#6 0x0000000000706c16 in bdrv_set_aio_context (bs=0x315d400, new_co...

Read more...

Revision history for this message
Zhang Chen (zhangckid) wrote : Re: [Bug 1754542] Re: colo: vm crash with segmentation fault
Download full text (9.2 KiB)

Hi Chao,

Yes, virtio blk isn't supported by current COLO, you can try the "-drive
if=ide xxxxxxxx".

Thanks
Zhang Chen

On Fri, Jul 27, 2018 at 12:53 PM, WANG Chao <email address hidden>
wrote:

> Hi, Zhang Chen
>
> It seems virtio blk isn't working.
>
> I test coloft against https://github.com/zhangckid/qemu/tree/qemu-colo-
> 18jul22, got the following error on very early stage:
>
> On primary:
> qemu-system-x86_64: Can't receive COLO message: Input/output error
>
> On secondary:
> qemu-system-x86_64: block.c:4893: bdrv_detach_aio_context: Assertion
> `!bs->walking_aio_notifiers' failed.
>
> Run the test as follows:
>
> 1. Setup primary:
>
> # qemu-img create -b centos6base.img -f qcow2 centos6sp.img
> # qemu-system-x86_64 -machine dump-guest-core=off -accel kvm -m 128 \
> -smp 2 -name primary -serial stdio \
> -qmp unix://root/wangchao/pvm.monitor.sock,server,nowait -vnc :10 \
> -netdev tap,id=hn0,vhost=off,script=no,downscript=no -drive \
> if=virtio,id=primary-disk0,driver=quorum,read-pattern=
> fifo,vote-threshold=1,children.0.file.filename=/root/wangchao/images/
> centos6sp.img,children.0.driver=qcow2 \
> -S -nodefaults
>
> 2. Setup secondary:
>
> # qemu-img create -b centos6base.img -f qcow2 centos6sp.img
> # qemu-img create -f qcow2 /dev/shm/active.img 20G
> # qemu-img create -f qcow2 /dev/shm/hidden.img 20G
> # qemu-system-x86_64 -machine dump-guest-core=off -accel kvm -m 128 \
> -smp 2 -name secondary -serial stdio \
> -qmp unix://root/wangchao/svm.monitor.sock,server,nowait -vnc :10 \
> -netdev tap,id=hn0,vhost=off,script=no,downscript=no \
> -drive if=none,id=secondary-disk0,file.filename=/root/wangchao/
> images/centos6sp.img,driver=qcow2,node-name=node0 \
> -drive if=virtio,id=active-disk0,driver=replication,mode=
> secondary,top-id=active-disk0,file.driver=qcow2,file.file.
> filename=/dev/shm/active.img,file.backing.driver=qcow2,
> file.backing.file.filename=/dev/shm/hidden.img,file.
> backing.backing=secondary-disk0 \
> -incoming tcp:0:8888 -nodefaults
>
> 3. Issue the following qmp:
>
> On secondary:
> {'execute':'qmp_capabilities'}
> {'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet',
> 'data': {'host': 'x.x.x.x', 'port': '8889'} } } }
> {'execute': 'nbd-server-add', 'arguments': {'device':'secondary-disk0',
> 'writable': true } }
>
> On primary:
> {'execute': 'qmp_capabilities'}
> {'execute': 'human-monitor-command', 'arguments': {'command-line':
> 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.
> host=x.x.x.x,file.port=8889,file.export=secondary-disk0,
> node-name=nbd_client0'}}
> {'execute': 'x-blockdev-change', 'arguments':{'parent': 'primary-disk0',
> 'node': 'nbd_client0' } }
> {'execute': 'migrate-set-capabilities', 'arguments': {'capabilities':
> [{'capability': 'x-colo', 'state': true } ] } }
> {'execute': 'migrate', 'arguments': {'uri': 'tcp:x.x.x.x:8888'}}
>
> 4. Then secondary immediately crashed:
> qemu-system-x86_64: block.c:4893: bdrv_detach_aio_context: Assertion
> `!bs->walking_aio_notifiers' failed.
>
> (gdb) bt
> #0 0x00007fb50d241277 in raise () from /lib64/libc.so.6
> #1 0x00007fb50d242968 in abort () from /lib64/libc.so.6
> #2 0x00007fb50d23a096 in ...

Read more...

Revision history for this message
WANG Chao (chao.wang) wrote :

Yes, ide works.

And by the way, how about other virtio devices or vhost-xxx? Are they supported by COLO?

Do you know the working set of devices? My preliminary test shows ide, e1000, rtl8139 work.

Thanks
WANG Chao

Revision history for this message
Zhang Chen (zhangckid) wrote :
Download full text (3.3 KiB)

Currently, we support virtio-net, and not support all vhost-xxx.

On Fri, Jul 27, 2018 at 6:41 PM WANG Chao <email address hidden>
wrote:

> Yes, ide works.
>
> And by the way, how about other virtio devices or vhost-xxx? Are they
> supported by COLO?
>
> Do you know the working set of devices? My preliminary test shows ide,
> e1000, rtl8139 work.
>
> Thanks
> WANG Chao
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1754542
>
> Title:
> colo: vm crash with segmentation fault
>
> Status in QEMU:
> New
>
> Bug description:
> I use Arch Linux x86_64
> Zhang Chen's(https://github.com/zhangckid/qemu/tree/qemu-colo-18mar10)
> Following document 'COLO-FT.txt',
> I test colo feature on my hosts
>
> I run this command
> Primary:
> sudo /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp
> stdio -name primary \
> -device piix3-usb-uhci \
> -device usb-tablet -netdev tap,id=hn0,vhost=off \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -drive
> if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
> children.0.file.filename=/var/lib/libvirt/images/1.raw,\
> children.0.driver=raw -S
>
> Secondary:
> sudo /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp
> stdio -name secondary \
> -device piix3-usb-uhci \
> -device usb-tablet -netdev tap,id=hn0,vhost=off \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -drive
> if=none,id=secondary-disk0,file.filename=/var/lib/libvirt/images/2.raw,driver=raw,node-name=node0
> \
> -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> file.driver=qcow2,top-id=active-disk0,\
> file.file.filename=/mnt/ramfs/active_disk.img,\
> file.backing.driver=qcow2,\
> file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
> file.backing.backing=secondary-disk0 \
> -incoming tcp:0:8888
>
> Secondary:
> {'execute':'qmp_capabilities'}
> { 'execute': 'nbd-server-start',
> 'arguments': {'addr': {'type': 'inet', 'data': {'host':
> '192.168.0.34', 'port': '8889'} } }
> }
> {'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0',
> 'writable': true } }
>
> Primary:
> {'execute':'qmp_capabilities'}
> { 'execute': 'human-monitor-command',
> 'arguments': {'command-line': 'drive_add -n buddy
> driver=replication,mode=primary,file.driver=nbd,file.host=192.168.0.34,file.port=8889,file.export=secondary-disk0,node-name=nbd_client0'}}
> { 'execute':'x-blockdev-change', 'arguments':{'parent': 'primary-disk0',
> 'node': 'nbd_client0' } }
> { 'execute': 'migrate-set-capabilities',
> 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state':
> true } ] } }
> { 'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.0.34:8888' } }
> And two VM with cash
> Primary:
> {"timestamp": {"seconds": 1520763655, "microseconds": 511415}, "event":
> "RESUME"}
> [1] 329 segmentation fault sudo /usr/local/bin/qemu-system-x86_64
> -boot c -enable-kvm -m 2048 -smp 2 -qm
>
> Secondary:
> {"timestamp": {"seconds": 1520763655, "microseconds": 510907}, "event":
> "RESUME"}
> [1...

Read more...

lee (lisuiheng)
Changed in qemu:
status: New → Fix Released
Revision history for this message
lee (lisuiheng) wrote :

Hi Zhang Chen ,

I try colo follow https://wiki.qemu.org/Features/COLO.
It work well. But disk performance slow.
Only host performance 10%.
Can virtio blk supported by current colo?
Or is there any other way to improve disk performance.

Thanks
lee

Revision history for this message
Zhang Chen (zhangckid) wrote :
Download full text (3.4 KiB)

Hi Lee,

Can you introduce to me the detail test step about disk performance?
I want to look into it when I have time.

Thanks
Zhang Chen

On Wed, Nov 27, 2019 at 10:50 AM lee <email address hidden> wrote:
>
> Hi Zhang Chen ,
>
> I try colo follow https://wiki.qemu.org/Features/COLO.
> It work well. But disk performance slow.
> Only host performance 10%.
> Can virtio blk supported by current colo?
> Or is there any other way to improve disk performance.
>
> Thanks
> Zhang Chen
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1754542
>
> Title:
> colo: vm crash with segmentation fault
>
> Status in QEMU:
> Fix Released
>
> Bug description:
> I use Arch Linux x86_64
> Zhang Chen's(https://github.com/zhangckid/qemu/tree/qemu-colo-18mar10)
> Following document 'COLO-FT.txt',
> I test colo feature on my hosts
>
> I run this command
> Primary:
> sudo /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp stdio -name primary \
> -device piix3-usb-uhci \
> -device usb-tablet -netdev tap,id=hn0,vhost=off \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -drive if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
> children.0.file.filename=/var/lib/libvirt/images/1.raw,\
> children.0.driver=raw -S
>
> Secondary:
> sudo /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp stdio -name secondary \
> -device piix3-usb-uhci \
> -device usb-tablet -netdev tap,id=hn0,vhost=off \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -drive if=none,id=secondary-disk0,file.filename=/var/lib/libvirt/images/2.raw,driver=raw,node-name=node0 \
> -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> file.driver=qcow2,top-id=active-disk0,\
> file.file.filename=/mnt/ramfs/active_disk.img,\
> file.backing.driver=qcow2,\
> file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
> file.backing.backing=secondary-disk0 \
> -incoming tcp:0:8888
>
> Secondary:
> {'execute':'qmp_capabilities'}
> { 'execute': 'nbd-server-start',
> 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.0.34', 'port': '8889'} } }
> }
> {'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 'writable': true } }
>
> Primary:
> {'execute':'qmp_capabilities'}
> { 'execute': 'human-monitor-command',
> 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=192.168.0.34,file.port=8889,file.export=secondary-disk0,node-name=nbd_client0'}}
> { 'execute':'x-blockdev-change', 'arguments':{'parent': 'primary-disk0', 'node': 'nbd_client0' } }
> { 'execute': 'migrate-set-capabilities',
> 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
> { 'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.0.34:8888' } }
> And two VM with cash
> Primary:
> {"timestamp": {"seconds": 1520763655, "microseconds": 511415}, "event": "RESUME"}
> [1] 329 segmentation fault sudo /usr/local/bin/qemu-system-x86_64 -boot c -enable-kvm -m 2048 -smp 2 -qm
>
> Secondary:
> ...

Read more...

Revision history for this message
lee (lisuiheng) wrote :
Download full text (6.9 KiB)

Hi Zhang Chen
I use sysbench compare Host、Qemu Native VM(virtio-blk)、Qemu Native VM、Qemu
colo disk performance.
The result in below attachment.
Qemu Native VM(virtio-blk) use -device virtio-blk-pci
Qemu colo follow https://wiki.qemu.org/Features/COLO
Thanks
Lee

Zhang Chen <email address hidden> 于2019年11月27日周三 上午11:15写道:

> Hi Lee,
>
> Can you introduce to me the detail test step about disk performance?
> I want to look into it when I have time.
>
> Thanks
> Zhang Chen
>
> On Wed, Nov 27, 2019 at 10:50 AM lee <email address hidden> wrote:
> >
> > Hi Zhang Chen ,
> >
> > I try colo follow https://wiki.qemu.org/Features/COLO.
> > It work well. But disk performance slow.
> > Only host performance 10%.
> > Can virtio blk supported by current colo?
> > Or is there any other way to improve disk performance.
> >
> > Thanks
> > Zhang Chen
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1754542
> >
> > Title:
> > colo: vm crash with segmentation fault
> >
> > Status in QEMU:
> > Fix Released
> >
> > Bug description:
> > I use Arch Linux x86_64
> > Zhang Chen's(https://github.com/zhangckid/qemu/tree/qemu-colo-18mar10)
> > Following document 'COLO-FT.txt',
> > I test colo feature on my hosts
> >
> > I run this command
> > Primary:
> > sudo /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp
> stdio -name primary \
> > -device piix3-usb-uhci \
> > -device usb-tablet -netdev tap,id=hn0,vhost=off \
> > -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> > -drive
> if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
> > children.0.file.filename=/var/lib/libvirt/images/1.raw,\
> > children.0.driver=raw -S
> >
> > Secondary:
> > sudo /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp
> stdio -name secondary \
> > -device piix3-usb-uhci \
> > -device usb-tablet -netdev tap,id=hn0,vhost=off \
> > -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> > -drive
> if=none,id=secondary-disk0,file.filename=/var/lib/libvirt/images/2.raw,driver=raw,node-name=node0
> \
> > -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> > file.driver=qcow2,top-id=active-disk0,\
> > file.file.filename=/mnt/ramfs/active_disk.img,\
> > file.backing.driver=qcow2,\
> > file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
> > file.backing.backing=secondary-disk0 \
> > -incoming tcp:0:8888
> >
> > Secondary:
> > {'execute':'qmp_capabilities'}
> > { 'execute': 'nbd-server-start',
> > 'arguments': {'addr': {'type': 'inet', 'data': {'host':
> '192.168.0.34', 'port': '8889'} } }
> > }
> > {'execute': 'nbd-server-add', 'arguments': {'device':
> 'secondary-disk0', 'writable': true } }
> >
> > Primary:
> > {'execute':'qmp_capabilities'}
> > { 'execute': 'human-monitor-command',
> > 'arguments': {'command-line': 'drive_add -n buddy
> driver=replication,mode=primary,file.driver=nbd,file.host=192.168.0.34,file.port=8889,file.export=secondary-disk0,node-name=nbd_client0'}}
> > { 'execute':'x-blockdev-change', 'arguments':{'parent':
> 'primary-disk0', 'node':...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.