qemu.git master -> qemu segfaults during tcp migration (and other modes when using MALLOC_PERTURB_=1)

Bug #1169375 reported by Lucas Meneghel Rodrigues
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

Relevant qemu.git master commit:

24a6e7f4d91e9ed5f8117ecb083431a23f8609a0

When trying to migrate a VM using the TCP protocol, a segfault happened:

21:45:07 INFO | Running qemu command (reformatted):
/home/lmr/Code/qemu/x86_64-softmmu/qemu-system-x86_64 \
    -S \
    -name 'virt-tests-vm1' \
    -nodefaults \
    -chardev socket,id=hmp_id_hmp1,path=/tmp/monitor-hmp1-20130415-214507-8fDeX7Fj,server,nowait \
    -mon chardev=hmp_id_hmp1,mode=readline \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20130415-214507-8fDeX7Fj,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -chardev socket,id=seabioslog_id_20130415-214507-8fDeX7Fj,path=/tmp/seabios-20130415-214507-8fDeX7Fj,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20130415-214507-8fDeX7Fj,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1 \
    -drive file='/home/lmr/Code/virt-test.git/shared/data/images/jeos-17-64.qcow2',if=none,id=virtio0 \
    -device virtio-blk-pci,drive=virtio0,bootindex=1 \
    -device virtio-net-pci,netdev=idr5RNof,mac='9a:42:43:44:45:46',id='idJVlBu3' \
    -netdev user,id=idr5RNof,hostfwd=tcp::5000-:22 \
    -m 1024 \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \
    -cpu 'SandyBridge' \
    -M pc \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :1 \
    -vga std \
    -rtc base=utc,clock=host,driftfix=none \
    -boot order=cdn,once=c,menu=off \
    -enable-kvm \
    -incoming tcp:0:5200
21:45:08 INFO | [qemu output] qemu-system-x86_64: -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1: Bus 'virtio-pci-bus.0' is full
21:45:08 DEBUG| VM appears to be alive with PID 2002
21:45:08 DEBUG| (monitor hmp1) Sending command 'info cpus'
21:45:08 DEBUG| (monitor hmp1) Response to 'info cpus'
21:45:08 DEBUG| (monitor hmp1) * CPU #0: pc=0x00000000fffffff0 thread_id=2004
21:45:08 DEBUG| (monitor hmp1) CPU #1: pc=0x00000000fffffff0 thread_id=2005
21:45:09 DEBUG| (monitor hmp1) Sending command 'cont'
21:45:09 INFO | Migrating to tcp:0:5200
21:45:09 DEBUG| (monitor hmp1) Sending command 'migrate -d tcp:0:5200'
21:45:10 WARNI| Could not find (qemu) prompt after command 'screendump /dev/shm/scrdump-MDE7wl.ppm'. Output so far: ''
21:45:10 WARNI| VM 'virt-tests-vm1' produced an invalid screendump
21:45:10 INFO | [qemu output] qemu: warning: error while loading state section id 3
21:45:10 INFO | [qemu output] load of migration failed
21:45:10 INFO | [qemu output] /bin/sh: line 1: 1867 Segmentation fault /home/lmr/Code/qemu/x86_64-softmmu/qemu-system-x86_64 -S -name 'virt-tests-vm1' -nodefaults -chardev socket,id=hmp_id_hmp1,path=/tmp/monitor-hmp1-20130415-214454-pGmRwNvs,server,nowait -mon chardev=hmp_id_hmp1,mode=readline -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20130415-214454-pGmRwNvs,server,nowait -device isa-serial,chardev=serial_id_serial1 -chardev socket,id=seabioslog_id_20130415-214454-pGmRwNvs,path=/tmp/seabios-20130415-214454-pGmRwNvs,server,nowait -device isa-debugcon,chardev=seabioslog_id_20130415-214454-pGmRwNvs,iobase=0x402 -device ich9-usb-uhci1,id=usb1 -drive file='/home/lmr/Code/virt-test.git/shared/data/images/jeos-17-64.qcow2',if=none,id=virtio0 -device virtio-blk-pci,drive=virtio0,bootindex=1 -device virtio-net-pci,netdev=id33wvth,mac='9a:42:43:44:45:46',id='idavPVhj' -netdev user,id=id33wvth,hostfwd=tcp::5001-:22 -m 1024 -smp 2,maxcpus=2,cores=1,threads=1,so:

We've missed those problems during the last couple of weeks due to problems in our test grid. The problem can be seen running the default test set on virt-test. By default, virt-test does not use MALLOC_PERTURB_=1. When using MALLOC_PERTURB_=1, pretty much all migration modes will fail.

Revision history for this message
Lucas Meneghel Rodrigues (lmr) wrote :

Problem fixed with this commit, recently pushed to master:

commit 7dda5dc82a776a39a7996020c188eb2a29187117
Author: Paolo Bonzini <email address hidden>
Date: Tue Apr 9 17:43:43 2013 +0200

    migration: initialize RAM to zero

    Using qemu_memalign only leaves the RAM zero by chance, because libc
    will usually use mmap to satisfy our huge requests. But memory will
    not be zero when using MALLOC_PERTURB_ with a nonzero value. In the
    case of incoming migration, this breaks a recently-introduced
    invariant (commit f1c7279, migration: do not sent zero pages in
    bulk stage, 2013-03-26).

    To fix this, use mmap ourselves to get a well-aligned, always zero
    block for the RAM. Mmap-ed memory is easy to "trim" at the sides.

    This also removes the need to do something special on valgrind
    (see commit c2a8238a, Support running QEMU on Valgrind, 2011-10-31),
    thus effectively reverts that patch.

    Reviewed-by: Juan Quintela <email address hidden>
    Signed-off-by: Paolo Bonzini <email address hidden>
    Reviewed-by: Markus Armbruster <email address hidden>
    Message-id: <email address hidden>
    Signed-off-by: Anthony Liguori <email address hidden>

I'll take the opportunity and also make MALLOC_PERTURB_=1 as default on virt-tests. This will help to avoid such regressions in the future.

Changed in qemu:
status: New → Fix Committed
Aurelien Jarno (aurel32)
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.