Unable to boot after drive-mirror

Bug #1719282 reported by Farid Zarazvand
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

Hi,
I am using "drive-mirror" qmp block-job command to transfer VM disk image to other path (different physical disk on host).
Unfortunately after shutting down and starting from new image, VM is unable to boot and qrub enters rescue mode displaying following error:
```
error: file '/grub/i386-pc/normal.mod' not found.
Entering rescue mode...
grub rescue>
```

To investigate the problem, I compared both RAW images using linux "cmp -l" command and found out that they differ in 569028 bytes starting from address 185598977 to 252708864 which are located on /boot partition.

So I mounted /boot partition of mirrored RAW image on host OS and it seems that file-system is broken and grub folder is not recognized. But /boot on original RAW image has no problem.

Mirrored Image:
ls -l /mnt/vm-boot/
ls: cannot access /mnt/vm-boot/grub: Structure needs cleaning
total 38168
-rw-r--r-- 1 root root 157721 Oct 19 2016 config-3.16.0-4-amd64
-rw-r--r-- 1 root root 129281 Sep 20 2015 config-3.2.0-4-amd64
d????????? ? ? ? ? ? grub
-rw-r--r-- 1 root root 15739360 Nov 2 2016 initrd.img-3.16.0-4-amd64
-rw-r--r-- 1 root root 12115412 Oct 10 2015 initrd.img-3.2.0-4-amd64
drwxr-xr-x 2 root root 12288 Oct 7 2013 lost+found
-rw-r--r-- 1 root root 2679264 Oct 19 2016 System.map-3.16.0-4-amd64
-rw-r--r-- 1 root root 2114662 Sep 20 2015 System.map-3.2.0-4-amd64
-rw-r--r-- 1 root root 3126448 Oct 19 2016 vmlinuz-3.16.0-4-amd64
-rw-r--r-- 1 root root 2842592 Sep 20 2015 vmlinuz-3.2.0-4-amd64

Original Image:
ls /mnt/vm-boot/ -l
total 38173
-rw-r--r-- 1 root root 157721 Oct 19 2016 config-3.16.0-4-amd64
-rw-r--r-- 1 root root 129281 Sep 20 2015 config-3.2.0-4-amd64
drwxr-xr-x 5 root root 5120 Nov 2 2016 grub
-rw-r--r-- 1 root root 15739360 Nov 2 2016 initrd.img-3.16.0-4-amd64
-rw-r--r-- 1 root root 12115412 Oct 10 2015 initrd.img-3.2.0-4-amd64
drwxr-xr-x 2 root root 12288 Oct 7 2013 lost+found
-rw-r--r-- 1 root root 2679264 Oct 19 2016 System.map-3.16.0-4-amd64
-rw-r--r-- 1 root root 2114662 Sep 20 2015 System.map-3.2.0-4-amd64
-rw-r--r-- 1 root root 3126448 Oct 19 2016 vmlinuz-3.16.0-4-amd64
-rw-r--r-- 1 root root 2842592 Sep 20 2015 vmlinuz-3.2.0-4-amd64

ls /mnt/vm-boot/grub/ -l
total 2376
-rw-r--r-- 1 root root 48 Oct 7 2013 device.map
drwxr-xr-x 2 root root 1024 Oct 10 2015 fonts
-r--r--r-- 1 root root 9432 Nov 2 2016 grub.cfg
-rw-r--r-- 1 root root 1024 Oct 7 2013 grubenv
drwxr-xr-x 2 root root 6144 Aug 6 2016 i386-pc
drwxr-xr-x 2 root root 1024 Aug 6 2016 locale
-rw-r--r-- 1 root root 2400500 Aug 6 2016 unicode.pf2

qemu Version: 2.7.0-10

Host OS: Debian 8x64
Guest OS: Debian 8x64

QMP Commands log:
socat UNIX-CONNECT:/var/run/qemu-server/48016.qmp STDIO
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": "pve-qemu-kvm_2.7.0-10"}, "capabilities": []}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "drive-mirror",
  "arguments": {
    "device": "drive-ide0",
    "target": "/diskc/48016/vm-48016-disk-2.raw",
    "sync": "full",
    "mode": "absolute-paths",
    "speed": 0
  }
}
{"return": {}}
{"timestamp": {"seconds": 1506331591, "microseconds": 623095}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-ide0", "len": 269445758976, "offset": 269445758976, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1506332641, "microseconds": 245272}, "event": "SHUTDOWN"}
{"timestamp": {"seconds": 1506332641, "microseconds": 377751}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-ide0", "len": 271707340800, "offset": 271707340800, "speed": 0, "type": "mirror"}}

Tags: drive-mirror
Revision history for this message
John Snow (jnsnow) wrote :

Do you have more information on the sequence of commands issued to QEMU? I see the drive-mirror invocation, but then I don't see what causes the shutdown or the BLOCK_JOB_COMPLETED event. Usually this is in response to a user command. I'm wondering if the exact sequence issued is safe.

Do you have a reproducer that I could try on my system to examine the behavior?

Also, 2.7 is a bit old at this point; do you have the ability to try a version currently supported by the upstream project? (2.9 or 2.10?)

Revision history for this message
Farid Zarazvand (farid67z) wrote :

In last try, vm shutdown before completing blockjob.
So i tried again and these are the exact qmp commands which i used:

Sequence of qmp commands:

socat UNIX-CONNECT:/var/run/qemu-server/48016.qmp STDIO
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": "pve-qemu-kvm_2.7.0-10"}, "capabilities": []}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "drive-mirror",
  "arguments": {
    "device": "drive-ide0",
    "target": "/diskb/48016/vm-48016-disk-1.raw",
    "sync": "full",
    "mode": "absolute-paths",
    "speed": 0
  }
}
{"return": {}}
{"timestamp": {"seconds": 1506434603, "microseconds": 633439}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-ide0", "len": 268479496192, "offset": 268479496192, "speed": 0, "type": "mirror"}}
{ "execute": 'block-job-complete', 'arguments': { 'device': 'drive-ide0' } }
{"return": {}}
{"timestamp": {"seconds": 1506494590, "microseconds": 735601}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-ide0", "len": 278522167296, "offset": 278522167296, "speed": 0, "type": "mirror"}}

Then i poweroff VM and start it again from new image, but grub starts in rescue mode.

Revision history for this message
John Snow (jnsnow) wrote :

It *looks* to me at a glance like the sequence should be safe, but I don't have any hunches for what could be going wrong, or why.

Can you please post:

(1) The command-line used to launch QEMU on the source machine, and
(2) The command-line used to launch QEMU on the destination machine from the mirrored image?

Revision history for this message
Farid Zarazvand (farid67z) wrote :

There is no source or destination machine. I used drive-mirror to transfer VM Image to different physical disk on same machine ("mode": "absolute-paths"). After block-job-complete and shutting down vm, I start vm again with same command with different drive path pointing to mirrored image. "-drive 'file=MIRRORED_IMAGE_PATH..".

* Command line used to launch VM:
```
/usr/bin/kvm -id 48016 -chardev 'socket,id=qmp,path=/var/run/qemu-server/48016.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/48016.pid -daemonize -smbios 'type=1,uuid=7a4b5ebc-a230-4e57-8ebc-4979a7b5a378' -name srv34197 -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga cirrus -vnc unix:/var/run/qemu-server/48016.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -chardev 'socket,path=/var/run/qemu-server/48016.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:6f368eef312d' -drive 'file=/var/lib/vz/images/48016/vm-48016-disk-1.raw,if=none,id=drive-ide0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -drive 'file=/var/lib/vz/template/iso/sysresccd-v03.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -netdev 'type=tap,id=net0,ifname=tap48016i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'rtl8139,mac=D6:89:56:3F:38:1F,netdev=net0,bus=pci.0,addr=0x12,id=net0' -netdev 'type=tap,id=net1,ifname=tap48016i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'rtl8139,mac=66:92:13:4A:6B:7E,netdev=net1,bus=pci.0,addr=0x13,id=net1'
```

Revision history for this message
John Snow (jnsnow) wrote :

OK, so we're only talking about migrating a disk and not a whole VM, I misunderstood. However... are you using qemu *2.7*? That's quite old! Before digging into this further I need to insist that you try on a supported release, either 4.0.1, 4.1.1, or 4.2.0.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.