file transfer over cifs to 64bit guest corrupts large files

Bug #1882241 reported by timsoft
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

qemu 4.0 compiled fom source.
vm called by
qemu-system-x86_64 -cpu qemu64 -smp 4 -m 4G -drive file=/data/images/slack14.2_64bit_test.qcow2,format=qcow2 -cdrom /mnt/smb1/slackware/iso/slackware64-14.2-install-dvd.iso -boot c -net nic,macaddr=02:00:00:11:11:17,model=i82551 -net bridge,br=br0 -enable-kvm -k en-gb -display vnc=:3 -monitor telnet:localhost:7103,server,nowait,nodelay

copying large files eg 2.4gb or reading them on a cifs mount in the guest causes corruption every time. For smaller files 40-60mb corruption is more than 50% of the time. tested by md5sum on cifs server, or on host machine vs. on guest vm.
corruption is seen only with 64bit guest using cifs with i82551 emulated network device
ie. 32bit guest using cifs with i82551 emulated network device gives no corruption.

changing the emulated device to vmxnet3 removes the data corruption (see below)

qemu-system-x86_64 -cpu qemu64 -smp 4 -m 4G -drive file=/data/images/slack14.2_64bit_test.qcow2,format=qcow2 -cdrom /mnt/smb1/slackware/iso/slackware64-14.2-install-dvd.iso -boot c -net nic,macaddr=02:00:00:11:11:17,model=vmxnet3 -net bridge,br=br0 -enable-kvm -k en-gb -display vnc=:3 -monitor telnet:localhost:7103,server,nowait,nodelay

this corruption is repeatable. ie. I created new vm, call using top example, installed 64bit linux, mounted cifs share and copied 2.4gb file to /tmp then run md5sum "filecopied"
the md5sum is different every time. copy same file to the host, or to a 32bit guest with the same virtual network device and bridge and md5sums are correct. The host pysical network adapter is
lspci|grep Ether
1e:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 11)

physically connected via gigabit ethernet to cifs server (via gigabit switch)

Tags: i82551
Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Bug 1882241] [NEW] file transfer over cifs to 64bit guest corrupts large files

On Fri, Jun 05, 2020 at 12:30:39PM -0000, timsoft wrote:
> Public bug reported:
>
> qemu 4.0 compiled fom source.
> vm called by
> qemu-system-x86_64 -cpu qemu64 -smp 4 -m 4G -drive file=/data/images/slack14.2_64bit_test.qcow2,format=qcow2 -cdrom /mnt/smb1/slackware/iso/slackware64-14.2-install-dvd.iso -boot c -net nic,macaddr=02:00:00:11:11:17,model=i82551 -net bridge,br=br0 -enable-kvm -k en-gb -display vnc=:3 -monitor telnet:localhost:7103,server,nowait,nodelay
>
> copying large files eg 2.4gb or reading them on a cifs mount in the guest causes corruption every time. For smaller files 40-60mb corruption is more than 50% of the time. tested by md5sum on cifs server, or on host machine vs. on guest vm.
> corruption is seen only with 64bit guest using cifs with i82551 emulated network device
> ie. 32bit guest using cifs with i82551 emulated network device gives no corruption.
>
> changing the emulated device to vmxnet3 removes the data corruption (see
> below)
>
> qemu-system-x86_64 -cpu qemu64 -smp 4 -m 4G -drive
> file=/data/images/slack14.2_64bit_test.qcow2,format=qcow2 -cdrom
> /mnt/smb1/slackware/iso/slackware64-14.2-install-dvd.iso -boot c -net
> nic,macaddr=02:00:00:11:11:17,model=vmxnet3 -net bridge,br=br0 -enable-
> kvm -k en-gb -display vnc=:3 -monitor
> telnet:localhost:7103,server,nowait,nodelay
>
> this corruption is repeatable. ie. I created new vm, call using top example, installed 64bit linux, mounted cifs share and copied 2.4gb file to /tmp then run md5sum "filecopied"
> the md5sum is different every time. copy same file to the host, or to a 32bit guest with the same virtual network device and bridge and md5sums are correct. The host pysical network adapter is
> lspci|grep Ether
> 1e:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 11)
>
> physically connected via gigabit ethernet to cifs server (via gigabit
> switch)

Not a solution but some comments:

1. As a sanity-check you could try "nc <guest-ip> 1234 </path/to/file" on
   the host and "nc -l -p 1234 >/tmp/file" in the guest. Netcat simply
   sends/receives data over a TCP connection (it's a much simpler test
   than CIFS). Is the checksum okay?

2. I don't know the CIFS network protocol, but if Wireshark can dissect
   it then you could compare the flows between the vmxnet3 and the
   i82551. This is only feasible if Wireshark can produce an unencrypted
   conversation and the CIFS protocol doesn't have many protocol header
   fields that differ between two otherwise identical sessions.

3. virtio-net is the most widely used and high-performance NIC model.
   Other emulated NIC models are mainly there for very old guests that
   lack virtio guest drivers.

Revision history for this message
timsoft (tim-tree-of-life) wrote :

thanks for the suggestion. I tried using netcat (nc) to transfer a large file from host to guest, and also from fileserver to guest with the problematic i82551 emulated network adapter on the host and the files transfered reliably. (correct md5sum 3 out of 3 attempts)
I also tried md5sum of the same file mounted on the guest fs as before and it still corrupts the data.
this seems to imply there is something in the cifs implementation which reacts adversly with this particular combination of virtual network hardware, the fact it works with the vmxnet3 emulated card, would support that conclusion.

Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Bug 1882241] Re: file transfer over cifs to 64bit guest corrupts large files

On Wed, Jun 17, 2020 at 02:55:55PM -0000, timsoft wrote:
> thanks for the suggestion. I tried using netcat (nc) to transfer a large file from host to guest, and also from fileserver to guest with the problematic i82551 emulated network adapter on the host and the files transfered reliably. (correct md5sum 3 out of 3 attempts)
> I also tried md5sum of the same file mounted on the guest fs as before and it still corrupts the data.
> this seems to imply there is something in the cifs implementation which reacts adversly with this particular combination of virtual network hardware, the fact it works with the vmxnet3 emulated card, would support that conclusion.

I'm not sure if someone will look into it because the eepro100
(i82551) NIC device is old an not used much nowadays.

However, if someone does decide to investigate and wants to brainstorm
debugging ideas or needs help, feel free to contact me.

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently moving its bug tracking to another system.
For this we need to know which bugs are still valid and which could be
closed already. Thus we are setting older bugs to "Incomplete" now.

If the bug has already been fixed in the latest upstream version of QEMU,
then please close this ticket as "Fix released".

If it is not fixed yet and you think that this bug report here is still
valid, then you have two options:

1) If you already have an account on gitlab.com, please open a new ticket
for this problem in our new tracker here:

    https://gitlab.com/qemu-project/qemu/-/issues

and then close this ticket here on Launchpad (or let it expire auto-
matically after 60 days). Please mention the URL of this bug ticket on
Launchpad in the new ticket on GitLab.

2) If you don't have an account on gitlab.com and don't intend to get
one, but still would like to keep this ticket opened, then please switch
the state back to "New" within the next 60 days (otherwise it will get
closed as "Expired"). We will then eventually migrate the ticket auto-
matically to the new system (but you won't be the reporter of the bug
in the new system and thus won't get notified on changes anymore).

Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.