Wrong implementation of SSE4.1 pmovzxbw and similar instructions
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Expired
|
Undecided
|
Unassigned |
Bug Description
QEMU 1.5.0 (and git version, as far as I can tell from the source code) has incorrect implementation of pmovzxbw and similar SSE4.1 instructions. The instruction zero-extends the first 8 8-bit elements of a vector to 16bit vector and puts them to another vector. The current implementation applies this operation only to the first element and zeros out the rest.
To verify, compile the attached program for SSE4.1 (g++ -msse4.1 cvtint.cc). On real hardware, it produces the following output:
$ ./a.out
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0
On QEMU, the output is as follows:
$ ./a.out
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
QEMU is invoked as:
qemu-system-x86_64 \
-M pc -cpu Haswell,
-serial stdio -no-reboot \
-kernel vmlinuz -initrd initrd.img \
-netdev user,id=user.0 -device rtl8139,
-hda ubuntu-amd64.ext3 \
--append "rw console=tty root=/dev/sda"
Looking through old bug tickets... is this still an issue with the latest version of QEMU? Or could we close this ticket nowadays?