Bug #1198221 “jupp: destroys large files on ^K/sort” : Bugs : jupp

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-07-05:

#1

x.xz Edit (183.2 KiB, application/octet-stream)

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-07-05:

#2

The error only occurs when *not* running under Valgrind, even with the same binary.

WTF?

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-07-05:

#3

This happens during the write part of ^K/ → probably bsavefd in bw.c (WTF?!?!?!)

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-07-09:

#4

'jupp x' enter ^K / 'diff -u - x' enter

results in a diff…

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-07-09:

#5

3.1.24-1~wtf50+1 on squeeze works; scp’ing it to a sid system makes it fail; scp’ing jupp CVS HEAD to a squeeze system makes it work.

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-07-09:

#6

It’s almost certainly eglibc’s libc.so.6 that makes a difference, between squeeze and wheezy; it’s definitely not the SSE3 memcpy issue.

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-07-09:

#7

Download full text (5.5 KiB)

Okay, to recap:

This happens on Debian wheezy/amd64, wheezy/i386, sid/i386 (with the amd64 kernel), but not lenny or squeeze, and not in Valgrind.

But it also doesn’t happen on Jupp’s (my friend, not the editor) box: Linux deimos 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux

It also doesn’t happen on another wheezy/amd64 box I have access to at work:

root@showcase:~ # uname -a; cat /proc/cpuinfo
Linux showcase.lan.tarent.de 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64 GNU/Linux
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 13
model name : QEMU Virtual CPU version (cpu64-rhel6)
stepping : 3
microcode : 0x1
cpu MHz : 2999.554
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm up nopl pni cx16 hypervisor lahf_lm
bogomips : 5999.10
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

But it does happen on this one (i386 even):

root@evolvis:~ # uname -a; cat /proc/cpuinfo
Linux evolvis.org 3.2.0-4-686-pae #1 SMP Debian 3.2.39-2 i686 GNU/Linux
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 2
model name : QEMU Virtual CPU version 1.1.2
stepping : 3
microcode : 0x1
cpu MHz : 2666.760
cache size : 4096 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 hypervisor lahf_lm
bogomips : 5333.52
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

I’d say it’s a virtualisation issue, were it not for my bare-metal desktop workstation (sid/i386 running an amd64 kernel) which does have Linux-kvm installed and in use, but where the issue occurs outside of any VM:

tglase@tglase:~ $ uname -a; sed '/^$/q' /proc/cpuinfo
Linux tglase.lan.tarent.de 3.9-1-amd64 #1 SMP Debian 3.9.6-1 i686 GNU/Linux
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
stepping : 5
microcode : 0x11
cpu MHz : 3060.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 6128.71
clflush size : 6...

Okay, to recap:

This happens on Debian wheezy/amd64, wheezy/i386, sid/i386 (with the amd64 kernel), but not lenny or squeeze, and not in Valgrind.

But it also doesn’t happen on Jupp’s (my friend, not the editor) box: Linux deimos 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux

It also doesn’t happen on another wheezy/amd64 box I have access to at work:

root@showcase:~ # uname -a; cat /proc/cpuinfo
Linux showcase.lan.tarent.de 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64 GNU/Linux
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 13
model name      : QEMU Virtual CPU version (cpu64-rhel6)
stepping        : 3
microcode       : 0x1
cpu MHz         : 2999.554
cache size      : 4096 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 4
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm up nopl pni cx16 hypervisor lahf_lm
bogomips        : 5999.10
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

But it does happen on this one (i386 even):

root@evolvis:~ # uname -a; cat /proc/cpuinfo
Linux evolvis.org 3.2.0-4-686-pae #1 SMP Debian 3.2.39-2 i686 GNU/Linux
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 2
model name      : QEMU Virtual CPU version 1.1.2
stepping        : 3
microcode       : 0x1
cpu MHz         : 2666.760
cache size      : 4096 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 4
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 hypervisor lahf_lm
bogomips        : 5333.52
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

I’d say it’s a virtualisation issue, were it not for my bare-metal desktop workstation (sid/i386 running an amd64 kernel) which does have Linux-kvm installed and in use, but where the issue occurs outside of any VM:

tglase@tglase:~ $ uname -a; sed '/^$/q' /proc/cpuinfo
Linux tglase.lan.tarent.de 3.9-1-amd64 #1 SMP Debian 3.9.6-1 i686 GNU/Linux
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz
stepping        : 5
microcode       : 0x11
cpu MHz         : 3060.000
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6128.71
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

I’m totally at loss what to do.

I just wanted to confirm that changing libc versions makes it work by running “schroot -c squeeze” on the aforementioned sid/i386 box, but I seem to be getting crazy, because now it always fails (both with the lenny-compiled jupp and with the sid-compiled one where I ln -s /lib/libncurses.so.5 libtinfo.so.5 and run LD_LIBRARY_PATH=. to make it work).

Meh. Let’s try one more box.

root@builds:~ # uname -a; sed '/^$/q' /proc/cpuinfo
Linux builds.lan.tarent.de 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64 GNU/Linux
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 8
model name      : Six-Core AMD Opteron(tm) Processor 2431
stepping        : 0
cpu MHz         : 2400.000
cache size      : 512 KB
physical id     : 0
siblings        : 6
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips        : 4800.52
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

It’s definitely not the kernel… but there’s something utterly strange going on.

I run DIST=squeeze cowbuilder --bindmount $PWD --login on “builds” (the last machine), then it works. I try another jupp binary and it doesn’t work. I try the first again, and it doesn’t work again, with absolutely nothing changed.

On another note, the error now also occurs in Valgrind, but Valgrind doesn’t detect anything out of the ordinary.

As a last-ditch test, I compiled jupp CVS HEAD on sid/i386, where it failed. Then I scp’d it and all of its dependencies (ld-linux, libc, libutil) to showcase, where it worked.

I’m somewhat sure this is not a bug in jupp itself, because it doesn’t happen on e.g. BSD either.

tags:

added: help

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-07-09:

#8

It’s not an amd64 thing either… the binary I scp’d also to an i386 wheezy box where it works.

root@evolvis-ci:~ # uname -a; cat /proc/cpuinfo
Linux evolvis-ci.lan.tarent.de 3.2.0-4-686-pae #1 SMP Debian 3.2.46-1 i686 GNU/Linux
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 3
model name : QEMU Virtual CPU version 1.1.2
stepping : 3
microcode : 0x1
cpu MHz : 3103.574
cache size : 4096 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu de pse tsc msr pae mce cx8 apic sep pge cmov mmx fxsr sse sse2 up pni popcnt hypervisor
bogomips : 6207.14
clflush size : 32
cache_alignment : 32
address sizes : 36 bits physical, 32 bits virtual
power management:

Revision history for this message

Thorsten Glaser (mirabilos) wrote on 2013-08-19:

#9

Confirmed as a data corruption bug, only affecting the ^K/ (pipe) command, masked by a timing issue (only apparent on the faster multi-CPU machines, hidden from debuggers).

Fixed in jupp25.

Changed in jupp:
status:	New → Fix Released

jupp

jupp: destroys large files on ^K/sort

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches