Ok, I think that I've found the problem. The XFree86 binary does its own
object loading and on sparc it is failing to set the PROT_EXEC bit when
mapping executable code. This is falling over a change in the kernel
which checks the executable bit and gives a Segmentation Fault.

Full rationale, explanation and proposed patch below.

Richard

I was looking through the changes between 2.4.27 and 2.4.28 and there is
a patch that adds a check that executed code is actually mapped as
executable (one bit of it is)

diff -urN linux-2.4.27/arch/sparc64/mm/fault.c
linux-2.4.28/arch/sparc64/mm/fault.c
--- linux-2.4.27/arch/sparc64/mm/fault.c        2004-08-07
16:26:04.000000000 -0700
+++ linux-2.4.28/arch/sparc64/mm/fault.c        2004-11-17
03:54:21.156379721 -0800

@@ -404,6 +404,16 @@
         */
 good_area:
        si_code = SEGV_ACCERR;
+
+       /* If we took a ITLB miss on a non-executable page, catch
+        * that here.
+        */
+       if ((fault_code & FAULT_CODE_ITLB) && !(vma->vm_flags &
VM_EXEC)) {
+               BUG_ON(address != regs->tpc);
+               BUG_ON(regs->tstate & TSTATE_PRIV);
+               goto bad_area;
+       }
+
        if (fault_code & FAULT_CODE_WRITE) {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;

Now given that this reports a SIGSEGV if you hit this issue (see
SEGV_ACCERR at the top of the patch) I figured that this would be
something that could be triggered.

Now looking at the broken strace from 2.4.28 we see two mmaps during the
loading of module pcidata. These correspond to the text(code) and data
sections of the binary.

mmap(NULL, 163840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x70272000
lseek(5, 229836, SEEK_SET)              = 229836
read(5, "\0pci_vendor_003d\0pci_vendor_0e11"..., 157024) = 157024
brk(0)                                  = 0x274000
brk(0x296000)                           = 0x296000
mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7029a000
lseek(5, 380, SEEK_SET)                 = 380
read(5, "\201\303\340\10\220\20 \1\201\303\340\10\1\0\0\0\235\343"...,
1612) = 1612

Note that neither has PROT_EXEC set in the mmap. The second one is the
text section that really needs it.

Now looking at the XFree86 code in
xc/programs/Xserver/hw/xfree86/loader/elfloader.c

This gets memory for the data in one of two ways (chosen at compile
time):

xf86loadermalloc - actually a call to the glibc2 malloc
or
mmap.

The mmap specifies PROT_EXEC but I've disassembled the XFree86 binary
and it seems to use the xf86loadermalloc option.

   77514:       90 00 40 08     add  %g1, %o0, %o0
   77518:       40 05 69 49     call  0x1d1a3c
   7751c:       d0 24 60 48     st  %o0, [ %l1 + 0x48 ]
   77520:       84 10 00 08     mov  %o0, %g2
   77524:       80 a2 20 00     cmp  %o0, 0
   77528:       02 80 00 77     be  0x77704

Apologies to those who don't read SPARC assembler!

The call at 77518 is a call to malloc (from the symbol table)

001d1a3c      DF *UND*  00000234  GLIBC_2.0   malloc

I'm guessing that malloc doesn't set PROT_EXEC (people generally don't
want it and it would create a security risk).

Now in the elfloader.c file there is a bit of conditional code for ia64
and OpenBSD that does an mprotect to add PROT_EXEC to the code.

So it looks quite clear to me that we need to do the same for sparc.
i.e. apply the following patch (untested I'm afraid)

--- xc/programs/Xserver/hw/xfree86/loader/elfloader.c.orig     
2004-12-02 16:56:31.000000000 +0000
+++ xc/programs/Xserver/hw/xfree86/loader/elfloader.c   2004-12-02
16:57:42.000000000 +0000
@@ -893,7 +893,7 @@
            ErrorF( "ELFCreateGOT() Unable to reallocate memory!!!!\n"
);
            return FALSE;
        }
-#   if defined(linux) && defined(__ia64__) || defined(__OpenBSD__)
+#   if defined(linux) && (defined(__ia64__) || defined(__sparc__)) ||
defined(__OpenBSD__)
        {
            unsigned long page_size = getpagesize();
            unsigned long round;

Anyone fancy compiling a new xserver binary?


On Thu, 2004-12-02 at 06:23, Jurzitza, Dieter wrote:
> Dear listmembers,
> I can confirm for my U60 that the XFree86-debug server comes up on 2.4.28. So I seem to be consistent with what Admar said and what Ron has been saying. What makes me wonder, though, is why does the binary loader work with 2.4.27 and does not work with 2.4.28.
> And, moreover, if it is a loader issue it seems more plausible to me that I can observe additional side effects on 2.4.28 not being related to X11 (like very long reaction times on ping / ssh requests, not settling a network connection for quite a while)
> A propably dumb question:
> is that binary loader a simple file? would it be possible to get that loader from another version (like Debian Woody), or is it buried deep down in the kernel?
> Thank you for your inputs,
> take care
> 
> 
> 
> Dieter Jurzitza

-- 
<email address hidden>