Ubuntu

[armel] the ./H5detect test segfaults when built for armv7

Reported by Matthias Klose on 2010-09-10
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gcc-4.4 (Ubuntu)
High
Yao Qi
hdf5 (Ubuntu)
High
Jani Monoses

Bug Description

Related branches

Matthias Klose (doko) wrote :

see with 4.4.4-14ubuntu1

Changed in hdf5 (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in gcc-4.4 (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Oliver Grawert (ogra) wrote :

david is tasked with fixing hdf5 in the ubuntu-arm team currently, assigning to him

Changed in hdf5 (Ubuntu):
assignee: nobody → David Sugar (dyfet)
milestone: none → ubuntu-10.10
Colin Watson (cjwatson) wrote :

hdf5 (1.8.4-patch1-2ubuntu1) maverick; urgency=low

  * On armel, build with -march=armv5t to avoid illegal instruction
    in H5detect test. LP: #635199.

 -- Matthias Klose <email address hidden> Fri, 10 Sep 2010 18:46:05 +0200

Changed in hdf5 (Ubuntu):
status: Confirmed → Fix Released

I think we only want to do this for building h5detect itself, and not to use armv5t for the package as a whole. In fact, I was doing a small patch to do something similar for h5detect alone.

Matthias Klose (doko) wrote :

thanks! reopening, and adjusting the milestone

Changed in hdf5 (Ubuntu):
milestone: ubuntu-10.10 → maverick-updates
status: Fix Released → Confirmed
Matthias Klose (doko) on 2010-09-17
tags: added: arm-mode
Changed in gcc-4.4 (Ubuntu):
assignee: nobody → Yao Qi (yao-codesourcery)
Yao Qi (yao-codesourcery) wrote :

It is not a toolchain issue here. In H5detect.c:ALIGNMENT, a testing to mis-aligned access is performed, as below,

        *((TYPE*)(_buf+align_g[_ano])) = _val; /*possible SIGBUS or SEGSEGV*/ \
        _val2 = *((TYPE*)(_buf+align_g[_ano])); /*possible SIGBUS or SEGSEGV*/ \

and SIGBUS and SIGSEGV is handled. However, on ARM, at least in thumb2, SIGILL is generated. This can be verified by this small case,
int main()
{
  char c[20];
  int i =0; float f;

   for (i = 0; i < 4; i++)
   {
      printf ("address is %x\n", (unsigned int) &c[i]);
      f = *((float*) &c[i]);
   }

return 0;
}

$ gcc align.c -o align
$ ./align
address is be8ebc20
address is be8ebc21
Illegal instruction

However, everything is OK with -march=armv5t,
$ gcc -march=armv5t align.c -o align
$ ./align
address is bedfbc20
address is bedfbc21
address is bedfbc22
address is bedfbc23

To fix this problem here, we may can either modify source code of hdf5 to handle SIGILL, or change kernel to use SIGBUS for unalinged access instead of SIGILL (http://www.spinics.net/lists/arm-kernel/msg93601.html).

Dave Martin (dave-martin-arm) wrote :

Hard-coding the alignment detection result for certain types on arm also works --- this allows hdf5 to be built for armv7+vfpv3, and seems reasonable since the alignment requirements for those types are common across all arm architecture versions.

Note that if we care about this package working on older kernels, we need to care about all the 64-bit integer types as well as floats, since some kernels (at least, <= 2.6.28) don't emulate LDRD/STRD in Thumb-2.

Shouldn't be an issue for maverick though, and shouldn't matter at all if the kernel is eventually fixed to generate SIGBUS.

On 22.09.2010 18:07, Dave Martin wrote:
> Shouldn't be an issue for maverick though, and shouldn't matter at all
> if the kernel is eventually fixed to generate SIGBUS.

could you open and track a kernel task?

Paul Brook (paul-codesourcery) wrote :

I believe the SIGILL is actually a red herring.

The underlying issue that the package appears to be testing for misaligned access support at runtime. On ARM you can't do this reliably. Behavior of misaligned accesses depends on both system configuration and instruction choice. i.e. it may veys from one system to annother, or even one access to annother within the same program.

You are correct in that the test is done at "runtime", but of course this actually only happens once at "configure" time for the package, which means it is using the runtime characteristics of a specific archive build machine, and at the time only. The behavior just happens to currently appear relatively "consistent" so far given the subset of ARM hardware and the given release that is being actively supported, but this is certainly not guaranteed to remain so. Hence I do happen to agree this methodology is itself flawed in the case of ARM.

Paul Brook (paul-codesourcery) wrote :

It's actually worse than that. The test can give incorrect results even for the current machine. e.g.:
void foo(float * p) // assume p is misaligned
{
*p = 0; // first store
frob(0);
*p = 0; // second store
}
It is possible for the first store to succeed and the second to cause an alignment fault (or vice-versa).

Tobin Davis (gruemaster) on 2011-01-19
Changed in hdf5 (Ubuntu):
assignee: David Sugar (dyfet) → Jani Monoses (jani)
Jani Monoses (jani) wrote :

If this is not also a toolchain issue then the gcc-4.4 bugtask can be closed.
The goal is here to have the package built for armv7 and using armv5 for the specific test only?
The original issue was prompted by a FTBFS which was solved by building for armv5.

Not on hardfloat. It still breaks regardless of arch.

On 19 January 2011 22:21, Jani Monoses <email address hidden> wrote:
> If this is not also a toolchain issue then the gcc-4.4 bugtask can be closed.
> The goal is here to have the package built for armv7 and using armv5 for the specific test only?
> The original issue was prompted by a FTBFS which was solved by building for armv5.
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/635199
>
> Title:
>  [armel] the ./H5detect test segfaults when built for armv7
>
> Status in “gcc-4.4” package in Ubuntu:
>  Confirmed
> Status in “hdf5” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  https://launchpad.net/ubuntu/+source/hdf5/1.8.4-patch1-2/+build/1725681
>
>  work around: build with -march=armv5t
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/gcc-4.4/+bug/635199/+subscribe
>

Jani Monoses (jani) wrote :

I am not sure how this bug can be solved, Paul's comments seem to indicate that there's no good test for this.
If it is a FTBFS with hardfloat (which is not yet supported in Ubuntu) Konstantinos can you offer a solution?
The app may rely on the behaviour it tests for so I am not sure whether making the test itself pass would solve much.
I am open to any suggestion as to how to fix this better (i.e. beyond Ubuntu FTBFS fix, which this initially was).

From what I read the kernel will keep SIGILL for the forseeable future as it is still relied on by some userspace.

Steve Langasek (vorlon) on 2011-02-15
tags: added: arm-porting-queue
Dave Martin (dave-martin-arm) wrote :

I still think it would be reasonable to hard-code the results for float alignment requirements on ARM.

On all ARM floating-point architecture versions, float and double must be size-aligned. I don't think there has ever been any exception to this rule, nor is any exception likely in the forseeable future IIUC. If the alignment requirements are relaxed someday, the hard-coded results may be suboptimal, but the package should still work.

Jani Monoses (jani) wrote :

I have just uploaded a new revision today restoring the build to armv7 and it worked. The previous failure may have been specific to the maverick toolchain?
Is there a reason not to close this bug?

Jani Monoses (jani) on 2011-02-16
Changed in gcc-4.4 (Ubuntu):
status: Confirmed → Invalid
Changed in hdf5 (Ubuntu):
status: Confirmed → Fix Released
Dave Martin (dave-martin-arm) wrote :

What is the kernel on the buildds?

If it is >= 2.6.36, the fix to generate SIGBUS for alignment faults will be present in the kernel. If the buildds are all >= 2.6.36, this may solve the problem. If some are older, we might randomly get build problems when the package gets rebuilt.

However, as discussed on IRC, it may be fundamentally incorrect to detect this kind of thing at build time, unless you're only going to run the resulting binaries on the same platform used to build them...

Dave Martin (dave-martin-arm) wrote :

There's also the problem that deferencing a float * or double * pointer won't necessarily generate an alignment fault.

This won't:

main () {

Dave Martin (dave-martin-arm) wrote :

(argh, hit "post" by accident)

This probably won't generate a fault, because only integer accesses are done to the target pointer in the generated code:

main () {
    printf("%f\n", *(double *)((char *)main + 2));
}

This probably will (but it depends on compiler options, and on what code the compiler actually generates):

main () {
    printf("%f\n", *(double *)((char *)main + 2) + *(double *)((char *)main + 6));
}

If the compiler loads the operands straight to VFP registers (needed because an actual floating-point computation is done), the above code will generate a SIGBUS. If the compiler loads the operands to integer registers first and then transfers them to the FPU, the code won't generate a SIGBUS. Optimisation can affect the result too. In my experiments, -O0 generally means that SIGBUS doesn't happen, since in this case floating-point operands are usually loaded to integer registers before being transferred to FPU registers.

Just tried this on armhf, but still it fails with the same error (admittedly the kernel there is 2.6.31, which explains it), I will try again using a .38-pre kernel once I get my hands on a working one.

Jani Monoses (jani) wrote :

Konstantinos I use the current omap4/natty kernel .35 and it worked, I don't know what version runs on the buildd though.

Yes it probably has to do with the kernel now. Sure the bug can be close,
but I'll be posting
any progress with it on armhf here anyway.

Konstantinos

On 18 February 2011 12:46, Jani Monoses <email address hidden> wrote:

> Konstantinos I use the current omap4/natty kernel .35 and it worked, I
> don't know what version runs on the buildd though.
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/635199
>
> Title:
> [armel] the ./H5detect test segfaults when built for armv7
>
> Status in “gcc-4.4” package in Ubuntu:
> Invalid
> Status in “hdf5” package in Ubuntu:
> Fix Released
>
> Bug description:
> https://launchpad.net/ubuntu/+source/hdf5/1.8.4-patch1-2/+build/1725681
>
> work around: build with -march=armv5t
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/gcc-4.4/+bug/635199/+subscribe
>

Dave Martin (dave-martin-arm) wrote :

Maybe this bug can be closed, but in that case some bug should be kept open for tracking purposes -- the build-time test done by H5detect still appears wrong by design, and this probably needs to be addressed at some point.

Just tested building on armhf with a 2.6.38 kernel here. The bug is "fixed", as it it does not generate SIGILL. So the package is buildable on armhf, but I do agree with Dave here, the method to detect float-alignment should probably be changed.

Konstantinos

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers