nih: nih_alloc() doesn't align returned pointer for picky platforms like sparc

Bug #436758 reported by Michael Casadevall on 2009-09-25
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Undecided
Unassigned
libnih
Low
Unassigned
upstart
Low
Scott James Remnant (Canonical)
upstart (Ubuntu)
Low
Scott James Remnant (Canonical)
Declined for Karmic by Scott James Remnant (Canonical)

Bug Description

Binary package hint: upstart

On SPARC servers running karmic, upstart currently seems to segfault on startup, causing init to abort and the system to crash; this regression appears to have been introduced with upstart 0.6.3, but I haven't isolated the exact upload which broke uploads. In addition, when current versions of upstart are installed on SPARC, the system spontaneously seems to restart in the postinst for unclear reasons.

ProblemType: Bug
Architecture: amd64
Date: Fri Sep 25 13:23:42 2009
DistroRelease: Ubuntu 9.10
Package: upstart 0.6.3-3
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-10.34-generic
SourcePackage: upstart
Uname: Linux 2.6.31-10-generic x86_64

Changed in upstart (Ubuntu):
importance: Undecided → High

SPARC is not a supported architecture. Patches welcome, but I'm not going to spend any particular amount of time investigating this.

If you can get a core dump, that would probably be useful - one way would be to make the root filesystem mounted writable (ro to rw on kernel command-line) so that it can actually dump - it should end up in /

Changed in upstart (Ubuntu):
importance: High → Low
summary: - [sparc] upstart currently hangs on start
+ upstart doesn't boot on sparc
Changed in upstart (Ubuntu):
status: New → Incomplete
assignee: nobody → Michael Casadevall (mcasadevall)

I am able to reproduce this by cloning a sparc VM running jaunty and performing an update to karmic. The VM crashes in the middle of the update. The system is then not bootable without the same crash when starting init.

I was able to get to a shell (by adding init=/bin/bash to my boot options). It does not appear that init is leaving a core dump.

I was able to get some kernel-level debug information using the magic sysreq key. I have attached a long dump of this data.

Let me know if there are any further steps I can take to help fix this. While I am aware that sparc is not one of Ubuntu's supported architectures, there is a community of Ubuntu users running on sparc, and it would be useful to address his issue.

Michael tells me that "make check" in the Upstart source tree fails on SPARC, that would be a good place to start.

bluedream (wangjinfajimmy) wrote :

what's wrong with me?the error msg:
init:sreadahead main process terminated with status 1
init: caught abort core dumped
init:procps main process terminated with status 255

bluedream (wangjinfajimmy) wrote :

bluedream wrote 2 seconds ago: #5

what's wrong with me?the error msg:
init:sreadahead main process terminated with status 1
init: caught abort core dumped
init:procps main process terminated with status 255
------------------------------------------------------------------------------it's armrootfs

Steve Langasek (vorlon) wrote :

Added to <https://wiki.ubuntu.com/KarmicKoala/ReleaseNotes#Sparc%20not%20supported%20by%20Ubuntu%209.10>:

The upstart init system in Ubuntu 9.10 fails to work on the sparc architecture due to an undiagnosed SIGBUS error. Users of Ubuntu on sparc are advised to remain on Ubuntu 9.04 instead of upgrading to 9.10. Assistance in resolving this architecture-specific bug for Ubuntu 10.04 is welcome. (436758)

Changed in ubuntu-release-notes:
status: New → Fix Released
katre (katre50) wrote :

It appears my emails on this didnlt get copied to the bug report.

I was able to reproduce the error on sparc by building from source in a karmic chroot and running make check. This revealed that libnih's custom allocation code is producing memory blocks that are improperly aligned for Sparc. I am working on a patch using alignof which should fix the SIGBUS issue on sparc.

katre (katre50) wrote :

I have written a fix for the problem with allocation in nih/alloc.c. This uses the __alignof__ operator to fix the memory alignment in blocks returned from nih's allocator. This should be portable to other architectures but obviously requires testing. However it does fix the bus error when running "make check" in the current (0.6.3-10) upstart sources.

If someone can tell me how to use the upstart package I built in an install to see if it fixes those issues, I will be happy to test that as well.

Shouldn't it be something more like:

  __allocof__ (struct) * ((sizeof (struct) - 1) / __allocof__ (struct) + 1)

Not entirely sure how __allocof__ works?

We'll need you to follow http://upstart.ubuntu.com/wiki/CopyrightAssignment for this patch

On Thu, Oct 22, 2009 at 3:57 PM, Scott James Remnant <email address hidden>wrote:

> Shouldn't it be something more like:
>
> __allocof__ (struct) * ((sizeof (struct) - 1) / __allocof__ (struct) +
> 1)
>
> Not entirely sure how __allocof__ works?
>
>
I used __alignof__. While I had some trouble finding docs, it appears to
work fairly simply. When given a type, it will return the number of bytes of
padding needed to reach proper memory alignment. For example, the
NihAllocCtx struct has sizeof 20 on my system, and __alignof__(NihAllicCtx)
returns 4. In theory on x86 it would return 0 and hopefully the compiler
will be smart enough to optimize it out.

> We'll need you to follow
> http://upstart.ubuntu.com/wiki/CopyrightAssignment for this patch
>
>
I will do this immediately.

katre

On Thu, 2009-10-22 at 20:18 +0000, katre wrote:

> On Thu, Oct 22, 2009 at 3:57 PM, Scott James Remnant
> <email address hidden>wrote:
>
> > Shouldn't it be something more like:
> >
> > __allocof__ (struct) * ((sizeof (struct) - 1) / __allocof__ (struct) +
> > 1)
> >
> > Not entirely sure how __allocof__ works?
> >
> >
> I used __alignof__. While I had some trouble finding docs, it appears to
> work fairly simply. When given a type, it will return the number of bytes of
> padding needed to reach proper memory alignment. For example, the
> NihAllocCtx struct has sizeof 20 on my system, and __alignof__(NihAllicCtx)
> returns 4. In theory on x86 it would return 0 and hopefully the compiler
> will be smart enough to optimize it out.
>
Ah, sorry, I meant alignof. From the gcc documentation it says that it
returns on what byte boundary the object is aligned, not how much
padding it needs:

  The keyword `__alignof__' allows you to inquire about how an object is
  aligned, or the minimum alignment usually required by a type. Its
  syntax is just like `sizeof'.

   For example, if the target machine requires a `double' value to be
  aligned on an 8-byte boundary, then `__alignof__ (double)' is 8. This
  is true on many RISC machines. On more traditional machine designs,
  `__alignof__ (double)' is 4 or even 2.

   Some machines never actually require alignment; they allow reference
  to any data type even at an odd address. For these machines,
  `__alignof__' reports the smallest alignment that GCC will give the
  data type, usually as mandated by the target ABI.

This implies that you do need to divide as I've done, so (correcting the
code for my thinko):

__alignof__ (struct) * ((sizeof (struct) - 1) / __alignof__ (struct)
                        + 1)

ie. the number of alignment segments required to fit the entire
structure, plus padding

That being said, I couldn't find any difference between the results of
applying this code and not applying this code. The context block
contains entirely pointers, which are the largest alignment anyway.

faure% gcc -m64 alignof.c
faure% ./a.out
sizeof: 40
alignof: 8
aligned space: 40
faure% gcc -m32 alignof.c
faure% ./a.out
sizeof: 20
alignof: 4
aligned space: 20

(ie. aligned space is the same as sizeof)

Your code would just change the aligned size from 40 to 48 and 20 to 24,
which doesn't make any particular sense.

(Plus afaict, sizeof always includes the necessary padding to achieve
 alignment *anyway* - since otherwise malloc and arrays wouldn't work!)

Scott
--
Scott James Remnant
<email address hidden>

Follow on from that, it turns out (unsurprisingly) that it's the alignment of the returned pointer that matters; and that since we only know the size, not the type, we have to guess what the maximum possible alignment must be.

Code (and comments) in glibc suggest:

 max (sizeof (size_t) * 2, __alignof__ (long long))

This is the alignment size we want for NihAllocCtx, thus

#define NIH_ALLOC_CTX_ALIGN nih_max (sizeof (size_t) * 2, __alignof__ (long long))
#define NIH_ALLOC_CTX_SIZE (NIH_ALLOC_CTX_ALIGN * (sizeof (NihAllocCtx) / NIH_ALLOC_CTX_ALIGN + 1))

is probably right.

On the bright side, it means there's a spare 4 bytes in the NihAllocCtx structure ;-)

Changed in upstart (Ubuntu):
status: Incomplete → Triaged
assignee: Michael Casadevall (mcasadevall) → nobody
summary: - upstart doesn't boot on sparc
+ nih: nih_alloc() doesn't align returned pointer for picky platforms like
+ sparc
Changed in upstart:
status: New → Triaged
importance: Undecided → Low
Changed in libnih:
importance: Undecided → Low
status: New → Triaged
katre (katre50) wrote :

On Mon, Oct 26, 2009 at 8:36 AM, Scott James Remnant <email address hidden>wrote:

> Ah, sorry, I meant alignof. From the gcc documentation it says that it
> returns on what byte boundary the object is aligned, not how much
> padding it needs:
>
>
Okay, my bad. I wonder how I didn't find that in the gcc docs.

> This implies that you do need to divide as I've done, so (correcting the
> code for my thinko):
>
> __alignof__ (struct) * ((sizeof (struct) - 1) / __alignof__ (struct)
> + 1)
>
> ie. the number of alignment segments required to fit the entire
> structure, plus padding
>
>
Okay, this looks good.

> That being said, I couldn't find any difference between the results of
> applying this code and not applying this code. The context block
> contains entirely pointers, which are the largest alignment anyway.
>
>
The problem with allocated blocks seems to not be with the returned pointer
directly, but with accessing fields in the returned block. I am assuming
from this that there's some extra-picky alignment that the internal pointers
need, but I really don't know enough sparc internals to say.

Your code would just change the aligned size from 40 to 48 and 20 to 24,
> which doesn't make any particular sense.
>
>
Yeah, I have no idea why 20 causes a SIGBUS (but, again, only when accessing
fields in a struct, not when using the returned pointer itself) and 24 is
fine.

> (Plus afaict, sizeof always includes the necessary padding to achieve
> alignment *anyway* - since otherwise malloc and arrays wouldn't work!)
>
>
Good point.

katre

Changed in ubuntu-release-notes:
status: Fix Released → Fix Committed
Steve Langasek (vorlon) on 2009-10-28
Changed in ubuntu-release-notes:
status: Fix Committed → Fix Released
D.Springer (kepler1) wrote :

Does the attached .diff fix the problem?

D.Springer (kepler1) wrote :

OK..after applying the diff to the upstart source and rebuilding the deb, I no longer get the kernel panic. But right at that point I now get:

init: mountall process killed by BUS signal

...and boot halts at that point.

katre (katre50) wrote :

I need to update the patch to include Scott's updates, above. And it looks like mountall includes its own copy of nih, which would also need to be patched and rebuilt.

Is there a list of packages that include nih? And is there a compelling reason not to spin it into its own package, to handle issues like this?

On Fri, 2009-10-30 at 03:45 +0000, katre wrote:

> I need to update the patch to include Scott's updates, above. And it
> looks like mountall includes its own copy of nih, which would also need
> to be patched and rebuilt.
>
upstart, mountall and ureadahead all use it (in Ubuntu)

> Is there a list of packages that include nih? And is there a compelling
> reason not to spin it into its own package, to handle issues like this?
>
It has an unstable API at the moment, it's basically my "copy and paste"
code library that I've been increasingly trying to turn into a real
library, modernising as I go.

I do expect to release it as its own project in the near future, and
deal with having a stable API :-/

Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?

Changed in libnih:
status: Triaged → Fix Committed

Fixed in libnih 1.0.0, note that this fix will still need to be merged into software using libnih or the software updated to dynamically link to libnih instead.

Changed in libnih:
status: Fix Committed → Fix Released
Changed in libnih:
status: Fix Released → Fix Committed
Changed in upstart:
status: Triaged → Fix Committed
Changed in upstart (Ubuntu):
status: Triaged → Fix Committed

Please leave bug statuses alone

Changed in libnih:
status: Fix Committed → Fix Released
Changed in upstart:
status: Fix Committed → Triaged
Changed in upstart (Ubuntu):
status: Fix Committed → Triaged
Frans van Berckel (fberckel) wrote :

Does this mean upstart_0.6.3-11 got to boot on Sparc well?

Committed the change to trunk to use the external libnih which is fixed

Changed in upstart:
status: Triaged → Fix Committed
milestone: none → 0.6.5
Changed in upstart (Ubuntu):
status: Triaged → Fix Committed
assignee: nobody → Scott James Remnant (scott)
Changed in upstart:
assignee: nobody → Scott James Remnant (scott)

0.6.5 2010-02-04 "Our last, best hope for victory"

 * Since libnih has been separated out into its own project, Upstart
   no longer includes its source and now uses the installed version by
   default.

 * The external nih-dbus-tool means that cross-compilation is trivial,
   the path to it may be overriden with NIH_DBUS_TOOL=... as an
   argument to configure. (Bug: #426740)

 * Developers may still build against a local libnih source tree by
   passing --with-local-libnih=/path/to/libnih to configure.

 * There is a new initctl "reload" command, with matching
   /sbin/reload symlink. This sends the SIGHUP signal to the running
   main process of the named job instance.

 * Event operator matches in jobs now support "!=" in addition to the
   usual "=", e.g.:

  start on net-device-added INTERFACE!=eth*

   (Bug: #513035)

 * Moved D-Bus system bus reconnection trigger from SIGHUP to SIGUSR1,
   since SIGHUP is already used for a forced configuration reload and
   causes Upstart to "forget" state.

 * Fixed bug where the default runlevel would be lost when an
   /etc/inittab file exists without an initdefault line. (Bug: #405847)

 * Fixed "Unhandled error" message from shutdown. (Bug: #426332)

 * Fixed "Unhandled error" assertion crash from Upstart child
   processes when failing to spawn a job. (Bug: #451917)

 * No longer holds /dev/console open, so the SAK SysRq key will not
   kill Upstart. (Bug: #486005)

 * Restored sync() call before reboot().

 * Added missing OPTIONS section to init(8) manpage. (Bug: #449883)

Changed in upstart:
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.5-1

---------------
upstart (0.6.5-1) lucid; urgency=low

  * New upstream release:
    - libnih has been separated out into its own project.
    - "start on" and "stop on" now support != matches. LP: #513035.
    - Fixed crash in child when unable to spawn job. LP: #451917.
    - No longer holds /dev/console open so SAK won't kill init. LP: #486005.
    - Added missing OPTIONS section to init(8). LP: #449883.

  [ Scott James Remnant ]
  * Build-depend on libnih-dev, libnih-dbus-dev and nih-dbus-tool to use
    the separated out libnih.
    - This has the fix for LP: #436758.
    - Remove changelog.nih from the doc directory.
  * Bump udev build-dependency to 147 to match upstream.
  * udev/Makefile.am: Update to use external libnih

  [ Johan Kiviniemi ]
  * udev/upstart-udev-bridge.c: Change -device-remove to -device-removed to
    match -device-added and -device-changed. LP: #516698.
 -- Scott James Remnant <email address hidden> Thu, 04 Feb 2010 16:30:10 -0800

Changed in upstart (Ubuntu):
status: Fix Committed → Fix Released

Scott, Is it possible to SRU these fixes to karmic so anyone upgrading from jaunty will not break their box?

On Tue, 2010-03-09 at 18:30 +0000, Michael Casadevall wrote:

> Scott, Is it possible to SRU these fixes to karmic so anyone upgrading
> from jaunty will not break their box?
>
It would be hard, since the change was made after libnih was separated
out of Upstart's source code.

Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?

Quinn Plattel (qiet72) wrote :

Apparantly, even though a fix has been made, there is still no way to install or upgrade to karmic on sparc.

If you attempt upgrade, upstart will fail or crash the system.

If you try to install the system from CD, booting from CD will fail with a "Fast Data Access MMU miss" because not enough memory is allocated to hold both the kernel and the initrd on newer ubuntu releases - Ubuntu Hardy 8.04 was the last release to be able to boot from cd. Lucid and up have solved the memory issue of the cd bootloader, but it does not seem to load the sd driver so you cannot access any block devices even though you have the correct scsi/ide driver loaded.

The only solution left is to install Hardy and then do an LTS upgrade to Lucid - mind you that you might have an issue with apparmor so you should probably remove that package before upgrading.

Quinn

katre (katre50) wrote :

I've given up on trying to use Ubuntu on Sparc and have moved on to using
Debian, which is actually working and supported.

On Wed, Oct 5, 2011 at 9:34 AM, Quinn Plattel <email address hidden> wrote:

> Apparantly, even though a fix has been made, there is still no way to
> install or upgrade to karmic on sparc.
>
> If you attempt upgrade, upstart will fail or crash the system.
>
> If you try to install the system from CD, booting from CD will fail with
> a "Fast Data Access MMU miss" because not enough memory is allocated to
> hold both the kernel and the initrd on newer ubuntu releases - Ubuntu
> Hardy 8.04 was the last release to be able to boot from cd. Lucid and
> up have solved the memory issue of the cd bootloader, but it does not
> seem to load the sd driver so you cannot access any block devices even
> though you have the correct scsi/ide driver loaded.
>
> The only solution left is to install Hardy and then do an LTS upgrade to
> Lucid - mind you that you might have an issue with apparmor so you
> should probably remove that package before upgrading.
>
> Quinn
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/436758
>
> Title:
> nih: nih_alloc() doesn't align returned pointer for picky platforms
> like sparc
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libnih/+bug/436758/+subscriptions
>

Scott Kitterman (kitterman) wrote :

Upgrading to Lucid would not be a good idea. Sparc is known not to work on Lucid. Debian is definitely a better option for sparc systems.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers