hal doesn't start on live CD

Bug #8723 reported by Matt Zimmerman
8
Affects Status Importance Assigned to Milestone
hal (Ubuntu)
Fix Released
Critical
Alex de Landgraaf

Bug Description

I noticed that hald wasn't running when testing the live CD. A bit of stracing
(it would be nice if syslog were available) showed that the problem was that
/var/run/hal was owned by root:root, rather than hal:hal.

Revision history for this message
Alex de Landgraaf (alextreme) wrote :

It seems that previous versions of hal had an /etc/init.d/hal script (which the
wartylive init scripts called), but 0.2.98-1ubuntu4 doesn't.
Starting /usr/sbin/hald manually fixes this, but shouldn't hal start at boot
time anyway?

also, ls -ld /var/run/hal on my version shows the directory to be owned by hal:hal.

Revision history for this message
Matt Zimmerman (mdz) wrote :

hal is now started by a dbus event.d script, so it should be indirectly started
by /etc/init.d/dbus-1, via /etc/dbus-1/event.d/hal

That's strange about the ownership; I'll verify the next time I look

Revision history for this message
Matt Zimmerman (mdz) wrote :

Confirmed that the permissions seem to be correct after startup; something must
have changed them as I was trying to debug the problem. However, the problem
does still have to do with writing the pid file. See attachments:

3687 unlink("/var/run/hal/hald.pid") = -1 ENOENT (No such file or directory)
3687 open("/var/run/hal/hald.pid", O_WRONLY|O_CREAT|O_TRUNC|O_EXCL|O_LARGEFILE,
0644) = -1 EINVAL (Invalid argument)
3687 exit_group(1) = ?

and in the kernel log:

mini_fo: error in build_sto_structure: failed to create storage dir [1].
mini_fo: create: build_sto_structure failed [1].
mini_fo: error in build_sto_structure: failed to create storage dir [1].
mini_fo: create: build_sto_structure failed [1].

Revision history for this message
Matt Zimmerman (mdz) wrote :

Created an attachment (id=322)
strace of hald failing

Revision history for this message
Matt Zimmerman (mdz) wrote :

Created an attachment (id=323)
dmesg when hal fails

Revision history for this message
Matt Zimmerman (mdz) wrote :

Can you provide a status update regarding this bug?

Revision history for this message
Alex de Landgraaf (alextreme) wrote :

It's an issue with sockets not being able to be created on the overlay
filesystem. A temporary workaround would be to just mount --bind a directory in
/tmp as /var/run, so that these sockets are made directly on the ramdisk. I've
informed the mini_fo developers, I'm hoping to hear back from them soon:

3686 connect(3, {sa_family=AF_FILE, path="/var/run/.nscd_socket"}, 110) = -1
ENOENT (No such file or directory)

Revision history for this message
Matt Zimmerman (mdz) wrote :

I'm not sure that the tmpfs approach is so simple; there tend to be
subdirectories in /var/run with specific permissions, which would need to be
preserved.

Did you hear anything back from the mini_fo guys? if not, we need to get a
workaround in ASAP so that we can test a new CD build with that included

Revision history for this message
Alex de Landgraaf (alextreme) wrote :

They're working on it, that's all I know.

Another workaround is to just start hal in the boot script, this would be dead
simple.

Revision history for this message
Matt Zimmerman (mdz) wrote :

How would that help, if the problem is that it can't create its pid file in
/var/run?

Revision history for this message
Alex de Landgraaf (alextreme) wrote :

Because it's not an issue of the .pid file being created, it has to do with the
socket not being able to be created. This socket is then used to start hal. That
the .pid doesn't exist is a consequence of the problem, not the problem itself.

This bug didn't exist before, when hal was started with /etc/init.d/hal start

Revision history for this message
Matt Zimmerman (mdz) wrote :

It's quite clear from the strace that the failure is happening when creating the
pid file. The socket you're referring to, .nscd_socket, has nothing to do with
the error, and is only present when nscd is in use.

Revision history for this message
Alex de Landgraaf (alextreme) wrote :

Yup, you've got a point there.

I still think that the /var/tmp workaround would work. Permissions and ownership
would be preserved. Anyway, I've been trying to catch a ghost, I'll write some
testcases over the weekend to help the mini_fo guys figure this one out.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Any word from the mini_fo guys? We need a new image with a fix or workaround
for this bug as soon as possible; there is a lot of functionality tied to hal
which has not seen significant testing in the live CD environment due to this bug

Revision history for this message
Alex de Landgraaf (alextreme) wrote :

Nothing from them, but if you wanted a new image with a fix you should have just
let me know directly instead of through bugzilla. I'll generate a new image for
Warty RC this wednesday, even though you should know that I'm swamped already...

Revision history for this message
Matt Zimmerman (mdz) wrote :

Upgrade really-RC bugs to critical

Revision history for this message
Matt Zimmerman (mdz) wrote :

hal is confirmed to be running on the latest live CD build

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.