torque-server init script fails during installation and removal

Bug #223649 reported by Morten Kjeldgaard on 2008-04-28
34
Affects Status Importance Assigned to Milestone
torque (Ubuntu)
Undecided
Morten Kjeldgaard
Hardy
Undecided
Morten Kjeldgaard
Intrepid
Undecided
Morten Kjeldgaard
Jaunty
Undecided
Morten Kjeldgaard

Bug Description

When installing the torque-server package, the installation fails when the init script is being executed. The installer complains about missing directories.

When removing the package, apt-get also fails, because the init script fails. This can only be fixed by editing the torque-server init script, so it always returns 0.

Related branches

Morten Kjeldgaard (mok0) wrote :

The attached patch fixes this problem, plus another couple of packaging issues that was addressed after the initial upload to the NEW queue.

motu-sru, please allow an SRU for hardy for the torque package.

Morten Kjeldgaard (mok0) wrote :

Patch incorporating modifications according to discussion with Luca Falavigna on IRC

Luca Falavigna (dktrkranz) wrote :

ACK from motu-sru.
Please, do not ship linda override in the upload, it is just a cosmetic change.
Also, provide a good TEST CASE to speed up verification process. Thanks!

Changed in torque:
status: New → Confirmed
Martin Pitt (pitti) wrote :

Why did you remove the pid file check? There is no rationale for this in the changelog, nor a separate bug report about this. PID files are a good thing to not break processes in chroots, or processes started as different users, etc.

Changed in torque:
status: New → Incomplete

> Why did you remove the pid file check? There is no rationale for
> this in
> the changelog, nor a separate bug report about this. PID files are a
> good thing to not break processes in chroots, or processes started as
> different users, etc.

Yes, that's how I thought the --pidfile option worked too. However,
the purpose of this option is to specify a PID file created by the
server process itself, and because pbs_server does not do that, it is
not relevant in this case.

Cheers,
Morten

Martin Pitt (pitti) wrote :

Accepted into -proposed, please test and give feedback here

Changed in torque:
status: Incomplete → Fix Committed
Morten Kjeldgaard (mok0) wrote :

jd, please test the updated package in hardy-proposed and report back here!

Cheers,
Morten

Morten Kjeldgaard (mok0) on 2008-05-14
Changed in torque:
status: Confirmed → Fix Committed
Ted Cabeen (ted-cabeen) wrote :

Unfortunately, the --pidfile options are still present in this release, and they cause the initscript to break.

Also, /var/lib/torque/mom_priv/jobs, /var/lib/torque/spool and /var/lib/torque/undelivered need to be added to the dirs for torque-mom. (/var/lib/torque/spool and /var/lib/torque/undelivered need to have 1777 permissions)

Ted Cabeen (ted-cabeen) wrote :

Looking in more detail, they are gone from the server init script, but not the mom or scheduler one.

Also, the mom won't run properly unless it has a config file in /var/lib/torque/mom_priv/config. It can be empty, but it has to be present.

Finally, the postinst script should create the database, if it doesn't exist yet. Here's a sample script to do it:
/usr/sbin/pbs_server -t create && sleep 2 && /usr/bin/qterm

Morten Kjeldgaard (mok0) wrote :

Martin, please release the version in hardy-proposed into hardy-updates.

jd (jeff-dyke) wrote :

Its taken me a while to get back to this, but i'm having similar issues with the package from proposed. I've taken action on each of the comments by Ted, removing the --pidfile arguments, creating the database, running `touch /var/lib/torque/mom_priv/config` and changing permissions to 1777 on both /var/lib/torque/spool and /var/lib/torque/undelivered.

First i tried simply to install the -proposed version and it failed, so then I made the changes above and tried a reinstallation and it continues to fail. It would seem though that I can start each of the services, mom, scheduler and server and see them running is ps -ef. And since i'm brand new to this world i wanted to install the torque-gui, which seems to install, but i can't find where to launch it from (which is a side question).

In conclusion, I'd say the -proposed does not fix the bug at hand, even though i *may* have a working torque server

If there is anything else I can provide, please ask, I'd really like to use this.

All crash files added

jd (jeff-dyke) wrote :

After changing /var/lib/torque/server_name from torqueserver to the name of my machine, I am able to run qmgr and configure the server, as well as successfully configure nodes and have them display as 'free' via pbsnodes -a.

So it looks like i do have a working server and clients.

Morten Kjeldgaard (mok0) wrote :

Thanks for the very useful comments. I will fix these issues and make another upload shortly.

Changed in torque:
assignee: nobody → mok0
status: Fix Committed → In Progress
assignee: nobody → mok0
status: Fix Committed → In Progress

I have looked here to find a solution to the problem related to the bug, synthetically about just removing the torque (especially torque-mom, torque-scheduler and torque-server). The working solution is that stated in first post "This can only be fixed by editing the torque-server init script, so it always returns 0." that translates in putting a line with "exit 0" before line "This can only be fixed by editing the torque-server init script, so it always returns 0." in /etc/init.d/torque-mom, /etc/init.d/torque-scheduler and /etc/init.d/torque-scheduler.

I have looked here to find a solution to the problem related to the bug, synthetically about just removing the torque (especially torque-mom, torque-scheduler and torque-server). The working solution is that stated in first post "This can only be fixed by editing the torque-server init script, so it always returns 0." that translates in putting a line with "exit 0" before line "This can only be fixed by editing the torque-server init script, so it always returns 0." in /etc/init.d/torque-mom, /etc/init.d/torque-scheduler and /etc/init.d/torque-scheduler.

Darren Faulke (darren-alidaf) wrote :

I can't get any of this to work and I would like to just remove it by hand as it interferes with just about everything I try to install now. Apt-get remove doesn't work so can anyone offer some hints to get rid of it. Ta

Ted Cabeen (ted-cabeen) wrote :

alidaf: To remove the packages, add the line "exit 0" to the top (like 3 or so is fine) of each of the torque scripts in /etc/init.d, as Giovanni describes above. Then you can remove/purge the packages with dpkg.

Sebastian Kapfer (caci) wrote :

Ping? This package is a catastrophe.

Ted Cabeen (ted-cabeen) wrote :

It is pretty bad, but once you have it installed, it does work properly. What problems are you having?

Christian Hudon (chrish) wrote :

I still get this error with the version in hardy-proposed:

Setting up torque-server (2.1.8+dfsg-0ubuntu1.1) ...
 * Starting Torque batch queue server: PBS_Server: No such file or directory (2) in chk_file_sec, Security violation with "/var/lib/torque/spool/"
PBS_Server: No such file or directory (2) in chk_file_sec, Security violation with "/var/lib/torque/pbs_environment"
PBS_Server: PBS_Server, pbsd_init failed
invoke-rc.d: initscript torque-server, action "start" failed.
dpkg: error processing torque-server (--configure):
 subprocess post-installation script returned error exit status 3
Errors were encountered while processing:
 torque-server

A fix would really be appreciated. This makes the torque packages unusable.

Ted Cabeen (ted-cabeen) wrote :

If you want to fix it yourself, just create the following directories before you install the package:
/var/lib/torque
/var/lib/torque/server_priv
/var/lib/torque/server_priv/jobs
/var/lib/torque/server_priv/queues
/var/lib/torque/server_priv/accounting.

On clients, you need the following:
/var/lib/torque/mom_priv
/var/lib/torque/mom_priv/jobs
/var/lib/torque/spool (with permissions 1777)
/var/lib/torque/undelivered (with permissions 1777)

Also, you'll need a /var/lib/torque/server_name file on your servers and clients listing the torque server DNS name.

That's everything I have listed in my puppet config. Ideally all of these should be put in the package.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package torque - 2.1.8+dfsg-0ubuntu3

---------------
torque (2.1.8+dfsg-0ubuntu3) intrepid; urgency=low

  * Build for intrepid.

torque (2.1.8+dfsg-0ubuntu1.2) hardy-proposed; urgency=low

  * Fix remaining issues with init scripts (LP: #223649)
  * Add missing empty directories to package torque-client

 -- Morten Kjeldgaard <email address hidden> Wed, 27 Aug 2008 21:48:57 +0200

Changed in torque:
status: In Progress → Fix Released
Martin Pitt (pitti) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in torque:
status: In Progress → Fix Committed
Christian Hudon (chrish) wrote :

The latest torque packages in hardy-proposed work for me.

Thanks!

Christian Hudon (chrish) wrote :

There's one small remaining problem with the torque-server init script: it's not idempotent. I found this out when running a "dpkg --configure torque-server" (to finish configuring the package) while the server was running.

I wasn't sure what was best, so I filed a separate bug report about this: https://bugs.launchpad.net/ubuntu/+source/torque/+bug/270574

Christian Hudon (chrish) wrote :

There's one issue with the packages in hardy-proposed. The following directories:

/var/lib/torque/mom_priv
/var/lib/torque/mom_priv/jobs

should be in the torque-mom package, not in the torque-client one. As is it, it makes torque-mom broken unless torque-client is also installed on the same machine (which is not always the case).

Christian Hudon (chrish) wrote :

Another issue. The /var/lib/torque/spool and /var/lib/torque/undelivered directories are not created with mode 1777 (see comment 21), which make submitted jobs fail when torque tries to run them on the compute note.

Christian Hudon (chrish) wrote :

Yet another reason why the hardy-proposed package is not ready for prime time: the directory /var/lib/torque/server_priv/acl_svr is missing from the torque-server package. Without this directory, the operators and managers settings of the torque-server are not preserved across executions.

Christian Hudon (chrish) wrote :

Three other directories missing (all in /var/lib/torque/server_priv):

acl_hosts
acl_users
acl_groups

Luca Falavigna (dktrkranz) wrote :

Marking verification-failed for now, this will probably require a new fix for Intrepid too.

Martin Pitt (pitti) wrote :

Reopening for all releases, since this still isn't fixed. I removed the SRU from hardy-proposed, since it failed verification.

Changed in torque:
status: Fix Committed → Confirmed
status: Fix Released → Confirmed
status: Fix Released → Confirmed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package torque - 2.3.6+dfsg-0ubuntu1

---------------
torque (2.3.6+dfsg-0ubuntu1) jaunty; urgency=low

  * New upstream release (LP: #235385).
  * Torque-server init script fails during installation and
    removal (LP: #223649) fixed, "set -e" removed from init script.
  * Package torque-scheduler 2.1.8+dfsg-0ubuntu1.1 failed to install
    (LP: #244440 LP: #270574) fixed, "set -e" removed from init script.
  * /etc/init.d/torque-mom not idempotent, and stop doesn't work
    (LP: #256998) Fixed, "set -e" removed from init script.
  * Torque-scheduler prints errors during package configuration
    (LP: #270653). Reason: missing dir sched_config. Fixed,
    package installs FIFO scheduler config file in
    /var/lib/torque/sched_priv/.
  * Package torque-mom 2.1.8+dfsg-0ubuntu1 failed to
    install/upgrade (LP: #276575 LP: #291674). Reason: missing directories.
    Fixed, /var/lib/torque/server_priv/jobs/,
    /var/lib/torque/server_priv/queues/,
    /var/lib/torque/server_priv/accounting/ and
    /var/lib/torque/mom_priv/jobs/ are now installed in their respective
    packages.
  * Package torque-gui missing most of the files it needs to run!
    (LP: #281360). Reason: missing *.tk etc. files from src/gui
    Fixed: /usr/lib/xpbs now shipped in package
  * changed patch system to quilt.

 -- Morten Kjeldgaard <email address hidden> Mon, 16 Feb 2009 17:32:28 +0100

Changed in torque:
status: Confirmed → Fix Released
UweBrauer (oub) wrote :

Hello

since I have a lot of problems with Kubuntu releases > 8.04, I want to stick with 8.04 for the moment. Where can I find a torque-server deb which works?

I tried to read all the posts and it seems there have been a release in hardy-proposed but
I can't find it in updates.

thanks

Uwe Brauer

Alex Valavanis (valavanisalex) wrote :

Intrepid Ibex reached end-of-life on 30 April 2010 so I am closing the report. The bug has been fixed in newer releases of Ubuntu.

Changed in torque (Ubuntu Intrepid):
status: Confirmed → Invalid
Rolf Leggewie (r0lf) wrote :

Hardy has seen the end of its life and is no longer receiving any updates. Marking the Hardy task for this ticket as "Won't Fix".

Changed in torque (Ubuntu Hardy):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers