procps runs too early in the boot process
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| procps (Ubuntu) |
Medium
|
James Hunt | ||
| Lucid |
Medium
|
James Hunt | ||
| Maverick |
Medium
|
James Hunt | ||
| Natty |
Medium
|
James Hunt | ||
| Oneiric |
Medium
|
Unassigned | ||
| Precise |
Medium
|
James Hunt |
Bug Description
Binary package hint: upstart
The start on criteria is for procps.conf is:
start on virtual-filesystems
This runs before some kernel modules are loaded, and procps applies the settings before they "exist", this is most noticed with network and network-related jobs (nfs, bridge).
This bug may be considered a duplicate of LP Bug #690433. I am opening a new one anyway, however because I think it's worth considering a more robust solution that would work for any possible kernel module.
Related branches
- Steve Langasek: Pending requested 2011-11-16
- Ubuntu branches: Pending requested 2011-11-16
-
Diff: 31 lines (+12/-1)2 files modifieddebian/changelog (+9/-0)
debian/upstart (+3/-1)
- Steve Langasek: Pending requested 2011-12-07
-
Diff: 32 lines (+13/-1)2 files modifieddebian/changelog (+9/-0)
debian/upstart (+4/-1)
Mark Russell (marrusl) wrote : | #1 |
James Hunt (jamesodhunt) wrote : | #2 |
This is an issue with the procps package, not Upstart.
affects: | upstart (Ubuntu) → procps (Ubuntu) |
Tom Ellis (tellis) wrote : | #3 |
In another customer case, I noticed this too which is the same as #690433:
net.bridge.
net.bridge.
net.bridge.
net.bridge.
These depend on the bridge module being loaded and due to this bug they aren't loaded at all. There is a workaround in the other bug that involves an additional upstart job to set the network related sysctls but this isn't a good solution, just a temp workaround.
Changed in procps (Ubuntu Lucid): | |
status: | New → Triaged |
Changed in procps (Ubuntu Maverick): | |
status: | New → Triaged |
Changed in procps (Ubuntu Natty): | |
status: | New → Triaged |
Changed in procps (Ubuntu Oneiric): | |
status: | New → Triaged |
Changed in procps (Ubuntu Lucid): | |
importance: | Undecided → Medium |
Changed in procps (Ubuntu Maverick): | |
importance: | Undecided → Medium |
Changed in procps (Ubuntu Natty): | |
importance: | Undecided → Medium |
Changed in procps (Ubuntu Oneiric): | |
importance: | Undecided → Medium |
James Hunt (jamesodhunt) wrote : | #4 |
= Short Answer =
The immediate fix (for the majority of cases) is to modify /etc/init/
start on virtual-filesystems or started networking
= Long Answer (brace yourselves! :-) =
This is an interesting issue. The ideal is a generic solution to the problem but unfortunately this cannot be provided at this point in time. The following attempts to explain why...
== Summary of Current Behaviour ==
The sysctl(8) facility allows *kernel parameters* to be set at any time. The current procps.conf Upstart job is started as early as possible to achieve this. The job calls sysctl which sets the specific kernel parameters by reading from /etc/sysctl.conf and /etc/sysctl.d/* and writing these values to /proc/sys/*.
Therefore, /proc must be mounted read-write and /etc must be mounted atleast read-only. These requirements are satisfied by the "start on" condition specified in procps.conf.
This works for all "built-in" kernel functionality, that is all functionality that isn't provided by kernel modules. It should be
pointed out that some settings *must* be applied at this the earliest point, for example "kernel.printk" (aka /proc/sys/
== The Two Types of Parameters ==
However, some kernel modules provide new kernel parameters. Note that there is a distinction between *kernel parameters* and *module parameters*:
- kernel parameters are set via sysctl(8) and sysctl can be called any
number of times to change these values at any time after the
functionality is available in the kernel.
- module parameters are set *once*: when the kernel module is loaded
(generally via modprobe(8)). One they are set you cannot change them
until you reload the module.
But the picture isn't as clear-cut as that since some modules link their module parameters with kernel parameters such that when you load a module, you can specify values for the parameters, but you can *also* change these same parameters via sysctl after the module has been loaded. The "sunrpc" module is a good example of this:
sunrpc.
sunrpc.
Further, lets summarise the available methods for setting *kernel parameters* and *module parameters*:
=== Methods for Setting Parameters ===
== Kernel Parameters ==
(1) The "procps.conf" Upstart job (already covered above).
(2) Running sysctl(8) manually.
== Module Parameters ==
(1) The "module-
This is available when you install the bridge-utils package, which is
not installed by default. This job job reads a list of modules and
optional parameters from "/etc/modules" and loads those modules by
calling modprobe(8) directly.
(2) The "/etc/modprobe.
As specified in modprobe.conf(5), these files can specify some quite
clever options. The crucial point though is that these files can
specify modules to load along with parameters for those modules
using the "options" keyword.
Note that there is *no* Upstart job that uses these files: there
doesn't need to be since any invocation ...
Changed in procps (Ubuntu Precise): | |
assignee: | nobody → James Hunt (jamesodhunt) |
Launchpad Janitor (janitor) wrote : | #5 |
This bug was fixed in the package procps - 1:3.2.8-11ubuntu2
---------------
procps (1:3.2.8-11ubuntu2) precise; urgency=low
* Make procps job run twice: as early as possible (for kernel
parameters such as kernel.printk) and then after all network
interfaces are up (to account for any kernel parameters relating
to recently loaded networking modules) (LP: #771372).
-- James Hunt <email address hidden> Wed, 16 Nov 2011 15:17:38 +0000
Changed in procps (Ubuntu Precise): | |
status: | Triaged → Fix Released |
Anders Kaseorg (andersk) wrote : | #6 |
Setting up procps (1:3.2.8-11ubuntu2) ...
Installing new version of config file /etc/init/
start: Unknown parameter: UPSTART_EVENTS
invoke-rc.d: initscript procps, action "start" failed.
dpkg: error processing procps (--configure):
subprocess installed post-installation script returned error exit status 1
Harry (harry33) wrote : | #7 |
Same error here.
It happens on both 32-bit and 64-bit setups.
Martin Pitt (pitti) wrote : | #8 |
Upgrade error was reported as bug 891369. I'll revert that upload for now to not break upgrades for too long.
Martin Pitt (pitti) wrote : | #9 |
Under the new precise egide of reverting broken uploads if they don't make the situation worse, I did that now.
procps (1:3.2.8-11ubuntu3) precise; urgency=low
* debian/upstart: Revert previous upload, breaks upgrades. (LP: #891369)
-- Martin Pitt <email address hidden> Thu, 17 Nov 2011 06:04:10 +0100
Changed in procps (Ubuntu Precise): | |
status: | Fix Released → Triaged |
Steve Langasek (vorlon) wrote : | #10 |
(Re-)fixed in 1:3.2.8-11ubuntu4, but I forgot to add the bug number, sorry. Changelog entry:
procps (1:3.2.8-11ubuntu4) precise; urgency=low
.
* Reintroduce the patch from -11ubuntu2 to run sysctl twice, this time
with an upgrade-proof upstart job.
Changed in procps (Ubuntu Precise): | |
status: | Triaged → Fix Released |
Changed in procps (Ubuntu Lucid): | |
status: | Triaged → In Progress |
Changed in procps (Ubuntu Maverick): | |
status: | Triaged → In Progress |
Changed in procps (Ubuntu Natty): | |
status: | Triaged → In Progress |
Hello Mark, or anyone else affected,
Accepted procps into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Changed in procps (Ubuntu Oneiric): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed |
Martin Pitt (pitti) wrote : | #12 |
Hello Mark, or anyone else affected,
Accepted procps into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Changed in procps (Ubuntu Natty): | |
status: | In Progress → Fix Committed |
Changed in procps (Ubuntu Lucid): | |
status: | In Progress → Fix Committed |
Martin Pitt (pitti) wrote : | #13 |
Hello Mark, or anyone else affected,
Accepted procps into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Changed in procps (Ubuntu Maverick): | |
status: | In Progress → Fix Committed |
Martin Pitt (pitti) wrote : | #14 |
Hello Mark, or anyone else affected,
Accepted procps into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Simon Déziel (sdeziel) wrote : | #15 |
This does not work on my Lucid. The sysctl settings related to the bridge modules are not set to the values defined under /etc/sysctl.
tags: | added: verification-failed-lucid |
Tobin Davis (gruemaster) wrote : | #16 |
The package currently in maverick-proposed fails to install when doing a netinstall with d-i apt-setup/proposed boolean true in the preseed. I would assume this will fail once the package hits maverick-updates.
I have tested this on AMD64 and armel images. Here is the relevent log data:
Dec 5 21:05:48 in-target: Setting up procps (1:3.2.
Dec 5 21:05:48 in-target: Installing new version of config file /etc/init/
Dec 5 21:05:48 in-target: start: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/
Dec 5 21:05:48 in-target: dpkg: error processing procps (--configure):
Dec 5 21:05:48 in-target: subprocess installed post-installation script returned error exit status 1
Tobin Davis (gruemaster) wrote : | #17 |
Full syslog of the failing install on amd64.
Colin Watson (cjwatson) wrote : | #18 |
I think the problem is that the procps postinst in maverick uses 'start procps' rather than 'invoke-rc.d procps start || exit $?' as is used in precise. See bug 602896 for the fix.
This is a pre-existing bug in maverick, but it was dormant because the installer didn't need to upgrade procps in this phase of installation until now. I think the fix for bug 602896 needs to be backported to maverick-proposed. This bug was fixed in natty, and, oddly, lucid didn't suffer from it in the first place because it doesn't appear to start the procps job in the postinst at all.
tags: | added: verification-failed-maverick |
Changed in procps (Ubuntu Maverick): | |
status: | Fix Committed → In Progress |
Steve Langasek (vorlon) wrote : | #19 |
Simon,
> This does not work on my Lucid. The sysctl settings related to
> the bridge modules are not set to the values defined under
> /etc/sysctl.
Oh. Well, that's actually perfectly understandable, because the static-network-up event was only introduced in oneiric!
James, that means we need a different fix here for lucid/maverick/
Changed in procps (Ubuntu Lucid): | |
status: | Fix Committed → Triaged |
assignee: | nobody → James Hunt (jamesodhunt) |
Changed in procps (Ubuntu Maverick): | |
status: | In Progress → Triaged |
assignee: | nobody → James Hunt (jamesodhunt) |
Changed in procps (Ubuntu Natty): | |
status: | Fix Committed → Triaged |
assignee: | nobody → James Hunt (jamesodhunt) |
James Hunt (jamesodhunt) wrote : | #20 |
@Simon (and others running Lucid): Please could you try modifying your /etc/init/
instance $UPSTART_EVENTS
start on virtual-filesystems or stopped networking
Simon Déziel (sdeziel) wrote : | #21 |
@James, you suggestion worked. Note that I dropped the "env UPSTART_EVENTS=". Here is the job definition I used.
# grep -v "#" /etc/init/
description "set sysctls from /etc/sysctl.conf"
instance $UPSTART_EVENTS
start on virtual-filesystems or stopped networking
task
script
cat /etc/sysctl.
end script
By the way, that also fixes LP: #690433. Thank you !
Simon Déziel (sdeziel) wrote : | #22 |
James, I added back the "env UPSTART_EVENTS=" as manually starting the job complained about this variable to be unknown. With and without the sysctl keys are updated so that looks good on that side.
One thing that I noted in those 2 reboot tests is that my ntpd is exiting after successfully binding on a few interfaces. Could that be related to running procps twice ? Here is an extract of the ntpd error :
Dec 6 16:16:53 xeon ntpd[4630]: kernel time sync status 2040
Dec 6 16:16:54 xeon ntpd[4630]: Deleting interface #11 vnet0, fe80::fc54:
...
Dec 6 16:16:54 xeon ntpd[4630]: Deleting interface #22 br-vpn0, fe80::6075:
Dec 6 16:17:01 xeon CRON[4781]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 6 16:17:03 xeon ntpd[4630]: ntpd exiting on signal 15
Dec 6 16:17:03 xeon ntpdate[4807]: the NTP socket is in use, exiting
Dec 6 16:17:03 xeon ntpd[4834]: ntpd 4.2.4p8@1.1612-o Tue Apr 19 07:08:18 UTC 2011 (1)
Dec 6 16:17:03 xeon ntpd[4835]: precision = 1.000 usec
Dec 6 16:17:03 xeon ntpd[4835]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
Dec 6 16:17:03 xeon ntpd[4835]: unable to bind to wildcard socket address 0.0.0.0 - another process may be running - EXITING
There always have been a "competition" between ntpd and ntpdate but ntpd was apparently not having a problem before.
Simon Déziel (sdeziel) wrote : | #23 |
I found traces of the ntpd issue dating from November 10th so that's unrelated, sorry for the noise.
Simon Déziel (sdeziel) wrote : | #24 |
@James, the following upstart job that you suggested fixes the issue on Lucid. Would that be possible to have it include in the next update ? Thanks.
# procps - set sysctls from /etc/sysctl.conf
#
# This task sets kernel sysctl variables from /etc/sysctl.conf and
# /etc/sysctl.d
description "set sysctls from /etc/sysctl.conf"
instance $UPSTART_EVENTS
env UPSTART_EVENTS=
start on virtual-filesystems or stopped networking
task
script
cat /etc/sysctl.
end script
Martin Pitt (pitti) wrote : | #25 |
Hello Mark, or anyone else affected,
Accepted procps into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
tags: | removed: verification-failed-lucid |
Changed in procps (Ubuntu Lucid): | |
status: | Triaged → Fix Committed |
Simon Déziel (sdeziel) wrote : | #26 |
The fix in lucid-proposed fixed my issue, many thanks.
tags: | added: verification-done-lucid |
Martin Pitt (pitti) wrote : | #27 |
Hello Mark, or anyone else affected,
Accepted procps into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Changed in procps (Ubuntu Maverick): | |
status: | Triaged → Fix Committed |
Martin Pitt (pitti) wrote : | #28 |
Hello Mark, or anyone else affected,
Accepted procps into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Changed in procps (Ubuntu Natty): | |
status: | Triaged → Fix Committed |
Andreas Ntaflos (daff) wrote : | #29 |
We applied the proposed fix to procps on a test machine but upon rebooting the problem remained, with the following message appearing in /var/log/boot.log:
init: procps (virtual-
We use /etc/sysctl.conf to disable frame filtering for bridges and vlans:
net.bridge.
net.bridge.
net.bridge.
net.bridge.
The machine is a Dell R710, but I don't think that matters. What other info can we provide?
Steve Langasek (vorlon) wrote : | #30 |
With the fix for bug #602896 included in a subsequent maverick SRU, this should no longer cause problems at install time. Resetting the verification status for maverick (this is no longer "failed"). Is someone willing/able to test this fix on maverick?
tags: | removed: verification-failed-maverick |
Peter Matulis (petermatulis) wrote : | #31 |
When will the fix for Lucid be released? The LTS fix should get out there in my opinion.
Clint Byrum (clint-fewbar) wrote : | #32 |
Peter, thanks for the bump. This just needed the verification-done flag for our process to catch it. I'll take a look at doing the sru-release tomorrow morning.
tags: | added: verification-done |
Peter Matulis (petermatulis) wrote : | #33 |
Clint, isn't this tag sufficient to get out the fix for Lucid:
verification-
?
Clint Byrum (clint-fewbar) wrote : Re: [Bug 771372] Re: procps runs too early in the boot process | #34 |
Excerpts from Peter Matulis's message of Fri Mar 09 12:50:44 UTC 2012:
> Clint, isn't this tag sufficient to get out the fix for Lucid:
>
> verification-
>
No, that tag is used for informational purposes only.
In the pending-sru report here:
http://
The only way we're going to see that it is ready for release is that the
bugs for a package are all green or purple. Green means verification-done,
purple means verification-done *and* verification-
by verification only being done in one of the releases.
We don't want to make the release-specific tags the way to get it
released, because we have to consider the impact of having it fixed in
one release, and not in the next, so purple is a signal to us on that
report that we need to think carefully about whether or not to release.
Clint Byrum (clint-fewbar) wrote : | #35 |
For instance, in this case, I think its ok to release to lucid-updates in spite of the fact that maverick has not been verified, because precise is scheduled to be released soon, and maverick EOL'd. So in this case, there's little danger that a user will upgrade and then find that things have regressed in a dangerous way.
Peter Matulis (petermatulis) wrote : | #36 |
Thank you for that nice explanation.
Launchpad Janitor (janitor) wrote : | #37 |
This bug was fixed in the package procps - 1:3.2.8-1ubuntu4.2
---------------
procps (1:3.2.
* Make procps job run twice: as early as possible (for kernel
parameters such as kernel.printk) and then after all network
interfaces are up (to account for any kernel parameters relating
to recently loaded networking modules) (LP: #771372).
procps (1:3.2.
[ James Hunt ]
* Make procps job run twice: as early as possible (for kernel
parameters such as kernel.printk) and then after all network
interfaces are up (to account for any kernel parameters relating
to recently loaded networking modules) (LP: #771372).
-- James Hunt <email address hidden> Wed, 07 Dec 2011 14:53:24 +0000
Changed in procps (Ubuntu Lucid): | |
status: | Fix Committed → Fix Released |
Andreas Ntaflos (daff) wrote : | #38 |
Is this now really fixed in Lucid? As I have mentioned in comment #29 we had applied this update from lucid-proposed and and continue to see this error in /var/log/boot.log:
init: procps (virtual-
The initial problem remains the same, the sysctl settings do not get applied in time.
We need to use this workaround in order to get VLAN trunks into Libvirt-managed virtual machines (KVM):
/etc/rc.local:
# At this point automatic startup of libvirt-bin.conf is disabled, we start it manually below
/sbin/sysctl -p /etc/sysctl.conf
/etc/init.
/etc/init.
exit 0
This is on Ubuntu 10.04.4, x86_64 on all of our Dell R710 servers.
tags: | removed: verification-done verification-done-lucid |
Clint Byrum (clint-fewbar) wrote : | #39 |
Andreas, its likely that the update just fixed it for some use cases. However, I would have expected it to fix your particular use case, because your VLAN trunks should be up as soon as 'stopped networking' is emitted, since that is emitted as soon as 'ifup -a' exits.
The error you see is likely because there are modules loaded later that support your sysctl.conf settings, so some of them don't exist yet. It shouldn't cause serious issues, but if you want to debug it, perhaps add a 2> /run/sysctl.errors to see why it exitted.
As far as the libvirt issue, thats caused by libvirt's start on:
start on runlevel [2345] and stopped networking
This presents a potential race condition since 'stopped networking' may happen after runlevel [2345]
Your best bet for a workaround is probably to add another condition to procps so it reads
start on virtual-filesystems or stopped networking or starting libvirt-bin
That will ensure that it runs before libvirt-bin is started.
I do actually think the fix for this is not complete, as we should still be trying to set these sysctl settings before any services start on them. I think this will require a larger refactoring, which we've been discussing for the "Q" release cycle, so that there is a barrier between bootup activities and starting of services.. and that would be an abstract 'network-services' job.
Vincent Bernat (vbernat) wrote : | #40 |
The proposed fix is quite disruptive. For example, if a sysctl is set in /etc/network/
iface dmz.902 inet static
[...]
up sysctl -w net.ipv4.
up sysctl -w net.ipv4.
This setup worked fine before update and has worked for many years without surprise. It may seem odd to disable "all.rp_filter" in /etc/network/
There are other failing scenario: network may trigger the start of some routing daemon that will enable IP forwarding which will be disabled again by procps job. This can be quite racy.
In short, it seems wrong to modify sysctl settings in the middle of the boot. Other jobs/daemons may have altered the settings.
Steffen Neumann (sneumann) wrote : | #41 |
Hi, we're exactly hit by the problem Vincent described in #40,
This is happened to us on 12.04 and only appeared after an upgrade on 24.9.2012.
Please see https:/
for some more details.
The problem does not appear on 12.10 anymore, maybe due to the above mentioned refactoring.
Nevertheless, not being able to run sysctl during network interface configuration
seems badly broken. Is there a workaround, should this be re-opened or added as a new bug ?
Yours,
Steffen and Paul.
tags: | added: bot-stop-nagging |
Vincent Ouwehand (4e0a8aa4) wrote : | #42 |
This bug seems to be back in Trusty Tahr, i ran into the exact issue described here just now, not more then an hour after updating the system.
Rolf Leggewie (r0lf) wrote : | #43 |
maverick has seen the end of its life and is no longer receiving any updates. Marking the maverick task for this ticket as "Won't Fix".
Changed in procps (Ubuntu Maverick): | |
status: | Fix Committed → Won't Fix |
Rolf Leggewie (r0lf) wrote : | #44 |
natty has seen the end of its life and is no longer receiving any updates. Marking the natty task for this ticket as "Won't Fix".
Changed in procps (Ubuntu Natty): | |
status: | Fix Committed → Won't Fix |
Rolf Leggewie (r0lf) wrote : | #45 |
oneiric has seen the end of its life and is no longer receiving any updates. Marking the oneiric task for this ticket as "Won't Fix".
Changed in procps (Ubuntu Oneiric): | |
status: | Fix Committed → Won't Fix |
Hua-Jung Chu (petertc-chu) wrote : | #46 |
10.04 has the same problem, and should be Won't Fix.
In one customer's case, they changed their procps.conf start on criteria to: filesystems or started portmap)
start on (virtual-
This allowed these settings to be applied properly: udp_slot_ table_entries = 128 tcp_slot_ table_entries = 128
sunrpc.
sunrpc.
But all that does is workaround the issue. If you add another module later and put related entries in sysctl.conf, you still can't be sure they will apply correctly without testing it out. If it doesn't work, now you have to think about fiddling with jobs again (see Bug #690433 for example workarounds). It would be nice to find a way of automatically loading and/or reloading procps when appropriate.
For example procps.conf could have start on criteria something like: filesystems or NEEDS_PROCPS=1) filesystems or KERNEL_MOD=1)
start on (virtual-
or
start on (virtual-
Of course it would still be up to those other jobs to export KERNEL_MOD and I'm probably overlooking something that makes this a no good, horrible idea. But it helps illustrate the point, I think.