test_tomcat_daemon smoke test failure on images with 3.13 kernel
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| tomcat7 (Ubuntu) |
High
|
Unassigned |
Bug Description
Starting from 20140109 onwards with the introduction of 3.13 kernel, we are seeing test_tomcat_daemon failure in smoke testing.
Impacted jobs are:
http://
and
http://
Reporting against tomcat although I am not entirely sure where exactly the issue is.
Steps:
1. Install a tomcat java server on a KVM (libvirt) following,
http://
2. Log into the machine and run the test:
http://
or run the following
ubuntu@ubuntu:~$ sudo netstat -ltnp | grep java
tcp6 0 0 :::8080 :::* LISTEN 1100/java
Note: Downgrading the kernel to 3.12.0-7-generic makes the test pass, with 60 second sleep for the daemon to start.
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: tomcat7 7.0.47-1
ProcVersionSign
Uname: Linux 3.13.0-1-generic i686
ApportVersion: 2.13.1-0ubuntu1
Architecture: i386
Date: Tue Jan 14 16:38:54 2014
InstallationDate: Installed on 2014-01-10 (4 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Alpha i386 (20140110)
PackageArchitec
SourcePackage: tomcat7
UpgradeStatus: No upgrade log present (probably fresh install)
modified.
James Page (james-page) wrote : | #3 |
James Page (james-page) wrote : | #4 |
I suspect that this points to some sort of problem with entropy generation in /dev/random which is what gets used by default.
Robie Basak (racb) wrote : | #5 |
If the problem is a shortage of entropy in /dev/random, then I suggest that tests replace /dev/random with a symlink to /dev/urandom instead. urandom should be good enough for testing purposes, since we're not testing the quality of entropy sources.
Does this resolve the issue?
Andy Whitcroft (apw) wrote : | #6 |
If the test @Robie suggests above resolves the issue it might indicate that this commit below (which was introduced to improve security) might be the underlying change which makes this worse:
commit 40db23e5337d99f
Author: Theodore Ts'o <email address hidden>
Date: Sun Nov 3 00:15:05 2013 -0400
random: make add_timer_
Change add_timer_
the nonblocking pool first if it hasn't been fully initialized yet.
This matches the strategy we use in add_interrupt_
allows us to push the randomness where we need it the most during when
the system is first booting up, so that get_random_bytes() and
/dev/urandom become safe to use as soon as possible.
Signed-off-by: "Theodore Ts'o" <email address hidden>
Note that i has been suggested that the machine does come up if you wait long enough. If we take the contention that this is indeed entropy related then if an init job hangs the system boot progress (preventing ssh etc) as it is waiting on entropy then all sources of entropy will also be gone other than network. This implies that if you ping the machine in this phase or increase the rate of ssh attempts that the machine might boot faster; which would also be confirmatory of this conjecture.
Changed in tomcat7 (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → High |
James Page (james-page) wrote : | #7 |
replacing the use in the test case is well and good - but this is actually an issue that will impact users.
James Page (james-page) wrote : | #8 |
Using /dev/urandom does resolve the issue, but is it as secure?
Robie Basak (racb) wrote : | #9 |
> Using /dev/urandom does resolve the issue, but is it as secure?
There are more things that could happen to make it less secure now. My understanding: though the early entropy is going to /dev/urandom now, there may be more things that feed from /dev/urandom (thus using that entropy up), and there is nothing to hold Tomcat back to wait for more entropy like there was before.
Must Tomcat block everything while it is waiting for entropy, or does the system still boot?
Fundamentally, the issue is that the system needs an early entropy source, and VMs have little. The kernel decides what is safe and available to use, and so if Tomcat wants high quality entropy and the kernel now says to wait, it'll have to wait.
I wonder if there is a bigger picture solution to this. What if, for example, an external source could optionally provide some entropy to the VM for early boot? cloud-init could take it then, for example, and feed the kernel, at least for the first-ever boot. Disadvantages: you have to trust the host more than you did before; the fed entropy would be have stored, and thus vulnerable to compromise; people may do it wrong. Though I'm no expert, and something like this definitely needs to be checked by an expert before doing it.
Stefan Bader (smb) wrote : | #10 |
James/Robie, where would one tweak which random source is used? Currently playing around with it btu I don't understand why tweaking a certain new setting has the effect it seems to have...
So there is this new /proc/sys/
Btw, the system does boot for me. It is just the 8080 socket that seems to appear only later (and which the testcase checks for)
Robie Basak (racb) wrote : | #11 |
Useful reading: http://
/dev/urandom does not block at the cost of not being the best quality entropy. /dev/random is recommended for long term cryptographic use, but at the cost of blocking if the kernel doesn't have enough entropy available.
If the problem is that in general use Tomcat takes longer to start listening on its socket (but is otherwise unaffected), then I think that the immediate problem could be fixed in the test case. It could use the symlink trick to simulate a system that does always have enough entropy available.
Question: should the test case be checking that Tomcat works eventually when enough entropy is provided, or that Tomcat starts listening quickly on an entropy-starved system?
An obvious secondary problem for users is "so how do I get enough entropy to get my VM running Tomcat to start listening faster, then?". The answer to this is the same as always - from the usual sources, including the option of an external hardware entropy source passed through to the VM, or from some other external source and fed in to the kernel from userspace.
A tertiary, perhaps blueprint-level item might be to make it easier for users to get entropy to their VMs, in order to make Ubuntu VM use better in general for all our users. This might involve us recommending a method and making it more automatic, for example via cloud-init.
SecureRandom generation is taking alot longer than normal, meaning that the tomcat7 instance is not starting fully before the tests are run.
We saw this before but I can't remember what caused it.