test_tomcat_daemon smoke test failure on images with 3.13 kernel

Bug #1269073 reported by Para Siva on 2014-01-14
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tomcat7 (Ubuntu)
High
Unassigned

Bug Description

Starting from 20140109 onwards with the introduction of 3.13 kernel, we are seeing test_tomcat_daemon failure in smoke testing.

Impacted jobs are:
http://ci.ubuntu.com/smokeng/trusty/server/amd64/20140114/6057/tomcat-server/666899/
and
http://ci.ubuntu.com/smokeng/trusty/server/i386/20140114/6058/tomcat-server/666815/

Reporting against tomcat although I am not entirely sure where exactly the issue is.

Steps:
1. Install a tomcat java server on a KVM (libvirt) following,
http://iso.qa.ubuntu.com/qatracker/testcases/1410/info

2. Log into the machine and run the test:
http://bazaar.launchpad.net/~ubuntu-server-dev/ubuntu-test-cases/server-tests-raring/view/head:/testsuites/tomcat-server/test_tomcat_daemon/test.py

or run the following
ubuntu@ubuntu:~$ sudo netstat -ltnp | grep java
tcp6 0 0 :::8080 :::* LISTEN 1100/java

Note: Downgrading the kernel to 3.12.0-7-generic makes the test pass, with 60 second sleep for the daemon to start.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: tomcat7 7.0.47-1
ProcVersionSignature: Ubuntu 3.13.0-1.16-generic 3.13.0-rc7
Uname: Linux 3.13.0-1-generic i686
ApportVersion: 2.13.1-0ubuntu1
Architecture: i386
Date: Tue Jan 14 16:38:54 2014
InstallationDate: Installed on 2014-01-10 (4 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Alpha i386 (20140110)
PackageArchitecture: all
SourcePackage: tomcat7
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.tomcat7.tomcat.users.xml: [inaccessible: [Errno 13] Permission denied: '/etc/tomcat7/tomcat-users.xml']

Para Siva (psivaa) wrote :
description: updated
James Page (james-page) wrote :

SecureRandom generation is taking alot longer than normal, meaning that the tomcat7 instance is not starting fully before the tests are run.

We saw this before but I can't remember what caused it.

James Page (james-page) wrote :

I suspect that this points to some sort of problem with entropy generation in /dev/random which is what gets used by default.

Robie Basak (racb) wrote :

If the problem is a shortage of entropy in /dev/random, then I suggest that tests replace /dev/random with a symlink to /dev/urandom instead. urandom should be good enough for testing purposes, since we're not testing the quality of entropy sources.

Does this resolve the issue?

Andy Whitcroft (apw) wrote :

If the test @Robie suggests above resolves the issue it might indicate that this commit below (which was introduced to improve security) might be the underlying change which makes this worse:

  commit 40db23e5337d99fda05ee6cd18034b516f8f123d
  Author: Theodore Ts'o <email address hidden>
  Date: Sun Nov 3 00:15:05 2013 -0400

    random: make add_timer_randomness() fill the nonblocking pool first

    Change add_timer_randomness() so that it directs incoming entropy to
    the nonblocking pool first if it hasn't been fully initialized yet.
    This matches the strategy we use in add_interrupt_randomness(), which
    allows us to push the randomness where we need it the most during when
    the system is first booting up, so that get_random_bytes() and
    /dev/urandom become safe to use as soon as possible.

    Signed-off-by: "Theodore Ts'o" <email address hidden>

Note that i has been suggested that the machine does come up if you wait long enough. If we take the contention that this is indeed entropy related then if an init job hangs the system boot progress (preventing ssh etc) as it is waiting on entropy then all sources of entropy will also be gone other than network. This implies that if you ping the machine in this phase or increase the rate of ssh attempts that the machine might boot faster; which would also be confirmatory of this conjecture.

James Page (james-page) on 2014-01-16
Changed in tomcat7 (Ubuntu):
status: New → Confirmed
importance: Undecided → High
James Page (james-page) wrote :

replacing the use in the test case is well and good - but this is actually an issue that will impact users.

James Page (james-page) wrote :

Using /dev/urandom does resolve the issue, but is it as secure?

Robie Basak (racb) wrote :

> Using /dev/urandom does resolve the issue, but is it as secure?

There are more things that could happen to make it less secure now. My understanding: though the early entropy is going to /dev/urandom now, there may be more things that feed from /dev/urandom (thus using that entropy up), and there is nothing to hold Tomcat back to wait for more entropy like there was before.

Must Tomcat block everything while it is waiting for entropy, or does the system still boot?

Fundamentally, the issue is that the system needs an early entropy source, and VMs have little. The kernel decides what is safe and available to use, and so if Tomcat wants high quality entropy and the kernel now says to wait, it'll have to wait.

I wonder if there is a bigger picture solution to this. What if, for example, an external source could optionally provide some entropy to the VM for early boot? cloud-init could take it then, for example, and feed the kernel, at least for the first-ever boot. Disadvantages: you have to trust the host more than you did before; the fed entropy would be have stored, and thus vulnerable to compromise; people may do it wrong. Though I'm no expert, and something like this definitely needs to be checked by an expert before doing it.

Stefan Bader (smb) wrote :

James/Robie, where would one tweak which random source is used? Currently playing around with it btu I don't understand why tweaking a certain new setting has the effect it seems to have...

So there is this new /proc/sys/kernel/random/urandom_min_reseed_secs which is said to be the time between reseeding urandom (assuming from random). This is defaulting to 60s. If I change it to 5s, it feels like tomcat comes up quicker. Somehow that would only make sense to me if whatever tomcat waits for is urandom already.

Btw, the system does boot for me. It is just the 8080 socket that seems to appear only later (and which the testcase checks for)

Robie Basak (racb) wrote :

Useful reading: http://man7.org/linux/man-pages/man4/random.4.html and http://en.wikipedia.org/wiki//dev/random

/dev/urandom does not block at the cost of not being the best quality entropy. /dev/random is recommended for long term cryptographic use, but at the cost of blocking if the kernel doesn't have enough entropy available.

If the problem is that in general use Tomcat takes longer to start listening on its socket (but is otherwise unaffected), then I think that the immediate problem could be fixed in the test case. It could use the symlink trick to simulate a system that does always have enough entropy available.

Question: should the test case be checking that Tomcat works eventually when enough entropy is provided, or that Tomcat starts listening quickly on an entropy-starved system?

An obvious secondary problem for users is "so how do I get enough entropy to get my VM running Tomcat to start listening faster, then?". The answer to this is the same as always - from the usual sources, including the option of an external hardware entropy source passed through to the VM, or from some other external source and fed in to the kernel from userspace.

A tertiary, perhaps blueprint-level item might be to make it easier for users to get entropy to their VMs, in order to make Ubuntu VM use better in general for all our users. This might involve us recommending a method and making it more automatic, for example via cloud-init.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers