Kubernetes test suite fails on mainline kernel 4.11+

Bug #1741887 reported by Chris Glass on 2018-01-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-gke (Ubuntu)
High
Joseph Salisbury
Artful
High
Joseph Salisbury
Bionic
High
Joseph Salisbury

Bug Description

The to-be-introduced 4.13 based kernel fails the kubernetes test suite (https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-ubuntudev2-k8sbeta-default/15?log#log)

After rough bisecting (using kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/ ), it seems some kernel change introduced in 4.11 breaks those tests (4.10 is good, 4.11+ are bad. 4.11-rc* are bad).

The minimal reproducer (doesn't have to be on a cloud image - my bionic desktop reproduces it just as well):

sudo apt install docker.io
sudo docker run -d gcr.io/kubernetes-e2e-test-images/hostexec-amd64:1.0
(note down returned hash)
sudo docker exec -it (first few chars of returned hash) /bin/sh
# (Inside the docker prompt)
timeout -t 1 cat

On kernels exhibiting the problem, the "timeout/cat" command above will terminate the container.

On kernels exhibiting the problem, docker logs exhibits:

# sudo docker logs (first few chars of returned hash)
/bin/sh: can't open /fifo: Interrupted system call

Kernels that do not exhibit the problem will not terminate after the "timeout" command and return to a normal command prompt.

Changed in linux-gke (Ubuntu):
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
status: New → In Progress
tags: added: performing-bisect
Chris Glass (tribaal) wrote :

Just tried with v4.15-rc7 from http://kernel.ubuntu.com/~kernel-ppa/mainline/ and the problem persists.

Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v4.10 final and v4.10-rc1. The kernel bisect will require testing of about 13 test kernels.

I built the first test kernel, up to the following commit:
caa59428971d5ad81d19512365c9ba580d83268c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1741887/caa59428971d5ad81d19512365c9ba580d83268c

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Chris Glass (tribaal) wrote :

Kernel from #2 is good.

Chris Glass (tribaal) wrote :

Kernel from #4 is good.

Chris Glass (tribaal) wrote :

kernel from #6 is bad.

Chris Glass (tribaal) wrote :

kernel from #8 is good.

Chris Glass (tribaal) wrote :

kernel in #10 is bad.

Chris Glass (tribaal) wrote :

Kernel in #12 is good.

Joseph Salisbury (jsalisbury) wrote :

I built the first test kernel, up to the following commit:
601109c5c74a10e6b89465cb6aa31a40d1efc8e3

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1741887/601109c5c74a10e6b89465cb6aa31a40d1efc8e3

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Joseph Salisbury (jsalisbury) wrote :

comment #14 should read the "Next" test kernel and not "first".

Chris Glass (tribaal) wrote :

Kernel in #14 is good.

Chris Glass (tribaal) wrote :

Kernel in #17 is bad

Joseph Salisbury (jsalisbury) wrote :

f1ef09fde17f9b77ca1435a5b53a28b203afb81c was good.

Next kernel is ready:

http://kernel.ubuntu.com/~jsalisbury/lp1741887/d6cffbbe9a7e51eb705182965a189457c17ba8a3

Chris Glass (tribaal) wrote :

Kernel in #19 is bad.

Chris Glass (tribaal) wrote :

Kernel in #21 is bad

Chris Glass (tribaal) wrote :

Kernel in #23 is good.

Chris Glass (tribaal) wrote :

Kernel in #25 is good.

Chris Glass (tribaal) wrote :

Kernel in #27 was good.

Chris Glass (tribaal) wrote :

Looks like the failure there is due to the following commit in the linux kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=c6c70f4455d1eda91065e93cc4f7eddf4499b105

Changed in linux-gke (Ubuntu Artful):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I noticed this bug was still open. I built a test kernel with a revert of commit c6c70f4455d.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1741887

Can you test this kernel and see if it resolves this bug?

It might also be worth testing the v4.16-rc4 mainline kernel to see if a fix already exists:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4/

Changed in linux-gke (Ubuntu Artful):
status: In Progress → Incomplete
Changed in linux-gke (Ubuntu Bionic):
status: In Progress → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers