Deeply layered Docker image problems

Bug #1702979 reported by Edward Vielmetti on 2017-07-07
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
docker.io (Ubuntu)
High
Unassigned
Xenial
Undecided
Michael Hudson-Doyle

Bug Description

Docker 1.12.x is built with a version of Go which contains a bug (https://bugs.launchpad.net/ubuntu/+source/golang-defaults/+bug/1661222) in syscall.Getpagesize. As a result, Docker builds that create images with lots of layers fail to behave properly.

If Go (golang) is fixed to correct this bug then Docker builds cleanly.

If not, then you get mysterious behavior like this experience:

https://github.com/ros2/ci/pull/73
https://github.com/ros2/ci/issues/75

for which the only reasonable workaround now is to recommend to not use the system Docker but instead to get the latest-and-greatest.

We've seen a second customer using ARM64 Docker 1.12.6 on Ubuntu running into errors that could be explained by issue. Would it be possible to triage it? Fixing the upstream Go bug and recompiling may be enough to solve.

Michael Hudson-Doyle (mwhudson) wrote :

Fixing the go bug in the version of Go in Xenial is highly non-trivial and I don't really want to go there. Two other ways of approaching this would be:

1) Use Adam's suggestion from bug 1661222 of patching Go to use the actually-correct-on-all-Ubuntu page size, then rebuilding docker.

2) Build docker with a newer version of Go. There is no newer version of Go in Xenial but we could in theory upload a newer version or build the newer version of Go as part of the docker build.

1) would be very very much easier, is it possible for you to test if this actually helps?

Another option would be to steer your customers towards the docker snap, if that works better (I haven't tried it at all on arm64 but it does at least seem to be built there).

Changed in docker.io (Ubuntu):
status: New → Triaged
importance: Undecided → High

I can test this possible fix (repair Go, then rebuild Go) next week, I am
on holiday this week.

> 1) Use Adam's suggestion from bug 1661222 of patching Go to use the
actually-correct-on-all-Ubuntu page size, then rebuilding docker.

On Sun, Jul 16, 2017 at 9:11 PM, Michael Hudson-Doyle <
<email address hidden>> wrote:

> Fixing the go bug in the version of Go in Xenial is highly non-trivial
> and I don't really want to go there. Two other ways of approaching this
> would be:
>
> 1) Use Adam's suggestion from bug 1661222 of patching Go to use the
> actually-correct-on-all-Ubuntu page size, then rebuilding docker.
>
> 2) Build docker with a newer version of Go. There is no newer version of
> Go in Xenial but we could in theory upload a newer version or build the
> newer version of Go as part of the docker build.
>
> 1) would be very very much easier, is it possible for you to test if
> this actually helps?
>
> Another option would be to steer your customers towards the docker snap,
> if that works better (I haven't tried it at all on arm64 but it does at
> least seem to be built there).
>
> ** Changed in: docker.io (Ubuntu)
> Status: New => Triaged
>
> ** Changed in: docker.io (Ubuntu)
> Importance: Undecided => High
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1702979
>
> Title:
> Deeply layered Docker image problems
>
> Status in docker.io package in Ubuntu:
> Triaged
>
> Bug description:
> Docker 1.12.x is built with a version of Go which contains a bug
> (https://bugs.launchpad.net/ubuntu/+source/golang-
> defaults/+bug/1661222) in syscall.Getpagesize. As a result, Docker
> builds that create images with lots of layers fail to behave properly.
>
> If Go (golang) is fixed to correct this bug then Docker builds
> cleanly.
>
> If not, then you get mysterious behavior like this experience:
>
> https://github.com/ros2/ci/pull/73
> https://github.com/ros2/ci/issues/75
>
> for which the only reasonable workaround now is to recommend to not
> use the system Docker but instead to get the latest-and-greatest.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/
> 1702979/+subscriptions
>

--
Edward Vielmetti +1 734 330 2465
<email address hidden>

Raghuram Kota (rkota) wrote :

A build with option #1 suggested in comment #2 is now available in a PPA (Many thanks to mwhudson!) and is ready for test. To install :

$ sudo add-apt-repository ppa:mwhudson/devirt
$ sudo apt-get update

Then apt-get upgrade or apt-get install docker.io containerd runc should get the rebuilt versions.

Raghuram Kota (rkota) wrote :

Hi @Edward Vielmetti : Would it be possible for you to test the fix uploaded to PPA by mwhudson (comm # 4 ?) ? Thx!

Thanks Raghuram - I'll put together a test based on the PPA.

On Mon, Jul 31, 2017 at 10:32 AM, Raghuram Kota <<email address hidden>
> wrote:

> Hi @Edward Vielmetti : Would it be possible for you to test the fix
> uploaded to PPA by mwhudson (comm # 4 ?) ? Thx!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1702979
>
> Title:
> Deeply layered Docker image problems
>
> Status in docker.io package in Ubuntu:
> Triaged
>
> Bug description:
> Docker 1.12.x is built with a version of Go which contains a bug
> (https://bugs.launchpad.net/ubuntu/+source/golang-
> defaults/+bug/1661222) in syscall.Getpagesize. As a result, Docker
> builds that create images with lots of layers fail to behave properly.
>
> If Go (golang) is fixed to correct this bug then Docker builds
> cleanly.
>
> If not, then you get mysterious behavior like this experience:
>
> https://github.com/ros2/ci/pull/73
> https://github.com/ros2/ci/issues/75
>
> for which the only reasonable workaround now is to recommend to not
> use the system Docker but instead to get the latest-and-greatest.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/
> 1702979/+subscriptions
>

--
Edward Vielmetti +1 734 330 2465
<email address hidden>

Raghuram Kota (rkota) wrote :

Hi @Edward Vielmetti : Trying to follow up to see if you had a chance to test. Many thanks for the help!

Hi @rkota - I was able to successfully install this version of Docker from the PPA.

I'm working on a test setup that generates enough layers to prove out the bug, and will report back when I get that.

There's a CI system used by the ROS team, and they have identified a branch to test this on which has a Dockerfile with 50 layers, enough to trigger the bug.

https://github.com/ros2/ci/issues/75

is the open issue, if you're tracking across projects.

tags: added: id-59655279753fb7ac84fe2084

I've confirmed that the PPA provided by @mwhudson in #4 addresses the crash reported in the issue.

I've also identified a simple test Dockerfile that crashes on an unpatched system, and works properly on the new PPA.

https://gist.github.com/bdafb8e961f55b2533fee8fa5221d186 - rename as "Dockerfile", then run

$ docker build -t deep-files .

With the PPA, this runs to completion; without it, it crashes at build time with

Step 41 : RUN mkdir 40
error creating aufs mount to /var/lib/docker/aufs/mnt/787c80e88d99c4ed74305f16ce
30395e3346cfa4629c3d078f9fab3c6e4e52f0: invalid argument

The Dockerfile is very simple; it just creates 100 directories, one layer at a time.

Changed in docker.io (Ubuntu):
status: Triaged → Opinion
status: Opinion → Invalid
Changed in docker.io (Ubuntu Xenial):
status: New → Triaged
assignee: nobody → Michael Hudson-Doyle (mwhudson)

Hello Edward, or anyone else affected,

Accepted docker.io into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/docker.io/1.13.1-0ubuntu1~16.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in docker.io (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-xenial

Hi Ed, are you able to test the version of docker.io that's now in xenial-proposed? If not, I can do it but it's always best to get the original reporter to check that problems are fixed!

This version can't be released to updates until the k8s people release their charms (see #1712954) but I think that's due today or tomorrow.

Michael - I'm able to test this, but I need to set up a fresh machine to make sure I didn't bodge one of the install steps.

Just to be clear, the xenial-proposed version is 1.13.1 based, and the earlier PPA version was 1.12.x based, so this is newer code than what I had first tested.

On 3 November 2017 at 15:27, Edward Vielmetti <email address hidden>
wrote:

> Michael - I'm able to test this, but I need to set up a fresh machine to
> make sure I didn't bodge one of the install steps.
>

Thanks.

> Just to be clear, the xenial-proposed version is 1.13.1 based, and the
> earlier PPA version was 1.12.x based, so this is newer code than what I
> had first tested.
>

Yes, that's a good point. I'd be interesting in hearing about any
regressions that are caused by the newer version! (We don't have any
autopkgtests running on arm64 yet unfortunately).

We'll be moving on to 17.03.2-ce at some point not too far away hopefully.

Actually I wanted to check some things for myself so I installed a system, installed the version of docker.io 1.13 that was in proposed before this bug was fixed, verified that the 'deep-files' build failed (thanks for the super simple test case!) in the same way as you described, then installed 1.13.1-0ubuntu1~16.04.2 and verified that the build succeeded. I also ran the basic arm64 smoke tests but I'd still be interested in the results of any testing you can do.

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package docker.io - 1.13.1-0ubuntu1~16.04.2

---------------
docker.io (1.13.1-0ubuntu1~16.04.2) xenial; urgency=medium

  * Rebuild with golang-1.6 1.6.2-0ubuntu5~16.04.4 which uses the correct page
    size on arm64 for use with Ubuntu kernels. (LP: #1702979)

docker.io (1.13.1-0ubuntu1~16.04.1) xenial; urgency=medium

  * Backport to Xenial. (LP: #1712954)
  * Install the service file with .install again, fixing service activation
    on install.
  * Use golang.org/x/net/context instead of stdlib context to enable building
    with Go 1.6.

 -- Michael Hudson-Doyle <email address hidden> Thu, 02 Nov 2017 11:43:13 +1300

Changed in docker.io (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for docker.io has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.