ubuntu focal on arm/v7 has broken ca-certificates

Bug #2069719 reported by Robert Sachunsky
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
New
Undecided
Unassigned

Bug Description

I am trying to use the most recent ubuntu:20.04 build for armhf from Dockerhub:

https://hub.docker.com/layers/library/ubuntu/20.04/images/sha256-717a8dabf82777782587740c79fd8704d3ec06583be6eacc67242cd00b9a0fd1

which I believe corresponds to

https://git.launchpad.net/cloud-images/+oci/ubuntu-base/tree/oci/blobs/sha256/5b2a13b49acb61bc754c03841a405fb6416cbc3a2fd117c120ebaa26d22ccfbf?h=oci-focal-20.04&id=ab223cc200e8654f24b973f8d18d7c6be0c78d2c

Now, using this minimal Dockerfile:

```
FROM ubuntu:20.04

RUN apt-get update && apt-get -y install ca-certificates wget
RUN wget -O - https://helloworld.letsencrypt.org
```

and building this using
```
docker buildx build --platform linux/arm/v7 -f Dockerfile -t test-tls .
```

I always end up with a TLS certificate failure like so:

```
#0 building with "xenodochial_nobel" instance using docker-container driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 243B done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 ...

#3 [auth] library/ubuntu:pull token for registry-1.docker.io
#3 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 1.0s

#4 [internal] load .dockerignore
#4 transferring context: 172B done
#4 DONE 0.0s

#5 [1/4] FROM docker.io/library/ubuntu:20.04@sha256:0b897358ff6624825fb50d20ffb605ab0eaea77ced0adb8c6a4b756513dec6fc
#5 resolve docker.io/library/ubuntu:20.04@sha256:0b897358ff6624825fb50d20ffb605ab0eaea77ced0adb8c6a4b756513dec6fc 0.0s done
#5 DONE 0.0s

#6 [2/4] RUN apt-get update && apt-get -y install ca-certificates wget
#6 CACHED

#7 [3/4] RUN wget -O - https://helloworld.letsencrypt.org
#7 0.208 --2024-06-18 10:43:13-- https://helloworld.letsencrypt.org/
#7 0.253 Resolving helloworld.letsencrypt.org (helloworld.letsencrypt.org)... 52.9.173.94
#7 0.323 Connecting to helloworld.letsencrypt.org (helloworld.letsencrypt.org)|52.9.173.94|:443... connected.
#7 0.937 ERROR: cannot verify helloworld.letsencrypt.org's certificate, issued by 'CN=R11,O=Let\'s Encrypt,C=US':
#7 0.937 Unable to locally verify the issuer's authority.
#7 0.939 To connect to helloworld.letsencrypt.org insecurely, use `--no-check-certificate'.
#7 ERROR: process "/dev/.buildkit_qemu_emulator /bin/sh -c wget -O - https://helloworld.letsencrypt.org" did not complete successfully: exit code: 5
```

This also affects many other certificates issued via Let's Encrypt, including things like apt.kitware.org, and so far I did not find a workaround (like forcing update-ca-certificates). So essentially, this release is broken.

Note that this does _not_ happen on any of the other supported target architectures / platforms, or Ubuntu versions.

Tags: docker
Revision history for this message
John Chittum (jchittum) wrote :

thank you for reporting. definitely odd as ca-certificates unpacks to the same directory for all platforms, and should have the same content on all platforms (it's _just_ certificates, nothing platform specific there). we'll have to do a little investigation and let you know what we find.

Revision history for this message
John Chittum (jchittum) wrote :
Download full text (3.1 KiB)

so far, i've been unable to reproduce. my setup:

host:
ubuntu 20.04
Docker version 24.0.7, build 24.0.7-0ubuntu2~20.04.1
docker-buildx 0.12.1-0ubuntu1~20.04.1

1. t4g.large arm64 ubuntu 20.04 VM on aws ec2
2. ssh in and:
sudo apt update && sudo apt install docker.io docker-buildx
3. copy in the minimal Dockerfile above
4. sudo docker buildx build --platform linux/arm/v7 -f Dockerfile . | tee build.out
[+] Building 13.6s (7/7) FINISHED docker:default
 => [internal] load build definition from Dockerfile 0.0s
 => => transferring dockerfile: 169B 0.0s
 => [internal] load .dockerignore 0.0s
 => => transferring context: 2B 0.0s
 => [internal] load metadata for docker.io/library/ubuntu:20.04 0.4s
 => CACHED [1/3] FROM docker.io/library/ubuntu:20.04@sha256:0b897358ff6624825fb50d20ffb605ab0eaea77ced0adb8c6a4b756513dec6fc 0.0s
 => => resolve docker.io/library/ubuntu:20.04@sha256:0b897358ff6624825fb50d20ffb605ab0eaea77ced0adb8c6a4b756513dec6fc 0.0s
 => [2/3] RUN apt-get update && apt-get -y install ca-certificates wget 12.5s
 => [3/3] RUN wget -O - https://helloworld.letsencrypt.org 0.5s
 => exporting to image 0.2s
 => => exporting layers 0.2s
 => => writing image sha256:e6c913231eb0dc5ab2476c9d562eae87c8782479c4ca35e08c59a0cdf9b32342

Can you provide more information? Build host version, docker version...

Read more...

Revision history for this message
Tianon Gravi (tianon) wrote :

I wonder if this is related to / the same problem as https://github.com/debuerreotype/docker-debian-artifacts/issues/219 ? (short version is that for some reason we didn't figure out, running "update-ca-certificates" via QEMU user-mode emulation causes rehash to fail, but silently)

Revision history for this message
John Chittum (jchittum) wrote :

That's a good possibility. I was trying to do a multi-platform build on my Ubuntu 24.04 x86 machine, but it was having none of it (getting the good ole errors of apt on a mismatched platform). hence my move to arm64.

Revision history for this message
Robert Sachunsky (bertsky) wrote :

I just tried to bisect: it turns out that (by the above test) not a single one of the armhf images ever had a working setup, starting from the next-older focal-20240427 all the way down to focal-20200703 (which is the oldest release that I can still fetch today).

Revision history for this message
Robert Sachunsky (bertsky) wrote :

Oops, sry, I did no receive your responses via email – just saw them now.

I agree it sounds implausible that just one architecture should be affected by this, but here I am trying to build a multi-platform image, and only one platform fails. (I don't know how QEMU works, but perhaps it has subtle bugs in arm/v7?)

I built a minimal working example repo under https://github.com/bertsky/test-ubuntu-multiplatform – please compare https://github.com/bertsky/test-ubuntu-multiplatform/actions/runs/9588890102/job/26441725212 (which fails at a arm/v7 wget) with https://github.com/bertsky/test-ubuntu-multiplatform/actions/runs/9588934364/job/26441858240 (which succeeds because I removed arm/v7 from the list).

Revision history for this message
Robert Sachunsky (bertsky) wrote :

So on the demo repo CI, you can see Ubuntu 22 / Docker v26.1.3 / buildx v0.15.0 / buildkit v0.13.2

But I get the same failure locally on Debian 10 / Docker v26.1.4 / buildx v0.14.0 / buildkit v0.13.2

And on Ubuntu 18 / Docker v24.0.2 / buildx v0.10.5 buildkit v0.11.7

Revision history for this message
John Chittum (jchittum) wrote :

This may still be related to the issue tianon and I have brought up. Github linux runners are by default x86 architecture. there appear to be issues with docker buildx and multiplatform builds on Debian and Ubuntu using an x86 host to do armv7 builds.

I was trying to look up how to do the public beta for linux arm64 builds for Github Runners. my gut is if that repo were to execute on an arm64 host, it'd succeed.

I think it's safe to say it's not a `cloud-images` bug. There's definitely a bug _somewhere_, but i'm having lots of trouble finding the _where_.

For a workaround for multiplatform now, it looks like building armv7 on armv8+ based nodes works. cross-platform seems broken, but root cause is unknown.

I'll see if i can spend a bit more time digging around to see if a root cause can pop up.

# NEXT STEPS #

* Try building with the public beta arm64 linux images and see if it works
* try building on a remote arm64 instance (all the public clouds have armv8+ variants now) -- this works for me now
* spend time debugging deeper issues with buildx, qemu emulation, and ca-certificates
    * based on stuck-stuck's comment in https://github.com/debuerreotype/docker-debian-artifacts/issues/219 it looks like the issue is reproducible with qemu emulation and installing ca-certificates. may indeed be a bug in ca-certs python script and armv7? (there is a python script that is used to move and generate those {hash}.0 files...at least i think that's what the script does)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.