Containerd cannot pull sandbox image from private registry

Bug #1851850 reported by Konstantinos Zagganas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Containerd Subordinate Charm
Fix Released
Undecided
George Kraft

Bug Description

I have set up a k8s cluster with the 1.16 bundle. However, when I try to run a new job, I get the following error:
"Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedCreatePodSandBox 5s (x23 over 4m49s) kubelet, juju-810b36-athena-kube-0 Failed create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "172.16.0.239:5000/pause-amd64:3.1": failed to pull image "172.16.0.239:5000/pause-amd64:3.1": failed to resolve image "172.16.0.239:5000/pause-amd64:3.1": parse registry endpoint "172.16.0.239:5000": parse 172.16.0.239:5000: first path segment in URL cannot contain colon
"

If I change the sandbox_image in /etc/containerd/config.toml to k8s.gcr.io/pause-amd64:3.1, then the image is pulled correctly and I get an ErrorImagePullOff for the image I try to run (which exists in the registry), with the same message about the URL as before.

It seems that the problem lies with "172.16.0.239:5000" part.

I have placed the juju crashdump here: juju-crashdump-6a4cc801-53d5-4e90-b9ce-c643c28d8412.tar.xz because whenever I tried to attach it, I got an error due to the size of the file: http://andrea.imsi.athenarc.gr/juju-crashdump-6a4cc801-53d5-4e90-b9ce-c643c28d8412.tar.xz

description: updated
Revision history for this message
George Kraft (cynerva) wrote :

Thanks for the report. I can reproduce this by deploying Charmed Kubernetes 1.16 with the docker-registry charm related to containerd.

Changed in charm-containerd:
status: New → Confirmed
George Kraft (cynerva)
Changed in charm-containerd:
assignee: nobody → George Kraft (cynerva)
status: Confirmed → In Progress
Revision history for this message
George Kraft (cynerva) wrote :

This is caused by an errant value for plugins.cri.registry.mirrors."{{ registry.url }}".endpoint in our containerd config.toml [1]. It needs to include a scheme, e.g. https://.

However, fixing that reveals a new error:

failed to get sandbox image "172.31.58.88:5000/pause-amd64:3.1": failed to pull image "172.31.58.88:5000/pause-amd64:3.1": failed to resolve image "172.31.58.88:5000/pause-amd64:3.1": no available registry endpoint: failed to do request: Head https://172.31.58.88:5000/v2/pause-amd64/manifests/3.1: remote error: tls: bad certificate

The docker-registry logs give a clearer message:

http: TLS handshake error from 172.31.43.76:52726: tls: client didn't provide a certificate

The registry is configured with mutual authentication enabled, meaning that containerd needs to provide a client certificate signed by the easyrsa charm. We already have containerd config entries that should handle this[2], but they seem to be ignored.

We ship containerd 1.2.6-0ubuntu1~18.04.2, but according to containerd release notes[3], registry TLS support was not added until containerd 1.3.0. Looks like we will have to update to a later version of containerd (and fix up the config), or disable mutual authentication in the docker-registry charm.

[1]: https://github.com/charmed-kubernetes/charm-containerd/blob/ae8510e9545f811bb3820bf4de589230fe9166db/templates/config.toml#L65
[2]: https://github.com/charmed-kubernetes/charm-containerd/blob/ae8510e9545f811bb3820bf4de589230fe9166db/templates/config.toml#L81-L82
[3]: https://github.com/containerd/containerd/releases

Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :

The above PR only fixes the URL parse error originally reported in this issue.

I've opened a new issue to address the `tls: bad certificate:` error that you're likely to hit as well: https://bugs.launchpad.net/charm-containerd/+bug/1853653

George Kraft (cynerva)
Changed in charm-containerd:
status: In Progress → Fix Committed
Changed in charm-containerd:
milestone: none → 1.17
Changed in charm-containerd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.