Bug #1906372 “cannot deploy bundle, cannot resolve URL, TLS hand...” : Bugs : Canonical Juju

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-12-02:

#1

juju issues a GET http request to the charm store

https://api.jujucharms.com/charmstore/v5/~openstack-charmers-next/ceph-mon/meta/any?include=id&include=supported-series&include=published

to fetch metadata about the charm. This HTTP request is timing out.

It sure seems like there's underlying connectivity issues external to Juju here.

Revision history for this message

Pen Gale (pengale) wrote on 2020-12-02:

#2

This should get fixed when we move to the new shim api.

Changed in juju:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Alvaro Uria (aluria) wrote on 2020-12-16:

#3

juju deploy timeouts samples Edit (57.8 KiB, application/pdf)

Hi,

We're consistently seeing this problem when running func tests on an OpenStack Charm. Please have a look to the attached pdf. In all but one of the tests, the issue was related to this bug. The charmstore times out after 10s.

Is there a "juju deploy" option to increase the timeout (eg. 15 or 20s)?

Revision history for this message

Pen Gale (pengale) wrote on 2020-12-16:

#4

Bumping to high. It sounds like we might be able to increase a timeout value to fix this, and it feels like something that might be affecting production deployments, in addition to the test environment here.

Changed in juju:
importance:	Medium → High
milestone:	none → 2.8.8

Revision history for this message

Pen Gale (pengale) wrote on 2020-12-16:

#5

Per conversation in sync, it is probably better to fix this by being smarter about re-using TLS connection, rather than setting a higher timeout.

(If we bump up the timeout, we're probably going to run into other issues talking to the store down the line.)

Revision history for this message

Alvaro Uria (aluria) wrote on 2020-12-17:

#6

The OpenStack team also suggested a similar approach to tenacity.retry, where connection timeouts may be retried once (or more if an arg like "--connection-retry" could be passed).

Revision history for this message

Alireza Nasri (sysnasri) wrote on 2021-01-22:

#7

Is there a temp workaround about this?

Revision history for this message

Jason Hobbs (jason-hobbs) wrote on 2021-01-29:

#8

Sub'd to field-high, this is affecting solutions-qa release testing.

Canonical Juju QA Bot (juju-qa-bot) on 2021-02-09

Changed in juju:
milestone:	2.8.8 → 2.8.9

Revision history for this message

Joshua Genet (genet022) wrote on 2021-02-09:

#9

Here's what we believe is another manifestation of this.
We run a Kubernetes test suite that's failing to pull image.

---

containerd_2/var/log/syslog:Feb 9 11:20:23 juju-074d0d-7 containerd[41538]: time="2021-02-09T11:20:23.753972008Z" level=error msg="PullImage
"rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8"
failed" error="failed to pull and unpack image
"rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8": failed to resolve reference "rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8": failed to do request: Head https://rocks.canonical.com/v2/cdk/jujusolutions/jujud-operator/manifests/2.8.8: net/http: TLS handshake timeout"

---

Example run:
https://solutions.qa.canonical.com/testruns/testRun/c9df119d-7bc6-4ebc-8f7c-7240897c6f85

Juju status at the bottom of this page:
https://oil-jenkins.canonical.com/job/fce_build/9620/console

Juju model config and crashdump:
https://oil-jenkins.canonical.com/artifacts/c9df119d-7bc6-4ebc-8f7c-7240897c6f85/generated/generated/kubernetes/juju-crashdump-kubernetes-2021-02-09-11.24.47.tar.gz

All artifacts:
https://oil-jenkins.canonical.com/artifacts/c9df119d-7bc6-4ebc-8f7c-7240897c6f85/index.html

Revision history for this message

John A Meinel (jameinel) wrote on 2021-02-09: Re: [Bug 1906372] Re: cannot deploy bundle, cannot resolve URL, TLS handshake timeout

#10

So this feels like something on the order of "your VMs are not getting
enough entropy in order to generate private keys for TLS connections".
I don't know that it is the case, but I worry that just doing retries on
Juju's behalf wont make things better (as it just consumes more of whatever
limited resource is causing TLS handshakes to fail).

On Tue, Feb 9, 2021 at 1:51 PM Joshua Genet <email address hidden>
wrote:

> Here's what we believe is another manifestation of this.
> We run a Kubernetes test suite that's failing to pull image.
>
> ---
>
> containerd_2/var/log/syslog:Feb 9 11:20:23 juju-074d0d-7
> containerd[41538]: time="2021-02-09T11:20:23.753972008Z" level=error
> msg="PullImage
> "rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8"
> failed" error="failed to pull and unpack image
> "rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8": failed to
> resolve reference "
> rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8": failed to do
> request: Head
> https://rocks.canonical.com/v2/cdk/jujusolutions/jujud-operator/manifests/2.8.8:
> net/http: TLS handshake timeout"
>
> ---
>
> Example run:
>
> https://solutions.qa.canonical.com/testruns/testRun/c9df119d-7bc6-4ebc-8f7c-7240897c6f85
>
> Juju status at the bottom of this page:
> https://oil-jenkins.canonical.com/job/fce_build/9620/console
>
> Juju model config and crashdump:
>
> https://oil-jenkins.canonical.com/artifacts/c9df119d-7bc6-4ebc-8f7c-7240897c6f85/generated/generated/kubernetes/juju-crashdump-kubernetes-2021-02-09-11.24.47.tar.gz
>
> All artifacts:
>
> https://oil-jenkins.canonical.com/artifacts/c9df119d-7bc6-4ebc-8f7c-7240897c6f85/index.html
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1906372
>
> Title:
> cannot deploy bundle, cannot resolve URL, TLS handshake timeout
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1906372/+subscriptions
>

So this feels like something on the order of "your VMs are not getting
enough entropy in order to generate private keys for TLS connections".
I don't know that it is the case, but I worry that just doing retries on
Juju's behalf wont make things better (as it just consumes more of whatever
limited resource is causing TLS handshakes to fail).

On Tue, Feb 9, 2021 at 1:51 PM Joshua Genet <1906372@bugs.launchpad.net>
wrote:

> Here's what we believe is another manifestation of this.
> We run a Kubernetes test suite that's failing to pull image.
>
> ---
>
> containerd_2/var/log/syslog:Feb  9 11:20:23 juju-074d0d-7
> containerd[41538]: time="2021-02-09T11:20:23.753972008Z" level=error
> msg="PullImage
> "rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8"
> failed" error="failed to pull and unpack image
> "rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8": failed to
> resolve reference "
> rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8": failed to do
> request: Head
> https://rocks.canonical.com/v2/cdk/jujusolutions/jujud-operator/manifests/2.8.8:
> net/http: TLS handshake timeout"
>
> ---
>
> Example run:
>
> https://solutions.qa.canonical.com/testruns/testRun/c9df119d-7bc6-4ebc-8f7c-7240897c6f85
>
> Juju status at the bottom of this page:
> https://oil-jenkins.canonical.com/job/fce_build/9620/console
>
> Juju model config and crashdump:
>
> https://oil-jenkins.canonical.com/artifacts/c9df119d-7bc6-4ebc-8f7c-7240897c6f85/generated/generated/kubernetes/juju-crashdump-kubernetes-2021-02-09-11.24.47.tar.gz
>
> All artifacts:
>
> https://oil-jenkins.canonical.com/artifacts/c9df119d-7bc6-4ebc-8f7c-7240897c6f85/index.html
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1906372
>
> Title:
>   cannot deploy bundle, cannot resolve URL, TLS handshake timeout
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1906372/+subscriptions
>

Revision history for this message

Ian Booth (wallyworld) wrote on 2021-02-10:

#11

What's interesting is that there's now 2 external services affected, called by 2 separate clients:

1. charm store (used by the juju client)
2. rocks (used by k8s itself, ie containerd)

Given containerd is also affected, ie it connects to rocks to pull an image totally outside of juju, this doesn't look like a Juju issue per se, and really does appear to be an artifact of the deployment environment.

Revision history for this message

Michael Skalka (mskalka) wrote on 2021-02-10:

#12

Ian,

I'm sorry but I don't buy the "it's your environment" line here. The 2.8.7 client was blessed on Dec 11th, 2020 and we shut our CI off the following Wednesday for the holiday break. We started seeing this sporadically starting On Jan 6th [0], basically a day after turning our CI back on. Assuming you didn't release on a Friday that was only a few days for this issue to present. Between the 2.8.7 release tests and today nothing has changed within our test lab that could have caused this.

This has also been confirmed by the OpenStack engineering team [1] and at least one community member.

So either the Juju client has a defect, or the charm store is working poorly. Either way it's a Juju issue.

0. https://solutions.qa.canonical.com/bugs/bugs/bug/1906372
1. https://bugs.launchpad.net/juju/+bug/1906372/comments/3

Revision history for this message

Ian Booth (wallyworld) wrote on 2021-02-10:

#13

It's also containerd, independent of Juju.

Below, the containerd service is trying to pull an image from rocks.canonical.com and gets the TLS handshake error. Juju is not involved here.

containerd_2/var/log/syslog:Feb 9 11:20:23 juju-074d0d-7 containerd[41538]: time="2021-02-09T11:20:23.753972008Z" level=error msg="PullImage
"rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8"
failed" error="failed to pull and unpack image
"rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8": failed to resolve reference "rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.8.8": failed to do request: Head https://rocks.canonical.com/v2/cdk/jujusolutions/jujud-operator/manifests/2.8.8: net/http: TLS handshake timeout"

Ian Booth (wallyworld) on 2021-02-16

Changed in juju:
milestone:	2.8.9 → none

Revision history for this message

Nobuto Murata (nobuto) wrote on 2022-06-09:

#14

K8s/containred retries pulling images, doesn't it?
https://kubernetes.io/docs/concepts/containers/images/#imagepullbackoff

I'm not saying Juju is doing something wrong here, but having retries and backoffs in Juju on pulling resources from charmhub and such make our life much easier.

Jeffrey Chang (modern911) on 2022-11-16

tags:

added: cdo-qa

Canonical Juju

cannot deploy bundle, cannot resolve URL, TLS handshake timeout

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches