Juju agent cannot download from FIPS enabled LXD containers in Openstack Deployment

Bug #2002841 reported by Moises Emilio Benzan Mora
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Undecided
Unassigned
lxd
New
Undecided
Unassigned

Bug Description

As the title suggests, when we have a FIPS enabled deployment, the LXD subordinate units of the Openstack layer cannot download the Juju agent binary from the Juju controller. Our best guess into what might be causing this is that there is a TLS v1.3 cipher mismatch between what the internal Juju controller web server offers, the go webserver, which has a (known bug around NIST-approved ciphers)[https://github.com/golang/go/issues/54072] and what the FIPS compliant lxd container can accept thus leading to the bellow error.

From the LXD console.log of the respective container:
```
[ 4115.002314] cloud-init[979]: + echo Attempt 181 to download agent binaries from 'https://10.246.165.124:17070/model/acce9b42-a01e-4eb0-87fa-abc0506e8c6b/tools/2.9.37-ubuntu-amd64'...\n
[ 4115.002482] cloud-init[979]: Attempt 181 to download agent binaries from 'https://10.246.165.124:17070/model/acce9b42-a01e-4eb0-87fa-abc0506e8c6b/tools/2.9.37-ubuntu-amd64'...
[ 4115.002648] cloud-init[979]: + curl -sSf --connect-timeout 20 --noproxy * --insecure -o /var/lib/juju/tools/2.9.37-ubuntu-amd64/tools.tar.gz https://10.246.165.124:17070/model/acce9b42-a01e-4eb0-87fa-abc0506e8c6b/tools/2.9.37-ubuntu-amd64
[ 4115.039889] cloud-init[979]: curl: (35) error:0607B0C8:digital envelope routines:EVP_CipherInit_ex:disabled for FIPS

```

The container then enters this infinite loop and the deployment gets stuck. The solutions QA team has a reproducer and an active environment with this bug, both that can be made available if requested.

Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :

For a compact reproducer, create a Juju controller bootstrapped with the below cloudinit userdata and make this same cloudinit userdata a model default, then create a unit that has an LXD subordinate.

Cloudinit userdata:

```
write_files:
            - owner: root:root
              path: /tmp/enable-fips-updates.sh
              permissions: '0775'
              content: |
                  #!/bin/bash

                  function log () {
                    echo "[FIPS-Updates] $1"
                  }

                  function with_retry () {
                    RETRIES=O
                    CMD=$1
                    until [[ $RETRIES -eq 90 ]] || $CMD
                    do
                      RETRIES=$((RETRIES+10))
                      sleep $RETRIES
                    done
                  }

                  # Attach this unit to UA/Ubuntu Pro. Change this token in the future.
                  log "Attaching unit to Ubuntu Pro/Ubuntu Advantage."
                  pro attach <Ubuntu Pro Token>

                  log "Enabling FIPS and FIPS-UPDATES"
                  pro enable --assume-yes fips fips-updates

                  with_retry "apt update --yes"
                  with_retry "apt upgrade --yes"

          preruncmd:
            - /tmp/enable-fips-updates.sh
          power_state:
            delay: "+0"
            mode: reboot
            timeout: 40
            condition: True
```

Revision history for this message
Juan M. Tirado (tiradojm) wrote :

What Juju version are you using?

Changed in juju:
status: New → Incomplete
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :

We were using Juju 2.9.37

Changed in juju:
status: Incomplete → New
Revision history for this message
Joseph Phillips (manadart) wrote (last edit ):

Can you confirm:
- That the hosts can download agent blobs fine.
- What OS version are the hosts in this instance.
- What OS version are the guest LXD containers.

Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :

Hosts and guest OS were Focal.

The hosts can indeed download the agent blobs fine.

We can provide an active environment with the bug reproduced on request if needed.

Revision history for this message
Joseph Phillips (manadart) wrote :

I've added LXD tentatively, mostly hoping for some insight.

If the hosts are FIPs-enabled and can download blobs, then it isn't particular to the ciphers in use, rather that the requests have a different origin.

tags: added: lxd-provider
Changed in juju:
status: New → Triaged
Revision history for this message
Jeff Hillman (jhillman) wrote :

This issue is true for non LXD units. If you install FIPs via curtin_userdata in MAAS, when the machine does the final reboot and tries to get a juju agent the same issue occurs.

the issue is that the self-signed certificate that juju created is using the TLS_CHACHA20_POLY1305_SHA256 cipher which only works in non-FIPS mode.

It should instead use something like TLS_AES_256_GCM_SHA384 which is 100% supported in FIPS mode.

This is blocking a customer deployment.

Revision history for this message
Henry Coggill (henrycoggill) wrote :

Certificates do not specify which ciphers the TLS connection uses, whether or not they are self-signed, this is determined by the server's configuration, the client's configuration, and their mutual agreement.

Revision history for this message
Jeff Hillman (jhillman) wrote :

subscribed field-critical

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (8.0 KiB)

So I tried these steps to reproduce the issue, to make sure that we can fix it, but I was able to reproduce.
Note that I'm using LXD and not VMs, and it may be that FIPS needs to poke at my kernel to break me, but the ciphers I'm getting are happy:

Create a config file that will enable pro with my personal UA token (redacted for the innocent).

$ cat config.yaml
cloudinit-userdata: |
  preruncmd:
 - pro refresh config
 - pro attach $SUPERSECRETTOKEN
 - pro enable usg --assume-yes
 - pro enable fips-updates --assume-yes

# Bootstrap using that cloud init configuration
# note also, that you have to bootstrap with '--bootstrap-series' because
# juju 3.1.6 defaults to Jammy. And `pro enable fips-updates` doesn't do anything on Jammy
# because FIPS is not supported.
$ juju bootstrap lxd lxd --config config.yaml --bootstrap-series focal

# Go into that machine and ensure that pro did what I wanted:
$$ cat /var/log/cloud-init-output.log
...

Successfully processed your pro configuration.
Enabling default service esm-apps
Enabling default service esm-infra
Updating package lists
Ubuntu Pro: ESM Apps enabled
Updating package lists
Ubuntu Pro: ESM Infra enabled
This machine is now attached to 'Ubuntu Pro (Apps-only) - Virtual'

SERVICE ENTITLED STATUS DESCRIPTION
anbox-cloud yes disabled Scalable Android in the cloud
esm-apps yes enabled Expanded Security Maintenance for Applications
esm-infra yes enabled Expanded Security Maintenance for Infrastructure
fips yes disabled NIST-certified core packages
fips-updates yes disabled NIST-certified core packages with priority security updates
ros yes disabled Security Updates for the Robot Operating System
usg yes disabled Security compliance and audit tools

NOTICES
Operation in progress: pro attach

For a list of all Ubuntu Pro services, run 'pro status --all'
Enable services with: pro enable <service>

             Account: Canonical - staff
        Subscription: Ubuntu Pro (Apps-only) - Virtual
         Valid until: Fri Dec 31 23:59:59 3999 UTC
Technical support level: essential
One moment, checking your subscription first
Updating package lists
Ubuntu Security Guide enabled
Visit https://ubuntu.com/security/certifications/docs/usg for the next steps
One moment, checking your subscription first
Updating package lists
Installing FIPS Updates packages
FIPS Updates enabled
A reboot is required to complete install.
Please run `apt upgrade` to ensure all FIPS packages are updated to the correct
version.

+ install -D -m 644 /dev/null /var/lib/juju/nonce.txt
...

# Now make sure that all the models that we create also get that configuration
$ juju model-defaults --file config.yaml

# It is plausible this could be added to bootstrap, but I didn't try that.

$ juju add-model test
$ juju model-config cloudinit-userdata
preruncmd:
  - pro refresh config
  - pro attach MYSECRET
  - pro enable usg --assume-yes
  - pro enable fips-updates --assume-yes

# Now deploy an application, also ensuring that we target the Focal version:
$ juju de...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

Though I get the same failure if I bootstrap 3.1.6 to jammy:
$ juju status -m controller
Model Controller Cloud/Region Version SLA Timestamp
controller lxd localhost/localhost 3.1.6 unsupported 16:01:24-04:00

App Version Status Scale Charm Channel Rev Exposed Message
controller active 1 juju-controller 3.1/stable 14 no

Unit Workload Agent Machine Public address Ports Message
controller/0* active idle 0 10.10.30.78

Machine State Address Inst id Base AZ Message
0 started 10.10.30.78 juju-245dcc-0 ubuntu@22.04 Running

$ openssl s_client -cipher "TLS_CHACHA20_POLY1305_SHA256" -connect 10.10.30.78:17070
Call to SSL_CONF_cmd(-cipher, TLS_CHACHA20_POLY1305_SHA256) failed
4027DA2E1C7F0000:error:0A0000B9:SSL routines:SSL_CTX_set_cipher_list:no cipher match:../ssl/ssl_lib.c:2745:

Revision history for this message
John A Meinel (jameinel) wrote :

So at a minimum, this was user error. It seems that openssl s_client -cipher is for TLS v1.2, since we are testing TLS v1.3 you have to use:
$ openssl s_client i -connect 54.91.83.198:17070

I was able to bootstrap to Jammy on AWS and saw that I was able to connect using chacha.

After bootstrapping a Jammy controller, I then added a model to test more VMs using the preload:
$ juju model-defaults --file config.yaml
$ juju add-model test
Added 'test' model on aws/us-east-1 with credential 'jam-aws' for user 'admin'
$ juju model-config cloudinit-userdata
preruncmd:
  - pro refresh config
  - pro attach MYSECRET
  - pro enable usg --assume-yes
  - pro enable fips-updates --assume-yes

Then as that machine started, I did confirm that the fips steps ran in /var/log/cloud-init-output.log
...
Visit https://ubuntu.com/security/certifications/docs/usg for the next steps
One moment, checking your subscription first
Updating package lists
Installing FIPS Updates packages
FIPS Updates enabled
A reboot is required to complete install.
+ install -D -m 644 /dev/null /var/lib/juju/nonce.txt

However, it successfully downloads the agent binaries and gets set up (it is slow, but it works).

Now it did say that I needed to reboot to take effect, but also:
$ openssl s_client -ciphersuites "TLS_CHACHA20_POLY1305_SHA256" 172.30.2.156:17070
...
---
Post-Handshake New Session Ticket arrived:
SSL-Session:
    Protocol : TLSv1.3
    Cipher : TLS_CHACHA20_POLY1305_SHA256
...

I even create a Focal LXD container on that machine, and by default it negotiated:
SSL-Session:
    Protocol : TLSv1.3
    Cipher : TLS_AES_128_GCM_SHA256
but if I forced it, I could get:
SSL-Session:
    Protocol : TLSv1.3
    Cipher : TLS_CHACHA20_POLY1305_SHA256

Now, post reboot things did change:
ubuntu@ip-172-30-4-160:~$ openssl s_client -ciphersuites "TLS_CHACHA20_POLY1305_SHA256" 172.30.2.156:17070
CONNECTED(00000003)
140197754987776:error:0607B0C8:digital envelope routines:EVP_CipherInit_ex:disabled for FIPS:../crypto/evp/evp_enc.c:227:
140197754987776:error:14202006:SSL routines:derive_secret_key_and_iv:EVP lib:../ssl/tls13_enc.c:419:
---
no peer certificate available
---
No client certificate CA names sent
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 160 bytes and written 309 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_CHACHA20_POLY1305_SHA256
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---

Though in the LXC container:
SSL-Session:
    Protocol : TLSv1.3
    Cipher : TLS_CHACHA20_POLY1305_SHA256

However, back in the host, if I don't force CHACHA:
SSL-Session:
    Protocol : TLSv1.3
    Cipher : TLS_AES_128_GCM_SHA256

Revision history for this message
John A Meinel (jameinel) wrote :

To summarize my testing:

On AWS, if I set the fips preruncmd and then restart the instance, I do get the host machine refusing to connect over CHACHA, even though the server supports it (and it can be negotiated by a client inside an LXD instance inside that VM where I didn't set up the fips enablement.)

Its plausible that because Curtin works differently than preruncmd, that the instance is coming up with the CHACHA cipher disabled, but even under those circumstances, juju the server is happy to support TLS_AES_128_GCM_SHA256 or TLS_AES_256_GCM_SHA384

I was unable to arrange a situation where the juju controller refused to negotiate TLS_AES_128_GCM_SHA256 and 90% of the time it seemed to prefer it (from the PoV of openssl s_client)

Note, I also tested with curl, and unless I forced the VM that had fips enablement run, it would successfully download the binaries:

ubuntu@ip-172-30-4-160:~$ curl -sSf --connect-timeout 20 --noproxy '*' --insecure -o xxx.tgz https://172.30.2.156:17070/model/c75854bd-8875-45d1-819c-91b782b4d77d/tools/3.1.6-ubuntu-amd64 ; echo $?
0

ubuntu@ip-172-30-4-160:~$ curl -sSf --connect-timeout 20 --noproxy '*' --insecure --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -o xxx.tgz https://172.30.2.156:17070/model/c75854bd-8875-45d1-819c-91b782b4d77d/tools/3.1.6-ubuntu-amd64 ; echo $?
curl: (35) error:0607B0C8:digital envelope routines:EVP_CipherInit_ex:disabled for FIPS
35

Revision history for this message
John A Meinel (jameinel) wrote :

Just as another summary:

on the host vm:
$ sudo pro enable fips-updates --assume-yes
$ curl --tls13-ciphers CHACHA # succeeds
$ sudo reboot now
$ curl --tls13-ciphers CHACHA # fails
$ curl # succeeds
$ lxc exec test-focal bash
$$ curl --tls13-ciphers CHACHA # succeeds
$$ sudo pro enable fips-updates --assume-yes
$$ sudo reboot now
$$ curl --tls13-ciphers # fails
$$ curl # succeeds

So only in the circumstances that I

a) enable fips in the container/VM that I'm currently running
b) reboot that container/VM
c) force curl to use CHACHA only

Does it actually fail. In all the other permutations curl is happy to download from the controller.

So we'll certainly need clearer indication of how we are getting the system into this state, because we absolutely should be falling back to TLS_AES_128_GCM_SHA256 automatically.

Revision history for this message
Jeff Hillman (jhillman) wrote : Re: [Bug 2002841] Re: Juju agent cannot download from FIPS enabled LXD containers in Openstack Deployment
Download full text (5.2 KiB)

If I understand your process correctly, the reboot is what is putting you
into a true FIPS environment. Which is what is happening when we use
MAAS/curtin to FIPS enable and deploy machines.

The CHACHA cipher is being rejected.

On Fri, Oct 6, 2023, 4:23 PM John A Meinel <email address hidden>
wrote:

> So at a minimum, this was user error. It seems that openssl s_client
> -cipher is for TLS v1.2, since we are testing TLS v1.3 you have to use:
> $ openssl s_client i -connect 54.91.83.198:17070
>
> I was able to bootstrap to Jammy on AWS and saw that I was able to
> connect using chacha.
>
> After bootstrapping a Jammy controller, I then added a model to test more
> VMs using the preload:
> $ juju model-defaults --file config.yaml
> $ juju add-model test
> Added 'test' model on aws/us-east-1 with credential 'jam-aws' for user
> 'admin'
> $ juju model-config cloudinit-userdata
> preruncmd:
> - pro refresh config
> - pro attach MYSECRET
> - pro enable usg --assume-yes
> - pro enable fips-updates --assume-yes
>
>
> Then as that machine started, I did confirm that the fips steps ran in
> /var/log/cloud-init-output.log
> ...
> Visit https://ubuntu.com/security/certifications/docs/usg for the next
> steps
> One moment, checking your subscription first
> Updating package lists
> Installing FIPS Updates packages
> FIPS Updates enabled
> A reboot is required to complete install.
> + install -D -m 644 /dev/null /var/lib/juju/nonce.txt
>
>
> However, it successfully downloads the agent binaries and gets set up (it
> is slow, but it works).
>
> Now it did say that I needed to reboot to take effect, but also:
> $ openssl s_client -ciphersuites "TLS_CHACHA20_POLY1305_SHA256"
> 172.30.2.156:17070
> ...
> ---
> Post-Handshake New Session Ticket arrived:
> SSL-Session:
> Protocol : TLSv1.3
> Cipher : TLS_CHACHA20_POLY1305_SHA256
> ...
>
>
> I even create a Focal LXD container on that machine, and by default it
> negotiated:
> SSL-Session:
> Protocol : TLSv1.3
> Cipher : TLS_AES_128_GCM_SHA256
> but if I forced it, I could get:
> SSL-Session:
> Protocol : TLSv1.3
> Cipher : TLS_CHACHA20_POLY1305_SHA256
>
>
> Now, post reboot things did change:
> ubuntu@ip-172-30-4-160:~$ openssl s_client -ciphersuites
> "TLS_CHACHA20_POLY1305_SHA256" 172.30.2.156:17070
> CONNECTED(00000003)
> 140197754987776:error:0607B0C8:digital envelope
> routines:EVP_CipherInit_ex:disabled for FIPS:../crypto/evp/evp_enc.c:227:
> 140197754987776:error:14202006:SSL routines:derive_secret_key_and_iv:EVP
> lib:../ssl/tls13_enc.c:419:
> ---
> no peer certificate available
> ---
> No client certificate CA names sent
> Server Temp Key: ECDH, P-256, 256 bits
> ---
> SSL handshake has read 160 bytes and written 309 bytes
> Verification: OK
> ---
> New, TLSv1.3, Cipher is TLS_CHACHA20_POLY1305_SHA256
> Secure Renegotiation IS NOT supported
> Compression: NONE
> Expansion: NONE
> No ALPN negotiated
> Early data was not sent
> Verify return code: 0 (ok)
> ---
>
> Though in the LXC container:
> SSL-Session:
> Protocol : TLSv1.3
> Cipher : TLS_CHACHA20_POLY1305_SHA256
>
>
> However, back in the host, if I don't force CHACHA:
> SSL-Session:
...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (5.7 KiB)

Because we determined that fips doesn't actually take effect until the machine is restarted, I switched up and wrote this config.yaml:

vpc-id: vpc-5aaf123f
cloudinit-userdata: |
  preruncmd:
    - pro refresh config
    - pro attach C14EW5D6fYetRNEKcSBztWHE8E1WWi
    - pro enable usg --assume-yes
    - pro enable fips-updates --assume-yes
    - if [ -e /var/lib/restarted ]; then echo already restarted; else touch /var/lib/restarted; echo restarting; reboot now; fi

It worked as expected, rebooting one time after installing fips, but otherwise moved forward and did the rest of the juju initialization steps. I bootstrapped with:

$ juju bootstrap aws jam-aws --config config.yaml --bootstrap-series focal --model-default config.yaml --agent-version 3.1.5

Which forced the controller to 3.1.5, but on the controller I could run:
curl -sSf --connect-timeout 20 --noproxy '*' --insecure --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -o xxx.tgz https://localhost:17070/model/c75854bd-8875-45d1-819c-91b782b4d77d/tools/3.1.5-ubuntu-amd64 ; echo $?
curl: (35) error:0607B0C8:digital envelope routines:EVP_CipherInit_ex:disabled for FIPS
35
$ curl -sSf --connect-timeout 20 --noproxy '*' --insecure -o xxx.tgz https://localhost:17070/model/c75854bd-8875-45d1-819c-91b782b4d77d/tools/3.1.5-ubuntu-amd64 ; echo $?
0

So I could see that Curl on the controller machine did, indeed, not like CHACHA, but was happy to do other ciphers. Using that same setup (reboot just after installing and enabling fips) I then added a new model

$ juju add-model test
$ juju model-config cloudinit-userdata
preruncmd:
  - pro refresh config
  - pro attach MYSECRET
  - pro enable usg --assume-yes
  - pro enable fips-updates --assume-yes
  - if [ -e /var/lib/restarted ]; then echo already restarted; else touch /var/lib/restarted; echo restarting; reboot now; fi
$ juju deploy ubuntu-lite --base ubuntu@20.04

I then watched as it got set up. fips enablement is a bit slow (about 5 min to get through all the steps before we get to reboot)

I could see that it did get restarted as the new machine came up, before it downloaded the agent binaries
Installing FIPS Updates packages
FIPS Updates enabled
A reboot is required to complete install.
restarting
+ install -D -m 644 /dev/null /var/lib/juju/nonce.txt
+ echo machine-0:168ea4a8-5886-4a81-84ed-f05966f9f49c

(and at that line 'restarting' I was connected in another terminal over ssh, and I got kicked out and had to wait a bit before I could reconnect.)

And post reboot (and finishing initialization) I can confirm that forcing the cipher fails as desired:
$ curl -sSf --connect-timeout 20 --noproxy '*' --insecure --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -o xxx.tgz https://172.30.2.150:17070/model/db124619-d82d-4ca7-8777-b3998dc4252b/tools/3.1.5-ubuntu-amd64; echo $? curl: (35) error:0607B0C8:digital envelope routines:EVP_CipherInit_ex:disabled for FIPS
35

I then went forward and added an LXD unit of the same application:
$ juju add-unit ubuntu-lite --to lxd:0

$ juju status
...
ubuntu-lite/1 waiting allocating 0/lxd/0 waiting for machine
...
0/lxd/0 pending pending ubuntu@20.04 ...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

While watching in the LXC container, it felt like I saw the commands post 'reboot now' echo even though it was supposed to be rebooting. So I changed my config.yaml to:

$ cat config.yaml
vpc-id: vpc-5aaf123f
cloudinit-userdata: |
  preruncmd:
    - pro refresh config
    - pro attach MYSECRET
    - pro enable usg --assume-yes
    - pro enable fips-updates --assume-yes
    - if [ -e /var/lib/restarted ]; then echo already restarted; else touch /var/lib/restarted; echo restarting; reboot now; sleep 30; fi

And initiated:
$ juju bootstrap aws jam-aws --config config.yaml --bootstrap-series focal --model-default config.yaml --agent-version 3.1.5

On the controller machine, it disconnects SSH right away:
FIPS Updates enabled
A reboot is required to complete install.
restarting
Connection to 54.84.245.226 closed by remote host.
Connection to 54.84.245.226 closed.

However, after doing that the bootstrap fails. Specifically I end up with:
A reboot is required to complete install.
2023-10-06 22:47:02.113978556+00:00
restarting
Cloud-init v. 23.2.2-0ubuntu0~20.04.1 running 'modules:final' at Fri, 06 Oct 2023 22:44:51 +0000. Up 21.40 seconds.

And with bootstrap --debug I see:
18:48:03 DEBUG juju.provider.common bootstrap.go:668 connection attempt for 52.23.188.21 failed: /var/lib/juju/nonce.txt does not exist

So it seems that if you reboot while cloud init is running it doesn't resume its script. (we are missing the install .../nonce.txt line, which allows bootstrap to continue.)

Revision history for this message
John A Meinel (jameinel) wrote :

So I found a trick to actually cause cloud-init to run again after rebooting.

$ cat config.yaml
vpc-id: vpc-5aaf123f
cloudinit-userdata: |
  preruncmd:
    - date --rfc-3339=ns
    - pro refresh config
    - pro attach MYSECRET
    - pro enable usg --assume-yes
    - pro enable fips-updates --assume-yes
    - date --rfc-3339=ns
    - if [ -e /var/lib/restarted ]; then echo already restarted; else touch /var/lib/restarted; echo resetting cloud init and restarting; cloud-init clean; reboot now; sleep 30; fi

It does mean that the machine ends up regenerating its SSH key twice. But at least now I'm sure that by the time the script runs, fips really is enabled.

Fetching Juju agent version 3.1.5 for amd64
+ n=1
+ true
+ echo Attempt 1 to download agent binaries from 'https://172.30.2.226:17070/model/4076c43f-10e8-458d-8362-6141da95feb0/tools/3.1.5-ubuntu-amd64'...\n
Attempt 1 to download agent binaries from 'https://172.30.2.226:17070/model/4076c43f-10e8-458d-8362-6141da95feb0/tools/3.1.5-ubuntu-amd64'...

+ curl -sSf --connect-timeout 20 --noproxy * --insecure -o /var/lib/juju/tools/3.1.5-ubuntu-amd64/tools.tar.gz https://172.30.2.226:17070/model/4076c43f-10e8-458d-8362-6141da95feb0/tools/3.1.5-ubuntu-amd64
uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),118(netdev)
+ echo Agent binaries downloaded successfully.

And then post reboot I see:

Building dependency tree...
Reading state information...
cpu-checker is already the newest version (0.7-1.1).
curl is already the newest version (7.68.0-1ubuntu2.19).
tmux is already the newest version (3.0a-2ubuntu0.4).
ubuntu-fan is already the newest version (0.12.13ubuntu0.1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2023-10-06 23:18:25.060138285+00:00
Successfully processed your pro configuration.
This machine is already attached to 'Canonical - staff'
To use a different subscription first run: sudo pro detach.
One moment, checking your subscription first
Ubuntu Security Guide is already enabled.
See: sudo pro status
One moment, checking your subscription first
FIPS Updates is already enabled.
See: sudo pro status
2023-10-06 23:18:35.286029430+00:00
already restarted
+ install -D -m 644 /dev/null /var/lib/juju/nonce.txt

so it is clear that it *did* restart and came up with pro enabled. But the later line (in the container):

+ curl -sSf --connect-timeout 20 --noproxy * --insecure -o /var/lib/juju/tools/3.1.5-ubuntu-amd64/tools.tar.gz https://172.30.2.226:17070/model/4076c43f-10e8-458d-8362-6141da95feb0/tools/3.1.5-ubuntu-amd64
uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),118(netdev)
+ echo Agent binaries downloaded successfully.
Agent binaries downloaded successfully.

Revision history for this message
John A Meinel (jameinel) wrote :

Note that in another container on that same VM:

$ sudo lxc launch juju/ubuntu@20.04/amd64 test-focal
$ sudo lxc exec test-focal bash
root@test-focal:~#
root@test-focal:~# curl -sSf --connect-timeout 20 --noproxy * --insecure --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -o xxx.tgz https://172.30.2.226:17070/model/4076c43f-10e8-458d-8362-6141da95feb0/tools/3.1.5-ubuntu-amd64; echo $?
0

So while CHACHA20 is refused on the VM itself, and refused in the container that is spawned by Juju (because the cloud-init tells it to fail). But each container needs to have fips enabled, because all the other containers happily allow it to work.

That negates the "it is the kernel of the host preventing it from working" because if I don't pro enable fips and reboot, the container happily negotates CHACHA.

Taking it one step further, the only situation I can think of is if there was a proxy that was configured, and that proxy was configured such that if the remote side claims to support CHACHA then it hard requires CHACHA to be used. *That* would cause the behavior that we are seeing.

Because the Juju controller does advertise CHACHA but it is happy to switch to any of the other TLS v1.3 ciphers, and all the other tools are also happy to switch ciphers.

But if I was a site that said "I must have FIPS" maybe I could create a proxy (possibly transparent / MITM) that forces cipher selection to pick ones that are outside of FIPS support if things claim to support outside of FIPS and so force things to break. I'm really stretching on that one, though.

Revision history for this message
John A Meinel (jameinel) wrote :

After much back and forth, we've only managed to replicate this on site, and ultimately found a workaround by injecting a '/root/.curlrc' file that contains:
--tls13-ciphers TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256

By removing TLS_CHACHA20_POLY1305_SHA256 from the list of preferred ciphers when cloud-init gets to the point where it needs to curl down the agent binaries, it doesn't try to use CHACHA and then fail, but instead continues onward.

Revision history for this message
Jeff Hillman (jhillman) wrote :

Unsubscribed Field Critical. I don't believe this was a juju bug WRT to the CHACHA cipher we were able to work around by forcing the curl client to default to the TLS_AES_* cipher suite and this allowed the juju binaries to be downloaded.

It is likely something specific to this environment causing that.

Revision history for this message
Jeff Hillman (jhillman) wrote :

Specifically, the fix was to add the following to the juju model config

```
      cloudinit-userdata: |
        preruncmd:
          - echo 'tls13-ciphers = TLS_AES_256_GCM_SHA384' > /root/.curlrc
          - echo 'tls13-ciphers = TLS_AES_256_GCM_SHA384' > /home/ubuntu/.curlrc
```

Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :

SQA attempted to reproduce this error, however using Juju 3.1.6 it was not possible. Nonetheless, if we used Juju 2.9.45 we could somewhat reproduce; the agent did not communicate back to the server.

Here are the two test runs made, for reference:

Juju 3.1.6: https://solutions.qa.canonical.com/testruns/97df0598-d14b-4ddf-ad48-764c13a48b90
Artifacts for 3.1.6: https://oil-jenkins.canonical.com/artifacts/97df0598-d14b-4ddf-ad48-764c13a48b90/index.html

Juju 2.9.45: https://solutions.qa.canonical.com/testruns/d8071c86-c094-41b1-bbb5-c3feaa643b11
Artifacts for 2.9.45: https://oil-jenkins.canonical.com/artifacts/d8071c86-c094-41b1-bbb5-c3feaa643b11/index.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.