sunbeam terraform apply fails behind proxy

Bug #2032783 reported by eblock@nde.ag
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Snap
Fix Committed
High
Unassigned

Bug Description

I wanted to try the sunbeam setup for the first time, following [1]. It's a virtual machine behind a proxy, environment variables are set like this:

root@sunbeam:~# cat /etc/environment
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
HTTP_PROXY="http://<IP>:<PORT>"
HTTPS_PROXY="http://<IP>:<PORT>"
http_proxy="http://<IP>:<PORT>"
https_proxy="http://<IP>:<PORT>"
NO_PROXY="localhost, <our_domain>, 172.17.2.0/24"
no_proxy="localhost, <our_domain>, 172.17.2.0/24"

The initial steps (install openstack snap, prepare-node-script) seem to work with the proxy, but it fails at terraform apply:

---snip---
root@sunbeam:~# sudo sunbeam cluster bootstrap --accept-defaults
terraform apply failed:
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # juju_application.sunbeam-machine will be created
  + resource "juju_application" "sunbeam-machine" {
      + constraints = (known after apply)
      + id = (known after apply)
      + model = "controller"
      + name = "sunbeam-machine"
      + placement = (known after apply)
      + principal = (known after apply)
      + trust = false
      + units = 0

      + charm {
          + channel = "latest/edge"
          + name = "sunbeam-machine"
          + revision = (known after apply)
          + series = "jammy"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.
juju_application.sunbeam-machine: Creating...
juju_application.sunbeam-machine: Still creating... [10s elapsed]
juju_application.sunbeam-machine: Still creating... [20s elapsed]
juju_application.sunbeam-machine: Still creating... [30s elapsed]

Error: resolving with preferred channel: Post "https://api.charmhub.io/v2/charms/refresh": dial tcp 185.125.188.54:443: i/o timeout

  with juju_application.sunbeam-machine,
  on main.tf line 29, in resource "juju_application" "sunbeam-machine":
  29: resource "juju_application" "sunbeam-machine" {

Error: Command '['/snap/openstack/236/bin/terraform', 'apply', '-auto-approve', '-no-color', '-parallelism=1']' returned non-zero exit status 1.
---snip---

This error usually is related to a direct attempt to reach the URL instead of using the proxy. Is there something else missing to honor the proxy?

[1] https://microstack.run/#get-started

Tags: proxy
Revision history for this message
eblock@nde.ag (eblock) wrote :

I also tried setting all the proxy variables in .local/share/juju/bootstrap-config.yaml (with and without "http://" prefix) but to no avail. Do I need to restart anything before that can work? I stopped and started juju (snap stop juju, snap start juju), doesn't make a difference.

James Page (james-page)
tags: added: proxy
Revision history for this message
Chris (undrh2o) wrote :

I'm experiencing the same problem - has anyone looked into this or resolved it?

Revision history for this message
Billy Olsen (billy-olsen) wrote :

@undrh2o - unfortunately there's not an official solution for sunbeam yet. However, the proxy settings is something we'll be tackling in the next few months.

Revision history for this message
Chris (undrh2o) wrote :

I found a poor workaround - after the failure below I add the proxy to juju with

juju model-config http-proxy=http://10.158.100.3:8080
juju model-config https-proxy=http://10.158.100.3:8080

then when I re-run the "sunbeam cluster bootstrap" - it will proceed past that point but it fails later (still debugging why)

####Failure###
#>sunbeam cluster bootstrap --role control --role compute --role storage --accept-defaults
⠴ Deploying Sunbeam Machine ... terraform apply failed:
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # juju_application.sunbeam-machine will be created
  + resource "juju_application" "sunbeam-machine" {
      + constraints = (known after apply)
      + id = (known after apply)
      + model = "controller"
      + name = "sunbeam-machine"
      + placement = (known after apply)
      + principal = (known after apply)
      + trust = false
      + units = 0

      + charm {
          + channel = "latest/edge"
          + name = "sunbeam-machine"
          + revision = (known after apply)
          + series = "jammy"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.
juju_application.sunbeam-machine: Creating...
juju_application.sunbeam-machine: Still creating... [10s elapsed]
juju_application.sunbeam-machine: Still creating... [20s elapsed]
juju_application.sunbeam-machine: Still creating... [30s elapsed]

Error: resolving with preferred channel: Post "https://api.charmhub.io/v2/charms/refresh": dial tcp 185.125.188.58:443: i/o timeout

  with juju_application.sunbeam-machine,
  on main.tf line 29, in resource "juju_application" "sunbeam-machine":
  29: resource "juju_application" "sunbeam-machine" {

Error: Command '['/snap/openstack/316/bin/terraform', 'apply', '-auto-approve', '-no-color']' returned non-zero exit status 1.

Revision history for this message
Guillaume Boutry (gboutry) wrote :

You seemed to have an issue with the terraform provider, which makes its own calls to the charmhub api, and therefore will not use juju model's proxy configuration.

You may try setting the environment variables:
export HTTP_PROXY=...
export HTTPS_PROXY=...

We get a copy of the environment variables (that we update), every time we issue a terraform command.

A more robust solution will be provided in the next few months, proxy settings are on the roadmap.

Changed in snap-openstack:
status: New → Triaged
Revision history for this message
Chris (undrh2o) wrote :

@gboutry Those environment variables are already set on all the hosts is there somewhere else I need to set them?

Revision history for this message
Guillaume Boutry (gboutry) wrote (last edit ):

Hi @undrh2o, sunbeam 2023.2/candidate supports proxy now, here you can find docs about enabling it:
https://microstack.run/docs/proxied-environment
https://microstack.run/docs/proxy-rules

Changed in snap-openstack:
importance: Undecided → High
status: Triaged → Fix Committed
Revision history for this message
eblock@nde.ag (eblock) wrote :

Thanks for making it support proxies. But I'm still struggling to create a cluster, although it seems to be a different issue now:

sunbeam@sunbeam:~$ sudo snap install openstack --channel=2023.2/edge
openstack (2023.2/edge) 2023.2 from Canonical✓ installed

# This prepare step needs to be executed as non-root user
sunbeam@sunbeam:~$ sudo sunbeam prepare-node-script | bash -x && newgrp snap_daemon
++ lsb_release -sc
+ '[' jammy '!=' jammy ']'
++ whoami
+ USER=sunbeam
[...]

# Bootstrapping fails with permission error
sunbeam@sunbeam:~$ sudo sunbeam cluster bootstrap --accept-defaults
⠋ Adding cloud to Juju client ... Error determining whether to skip the bootstrap process. Defaulting to not skip.
Traceback (most recent call last):
  File "/snap/openstack/479/lib/python3.10/site-packages/sunbeam/commands/juju.py", line 343, in is_skip
    juju_clouds = self.get_clouds(cloud_type, local=True)
  File "/snap/openstack/479/lib/python3.10/site-packages/sunbeam/commands/juju.py", line 115, in get_clouds
    clouds_from_juju_cmd = self._juju_cmd(*cmd)
  File "/snap/openstack/479/lib/python3.10/site-packages/sunbeam/commands/juju.py", line 92, in _juju_cmd
    process = subprocess.run(cmd, capture_output=True, text=True, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/snap/openstack/479/juju/bin/juju', 'clouds', '--client', '--format', 'json']' returned non-zero exit status 2.
ERROR stat .: permission denied

# User has sudo permissions:
sunbeam@sunbeam:~$ sudo cat /etc/sudoers.d/sunbeam
sunbeam ALL=(ALL) NOPASSWD:ALL

# Trying to bootstrap as root doesn't work, it seems to never finish
root@sunbeam:~# sunbeam cluster bootstrap --accept-defaults
The authenticity of host '172.17.2.48 (172.17.2.48)' can't be established.
ED25519 key fingerprint is SHA256:SVLS6wRNCco7bAG/svW/mxdVT0g38FVumVKbfhpmh1U.
This host key is known by the following other names/addresses:
    /root/.ssh/known_hosts:1: [hashed name]
Bootstrapping Juju onto machine

It's unclear to me how this is supposed to work, I'd appreciate any pointers. Should I create a separate Bug report for this (if it even is a bug)?

Revision history for this message
Guillaume Boutry (gboutry) wrote :

Can you file a different bug report ?

And can you run `juju clouds --debug --client` to get a better error message ?

Revision history for this message
eblock@nde.ag (eblock) wrote :

I started from scratch (snapshot of the VM), with user "ubuntu" instead of "sunbeam", now I get this error:

---snip---
ubuntu@sunbeam:~$ sunbeam cluster bootstrap --accept-defaults
⠧ Bootstrapping Juju onto machine ... Error bootstrapping Juju
Traceback (most recent call last):
  File "/snap/openstack/479/lib/python3.10/site-packages/sunbeam/commands/juju.py", line 559, in run
    process = subprocess.run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/snap/openstack/479/juju/bin/juju', 'bootstrap', 'sunbeam', 'sunbeam-controller', '--config', 'juju-http-proxy="http://IP:PORT"', '--config', 'snap-http-proxy="http://IP:PORT"', '--config', 'juju-https-proxy="http://IP:PORT"', '--config', 'snap-https-proxy="http://IP:PORT"', '--config', 'juju-no-proxy="localhost,10.1.0.0/16,127.0.0.1, DOMAIN,10.152.183.0/24, 172.17.2.0/24",localhost, DOMAIN,.svc']' returned non-zero exit status 1.
Creating Juju controller "sunbeam-controller" on sunbeam/default
Looking for packaged Juju agent version 3.4.2 for amd64
WARNING Got error requesting "https://streams.canonical.com/juju/tools/streams/v1/index2.sjson": Get "https://streams.canonical.com/juju/tools/streams/v1/index2.sjson": proxyconnect tcp: dial tcp: lookup "http: no such host
WARNING Got error requesting "https://streams.canonical.com/juju/tools/streams/v1/index2.sjson": Get "https://streams.canonical.com/juju/tools/streams/v1/index2.sjson": proxyconnect tcp: dial tcp: lookup "http: no such host
WARNING Got error requesting "https://streams.canonical.com/juju/tools/streams/v1/index2.sjson": Get "https://streams.canonical.com/juju/tools/streams/v1/index2.sjson": proxyconnect tcp: dial tcp: lookup "http: no such host
ERROR failed to bootstrap model: cannot read index data, attempt count exceeded: cannot access URL "https://streams.canonical.com/juju/tools/streams/v1/index2.sjson": Get "https://streams.canonical.com/juju/tools/streams/v1/index2.sjson": proxyconnect tcp: dial tcp: lookup "http: no such host

Error: Command '['/snap/openstack/479/juju/bin/juju', 'bootstrap', 'sunbeam', 'sunbeam-controller', '--config', 'juju-http-proxy="http://IP:PORT"', '--config', 'snap-http-proxy="http://IP:PORT"', '--config', 'juju-https-proxy="http://IP:PORT"', '--config', 'snap-https-proxy="http://IP:PORT"', '--config', 'juju-no-proxy="localhost,10.1.0.0/16,127.0.0.1, DOMAIN,10.152.183.0/24, 172.17.2.0/24",localhost, DOMAIN,.svc']' returned non-zero exit status 1.
---snip---

This looks like a parsing error, doesn't it?

Revision history for this message
Guillaume Boutry (gboutry) wrote :

Yes, it looks like the parsing included double quotes `"`. Which it shouldn't have, can you retry and make sure the environment it's reading from is not quoted ?

How did you pass the proxy values ? By answering the prompts with values quoted `"` ?

Revision history for this message
eblock@nde.ag (eblock) wrote :

I removed all quotes from /etc/environment (I didn't have any prompts) and retried after logging out and in again, it still tries it with quotes:

Error: Command '['/snap/openstack/479/juju/bin/juju', 'bootstrap', 'sunbeam', 'sunbeam-controller', '--config', 'juju-http-proxy="http://IP:PORT"', '--config', 'snap-http-proxy="http:// ... and so on

Is there any cache involved? How can I reset?

Revision history for this message
eblock@nde.ag (eblock) wrote :

If I run this command separately, I can create the controller and also bootstrap:

---snip---
ubuntu@sunbeam:~$ juju bootstrap sunbeam sunbeam-controller --config juju-http-proxy=http://IP:PORT --config snap-http-proxy=http://IP:PORT --config juju-https-proxy=http://IP:PORT --config snap-https-proxy=http://IP:PORT --config juju-no-proxy= ...

Creating Juju controller "sunbeam-controller" on sunbeam/default
Looking for packaged Juju agent version 3.4.2 for amd64
Located Juju agent version 3.4.2-ubuntu-amd64 at https://streams.canonical.com/juju/tools/agent/3.4.2/juju-3.4.2-linux-amd64.tgz
Installing Juju agent on bootstrap instance
Running machine configuration script...
Bootstrap agent now started
Contacting Juju controller at 172.17.2.48 to verify accessibility...

Bootstrap complete, controller "sunbeam-controller" is now available
Controller machines are in the "controller" model

Now you can run
        juju add-model <model-name>
to create a new model to deploy workloads

------

ubuntu@sunbeam:~$ sunbeam cluster bootstrap --accept-defaults
⠋ Deploying Sunbeam Machine ... terraform apply failed:

...
      + charm {
          + base = "ubuntu@22.04"
          + channel = "2023.2/stable"
          + name = "sunbeam-machine"
          + revision = (known after apply)
          + series = (known after apply)
        }
---snip---

I can see that terraform is aware of the proxy, but now this error shows up:

Unable to create application, got error: selecting releases: charm or bundle
not found for channel "2023.2/stable", base "amd64/ubuntu/22.04"
available releases are:
  channel "2023.2/candidate": available bases are: ubuntu@22.04
  channel "2023.2/beta": available bases are: ubuntu@22.04
  channel "2023.2/edge": available bases are: ubuntu@22.04
  channel "latest/edge": available bases are: ubuntu@22.04
  channel "2024.1/edge": available bases are: ubuntu@22.04

Is 2023.2/stable hard-coded somewhere? As mentioned in a previous comment, I used --channel=2023.2/edge when installing the openstack snap.

Revision history for this message
Guillaume Boutry (gboutry) wrote :

Is there any cache involved? <-- yes, there's a cache, resetting it is a bit involved, have you tried re-running bootstrap and answer differently ?

For the last issue, there's been a breaking change in sunbeam.

Now, every channel of the snap `openstack` will deploy the stable charms by default, there's still some friction because sunbeam machine did not get a stable release YET for 2023.2.

The new way to be able to test edge charms is to do: sunbeam -v cluster bootstrap -m /snap/openstack/current/etc/manifests/edge.yml

(/snap/openstack/current/etc/manifests/candidate.yml for candidate testing)

Revision history for this message
eblock@nde.ag (eblock) wrote :

Alright, I'm one step further after purging everything again. The sunbeam bootstrap still doesn't finish (at least not within an hour or so), and I do see a prompt asking for proxy settings. But when I run the 'juju bootstrap sunbeam sunbeam-controller ...' first, I get a little more progress with the sunbeam bootstrap command:

ubuntu@sunbeam:~$ sunbeam cluster bootstrap -m /snap/openstack/current/etc/manifests/edge.ymlConfigure proxy for access to external network resources? [y/n] (y): y
Enter value for http_proxy: (http://IP:PORT): http://IP:PORT
Enter value for https_proxy: (http://IP:PORT): http://IP:PORT
Enter value for no_proxy: (localhost, 172.17.2.0/24): localhost, 172.17.2.0/24
Management networks shared by hosts (CIDRs, separated by comma) (172.17.2.0/24): 172.17.2.0/24
MetalLB address allocation range (supports multiple ranges, comma separated) (10.20.21.10-10.20.21.20): 10.20.21.10-10.20.21.20

But now it seems to be stuck here:
⠏ Adding MicroK8S unit to machine ...

In the syslog I see a lot of these messages:

Apr 24 07:38:25 sunbeam microk8s.daemon-kubelite[9327]: E0424 07:38:25.318497 9327 kubelet.go:2855] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

But it's still running, so I'll wait for the result. But regarding proxy settings, I still don't think it works as expected, given that I have to run the juju command separately.

Revision history for this message
eblock@nde.ag (eblock) wrote :

Just to update the status, the command timed out:

⠼ Adding MicroK8S unit to machine ... Timed out while waiting for units microk8s/0 to be ready
Error: Timed out while waiting for units microk8s/0 to be ready

There are lots of different error messages in the syslog, but maybe that's a different bug.

Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Can you provide output for following commands:

sudo microk8s.kubectl get po -A
sudo snap services microk8s
sudo microk8s inspect # Upload the log file generated from this command

Revision history for this message
Guillaume Boutry (gboutry) wrote :

This might still be related, can you list all the pods in kube-system, and check their status ?

`kubectl get pods -n kube-system`
`kubectl describe -n kube-system pod/...`

From the error message you've given, it looks like the CNI images were not pulled. This might be due to proxy, or rate limiting (or something else).

Microk8s reads proxy configuration from /etc/environment.

Make sure your proxy allows these urls: https://microstack.run/docs/proxy-access

Revision history for this message
eblock@nde.ag (eblock) wrote :

root@sunbeam:~# sudo microk8s.kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-864597b5fd-zl5h5 0/1 Pending 0 37m
kube-system hostpath-provisioner-7df77bc496-w2mcd 0/1 Pending 0 32m
metallb-system controller-5f7bb57799-7jdmf 0/1 Pending 0 32m

root@sunbeam:~# sudo snap services microk8s
Service Startup Current Notes
microk8s.daemon-apiserver-kicker enabled active -
microk8s.daemon-apiserver-proxy enabled inactive -
microk8s.daemon-cluster-agent enabled active -
microk8s.daemon-containerd enabled active -
microk8s.daemon-etcd enabled inactive -
microk8s.daemon-flanneld enabled inactive -
microk8s.daemon-k8s-dqlite enabled active -
microk8s.daemon-kubelite enabled active -

I'm not aware of any blacklist on our proxy, so that shouldn't be an issue. The report is attached. I'm wondering if I should have "predicted" the ClusterIP (10.152.183.1) and added that to the no-proxy as well. In one of the previous attempts I *believe* I saw that it was added to no-proxy, but I'm not sure. I'm currently trying to tear it down with 'sunbeam cluster remove' but it's stuck as well. So maybe I'll need to rollback the snapshot.

Revision history for this message
eblock@nde.ag (eblock) wrote :

Alright, it looks better now, I added the IP range 10.152.183.0/24 to no_proxy variable, I didn't have to run the juju command manually. Now the pods get created successfully:

root@sunbeam:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system hostpath-provisioner-7df77bc496-sksdg 1/1 Running 0 117s
kube-system coredns-864597b5fd-8xgs4 1/1 Running 0 3m43s
metallb-system speaker-vvnfm 1/1 Running 0 115s
kube-system calico-node-xfhqc 1/1 Running 0 3m43s
metallb-system controller-5f7bb57799-cchdf 1/1 Running 0 115s
kube-system calico-kube-controllers-77bd7c5b-k9wjq 1/1 Running 0 3m43s

⠏ Deploying OpenStack Control Plane to Kubernetes (this may take a while) ... waiting for services to come online (0/24)

This takes some time (as expected), but after around 35 minutes, the bootstrap is complete and all pods are running. I'm going to configure it now.
One thing I can't really grasp is if the prepar script adds the ClusterIP to the no_proxy variable, I found it in the /etc/environment. But in the previous attempt it wasn't applied, at least that's how it seems to me. So there are still a couple of questions, but it seems to work in general.

Revision history for this message
eblock@nde.ag (eblock) wrote :

I'm getting closer, but the hypervisor doesn't start successfully, I believe it's still related to proxy settings because it logs:

root@sunbeam:~# snap logs openstack-hypervisor
2024-04-24T09:54:37Z nova-compute[516904]: 2024-04-24 09:54:37.518 516904 WARNING keystoneauth.identity.generic.base [None req-7b5a8aab-d612-448e-9282-6e688f664c4c - - - - - -] Failed to discover available identity versions when contacting http://10.20.21.13/openstack-keystone/v3. Attempting to parse version from URL.: keystoneauth1.exceptions.http.ServiceUnavailable: Service Unavailable (HTTP 503)

That is the internal keystone IP, but I'm able to run other openstack commands against the public IP after I updated 'sunbeam proxy set ...', adding 10.20.21.0/24 to no-proxy. But how can I tell the hypervisor to update its no-proxy settings?

Revision history for this message
Guillaume Boutry (gboutry) wrote :

It looks like it's getting a 503 from keystone, which does not indicate a proxy issue at first glance.

Can you check keystone logs with `kubectl logs -n openstack -c keystone keystone-0` ?

Revision history for this message
eblock@nde.ag (eblock) wrote :

Yeah, I'm already familiar with this (keystone) error, this is a proxy issue. Adding the keystone IP to no_proxy resolves that as well. I started from scratch again and now also the hypervisor got deployed successfully. I believe the proxy issue is resolved, the cloud is working, I just created an instance successfully. I just have to learn some more about the setup, where to look for log files, how to access instances and have a reasonable network configuration.
Thank you all for your help to figure this out, I really appreciate it!

Revision history for this message
Guillaume Boutry (gboutry) wrote :

Glad you manage to sort that out!

I opened a bug report specifically for this: https://bugs.launchpad.net/snap-openstack/+bug/2063424

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.