Deployed charm contains future revision code

Bug #2060271 reported by Christopher Bartz

This bug report will be marked for expiration in 15 days if no further activity occurs. (find out why)

6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
High
Unassigned

Bug Description

We observed a bug in juju 3.1.6 / openstack cloud where a machine charm unit contained code that should not be included in the current revision of the charm.

The initial situation was a deployed github runner (charmhub.io/github-runner) charm A using the latest/stable revision 140. And next to this another github-runner charm application B is deployed in the same model on latest/edge revision 141.

The deployment steps were:

1. Deploy (using juju terraform provider) with a terraform plan that moves A to revision 141. The same terraform plan moves B from latest/edge rev 141 to 145.

2. After successfully applying, A's juju status shows that it's on revision 141 latest/stable, the charm is in active/idle state, and the juju debug log output shows that the correct charm file is being downloaded:

$ zgrep 'from API server' /var/log/juju/unit-xlarge-53-2024-03-29T08-52-54.644.log.gz
2024-03-28 10:51:53 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-141 from API server

3. ssh into a unit of A. The unit contains code introduced in rev 143.

Post-mortem analysis shows that if we run `juju download github-runner --revision 141`, the downloaded charm file contains the correct code (not code from a future revision).

It may also be worth mentioning that the charm rev 141 was promoted to stable just before the deployment (the deployment was on 28 March and the promotion of the rev to stable also https://github.com/canonical/github-runner-operator/releases/tag/rev141).

The same bug was observed in January and reported to the juju matrix channel: https://matrix.to/#/!xzmWHtGpPfVCXKivIh:ubuntu.com/$JH6ktghpkZD2cVxXTDUk4Y9qSUhvfph1VFqAGtFGPvA?via=ubuntu.com&via=matrix.org&via=mx.aouss.it

Tags: canonical-is
Revision history for this message
John A Meinel (jameinel) wrote :

`juju info github-runner` tells me:
channels: |
  latest/stable: 141 2024-03-28 (141) 20MB amd64 ubuntu@22.04
  latest/candidate: ↑
  latest/beta: 145 2024-03-28 (145) 20MB amd64 ubuntu@22.04
  latest/edge: 145 2024-03-25 (145) 20MB amd64 ubuntu@22.04

So I have the question about "140 is the latest/stable revision" as I can't see how that would be true.

Is the issue that there was an application A that at the time it was deployed, latest/stable was at revision 140, but now latest/stable is at 141?
And that at the point application B was deployed from latest/edge at 141, but it is now at 145?

Revision history for this message
Christopher Bartz (bartz) wrote :

Sorry, the channel information can be confusing. So in the original situation, Charm A was at rev 140 and Charm B was at rev 141.

Then the update in step 1 moved charm A to rev 141 and charm B to rev 145. The problem is that although juju gave the correct status output, charm A contained code that should not be present in rev 141.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.9 KiB)

I did try to reproduce with these steps:

snap refresh juju --channel 3.1
juju bootstrap lxd lxd31
juju add-model gh-test
juju deploy github-runner --channel latest/stable --revision 140 gh-stable
juju deploy github-runner --channel latest/edge --revision 141 gh-edge

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
gh-test lxd31 localhost/localhost 3.1.7 unsupported 10:39:25-04:00

App Version Status Scale Charm Channel Rev Exposed Message
gh-edge blocked 1 github-runner latest/edge 141 no Missing path configuration
gh-stable blocked 1 github-runner latest/stable 140 no Missing path configuration

Unit Workload Agent Machine Public address Ports Message
gh-edge/0* blocked idle 1 10.139.162.189 Missing path configuration
gh-stable/0* blocked idle 0 10.139.162.98 Missing path configuration

Machine State Address Inst id Base AZ Message
0 started 10.139.162.98 juju-c9bfb2-0 ubuntu@22.04 Running
1 started 10.139.162.189 juju-c9bfb2-1 ubuntu@22.04 Running

juju ssh 0
$$ sudo -i
$$ cd /var/log/juju
# grep -rnI "downloading "
machine-0.log:106:2024-04-05 14:37:14 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-140
unit-gh-stable-0.log:11:2024-04-05 14:37:14 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-140 from API server

juju ssh 1
$$ sudo -i
$$ cd /var/log/juju
root@juju-c9bfb2-1:/var/log/juju# grep -rnI "downloading "
machine-1.log:118:2024-04-05 14:37:21 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-141
unit-gh-edge-0.log:11:2024-04-05 14:37:21 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-141 from API server

So both of those certainly believe that they are downloading and installing the versions that you have asked for.

I downloaded r 140, 141, 143, 145 to try and figure out what the differences are. If I strip it down too much:
$ diff -x "venv" -x "*.so" -x "*.pyc" -ur github-runner_r140 github-runner_r141
diff -x venv -x '*.so' -x '*.pyc' -ur github-runner_r140/manifest.yaml github-runner_r141/manifest.yaml
--- github-runner_r140/manifest.yaml 2024-03-06 06:05:32.000000000 -0500
+++ github-runner_r141/manifest.yaml 2024-03-12 04:53:28.000000000 -0400
@@ -9,5 +9,5 @@
   - amd64
   channel: '22.04'
   name: ubuntu
-charmcraft-started-at: '2024-03-06T06:03:07.684053Z'
+charmcraft-started-at: '2024-03-12T04:51:25.680319Z'
 charmcraft-version: 2.5.5

Which seems to say that the only thing that differed between 140 and 141 was maybe a charmcraft.yaml file (that doesn't end up in the rendered charm) that maybe had different dependencies?

Pulling that back a bit I get:
--- github-runner_r140/venv/wheel/vendored/vendor.txt 2024-03-06 06:05:24.000000000 -0500
+++ github-runner_r141/venv/wheel/vendored/vendor.txt 2024-03-12 04:53:18.000000000 -0400
@@ -1 +1 @@
-packaging==23.2
+packaging==24.0
diff -x '*.so' -x '*.pyc' -ur github-runner_r14...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

root@juju-c9bfb2-0:/var/lib/juju/agents/unit-gh-stable-0/charm# grep "started-at" manifest.yaml
charmcraft-started-at: '2024-03-06T06:03:07.684053Z'
root@juju-c9bfb2-0:/var/lib/juju/agents/unit-gh-stable-0/charm# grep -r "Weak hash algorithm" venv/wheel
grep: venv/wheel/__pycache__/wheelfile.cpython-310.pyc: binary file matches
venv/wheel/wheelfile.py: "Weak hash algorithm ({}) is not permitted by PEP "

So gh-stable-0 should, indeed be at r140

root@juju-c9bfb2-1:/var/lib/juju/agents/unit-gh-edge-0/charm# grep "started-at" manifest.yaml
charmcraft-started-at: '2024-03-12T04:51:25.680319Z'
root@juju-c9bfb2-1:/var/lib/juju/agents/unit-gh-edge-0/charm# grep -r "Weak hash algorithm" venv/wheel
grep: venv/wheel/__pycache__/wheelfile.cpython-310.pyc: binary file matches
venv/wheel/wheelfile.py: f"Weak hash algorithm ({algorithm}) is not permitted by "

And this also puts gh-edge-0 as being correctly at r141

Trying to compare vs 143 I get:
$ diff -x "venv" -x "*.so" -x "*.pyc" -ur github-runner_r140 github-runner_r143
...
diff -x venv -x '*.so' -x '*.pyc' -ur github-runner_r140/src/runner_type.py github-runner_r143/src/runner_type.py
--- github-runner_r140/src/runner_type.py 2024-03-06 06:02:42.000000000 -0500
+++ github-runner_r143/src/runner_type.py 2024-03-21 03:09:30.000000000 -0400
@@ -37,6 +37,7 @@
     Attributes:
         app_name: Application name of the charm.
         issue_metrics: Whether to issue metrics.
+ labels: Custom runner labels.
         lxd_storage_path: Path to be used as LXD storage.
         name: Name of the runner.
         path: GitHub repository path in the format '<owner>/<repo>', or the GitHub organization
@@ -48,6 +49,7 @@

     app_name: str
     issue_metrics: bool
+ labels: tuple[str]
     lxd_storage_path: Path
     name: str
     path: GithubPath
Only in github-runner_r143/templates: openstack-userdata.sh.j2

So I could look for those things.
And vs 145:
$ diff -x "venv" -x "*.so" -x "*.pyc" -ur github-runner_r143 github-runner_r145
...
diff -x venv -x '*.so' -x '*.pyc' -ur github-runner_r143/scripts/build-openstack-image.sh github-runner_r145/scripts/build-openstack-image.sh
--- github-runner_r143/scripts/build-openstack-image.sh 2024-03-21 03:09:30.000000000 -0400
+++ github-runner_r145/scripts/build-openstack-image.sh 2024-03-25 03:27:12.000000000 -0400
@@ -146,6 +146,13 @@
 ln -s /usr/bin/python3 /usr/bin/python

 # Uninstall unattended-upgrades, to avoid lock errors when unattended-upgrades is active in the runner
+DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl stop apt-daily.timer
+DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl disable apt-daily.timer
+DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl mask apt-daily.service
+DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl stop apt-daily-upgrade.timer
+DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl disable apt-daily-upgrade.timer
+DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl mask apt-daily-upgrade.service
+DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl daemon-reload
 DEBIAN_FRONTEND=noninteractive /usr/bin/apt-get purge unattended-upgrades -yq

Revision history for this message
John A Meinel (jameinel) wrote :

If I then trigger a refresh of gh-stable:
$ juju refresh gh-stable
Added charm-hub charm "github-runner", revision 141 in channel latest/stable, to the model
no change to endpoints in space "alpha": cos-agent, debug-ssh

And internally it says:
root@juju-c9bfb2-0:~# cd /var/log/juju
root@juju-c9bfb2-0:/var/log/juju# grep -rnI "downloading "
machine-0.log:106:2024-04-05 14:37:14 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-140
machine-0.log:111:2024-04-05 15:03:57 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-141
unit-gh-stable-0.log:11:2024-04-05 14:37:14 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-140 from API server
unit-gh-stable-0.log:135:2024-04-05 15:03:57 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-141 from API server

(and 141 is the tip of stable)
root@juju-c9bfb2-0:/var/lib/juju/agents/unit-gh-stable-0/charm# grep "started-at" manifest.yaml
charmcraft-started-at: '2024-03-12T04:51:25.680319Z'
root@juju-c9bfb2-0:/var/lib/juju/agents/unit-gh-stable-0/charm# grep -r "Weak hash algorithm" venv/wheel
grep: venv/wheel/__pycache__/wheelfile.cpython-310.pyc: binary file matches
venv/wheel/wheelfile.py: f"Weak hash algorithm ({algorithm}) is not permitted by "

And that does have the 141 items, but
root@juju-c9bfb2-0:/var/lib/juju/agents/unit-gh-stable-0/charm# ls scripts/
build-image.sh repo_policy_compliance_service.py

Doesn't have "scripts/build-openstack-image.sh" so it cannot be 143 or 145.
That looks like 141 to me.

Revision history for this message
Tom Haddon (mthaddon) wrote :

John, you don't seem to have reproduced the issue there - it looks to me like that's doing what we'd want. But we are definitely seeing code from revisions later than 141 in the deployed model. I think you'd mentioned a command we could run to get the contents of ch:amd64/jammy/github-runner-141 from the controller in question to confirm if it's cached the incorrect copy?

tags: added: canonical-is
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.2 KiB)

And before triggering and edge refresh, I'm going to -edge and seeing what it has:
root@juju-c9bfb2-1:/var/log/juju# grep -rnI "downloading "
machine-1.log:118:2024-04-05 14:37:21 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-141
unit-gh-edge-0.log:11:2024-04-05 14:37:21 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-141 from API server

No indication that anything should have changed
root@juju-c9bfb2-1:/var/lib/juju/agents/unit-gh-edge-0/charm# grep -r "Weak hash algorithm" venv/wheel
grep: venv/wheel/__pycache__/wheelfile.cpython-310.pyc: binary file matches
venv/wheel/wheelfile.py: f"Weak hash algorithm ({algorithm}) is not permitted by "
root@juju-c9bfb2-1:/var/lib/juju/agents/unit-gh-edge-0/charm# ls scripts/
build-image.sh repo_policy_compliance_service.py

Still at 141.

Doing a refresh:
$ juju refresh gh-edge
Added charm-hub charm "github-runner", revision 145 in channel latest/edge, to the model
no change to endpoints in space "alpha": cos-agent, debug-ssh
root@juju-c9bfb2-1:/var/log/juju# grep -rnI "downloading "
machine-1.log:118:2024-04-05 14:37:21 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-141
machine-1.log:123:2024-04-05 15:08:21 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-145
machine-1.log:126:2024-04-05 15:08:25 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-145
machine-1.log:129:2024-04-05 15:08:28 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/github-runner-145
unit-gh-edge-0.log:11:2024-04-05 14:37:21 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-141 from API server
unit-gh-edge-0.log:159:2024-04-05 15:08:21 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-145 from API server
unit-gh-edge-0.log:165:2024-04-05 15:08:25 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-145 from API server
unit-gh-edge-0.log:171:2024-04-05 15:08:28 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/github-runner-145 from API server

In this case, we are triggering a download of Unit agent <- controller 3 times because the controller hasn't finished its download from charmhub.

After that:
root@juju-c9bfb2-1:/var/lib/juju/agents/unit-gh-edge-0/charm# ls scripts/
build-lxd-image.sh build-openstack-image.sh repo_policy_compliance_service.py

We are certainly at least at 143. catting build-openstack-image.sh:
# Uninstall unattended-upgrades, to avoid lock errors when unattended-upgrades is active in the runner
DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl stop apt-daily.timer
DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl disable apt-daily.timer
DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl mask apt-daily.service
DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl stop apt-daily-upgrade.timer
DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl disable apt-daily-upgrade.timer
DEBIAN_FRONTEND=noninteractive /usr/bin/systemctl mask apt-daily-upgrade.service
DEBIAN_FRONTEND=noni...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

So far we haven't been able to reproduce this (neither myself nor Chris).
If we do get a way to reproduce, then we can take this on.

Changed in juju:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
John A Meinel (jameinel) wrote :

It took a bit, but here is a way to get the curl command for the charm content:

cd /var/lib/juju/agents/unit-*/
cat agent.conf

user=`awk '$1 == "tag:" {print($2)}' agent.conf`
password=`awk '$1 == "apipassword:" {print($2)}' agent.conf`
address=`grep -A1 apiaddresses agent.conf | awk '$1 == "-" {print($2)}'`
model=`awk '$1 == "model:" {print($2)}' agent.conf | sed 's/-/\//'`
charmurl="ch:amd64/jammy/github-runner-140"
filename="${charmurl##*/}.zip"

curl -o ${filename} -G --insecure --user "${user}:${password}" "https://${address}/${model}/charms?file=%2A" --data-urlencode "url=${charmurl}"

unzip -t ${filename}

Revision history for this message
John A Meinel (jameinel) wrote :

Maybe a bit cleaner this way:

# You can get this value from the log line about "downloading"
charmurl="ch:amd64/jammy/github-runner-140"
filename="/tmp/${charmurl##*/}.zip"

cd /var/lib/juju/agents/unit*

# all these values are read from agent.conf
user=`awk '$1 == "tag:" {print($2)}' agent.conf`
password=`awk '$1 == "apipassword:" {print($2)}' agent.conf`
address=`grep -A1 apiaddresses agent.conf | awk '$1 == "-" {print($2)}'`
model=`awk '$1 == "model:" {print($2)}' agent.conf | sed 's/-/\//'`

curl -G --insecure \
  -o ${filename} \
  --user "${user}:${password}" \
  --data-urlencode "url=${charmurl}" \
  --data-urlencode "file=*" \
  "https://${address}/${model}/charms"

# and test that it downloaded correctly
unzip -t ${filename}

Revision history for this message
Tom Haddon (mthaddon) wrote :

I've tried this and initially I got a 403 because curl doesn't respect NO_PROXY env variables, and this model has the following model config set (I mention it here just in case that's relevant):

```
$ juju model-config | grep proxy
apt-ftp-proxy default ""
apt-http-proxy default ""
apt-https-proxy default ""
apt-no-proxy default ""
ftp-proxy default ""
http-proxy default ""
https-proxy default ""
juju-ftp-proxy default ""
juju-http-proxy model http://squid.internal:3128
juju-https-proxy model http://squid.internal:3128
juju-no-proxy model 127.0.0.1,localhost,::1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.canonical.com,.launchpad.net,.internal,.jujucharms.com,.nip.io
no-proxy default 127.0.0.1,localhost,::1
proxy-ssh default false
snap-http-proxy default ""
snap-https-proxy default ""
snap-store-proxy default ""
snap-store-proxy-url default ""
```

Adding --noproxy '*' to the curl command I am able to download the file, and per the steps below can confirm this isn't version 141 of the charm (scripts/build-openstack-image.sh was only introduced in revision 143 https://github.com/canonical/github-runner-operator/compare/rev142...rev143):

```
root@juju-53f11a-prod-github-runner-626:/tmp# echo $charmurl
ch:amd64/jammy/github-runner-141
root@juju-53f11a-prod-github-runner-626:/tmp# curl --noproxy '*' -G --insecure -o ${filename} --user "${user}:${password}" --data-urlencode "file=*" --data-urlencode "url=${charmurl}
" "https://${address}/${model}/charms"
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 19.5M 100 19.5M 0 0 106M 0 --:--:-- --:--:-- --:--:-- 107M
root@juju-53f11a-prod-github-runner-626:/tmp# unzip -t github-runner-141.zip | grep 'testing: scripts'
    testing: scripts/build-openstack-image.sh OK
    testing: scripts/build-lxd-image.sh OK
    testing: scripts/repo_policy_compliance_service.py OK
```

Revision history for this message
Harry Pidcock (hpidcock) wrote :

What was the difference in the code that was on the unit vs what was in 141? Was it extra files (not in 141) or was it the contents of those files (with the same names in both revisions)?

From memory the deployer tries to be smart and modifies files/folders in-place, so possibly a file was removed in 141 and left behind from a previous revision?

Revision history for this message
Tom Haddon (mthaddon) wrote :

I thought it might be easiest to attach the contents of the charm directory on one of the units that says it's currently on revision 141 so you can see exactly what's in there. As far as I can tell it's exactly the same as revision 144.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.