Bug #1830228 “juju storage mounting itself over itself” : Bugs : Canonical Juju

Revision history for this message

Junien F (axino) wrote on 2019-05-23:

#1

Note : this could be from an old version, as we upgraded to 2.5.4 fairly recently.

Richard Harding (rharding) on 2019-05-23

Changed in juju:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2019-05-23:

#2

@Junien Fridrick (axino),

This is fascinating! I wonder how it got into this stage - I wish we had an audit log of commands that were run...

Do you have any idea what a repro scenario would be?

Revision history for this message

Junien F (axino) wrote on 2019-05-24:

#3

@anastasia-macmodd,

Sadly, I don't know how to repro. Is it normal that juju storage on MAAS just binds mount the filesystem over itself ? I don't see the point of doing that - it doesn't offer anything other than complexity. And it can potentially hide files.

Anyway, I figured I'd file this bug in case anyone had seen it before. We have a 2.6.2 controller that's not exhibiting this. Although it's been rebooted a few days ago and doesn't have anything mounted. So perhaps juju changed at some point and decided that for now, mounting storage with the MAAS provider is not useful ?

Revision history for this message

Joseph Phillips (manadart) wrote on 2019-05-24:

#4

Possibly related: https://bugs.launchpad.net/juju/+bug/1830326

Revision history for this message

Benjamin Allot (ballot) wrote on 2020-08-10:

#5

We have a jenkins unit mounting randomly its juju sotrage over itself once in a while.

I've currently disabled the jujud service to avoid it but happy to debug it if you want to.
Or tell me what I can get you as report.

I tried "juju-crashdupmp" but ...

$ juju-crashdump -o ballot/crashdump
Command "juju status --format=yaml" failed
Command "juju storage --format=yaml" failed
Traceback (most recent call last):
  File "/usr/local/bin/juju-crashdump", line 9, in <module>
    load_entry_point('jujucrashdump==0.0.0', 'console_scripts', 'juju-crashdump')()
  File "/usr/local/lib/python2.7/dist-packages/jujucrashdump/crashdump.py", line 407, in main
    filename = collector.collect()
  File "/usr/local/lib/python2.7/dist-packages/jujucrashdump/crashdump.py", line 286, in collect
    self.run_addons()
  File "/usr/local/lib/python2.7/dist-packages/jujucrashdump/crashdump.py", line 184, in run_addons
    juju_status = yaml.load(open('juju_status.yaml', 'r'))
IOError: [Errno 2] No such file or directory: 'juju_status.yaml'

Regards

Revision history for this message

Loïc Gomez (kotodama) wrote on 2021-12-02:

#6

Hi,

I know there's not much data, but we're still having this issue with 2.9.18-ubuntu-amd64, on two distinct Jenkins units. For now we've disabled all juju agents on the unit going crazy.

Let us know if we can help in any way!

Thank you

Revision history for this message

Junien F (axino) wrote on 2021-12-02:

#7

(to add to the comment above, we end up with a /proc/self/mountinfo having around 32k lines, which eventually makes the machine extremely slow, to a point where we're getting "BUG: soft lockup - CPU#2 stuck for 22s" messages on the console, at which point the instance has to be rebooted)

Ian Booth (wallyworld) on 2021-12-13

Changed in juju:
milestone:	none → 2.9.23

Revision history for this message

Harry Pidcock (hpidcock) wrote on 2021-12-13:

#8

JUJU-345

Canonical Juju QA Bot (juju-qa-bot) on 2022-01-12

Changed in juju:
milestone:	2.9.23 → 2.9.24

Canonical Juju QA Bot (juju-qa-bot) on 2022-02-01

Changed in juju:
milestone:	2.9.24 → 2.9.25

Canonical Juju QA Bot (juju-qa-bot) on 2022-02-15

Changed in juju:
milestone:	2.9.25 → 2.9.26

Canonical Juju QA Bot (juju-qa-bot) on 2022-03-09

Changed in juju:
milestone:	2.9.26 → 2.9.27

Canonical Juju QA Bot (juju-qa-bot) on 2022-03-18

Changed in juju:
milestone:	2.9.27 → 2.9.28

Harry Pidcock (hpidcock) on 2022-03-28

Changed in juju:
milestone:	2.9.28 → 2.9-next

Harry Pidcock (hpidcock) on 2023-01-12

Changed in juju:
milestone:	2.9-next → 2.9.40

Canonical Juju QA Bot (juju-qa-bot) on 2023-02-17

Changed in juju:
milestone:	2.9.40 → 2.9.41

Canonical Juju QA Bot (juju-qa-bot) on 2023-02-27

Changed in juju:
milestone:	2.9.41 → 2.9.42

Canonical Juju QA Bot (juju-qa-bot) on 2023-03-01

Changed in juju:
milestone:	2.9.42 → 2.9.43

Canonical Juju QA Bot (juju-qa-bot) on 2023-06-02

Changed in juju:
milestone:	2.9.43 → 2.9.44

Canonical Juju QA Bot (juju-qa-bot) on 2023-07-10

Changed in juju:
milestone:	2.9.44 → 2.9.45

Revision history for this message

Steve Langasek (vorlon) wrote on 2023-08-31:

#9

Just saw this issue on a juju-deployed instance. The 'ubuntu' charm seems to have a /srv/data for some reason that is a bind mount of the rootfs. And there were *16383* bind mounts of the same path under /var/lib/juju/storage.

Noticed because the number of mounts accumulates over time and drastically slows down ssh into the system.

Revision history for this message

Junien F (axino) wrote on 2023-09-01:

#10

@vorlon Interesting - with which Juju version did you see it ?

Thanks

Revision history for this message

Steve Langasek (vorlon) wrote on 2023-09-01:

#11

Junien, this is juju 2.9.44-ubuntu-amd64 on the client with juju 2.60.2 on the server.

It's been suggested this may be a problem with the specific revision of the ubuntu charm that we happened to deploy and that the problem is hidden in the current version.

Revision history for this message

Niels Robin-Aubertin (nrobinaubertin) wrote on 2023-09-07:

#12

It is happening for us too.

We have a postgresql machine unit becoming extremely slow and inspection of /proc/self/mountinfo (when logged in as `ssh ubuntu@IP`) shows a lot of lines like `1098 263120 252:1 /var/lib/juju/storage/rootfs/0/0 /var/snap/charmed-postgresql/common rw,relatime shared:1 - ext4 /dev/vda1 rw,discard,errors=remount-ro`.

I can't have an exact count because grep -c hangs but if I limit that to the first 5k lines of the file, we have 4968 matching '/var/lib/juju/storage'.

juju client version: 2.9.44-ubuntu-amd64
juju controller version: 2.9.44
ubuntu (postgres machine unit): 22.04.2 LTS

Tom Haddon (mthaddon) on 2023-09-08

tags:

added: canonical-is

Revision history for this message

Tom Haddon (mthaddon) wrote on 2023-09-08:

#13

We've now seen the same thing in a different juju model using revision 189 from the 8.0/edge channel of the mysql charm.

Revision history for this message

John A Meinel (jameinel) wrote on 2023-09-08:

#14

Download full text (4.0 KiB)

From the Juju side, we can see that we have code that calls AttachFilesystems, that had the contract if source and target were stable, that it should be a 'noop'.

As far as we can tell, the rootfs code has always been wrong with its check for 'is it already mounted'.

The assumption is that if you do:
mount --bind $SOURCE $TARGET
that you would have
df --output=source $TARGET
return (in the second line) $SOURCE

eg:
$ mount --bind /var/lib/juju/data /srv/data
would then have:
$ df --output=source /srv/data
Filesystem
/var/lib/juju/data

In practice, what you actually get is:
Filesystem
/dev/sda1

That is a bad assumption on our part, but has been a bad assumption for all of the life of juju.

The other issue that caused us to notice this is that there was a change between juju 2.9.30 and 2.9.44 (I have to dig to find the exact patch, but it was by Simon Richardson).

That patch caused us to be more persistent with mounts, because of some other issues (specifically the order that volumes/blocks show up was not guaranteed, so we needed to try again). That bug fix is causing us to wake up periodically (current period unknown) and essentially call 'mount --bind /source/ /target' over and over again.

To get a workaround, we just need to interrupt something.
Either

1) Put a `df` into the $PATH of `jujud` such that `df --output source $TARGET` for a bind mount can return $SOURCE rather than SOURCE_DEVICE
2) Put a `mount` into $PATH of `jujud` so that `mount --bind $SOURCE $TARGET` is a no-op when source is already bind mounted at $TARGET.

For (1) we still haven't found what command could be run for bind mounts. Because reading /proc/mounts and 'df' always give the device, and we need to figure out the relative mount location.

https://unix.stackexchange.com/questions/18048/list-only-bind-mounts/346444#346444

Seems to say that the information might be available from '/proc/self/mountinfo'

From what I can tell, the issue is that you get 'mount the filesystem from this block device, taking into account this relative path' but you never have a way to unfold it unambiguously.

We did a test where '/home' was a mounted filesystem '/dev/nvme0n1p3' we then did 'mkdir a; mkdir b; mount --bind a b' and we could get:
198 29 259:3 / /home rw,relatime shared:117 - ext4 /dev/nvme0n1p3 rw
...
6314 198 259:3 /ian/a /home/ian/b rw,relatime shared:117 - ext4 /dev/nvme0n1p3 rw
6443 6314 259:3 /ian/a /home/ian/b rw,relatime shared:117 - ext4 /dev/nvme0n1p3 rw
6581 198 259:3 /ian/a /home/ian/a rw,relatime shared:117 - ext4 /dev/nvme0n1p3 rw

Which says that '/dev/nvme0n1p3' is mounted at '/home' from the '/' of its filesystem, and then mounted again at '/home/ian/b' with '/ian/a' as its relative filesystem.

There are a few possibilities to give some answers, in increasing complexity but also increasing correctness.

1) 'mount --bind /home/ian/a /home/ian/b' looks at /proc/self/mountinfo and sees that there exists an exact entry for /home/ian/b already, and just refuses to do anything

2) 'mount --bind ...' looks at the same line, and sees that there is a relative directory mount (/ian/a) and a non-relative mount ('/'). It then tacks on '/ian/a' to '/h...

From the Juju side, we can see that we have code that calls AttachFilesystems, that had the contract if source and target were stable, that it should be a 'noop'.

As far as we can tell, the rootfs code has always been wrong with its check for 'is it already mounted'.

The assumption is that if you do:
 mount --bind $SOURCE $TARGET
that you would have
 df --output=source $TARGET
return (in the second line) $SOURCE

eg:
 $ mount --bind /var/lib/juju/data /srv/data
would then have:
 $ df --output=source /srv/data
 Filesystem
 /var/lib/juju/data

In practice, what you actually get is:
 Filesystem
 /dev/sda1

That is a bad assumption on our part, but has been a bad assumption for all of the life of juju.

The other issue that caused us to notice this is that there was a change between juju 2.9.30 and 2.9.44 (I have to dig to find the exact patch, but it was by Simon Richardson).

That patch caused us to be more persistent with mounts, because of some other issues (specifically the order that volumes/blocks show up was not guaranteed, so we needed to try again). That bug fix is causing us to wake up periodically (current period unknown) and essentially call 'mount --bind /source/ /target' over and over again.

To get a workaround, we just need to interrupt something.
Either

1) Put a `df` into the $PATH of `jujud` such that `df --output source $TARGET` for a bind mount can return $SOURCE rather than SOURCE_DEVICE
2) Put a `mount` into $PATH of `jujud` so that `mount --bind $SOURCE $TARGET` is a no-op when source is already bind mounted at $TARGET.

For (1) we still haven't found what command could be run for bind mounts. Because reading /proc/mounts and 'df' always give the device, and we need to figure out the relative mount location.

https://unix.stackexchange.com/questions/18048/list-only-bind-mounts/346444#346444

Seems to say that the information might be available from '/proc/self/mountinfo'

From what I can tell, the issue is that you get 'mount the filesystem from this block device, taking into account this relative path' but you never have a way to unfold it unambiguously.

We did a test where '/home' was a mounted filesystem '/dev/nvme0n1p3' we then did 'mkdir a; mkdir b; mount --bind a b' and we could get:
198 29 259:3 / /home rw,relatime shared:117 - ext4 /dev/nvme0n1p3 rw
...
6314 198  259:3 /ian/a /home/ian/b rw,relatime shared:117 - ext4 /dev/nvme0n1p3 rw
6443 6314 259:3 /ian/a /home/ian/b rw,relatime shared:117 - ext4 /dev/nvme0n1p3 rw
6581 198  259:3 /ian/a /home/ian/a rw,relatime shared:117 - ext4 /dev/nvme0n1p3 rw

Which says that '/dev/nvme0n1p3' is mounted at '/home' from the '/' of its filesystem, and then mounted again at '/home/ian/b' with '/ian/a' as its relative filesystem.

There are a few possibilities to give some answers, in increasing complexity but also increasing correctness.

1) 'mount --bind /home/ian/a /home/ian/b' looks at /proc/self/mountinfo and sees that there exists an exact entry for /home/ian/b already, and just refuses to do anything

2) 'mount --bind ...' looks at the same line, and sees that there is a relative directory mount (/ian/a) and a non-relative mount ('/'). It then tacks on '/ian/a' to '/home' to realize that '/home/ian/a' is already mounted to '/home/ian/b' and exits.

3) Note though, that you might need to recurse, and there are ways that /home/ian/a/ian/a could be the desired source, and that would actually result in the same content at '/home/ian/b'.
I don't think we care about this case enough to worry about it.

Ultimately, the juju code needs
a) To do the recursion based on '/proc/self/mountinfo' so that it can determine that '/var/lib/juju/...' is already mounted at '/srv/data'
b) Fix the bug that is waking us up over and over to try and mount it again.

If we only did (b) without (a) then whenever an agent got restarted, it would cause more mounts. This would still be far better than every X seconds, but still result in multiple mounts of the same thing.

If we only did (a) we would end up with a correct filesystem each time, but we would be inefficient and trying over and over again.

Revision history for this message

Tom Haddon (mthaddon) wrote on 2023-09-09:

#15

found some interesting logs in syslog https://pastebin.canonical.com/p/vDHx8YNsQr/ (sorry, internal only). It looks to me like something happened here:
```
Sep 7 08:02:59 juju-9431ad-prod-events-db-7 snapd[247744]: storehelpers.go:773: cannot refresh:
Sep 7 08:02:59 juju-9431ad-prod-events-db-7 snapd[247744]: no snap revision available as specified: "charmed-postgresql"
```
and then shortly after that we get repeating messages like this:
```
Sep 7 08:31:56 juju-9431ad-prod-events-db-7 systemd[1]: snapd.service: start operation timed out. Terminating.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: snapd.service: State 'stop-sigterm' timed out. Killing.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: snapd.service: Killing process 1888670 (snapd) with signal SIGKILL.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: snapd.service: Main process exited, code=killed, status=9/KILL
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: snapd.service: Failed with result 'timeout'.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: Failed to start Snap Daemon.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: snapd.service: Consumed 3min 43.451s CPU time.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 3.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: Stopped Snap Daemon.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: snapd.service: Consumed 3min 43.451s CPU time.
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 systemd[1]: Starting Snap Daemon...
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 snapd[1889451]: overlord.go:272: Acquiring state lock file
Sep 7 08:33:27 juju-9431ad-prod-events-db-7 snapd[1889451]: overlord.go:277: Acquired state lock file
```
None since the instance was rebooted. I've given John access to the full syslog in case that's helpful.

Revision history for this message

Ian Booth (wallyworld) wrote on 2023-09-12:

#16

I have a working prototype fix. I'll land something in the next day or so - there's a few things to tidy up first.

Changed in juju:
status:	Triaged → In Progress
assignee:	nobody → Ian Booth (wallyworld)

Revision history for this message

Ian Booth (wallyworld) wrote on 2023-09-14:

#17

https://github.com/juju/juju/pull/16246

Ian Booth (wallyworld) on 2023-09-14

Changed in juju:
status:	In Progress → Fix Committed

Revision history for this message

John A Meinel (jameinel) wrote on 2023-09-14:

#18

Python mount wrapper to disallow bind mounts that already exist Edit (2.2 KiB, text/plain)

This is a 'mount' command that detects when it is being run in '--bind' mode, and otherwise just execs to the normal mount that can be found in $PATH.
It checks to see if the desired source and target appear to already exist, in which case it prints that it already exists, and then exits happily. In all other cases it just execv into mount and lets mount handle everything.

I did test on my machine, seeing that it did the right thing when asking to mount a relative path to another relative path, as well as absolute paths, as well as when there were nested mounts (a mounted to b, and a mounted to b/c).

I tested that I could install it as '/usr/local/bin/mount' and it would be executed and it would find the real mount.
I tested that it allowed mounting a directory to itself (which is allowed) but no further.

I'm not sure where to put it exactly in $PATH, but you should be able to look at the environment for the jujud process in case you can find a location in PATH that jujud looks at, but wouldn't otherwise impact any other commands.
That said, it is programmed defensively to thunk to mount for just about everything, so is unlikely to break things if it was put in system paths.

Revision history for this message

Alex Lutay (taurus) wrote on 2023-09-15:

#19

Thank you for the lightning fix in Juju 2.9.45!

Is it possible to include it in 3.1.6 as well?
The 3.1.6 is the primary Juju version for Data team: tests/demos/recommendations/...

Tnx!

Revision history for this message

Ian Booth (wallyworld) wrote on 2023-09-16:

#20

Yes, it will be in 3.1.6 also :-)

Revision history for this message

Steve Langasek (vorlon) wrote on 2023-09-19:

#21

Have deployed this workaround as /usr/local/sbin/mount; after 24 hours, there are no redundant mounts:

$ mount|grep srv
/dev/vda1 on /srv/data type ext4 (rw,relatime,discard,errors=remount-ro)
$ mount|grep juju
$

Canonical Juju QA Bot (juju-qa-bot) on 2023-09-27

Changed in juju:
status:	Fix Committed → Fix Released

Canonical Juju

juju storage mounting itself over itself

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
Canonical Juju	Fix Released	High	Ian Booth	Canonical Juju 2.9.45
3.1	Fix Released	High	Ian Booth	Canonical Juju 3.1.6
3.2	Fix Released	High	Ian Booth	Canonical Juju 3.2.4