Comment 6 for bug 1668123

Brad Marshall (brad-marshall) wrote :

> Sidenote) the 18GB of /var/lib/juju/db (with backups, of backups, of backups)
> was not helpful, I'll need to talk to sosreport people about that. This is
> what made the report so huge.

I did notice that, but I figured getting you all of the data was better than
fiddling around trying to not include that part, and maybe missing bits.
It'd be nice to have better control over it so we don't have to throw the juju
state db around if we don't need to.

> 1) It appears that deputy systemd was installed on the machine and
> subsequently upgraded:
> 2017-02-12 01:30:24 upgrade systemd:amd64 204-5ubuntu20.22 204-5ubuntu20.24

> However, there are no logs available as to what/who/why 20.22 deputy systemd
> was installed.

Interesting. I don't really know why it would have been installed.

> 2) Have you tried to use snapd on trusty on that host? Has anything else
> tried to do that? (e.g. juju manual provider or some such?!)

No, I don't believe anyone has, I don't see any evidence of that.

> 3) To recover the system, you should $ apt remove systemd; and reboot.
> However that is the workaround

Ok, I'll organise removing the package and rebooting it.

> 4) Is this nested lxc? or errors inside the instances?
> E.g. from logs I see failures to start lxc instances, but I don't see logs
> for failing to start instances for some reason.

This is LXCs on a KVM. The errors are in /var/log/lxc, its odd
that the sosreport didn't include it.

> 5) Why was lxc downgraded/upgraded/downgraded multiple times?

We were trying to work out if LP#1656280 was related somehow, the
errors were occuring before we did that.

> 6) Are the error messages from this machine? Whilst I do see that systemd is
> installed, and dsystemd cgroup is mounted, I am failing to find the logs for
> any lxc failures related to starting them.

The errors are in /var/log/lxc - see the next reply.

> Is there /var/log/lxc or some such that you could share privately? for
> some reason it was not part of the sosreport.

I've uploaded it to https://private-fileshare.canonical.com/~bradm/lp1668123/lp1668123-var-log-lxc.tar.gz

> cgmanager should not be interracting with dsystemd.
> systemd should not be present on this system (as hwe kernel is not in use, nor is snapd).
> lxc should work irrespective of dsystemd.

Its odd that stopping and starting cgmanager would let LXC work then.

> I will setup trusty, with GA kernel, lxc1, deploy any charm (e.g. ubuntu),
> and install deputy systemd to try to reproduce this test case.

> I wonder if upstart systemd job should be neutered, unless snapd is present,
> and we are booted with hwe kernel.

It does sound like a good idea if we're going to have failures like what we
saw.