Deploying service to bootstrap node causes debug-log to spew messages

Bug #1211147 reported by Jonathan Davies
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins

Bug Description

In the unlikely event of deploying a service to the bootstrap node machine, debug-log starts uncontrollably spewing messages.

I believe this is because the remote syslogging starts a logging loop on itself.

Related branches

Jonathan Davies (jpds)
description: updated
Revision history for this message
John A Meinel (jameinel) wrote :

There are some things that could easily end up on machine-0 until we get containers sorted out. So this is actually a reasonably big deal.

Changed in juju-core:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Dave Cheney (dave-cheney) wrote :

I think this is broken even when we don't use hulk smash.

Revision history for this message
Dave Cheney (dave-cheney) wrote :

Ok, here is the skinny

on hp cloud, unrelated to this issue, rsyslog is not writing log lines to all-machines.log properly. The log file has no \n delimiters.

On ec2 I've verified the problem, and the fix,

sudo rm /etc/rsyslog/26-juju* && sudo service rsyslog restart

fixes the issue.

I'll open a separate error for the strange rsyslog format on hpcloud

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I think it's a simple matter of modifying the provisioner to first check if the machine being provisioned on is a state server. If it is, just don't write out the rsyslog forwarding config for added units.

Andrew Wilkins (axwalk)
Changed in juju-core:
assignee: nobody → Andrew Wilkins (axwalk)
Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I think I may have misdiagnosed the issue. I was looking into this again this morning, and thinking about it again: I couldn't see *why* there should be a feedback loop in the current setup.

Deploying a unit to the bootstrap machine just creates a new rsyslog conf file that forwards to the bootstrap node's UDP listener, which writes it to all-machines.log. The all-machines.log listener does not ever forward logs, so there should be no chance of a feedback loop.

I *think* the issue is a lack of "& ~" between the two :syslogtag rules in 25-juju.conf. I put this in, and the all-machines.log file settles down. "& ~" tells rsyslog that any messages that matched this rule should not be processed any further. What I don't understand is why this causes the log to continue growing; I would just expect duplicates of all the remote messages.

Andrew Wilkins (axwalk)
Changed in juju-core:
milestone: none → 1.13.2
Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
Changed in juju-core:
status: Fix Committed → Fix Released
Revision history for this message
John A Meinel (jameinel) wrote :

This was still replicated by Liam using 1.13.2 so it looks like our original fix was not sufficient.

Changed in juju-core:
milestone: 1.13.2 → none
status: Fix Released → Triaged
Revision history for this message
John A Meinel (jameinel) wrote :

So I'm a bit confused by bug #1228239. A few open questions:
1) He uses 'juju bootstrap --upload-tools" for a 1.13.2. I wouldn't think you would need --upload-tools for an officially released version
2) This bug claims to have been fixed in exactly 1.13.2, which would mean the fix was incomplete. Unless he is running with a working tree that is actually pre 1.13.2 final release.
3) I'm unable to reproduce the issue with 1.14.1, at least on Amazon. He was running on "Prod Stack" which could mean that he has a slightly different Ubuntu image.

I'm going to see if I can somehow revert my tools to 1.13.2 and reproduce. (Which may leave us with 'it was fixed in a later version'.)

Revision history for this message
John A Meinel (jameinel) wrote :

I downloaded the official 1.13.2 tarball from here:
  https://launchpad.net/juju-core/trunk/1.13.2/+download/juju-core_1.13.2.tar.gz

And followed the same steps as described in bug #1228239. I was unable to reproduce the failure. I see 2 ping requests every minute, but nothing that would generate 1GB in 10 Minutes.

My only guesses are that either Liam was using 1.13.2 "from source" and was using a pre-release version, or the image of Ubuntu they have available does something different with rsyslog.

Revision history for this message
John A Meinel (jameinel) wrote :

I *was* able to reproduce this issue with juju-core 1.13.1 downloaded from here:
  https://launchpad.net/juju-core/trunk/1.13.1/+download/juju-core_1.13.1.tar.gz

So I'm going to mark this back as Fix Released in 1.13.2 and assume that Liam was just running a prerelease 1.13.2 version.

Revision history for this message
John A Meinel (jameinel) wrote :

Since I was able to reproduce this with an official 1.13.1 build and not with 1.13.2, I'll mark this as Fix Released for 1.13.2.

Changed in juju-core:
milestone: none → 1.13.2
status: Triaged → Fix Released
Revision history for this message
Liam Young (gnuoy) wrote :

I have upgraded an environment to 1.14.1 and then deployed a subordinate to the bootstrap node and I got the crazy log spew again. I then did a fresh deployed of the environment on 1.14.1 and I didn't hit the issue. Obviously this is concerning because the bug appears to be fixed, when checking tools revno on an upgraded env, but is still liable to explode.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1211147] Re: Deploying service to bootstrap node causes debug-log to spew messages

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-10-08 16:27, Liam Young wrote:
> I have upgraded an environment to 1.14.1 and then deployed a
> subordinate to the bootstrap node and I got the crazy log spew
> again. I then did a fresh deployed of the environment on 1.14.1 and
> I didn't hit the issue. Obviously this is concerning because the
> bug appears to be fixed, when checking tools revno on an upgraded
> env, but is still liable to explode.
>

So the fix is in how we configure /etc/rsyslog when creating a new
instance. So it *won't* be fixed with upgrading to 1.14 (because the
rules were already written).

I'm fine documenting that the bug is fixed, but I don't know if it is
worth expending a lot of effort to fix it.

Given it only happens when deploying to the bootstrap node, you would
need to decide after the fact that you want to upgrade and then deploy
to machine 0.

Most times, if you want something on machine 0 you've decided that at
bootstrap time (because you don't want to create a new node).

I realize stuff like juju-gui are often deployed to machine 0 and done
at a later time. But all those circumstances work if you use newer
tools to start with.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJVEp8ACgkQJdeBCYSNAANS+gCgqx3QE+hratffd4sqR+hlt8PQ
ghUAoJQYGH2TQm0Mv9r/mUY7AUYXtweD
=8QVq
-----END PGP SIGNATURE-----

Revision history for this message
Liam Young (gnuoy) wrote :

But doesn't the problem of upgrade tools not applying the fix still exist ? If another issue with rsyslog is found then upgrade tools will still fail to apply the fix. Shouldn't upgrade tools apply the syslog fix and HUP the process as part of the upgrade?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.