juju-core

Deploying service to bootstrap node causes debug-log to spew messages

Bug #1211147 reported by Jonathan Davies on 2013-08-12

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Fix Released	High	Andrew Wilkins	juju-core 1.13.2

Bug Description

In the unlikely event of deploying a service to the bootstrap node machine, debug-log starts uncontrollably spewing messages.

I believe this is because the remote syslogging starts a logging loop on itself.

See original description

Related branches

lp:~axwalk/juju-core/lp1211147-machine-isstateserver

Merged into lp:~go-bot/juju-core/trunk at revision 1664

Juju Engineering: Pending requested 2013-08-15

lp:~axwalk/juju-core/lp1211147-terminate-rsyslog-rule-take2

Merged into lp:~go-bot/juju-core/trunk at revision 1682

Juju Engineering: Pending requested 2013-08-19

Jonathan Davies (jpds) on 2013-08-12

description:

updated

Revision history for this message

John A Meinel (jameinel) wrote on 2013-08-12:

There are some things that could easily end up on machine-0 until we get containers sorted out. So this is actually a reasonably big deal.

Changed in juju-core:
importance:	Undecided → High
status:	New → Triaged

Revision history for this message

Dave Cheney (dave-cheney) wrote on 2013-08-14:

I think this is broken even when we don't use hulk smash.

Revision history for this message

Dave Cheney (dave-cheney) wrote on 2013-08-14:

Ok, here is the skinny

on hp cloud, unrelated to this issue, rsyslog is not writing log lines to all-machines.log properly. The log file has no \n delimiters.

On ec2 I've verified the problem, and the fix,

sudo rm /etc/rsyslog/26-juju* && sudo service rsyslog restart

fixes the issue.

I'll open a separate error for the strange rsyslog format on hpcloud

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2013-08-14:

I think it's a simple matter of modifying the provisioner to first check if the machine being provisioned on is a state server. If it is, just don't write out the rsyslog forwarding config for added units.

Andrew Wilkins (axwalk) on 2013-08-14

Changed in juju-core:
assignee:	nobody → Andrew Wilkins (axwalk)

Andrew Wilkins (axwalk) on 2013-08-14

Changed in juju-core:
status:	Triaged → In Progress

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2013-08-19:

I think I may have misdiagnosed the issue. I was looking into this again this morning, and thinking about it again: I couldn't see *why* there should be a feedback loop in the current setup.

Deploying a unit to the bootstrap machine just creates a new rsyslog conf file that forwards to the bootstrap node's UDP listener, which writes it to all-machines.log. The all-machines.log listener does not ever forward logs, so there should be no chance of a feedback loop.

I *think* the issue is a lack of "& ~" between the two :syslogtag rules in 25-juju.conf. I put this in, and the all-machines.log file settles down. "& ~" tells rsyslog that any messages that matched this rule should not be processed any further. What I don't understand is why this causes the log to continue growing; I would just expect duplicates of all the remote messages.

Andrew Wilkins (axwalk) on 2013-08-20

Changed in juju-core:
milestone:	none → 1.13.2

Andrew Wilkins (axwalk) on 2013-08-20

Changed in juju-core:
status:	In Progress → Fix Committed

Dave Cheney (dave-cheney) on 2013-08-24

Changed in juju-core:
status:	Fix Committed → Fix Released

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-22:

This was still replicated by Liam using 1.13.2 so it looks like our original fix was not sufficient.

Changed in juju-core:
milestone:	1.13.2 → none
status:	Fix Released → Triaged

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-22:

So I'm a bit confused by bug #1228239. A few open questions:
1) He uses 'juju bootstrap --upload-tools" for a 1.13.2. I wouldn't think you would need --upload-tools for an officially released version
2) This bug claims to have been fixed in exactly 1.13.2, which would mean the fix was incomplete. Unless he is running with a working tree that is actually pre 1.13.2 final release.
3) I'm unable to reproduce the issue with 1.14.1, at least on Amazon. He was running on "Prod Stack" which could mean that he has a slightly different Ubuntu image.

I'm going to see if I can somehow revert my tools to 1.13.2 and reproduce. (Which may leave us with 'it was fixed in a later version'.)

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-22:

I downloaded the official 1.13.2 tarball from here:
https://launchpad.net/juju-core/trunk/1.13.2/+download/juju-core_1.13.2.tar.gz

And followed the same steps as described in bug #1228239. I was unable to reproduce the failure. I see 2 ping requests every minute, but nothing that would generate 1GB in 10 Minutes.

My only guesses are that either Liam was using 1.13.2 "from source" and was using a pre-release version, or the image of Ubuntu they have available does something different with rsyslog.

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-22:

I *was* able to reproduce this issue with juju-core 1.13.1 downloaded from here:
https://launchpad.net/juju-core/trunk/1.13.1/+download/juju-core_1.13.1.tar.gz

So I'm going to mark this back as Fix Released in 1.13.2 and assume that Liam was just running a prerelease 1.13.2 version.

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-22:

#10

Since I was able to reproduce this with an official 1.13.1 build and not with 1.13.2, I'll mark this as Fix Released for 1.13.2.

Changed in juju-core:
milestone:	none → 1.13.2
status:	Triaged → Fix Released

Revision history for this message

Liam Young (gnuoy) wrote on 2013-10-08:

#11

I have upgraded an environment to 1.14.1 and then deployed a subordinate to the bootstrap node and I got the crazy log spew again. I then did a fresh deployed of the environment on 1.14.1 and I didn't hit the issue. Obviously this is concerning because the bug appears to be fixed, when checking tools revno on an upgraded env, but is still liable to explode.

Revision history for this message

John A Meinel (jameinel) wrote on 2013-10-09: Re: [Bug 1211147] Re: Deploying service to bootstrap node causes debug-log to spew messages

#12

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-10-08 16:27, Liam Young wrote:
> I have upgraded an environment to 1.14.1 and then deployed a
> subordinate to the bootstrap node and I got the crazy log spew
> again. I then did a fresh deployed of the environment on 1.14.1 and
> I didn't hit the issue. Obviously this is concerning because the
> bug appears to be fixed, when checking tools revno on an upgraded
> env, but is still liable to explode.
>

So the fix is in how we configure /etc/rsyslog when creating a new
instance. So it *won't* be fixed with upgrading to 1.14 (because the
rules were already written).

I'm fine documenting that the bug is fixed, but I don't know if it is
worth expending a lot of effort to fix it.

Given it only happens when deploying to the bootstrap node, you would
need to decide after the fact that you want to upgrade and then deploy
to machine 0.

Most times, if you want something on machine 0 you've decided that at
bootstrap time (because you don't want to create a new node).

I realize stuff like juju-gui are often deployed to machine 0 and done
at a later time. But all those circumstances work if you use newer
tools to start with.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJVEp8ACgkQJdeBCYSNAANS+gCgqx3QE+hratffd4sqR+hlt8PQ
ghUAoJQYGH2TQm0Mv9r/mUY7AUYXtweD
=8QVq
-----END PGP SIGNATURE-----

Revision history for this message

Liam Young (gnuoy) wrote on 2013-10-10:

#13

But doesn't the problem of upgrade tools not applying the fix still exist ? If another issue with rsyslog is found then upgrade tools will still fail to apply the fix. Shouldn't upgrade tools apply the syslog fix and HUP the process as part of the upgrade?

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1228239

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.