Comment 4 for bug 1720155

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1720155] Re: Juju agent is not communicating with the server.

Also, looking at http://ci.jujucharms.com/job/charm-storage/ I see only a
single success at ​a906cf9, all earlier revisions are marked as failure, so
I'm not sure if that is relevant or not.

There does seem to be something fishy going on with Mongo on Trusty. I see
the Ping Batcher operating:
2017-10-02 09:11:30 TRACE juju.state.presence pingbatcher.go:327
0xc42008b950 [cef56d] recorded 1 pings for 1 ping slot(s) and 1 fields in
0.010s

However, in the actual database there is no 'db.presence.pings' table:
juju:PRIMARY> show collections
presence.beings
presence.seqs
system.indexes

While in the shell I did some testing, and it seems that in Mongo 2.4
update(..., {upsert: true}) will *not* create a document if it doesn't
exist already if one of the operations is a $bit operation. Specifically,
doing:

juju:PRIMARY> db.presence.test.find()
juju:PRIMARY> db.presence.test.update({_id: "blah:1245"}, {$set: {"slot":
1245}, $bit: {"alive.0": {"or": 1}}}, {"upsert": true})
juju:PRIMARY> db.presence.test.find()
# nothing returned, nothing created
juju:PRIMARY> db.presence.test.update({_id: "blah:1245"}, {$set: {"slot":
1245}, $inc: {"alive.0": NumberInt(1)}}, {"upsert": true})
juju:PRIMARY> db.presence.test.find()
{ "_id" : "blah:1245", "alive" : { "0" : 1 }, "slot" : 1245 }
# $inc *did* create the document
juju:PRIMARY> db.presence.test.update({_id: "blah:1245"}, {$set: {"slot":
1245}, $bit: {"alive.0": {"or": 2}}}, {"upsert": true})
juju:PRIMARY> db.presence.test.find()
{ "_id" : "blah:1245", "alive" : { "0" : 3 }, "slot" : 1245 }
# And now the $bit operation is able to increment it.

I'll play with it a bit more, but I would have thought we had a trusty
based run of all the test suite, and that should have also failed, vs just
a CI test failing.

On Mon, Oct 2, 2017 at 1:07 PM, John Meinel <email address hidden> wrote:

> I don't see any successes for multi-series-charm so I'm not sure that
> there is actually a cause from this patch.
>
> The failure around 'agent-lost' or 'agent-is-not-communicating-with-the-server'
> could, indeed be caused by the change from this patch, as it is
> specifically dealing with changes around detecting whether agents are alive
> and communicating or not.
>
> I haven't managed to actually run the test yet, but just doing "bootstrap
> && deploy" I don't see the result. I've been trying with Xenial and LXD, I
> may try with Trusty to see if it makes a difference. (Its possible that
> Mongo 2.4 on Trusty doesn't support the $bit operation as we think it does,
> though the test suite was passing.)
>
> On Mon, Oct 2, 2017 at 12:07 PM, John Meinel <email address hidden>
> wrote:
>
>> Given the need to run against $ENV, I'm guessing this expects to have a
>> controller already bootstrapped? There shouldn't be anything in the patch
>> that changes agents (vs possibly changing how controllers operate).
>>
>> I'm missing something in how to run the tests manually. If I try:
>>
>> ./assess_storage lxd ~/dev/go/bin/juju ~/dev/tmp/juju_test
>>
>> Then it warns me that:
>> /home/jameinel/dev/go/src/github.com/juju/juju/acceptancetes
>> ts/utility.py:88: UserWarning: Not a directory
>> /home/jameinel/dev/tmp/jujutest
>> warnings.warn('Not a directory {}'.format(maybe_dir))
>> and fails with:
>> File "/home/jameinel/dev/go/src/github.com/juju/juju/acceptancete
>> sts/jujupy/configuration.py", line 37, in get_selected_environment
>> 'Environment "{}" does not exist.'.format(selected))
>> jujupy.configuration.NoSuchEnvironment: Environment "lxd" does not exist.
>>
>> However, if I first do "juju bootstrap lxd" I then get:
>> /home/jameinel/dev/go/src/github.com/juju/juju/acceptancetes
>> ts/utility.py:94: UserWarning: Directory '/home/jameinel/dev/tmp/jujutest'
>> has existing contents.
>> 'Directory {!r} has existing contents.'.format(maybe_dir))
>>
>> and a final failure of:
>> jujupy.configuration.NoSuchEnvironment: Environment "lxd" does not exist.
>>
>> Now, trying to look at the code, I see it trying to do:
>>
>> def get_juju_home():
>> home = os.environ.get('JUJU_HOME')
>> if home is None:
>> home = os.path.join(os.environ.get('HOME'), '.juju')
>> return home
>> ...
>> def get_environments_path(juju_home):
>> return os.path.join(juju_home, 'environments.yaml')
>> ...
>> def get_environments():
>> """Return the environments for juju."""
>> home = get_juju_home()
>> with open(get_environments_path(home)) as env:
>> return yaml.safe_load(env)['environments']
>>
>> So it seems that this code assumes you have a leftover Juju-1.x setup
>> laying around that has a matching environments.yaml file?
>>
>>
>> On Mon, Oct 2, 2017 at 11:37 AM, John Meinel <email address hidden>
>> wrote:
>>
>>> Note that you have to be logged in to see the 'charm-storage' job.
>>> Otherwise those links come back as 404.
>>>
>>> I'm investigating this now.
>>>
>>> John
>>>
>>> On Thu, Sep 28, 2017 at 7:50 PM, Burton Swan <<email address hidden>
>>> > wrote:
>>>
>>>> ** Description changed:
>>>>
>>>> - Juju version: 2.3-alpha1
>>>> + Juju version: 2.3-alpha1 and 2.2 series.
>>>>
>>>> PR7868 is highly likely the cause of this issue:
>>>> https://github.com/juju/juju/pull/7868
>>>>
>>>> Using charm-storage test as an example:
>>>> http://ci.jujucharms.com/job/charm-storage/
>>>>
>>>> Re-run the last successful job, revision build 5743, commit a906cf9,
>>>> works well:
>>>> http://ci.jujucharms.com/job/charm-storage/2027/
>>>>
>>>> Then re-run the first failed job, revision build 5745, commit
>>>> 774f2f0, failed again:
>>>> http://ci.jujucharms.com/job/charm-storage/2028/
>>>> 2017-09-28 14:11:49 ERROR ('0', 'agent is not communicating with the
>>>> server', 1149.777357)
>>>>
>>>> commit 774f2f0 also triggered other failures, e.g.:
>>>> http://ci.jujucharms.com/job/multi-series-charm/
>>>>
>>>> This issue can also be manually reproduced on the latest Juju
>>>> 2.3-alpha1 by run:
>>>> $ ./assess_storage.py $ENV $JUJU_BIN $JUJU_DATA
>>>>
>>>> ** Description changed:
>>>>
>>>> Juju version: 2.3-alpha1 and 2.2 series.
>>>>
>>>> PR7868 is highly likely the cause of this issue:
>>>> https://github.com/juju/juju/pull/7868
>>>> +
>>>> + On Juju 2.2 series it is PR7863:
>>>> + https://github.com/juju/juju/pull/7863
>>>>
>>>> Using charm-storage test as an example:
>>>> http://ci.jujucharms.com/job/charm-storage/
>>>>
>>>> Re-run the last successful job, revision build 5743, commit a906cf9,
>>>> works well:
>>>> http://ci.jujucharms.com/job/charm-storage/2027/
>>>>
>>>> Then re-run the first failed job, revision build 5745, commit
>>>> 774f2f0, failed again:
>>>> http://ci.jujucharms.com/job/charm-storage/2028/
>>>> 2017-09-28 14:11:49 ERROR ('0', 'agent is not communicating with the
>>>> server', 1149.777357)
>>>>
>>>> commit 774f2f0 also triggered other failures, e.g.:
>>>> http://ci.jujucharms.com/job/multi-series-charm/
>>>>
>>>> This issue can also be manually reproduced on the latest Juju
>>>> 2.3-alpha1 by run:
>>>> $ ./assess_storage.py $ENV $JUJU_BIN $JUJU_DATA
>>>>
>>>> --
>>>> You received this bug notification because you are subscribed to the bug
>>>> report.
>>>> https://bugs.launchpad.net/bugs/1720155
>>>>
>>>> Title:
>>>> Juju agent is not communicating with the server.
>>>>
>>>> To manage notifications about this bug go to:
>>>> https://bugs.launchpad.net/juju/+bug/1720155/+subscriptions
>>>>
>>>
>>>
>>
>