ElasticSearch fails at peer-relation-joined

Bug #1646904 reported by Simon Turner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
elasticsearch (Juju Charms Collection)
New
Undecided
Unassigned

Bug Description

Deploying the ElasticSearch charm (tried both 18 and 19) into AWS fails when at peer-relation-join (see below - this is for instance 2 but all 3 fail the same way). I'm new to Juju so it maybe a rookie error, but since I can't see *why* Ansible is returning a non-zero code, I'm stumped.

unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined PLAY RECAP ********************************************************************
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined localhost : ok=2 changed=1 unreachable=0 failed=1
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined Traceback (most recent call last):
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined File "/var/lib/juju/agents/unit-elasticsearch-2/charm/hooks/peer-relation-joined", line 101, in <module>
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined hooks.execute(sys.argv)
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined File "/var/lib/juju/agents/unit-elasticsearch-2/charm/hooks/charmhelpers/contrib/ansible/__init__.py", line 171, in execute
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined self.playbook_path, tags=[hook_name])
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined File "/var/lib/juju/agents/unit-elasticsearch-2/charm/hooks/charmhelpers/contrib/ansible/__init__.py", line 116, in apply_playbook
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined subprocess.check_call(call, env=env)
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined raise CalledProcessError(retcode, cmd)
unit-elasticsearch-2: 16:38:50 INFO unit.elasticsearch/2.peer-relation-joined subprocess.CalledProcessError: Command '['ansible-playbook', '-c', 'local', 'playbook.yaml', '--tags', 'peer-relation-joined']' returned non-zero exit status 2
unit-elasticsearch-2: 16:38:50 ERROR juju.worker.uniter.operation hook "peer-relation-joined" failed: exit status 1
unit-elasticsearch-2: 16:38:56 INFO unit.elasticsearch/2.peer-relation-joined
unit-elasticsearch-2: 16:38:56 INFO unit.elasticsearch/2.peer-relation-joined PLAY [localhost] **************************************************************
unit-elasticsearch-2: 16:38:56 INFO unit.elasticsearch/2.peer-relation-joined
unit-elasticsearch-2: 16:38:56 INFO unit.elasticsearch/2.peer-relation-joined GATHERING FACTS ***************************************************************
unit-elasticsearch-2: 16:38:57 INFO unit.elasticsearch/2.peer-relation-joined ok: [localhost]
unit-elasticsearch-2: 16:38:57 INFO unit.elasticsearch/2.peer-relation-joined
unit-elasticsearch-2: 16:38:57 INFO unit.elasticsearch/2.peer-relation-joined TASK: [Update config with peer hosts] *****************************************
unit-elasticsearch-2: 16:38:57 INFO unit.elasticsearch/2.peer-relation-joined ok: [localhost]
unit-elasticsearch-2: 16:38:57 INFO unit.elasticsearch/2.peer-relation-joined
unit-elasticsearch-2: 16:38:57 INFO unit.elasticsearch/2.peer-relation-joined TASK: [Wait until the local service is available] *****************************

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Hi Simon. What juju version are you running? (I'm surprised as there's a test tests/02-deploy-three-units which tests that this works, so not sure if it's related to juju versions or what else may have changed).

Looking at the tasks/peer-relations.yml, this will happen if a unit is unable to join the cluster. Given that all 3 failed, one possibility is that all 3 were waiting for the others to open port 9300 so they could join. If you get a chance, can you try both:

1) Deploying just 1 unit, then adding a second and third. If this works, then there may be a timing issue that we need to fix. If this fails, please try

2) Deploying 3 units as you did, but with the config option firewall_enabled set to false.

Thanks

Revision history for this message
Simon Turner (srjturner) wrote :

Hi Michael,

Thanks, tried a single unit, it failed. Then I realised that although juju says that the unit is "active" in fact the elasticsearch process was not running(!) Explicitly starting it on the host failed, and the logs revealed the message:

No up-and-running site-local (private) addresses found, got [name:lo (lo), name:eth0 (eth0)]

No sure why it Elasticsearch doesn't recognise "lo" as a loopback address in EC2, but by editing the "network.host" in /etc/elasticsearch/elasticsearch.yml (from ["_site", "_local_"]) to ["0.0.0.0"] the process started OK.

Now, I appreciate that I can download the charm and edit /templates/elasticsearch.yml, but that's not ideal - my objective is to get a healthy deployment of the canonical-kubernetes bundle (which includes this charm). Would it make sense to make the network.host a configuration option for the charm? Than (worst case) I can set it with juju config post-deployment.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Hi Simon.

The version of the charm which I use already has the network.host setting is already 0.0.0.0 in templates/elasticsearch.yml:

http://bazaar.launchpad.net/~onlineservices-charmers/charms/trusty/elasticsearch/elasticsearch2/view/head:/templates/elasticsearch.yml

so I wasn't sure why you're seeing ["_site", "_local_"], until I checked jujucharms.com and see there that current ES charm there uses a different branch which has exactly ["_site", "_local_"]:

https://api.jujucharms.com/charmstore/v5/elasticsearch/archive/templates/elasticsearch.yml

which is the same as

http://bazaar.launchpad.net/~onlineservices-charmers/charms/trusty/elasticsearch/trunk/view/head:/templates/elasticsearch.yml

I'll try to find out why those changes aren't merged into the trunk branch. I'm not involved in the kubernetes work, but will ping those who are as well.

That said, I see no problem at all with updating the network.host to be a charm config option defaulting to the current value of 0.0.0.0. (You can set config options with the deploy too, rather than post-deploy).

Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

Saw that issue as well. It seems to be related to: https://bugs.launchpad.net/charm-elasticsearch/+bug/1714126.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.