template for elasticsearch.yaml injects _site_ which is causing elasticsearch to fail to start

Bug #1809279 reported by Jeff Hillman
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Elasticsearch Charm
Won't Fix
Medium
Unassigned

Bug Description

After performing the steps and resolving the issue mentioned in bug https://bugs.launchpad.net/elasticsearch-charm/+bug/1809275

Elasticsearch still fails to start with the error:

---

org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: No up-and-running site-local (private) addresses found, got [name:lo (lo), name:eth0 (ens6)]

---

This is running on a KVM host and is clustered as part of the standard foundation cloud build ELK stack.

The peer-relation-join hook is explictly what is failing. With ust 1 unit it runs fine, but adding the peer, the playbook.yaml is run and /etc/elasticsearch/elasticsearch.yaml is updated and it adds "_site_" to the network.bind_host line.

After several attempts of troubleshooting and various combinations, if we remove the _site_ piece from that line, the service starts fine Wethen changed the template in /var/lib/juju/agents/unit-elasticsearch-X/charm/template/elasticsearch.yaml to remove this.

Now the relation joins just fine and the service starts up.

elasticsearch-32 charm from charmstore
using the 6.x apt repo
bionic

Tags: cpe-onsite
Revision history for this message
Vern Hart (vern) wrote :

I encountered this issue again (same customer as Jeff).

It was working previously and then when I imported all new, stable charm versions, it stopped working. This site is an air-gapped deployment and all the charms are local copies.

I removed _site_ from the elasticsearch.yml template and this didn't fix the problem. What looks like is happening is that each unit forms its own cluster and the discovery is not finding each other.

Then I checked the old charm to see if there were any changes and found this for the bind_host:

  network.bind_host: ["local", {{ unit_private_address }}]

After changing the template in the newly imported charm to this, it worked.

Revision history for this message
Vern Hart (vern) wrote :

Is this being seen anywhere else?

One thing about this deployment that might be special is that the multiple elasticsearch nodes are in different subnets. Would that cause this problem?

Revision history for this message
Xav Paice (xavpaice) wrote :

I was able to reproduce this when running Elasticsearch in a lxd using fan-out networking, and another node on bare metal (i.e. different subnets).

Changed in charm-elasticsearch:
status: New → Triaged
importance: Undecided → Medium
Eric Chen (eric-chen)
Changed in charm-elasticsearch:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.