etcd service doesn't start on secondary nodes.

Bug #1809389 reported by Tim Van Steenburgh
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Etcd Charm
New
Undecided
Unassigned

Bug Description

Opened by Pekkari on 2018-11-14 11:15:34+00:00 at https://github.com/juju-solutions/layer-etcd/issues/144

------------------------------------------------------------

I'm currently trying to deploy an etcd cluster with juju, three nodes gets
installed and one become active idle, rest stuck at requesting certificates
to an easyrsa related:
Unit Workload Agent Machine Public address Ports Message │···································································
etcd/0* active idle 8/lxd/1 x.x.x.x 2379/tcp Errored with 0 known peers │···································································
etcd/1 maintenance idle 18/lxd/1 x.x.x.x Requesting tls certificates. │···································································
etcd/2 maintenance idle 28/lxd/1 x.x.x.x Requesting tls certificates.

Inside the broken nodes I try to start etcd service installed from snap and it never
succeed. Connectivity between nodes is ok though:
# systemctl status snap.etcd.etcd │···································································
● snap.etcd.etcd.service - Service for snap application etcd.etcd │···································································
   Loaded: loaded (/etc/systemd/system/snap.etcd.etcd.service; enabled; vendor preset: enabled) │···································································
   Active: failed (Result: start-limit-hit) since Wed 2018-11-14 09:37:59 UTC; 1h 35min ago │···································································
  Process: 409798 ExecStart=/usr/bin/snap run etcd (code=exited, status=1/FAILURE) │···································································
 Main PID: 409798 (code=exited, status=1/FAILURE) │···································································
                                                                                                                                                                                                           │···································································
Nov 14 09:38:01 juju-d67a39-18-lxd-1 systemd[1]: Failed to start Service for snap application etcd.etcd. │···································································
Nov 14 09:38:01 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Failed with result 'start-limit-hit'. │···································································
Nov 14 09:38:03 juju-d67a39-18-lxd-1 systemd[1]: Stopped Service for snap application etcd.etcd. │···································································
Nov 14 09:38:03 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Start request repeated too quickly. │···································································
Nov 14 09:38:03 juju-d67a39-18-lxd-1 systemd[1]: Failed to start Service for snap application etcd.etcd. │···································································
Nov 14 09:38:03 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Failed with result 'start-limit-hit'. │···································································
Nov 14 09:38:05 juju-d67a39-18-lxd-1 systemd[1]: Stopped Service for snap application etcd.etcd. │···································································
Nov 14 09:38:05 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Start request repeated too quickly. │···································································
Nov 14 09:38:05 juju-d67a39-18-lxd-1 systemd[1]: Failed to start Service for snap application etcd.etcd. │···································································
Nov 14 09:38:05 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Failed with result 'start-limit-hit'.

# systemctl start snap.etcd.etcd
# systemctl status snap.etcd.etcd │···································································
● snap.etcd.etcd.service - Service for snap application etcd.etcd │···································································
   Loaded: loaded (/etc/systemd/system/snap.etcd.etcd.service; enabled; vendor preset: enabled) │···································································
   Active: failed (Result: start-limit-hit) since Wed 2018-11-14 11:13:56 UTC; 50s ago │···································································
  Process: 431109 ExecStart=/usr/bin/snap run etcd (code=exited, status=1/FAILURE) │···································································
 Main PID: 431109 (code=exited, status=1/FAILURE) │···································································
                                                                                                                                                                                                           │···································································
Nov 14 11:13:55 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Unit entered failed state. │···································································
Nov 14 11:13:55 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Failed with result 'exit-code'. │···································································
Nov 14 11:13:56 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Service hold-off time over, scheduling restart. │···································································
Nov 14 11:13:56 juju-d67a39-18-lxd-1 systemd[1]: Stopped Service for snap application etcd.etcd. │···································································
Nov 14 11:13:56 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Start request repeated too quickly. │···································································
Nov 14 11:13:56 juju-d67a39-18-lxd-1 systemd[1]: Failed to start Service for snap application etcd.etcd. │···································································
Nov 14 11:13:56 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Unit entered failed state. │···································································
Nov 14 11:13:56 juju-d67a39-18-lxd-1 systemd[1]: snap.etcd.etcd.service: Failed with result 'start-limit-hit'.

Thanks!
José.

====================== COMMENTS ============================

Comment created by tvansteenburgh on 2018-11-14 13:01:05+00:00

@Pekkari Please provide a bundle file so we can attempt to reproduce.

------------------------------------------------------------

Comment created by Cynerva on 2018-11-14 14:28:52+00:00

@Pekkari Can you check the status of the snap.etcd.etcd service on unit etcd/0? I suspect it's not running and you need to restart it.

The other units are likely failing because they can't contact the leader.

------------------------------------------------------------

Comment created by Pekkari on 2018-11-14 14:53:29+00:00

It was, charm was in active/idle status reporting Errored on 0 peers on the status line. The environment
was thrown out for now, I'll give proper output when I can reproduce it with a shorter bundle.

Tags: field
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.