Multinode node job failed to start etcd
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zun |
Triaged
|
Wishlist
|
Unassigned | ||
kuryr-libnetwork |
New
|
Undecided
|
Unassigned |
Bug Description
Description
===========
Zun multinode job broke starting from September 28. One of the failed job:
Errors
======
In /console.html
----------------
...
2017-09-28 12:01:00.234797 | + /opt/stack/
2017-09-28 12:08:30.221956 | ERROR: the main setup script run by this job failed - exit code: 2
In /subnode-
-------
...
2017-09-28 12:08:28.707 | + lib/etcd3:
2017-09-28 12:08:28.758 | Job for <email address hidden> failed because the control process exited with error code. See "systemctl status <email address hidden>" and "journalctl -xe" for details.
In /subnode-
-------
Sep 28 12:08:28.735930 ubuntu-
Sep 28 12:08:28.753698 ubuntu-
...
Changed in zun: | |
status: | New → Triaged |
importance: | Undecided → Critical |
Changed in zun: | |
importance: | Critical → Wishlist |
I think you are correct in locating the patch that triggers this, but I'm wondering how the original devstack setup is sensible. IIUC it is starting two independent etcd processes, one on each node, and then directs clients on each node to the local etcd. So the expected feature of getting some kind of coordination between the nodes won't happen. Unless we want to create a real etcd cluster, I think we should better just stop starting etcd on subnodes in order to fix this bug.