Comment 5 for bug 1843497

Revision history for this message
George Kraft (cynerva) wrote :

For now I'm moving forward with the assumption that your clusters ran etcd 2.3 once in the past. Some key points of information:

1. Etcd 2.3 stores data in /var/snap/etcd/current/etcd0.etcd/
2. Etcd 3.x stores data in /var/snap/etcd/current/
3. If you upgrade from etcd 2.3 to etcd 3.0, then the snap generates a "migration config"[1] that includes an adjusted data-dir field to keep the data in /var/snap/etcd/current/etcd0.etcd/
4. Usually, the etcd charm does not regenerate its configuration, even on upgrade-charm, so the "migration config" continues to be used.
5. However, etcd-449 includes a PR[2] that causes the config to be regenerated. When that happens, the data-dir is changed to /var/snap/etcd/current/ but the data is not moved. As far as etcd is concerned, all data is lost.

This was a time bomb. The charm needs to be able to regenerate the etcd config as needed, but the etcd2->3 upgrade makes doing that a disaster. We haven't encountered this until now because of how rare it is for the etcd charm to actually regenerate its config.

I am still looking into solutions, but I think what needs to happen here is that the charm needs to detect this case and complete the migration such that it's no longer dependent on a special migration config to function.

[1]: https://github.com/juju-solutions/etcd-snaps/blob/d53089eb425db715c5514186cd5ee108a8671332/bin/snap-wrap.sh#L38-L77
[2]: https://github.com/charmed-kubernetes/layer-etcd/pull/158