For now I'm moving forward with the assumption that your clusters ran etcd 2.3 once in the past. Some key points of information:
1. Etcd 2.3 stores data in /var/snap/etcd/current/etcd0.etcd/
2. Etcd 3.x stores data in /var/snap/etcd/current/
3. If you upgrade from etcd 2.3 to etcd 3.0, then the snap generates a "migration config"[1] that includes an adjusted data-dir field to keep the data in /var/snap/etcd/current/etcd0.etcd/
4. Usually, the etcd charm does not regenerate its configuration, even on upgrade-charm, so the "migration config" continues to be used.
5. However, etcd-449 includes a PR[2] that causes the config to be regenerated. When that happens, the data-dir is changed to /var/snap/etcd/current/ but the data is not moved. As far as etcd is concerned, all data is lost.
This was a time bomb. The charm needs to be able to regenerate the etcd config as needed, but the etcd2->3 upgrade makes doing that a disaster. We haven't encountered this until now because of how rare it is for the etcd charm to actually regenerate its config.
I am still looking into solutions, but I think what needs to happen here is that the charm needs to detect this case and complete the migration such that it's no longer dependent on a special migration config to function.
For now I'm moving forward with the assumption that your clusters ran etcd 2.3 once in the past. Some key points of information:
1. Etcd 2.3 stores data in /var/snap/ etcd/current/ etcd0.etcd/ etcd/current/ etcd/current/ etcd0.etcd/ etcd/current/ but the data is not moved. As far as etcd is concerned, all data is lost.
2. Etcd 3.x stores data in /var/snap/
3. If you upgrade from etcd 2.3 to etcd 3.0, then the snap generates a "migration config"[1] that includes an adjusted data-dir field to keep the data in /var/snap/
4. Usually, the etcd charm does not regenerate its configuration, even on upgrade-charm, so the "migration config" continues to be used.
5. However, etcd-449 includes a PR[2] that causes the config to be regenerated. When that happens, the data-dir is changed to /var/snap/
This was a time bomb. The charm needs to be able to regenerate the etcd config as needed, but the etcd2->3 upgrade makes doing that a disaster. We haven't encountered this until now because of how rare it is for the etcd charm to actually regenerate its config.
I am still looking into solutions, but I think what needs to happen here is that the charm needs to detect this case and complete the migration such that it's no longer dependent on a special migration config to function.
[1]: https:/ /github. com/juju- solutions/ etcd-snaps/ blob/d53089eb42 5db715c5514186c d5ee108a8671332 /bin/snap- wrap.sh# L38-L77 /github. com/charmed- kubernetes/ layer-etcd/ pull/158
[2]: https:/