etcd snapshot default keys-version should be v3

Bug #1840933 reported by Mike Wilson
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Etcd Charm
Fix Released
Medium
Robert Gildein

Bug Description

etcd snapshots aren't saving Kubernetes data now that Kubernetes data is in a namespace.

12:51:12 <jk0ne> I was trying to do an upgrade, and etcd came back with 0 active peers (and apparently no data)
12:51:39 <jk0ne> I've tried to restore the data (I did make a snapshot) but I can't make it go.
14:14:51 <jk0ne> Here's where I'm at now. I've managed to restore the snapshot and get one etcd working.
14:15:18 <jk0ne> Unfortunately, the snapshot seems to have not restored the kubernetes data stored in etcd (all pods / deployments / etc. lost)
14:15:54 <jk0ne> (the snapshot doesn't include member/snap/db)

Revision history for this message
George Kraft (cynerva) wrote :

Was the snapshot action run with keys-version=v3 ?

Revision history for this message
Jay Kuri (jk0ne) wrote :

I did not run the action with keys-version=v3. I was unaware of that option (it's not in the docs FYI)

I did some investigation yesterday and I found a few things.

The missing db file seems to be a bug in pre 3.1.8 etcd itself (we have 3.0.17 in the affected environment)

https://github.com/etcd-io/etcd/issues/8331

I also did some testing with snapshots created via the documented process. If you shut down etcd manually on the unit and touch the member/snap/db file as mentioned in the above ticket, when etcd is restarted it does allow the process to proceed.

Also - when the snapshot is restored, the cluster memberships are restored with it. This causes another etcd crash if restoring to a different environment (or a rebuild of the existing one) where the etcd units have changed.

Testing with snapshots from a different environment (with the above workarounds) I was able to get the data restored and verify the k8s data was present.

Revision history for this message
George Kraft (cynerva) wrote :

> I was unaware of that option (it's not in the docs FYI)

I see it documented here: https://ubuntu.com/kubernetes/docs/backups

Etcd 3.x has 2 APIs with two separate databases: etcdv2 and etcdv3. By default, the snapshot action grabs data from the etcdv2 backend. The keys-version=v3 option tells it to grab data from the etcdv3 backend instead.

Any recent version of Kubernetes (1.13+) will be using the etcdv3 API, so you need keys-version=v3 to get the data from there.

Revision history for this message
Mike Wilson (knobby) wrote :

Should we flip the default or check the config to see what version to snapshot?

Revision history for this message
Jay Kuri (jk0ne) wrote :

+1 on flip the default going forward and/or check config.

re: docs - I was looking at the charmstore:

https://jaas.ai/etcd

under 'Operational actions' which provides the snapshot commands but without the keys-version option mentioned.

Revision history for this message
George Kraft (cynerva) wrote :

> +1 on flip the default going forward and/or check config.

Yes, let's do that.

> re: docs - I was looking at the charmstore: ... under 'Operational actions' which provides the snapshot commands but without the keys-version option mentioned.

This section has been removed from the charm docs, which now link to our official documentation instead. I think doc-wise, we don't need to make any changes here.

summary: - etcd snapshots aren't saving other namespaces
+ etcd snapshot default key-version should be v3
Changed in charm-etcd:
importance: Undecided → Medium
status: New → Triaged
summary: - etcd snapshot default key-version should be v3
+ etcd snapshot default keys-version should be v3
Changed in charm-etcd:
status: Triaged → In Progress
assignee: nobody → Robert Gildein (rgildein)
Revision history for this message
George Kraft (cynerva) wrote :
Changed in charm-etcd:
status: In Progress → Fix Committed
George Kraft (cynerva)
Changed in charm-etcd:
milestone: none → 1.20+ck1
tags: added: backport-needed
Revision history for this message
George Kraft (cynerva) wrote :
tags: removed: backport-needed
Changed in charm-etcd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.