Device stopped receiving OTA-updates due to outdated keyring

Bug #1579738 reported by Michael Zanetti
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Fix Released
Critical
Łukasz Zemczak

Bug Description

I have an arale device here which stopped giving me OTA-updates. Trying to update it manually using system-image-cli, it turns out that it is outdated so much, that system-image-cli doesn't have up-to-date signing keys any more and rejects to verify index.json. With that, it rejects to update anything at all => device needs to be reflashed.

It should not happen that those keys get stale after like 2 months not updating a device.

Currently installed image:
phablet@ubuntu-phablet:~$ system-image-cli -i
current build number: 268
device name: arale
channel: ubuntu-touch/rc-proposed/meizu.en
last update: 2010-01-19 23:51:26
version version: 268
version ubuntu: 20160312
version device: 20160111-51982fc
version custom: 1452441600

Note that the last update date must be wrong. I updated it earlier this year (approx 2 months ago).

Here's the log, trying to system-image-cli it: http://paste.ubuntu.com/16317219/

(also, wouldn't hurt if system-image-cli would print human readable error messages instead of stack traces)

Tags: lt-blocker
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Thanks for your report, confirmed with much more recent devices. Same error occurs on
- ubuntu-touch/rc-proposed/bq-aquaris-pd.en frieza 97
- ubuntu-touch/rc-proposed/bq-aquaris.en krillin 325
- ubuntu-touch/rc-proposed/meizu.en arale 317

Changed in canonical-devices-system-image:
assignee: nobody → Łukasz Zemczak (sil2100)
importance: Undecided → Critical
milestone: none → 11
status: New → Confirmed
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Looking into this right now - I don't have too much knowledge of how the keyring-bits are working, but I might have some theories already.

So what happened last week is we had a key-exchange happening on system-image. We got new keys signed and uploaded as the old ones were expiring. It seems system-image-cli checks on each upgrade check if the image keyrings are present on the FS or not. If it's there, it uses it for GPG validation. If not, it downloads the keyrings from system-image. When I tested image upgrades I tested from a fresh flash, so s-i downloaded the new keyrings instantly and everything worked. But with existing devices it doesn't work, as it still has the *old* keyrings in place - and s-i will only re-download if they're removed or not present. Eh...

I asked Jean-Baptiste to confirm this theory and it indeed works after removing the keyrings. We need to somehow force the re-download of keyrings (at least by removing the old ones) in all our devices.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

A possible idea I have for forcing the re-download of keyrings is to put the old image-signing keyring to the blacklist. From what I see s-i client has a mechanism that when it sees a keyring in the blacklist it downloads a new one from system-image servers. Waiting for Steve to appear and confirm that's the right way to go.

tags: added: lt-blocker
Changed in canonical-devices-system-image:
status: Confirmed → In Progress
summary: - Device stopped receiving OTA-updates, system-image-cli outdated
+ Device stopped receiving OTA-updates
summary: - Device stopped receiving OTA-updates
+ Device stopped receiving OTA-updates due to outdated keyring
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Still looking around for other possibilities. Blacklisting might work but it seems odd for such a standard case as key-rotation. Poked Barry in case he knows more.

Revision history for this message
Barry Warsaw (barry) wrote :

Here's the way keyring updates should work: https://wiki.ubuntu.com/ImageBasedUpgrades/GPG

TL;DR:

First, we cannot download a new archive-master key since that's the root of the trust chain. But that key should never expire and we protect it zealously. If it ever does expire or get compromised, all devices would have to be reflashed.

The image-master key should also never expire, but if it does, we can download a new one and validate it against the archive-master.

The image-signing and device-signing keys can and do expire. The former is supposed to be regenerated every two years. (The webpage above indicates a 2013 update, but I'll bet it was regened for 2015).

Under what conditions does a new image-signing key get re-downloaded? First, if the key is cached and it is blacklisted, has a corrupt checksum, or fails to verify against the image-master, it will get redownloaded. Second, if the channels.json file's signature fails to validate against the image-signing key then we'll redownload it too. I believe GPG ensure that a key will not verify if it has expired, but we rely on GPG itself to tell us that. When we do download the new key, the key's metadata can contain an 'expiry' entry in the JSON, which we check against the device's timestamp. However, the 'expiry' key is optional. (I don't remember when we turned on 'expiry' or under what conditions that metadata key will be there.)

Of course, the newly downloaded image-signing key must also validate, and if all that fails, the device will refuse to update. Remember too that the index.json can be signed by either the image-signing or device-signing key. If either validates, the index.json is good.

What the traceback is telling me is that the index.json is failing to be validly signed by either the image-signing key or the device-signing key, *but* that the channels.json file was validly signed by the image-signing key. ISTM that's a broken configuration on the server.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Issue fixed! Thanks for your insight Barry! So, the problem was something more or less what Barry said, but not really a mis-configuration on the server. Explanation: the channels.json file was signed with the old key and index.json with the new key. This worked in the case of a fresh flash or removing keyrings because the new keyring still *has* the old key in it (and it's still not expired yet). This meant that, for the case of a fresh flash, we downloaded the new keyring, validated channels.json (since we had the old key in the keyring) and then validated index.json with the new key. For existing devices it was failing since, as Barry mentioned, we only re-download the keyring when we fail validating channels.json. In this case we didn't re-download since channels.json was still signed with the old keys! This lead to a situation where we had 2 differently signed files.

What I did is update (no-change) the channels on system-image to force re-generation of channels.json and getting it re-signed. Multiple people confirmed that the upgrades are now working again as system-image-cli is re-downloading the image-signing keyring.

Need to update the docs about this case.

Phew!

Changed in canonical-devices-system-image:
status: In Progress → Fix Released
status: Fix Released → Fix Committed
Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.