Walrus image re-cache after flush fails

Bug #693695 reported by iain MacDonnell on 2010-12-23
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Neil Soman

Bug Description

I had been experiencing two issues with Walrus, triggered by the maximum image cache size limit being exceeded

Problem 1: Images that were flushed from the cache could not be downloaded from then on, unless they were deregistered then registered with a new ID. When the failure occurred, this exception showed:

15:49:11 ERROR [WalrusImageManager:Bukkit.6] edu.ucsb.eucalyptus.cloud.NotAuthorizedException: You are not authorized to perform this operation
edu.ucsb.eucalyptus.cloud.NotAuthorizedException: You are not authorized to perform this operation
        at edu.ucsb.eucalyptus.cloud.ws.WalrusImageManager.decryptImage(WalrusImageManager.java:248)
        at edu.ucsb.eucalyptus.cloud.ws.WalrusImageManager.cacheImage(WalrusImageManager.java:526)
        at edu.ucsb.eucalyptus.cloud.ws.WalrusImageManager.getDecryptedImage(WalrusImageManager.java:1112)

Problem 2: Sometimes I'd get in a state where trying to download a particular image would always result a long wait, followed by the "Tired of waiting to cache image" condition to occur

The bug that causes problem 1 is silly .. in WalrusImageManager's decryptImage(), we find a section that is executed when caching is triggered as administrator, as appears to be the case when an attempt is made to download an image that's not currently cached:

                                        if(isAdministrator) {
                                                try {
                                                        boolean verified = false;
                                                        for(User user:Users.listAllUsers( )) {
                                                                for (X509Certificate cert : user.getAllX509Certificates()) {
                                                                        if(cert != null)

verified = canVerifySignature(sigVerifier, cert, signature, verificationString);

                                                        if(!verified) {

It's great that we break out of the "cert" loop when verification fails, but unfortunately we forget to break out of the "user" loop! So we move on to the next user, who either doesn't have a cert, or his cert can't decrypt the image.

A quick fix is to add a second "if (verified) break;" at the bottom of the outer loop. If I was writing the code, I might have built a list with a single item for the non-administrator case, and then used one block of code to handle both cases.

This leads to problem 2 - when signature verification fails, it throws a NotAuthorizedException. This is thrown back through cacheImage() to a semaphore-protected part of getDecryptedImage(). Unfortunately the exception is not caught there, so the semaphore is never released, and that image cannot be decrypted again until the CLC is restarted.

I supposed fixing problem 1 could prevent problem 2 from recurring, but really the semaphore-protected code should catch exceptions and release the semaphore when they occur. I wrapped the whole section in...

try {
} catch (Exception ex) {
      throw new EucalyptusCloudException("caught in semaphore-protected part of getDecryptedImage()", ex);

which seems to do the trick.

iain MacDonnell (dseven) wrote :
Neil Soman (neilsoman) on 2010-12-23
Changed in eucalyptus:
status: New → Confirmed
Neil Soman (neilsoman) on 2010-12-23
Changed in eucalyptus:
assignee: nobody → Neil Soman (neilsoman)
Andy Grimm (agrimm) wrote :

This issue is now being tracked upstream at http://eucalyptus.atlassian.net/browse/EUCA-2749

Please watch that issue for further updates.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers