SC: Maximum number of loop devices should be configurable

Bug #586134 reported by C de-Avillez
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Fix Released
High
Chris Cheney
Lucid
Fix Released
High
Chris Cheney
Maverick
Fix Released
High
Chris Cheney

Bug Description

Eucalyptus versions:

ii eucalyptus-common 1.6.2-0ubuntu30.1
ii eucalyptus-java-common 1.6.2-0ubuntu30.1
ii eucalyptus-sc 1.6.2-0ubuntu30.1

Bug 430846 and bug 498174 partially dealt with this issue. But, on an SC (perhaps also on the Walrus?) currently you cannot create more than 32 volumes -- trying to create the 33rdvolume will fail with the SC reporting:

ERROR [BlockStorage:pool-10-thread-5] com.eucalyptus.util.EucalyptusCloudException: Could not create loopback device for //var/lib/ eucalyptus/volumes/vol-xxxxxxxxx. Please check the max loop value and permissions

This, of course, is an artificial restriction: Consider a SC with a few terabytes of disc space, and consider euca-create-volumes usually running with a few tens of giga for the allocation (-s)... the system very fast reaches saturation, and real disc utilisation is still (probably) in the few units of percent.

So, either this should be set in /etc/eucalyptus/eucaliptus.*conf, or eucalyptus should dynamically create the needed /dev/loppxxx.

*Notes*:
(1) setting this on the upstart script would allow for an initial setup, but would require the SC to be restarted for the new value to take effect, which will probably cause a temporary interruption of volume service.
(2) I wonder how the system would behave on a reboot...

======

IMPACT:
 * This bug affects people trying to create more than 32 SC volumes.

ADDRESSED:
 * This bug is addressed by creating more loopback devices (512) in the eucalyptus-sc upstart job.

REPRODUCE:
 * To reproduce this issue, try to create more than 32 SC volumes. You should not be able to do so.

REGRESSION POTENTIAL:
 * The chances for regression are relatively low.

======

Revision history for this message
C de-Avillez (hggdh2) wrote :

proposing for lucid-updates. Changes will be made to /etc/init/eucalyptus-sc.conf (and, perhaps, to /etc/eucalyptus/eucalyptus.conf?) raising the loopback devices to 512 (256 max volumes, plus one snapshot per volume).

Changed in eucalyptus (Ubuntu):
importance: Undecided → High
milestone: none → lucid-updates
Revision history for this message
C de-Avillez (hggdh2) wrote :

*Note* per Dan, the NCs use one loopback device per VM *startup*. Currently we are defaulting to 32 loopback in the upstart script, so we should be good to up-to 32 *simultaneous* instances startups.

Revision history for this message
C de-Avillez (hggdh2) wrote :

Marking Triaged, given the results of the teleconference.

Changed in eucalyptus (Ubuntu):
status: New → Triaged
Revision history for this message
C de-Avillez (hggdh2) wrote :

I have tested it on the test rig: updated the upstart script to allocate 512 /dev/loopnnn, and successfully created 50 volumes (the max we can create right now).

Revision history for this message
C de-Avillez (hggdh2) wrote :

new test, using resources provided by Etienne Goyer (et merci beaucoup Etienne!): I successfully allocated 257 volumes. This is interesting, in one aspect: I understood we could have a max of 256 volumes and 256 snapshots. It seems that the real maximum is 512 (to be tested); but we may end up eating on the snapshots area if we go above (to be tested).

Changed in eucalyptus (Ubuntu):
assignee: nobody → Chris Cheney (ccheney)
Changed in eucalyptus (Ubuntu Lucid):
assignee: nobody → Chris Cheney (ccheney)
status: New → Triaged
importance: Undecided → High
Chris Cheney (ccheney)
description: updated
Revision history for this message
Chris Cheney (ccheney) wrote :
Download full text (5.5 KiB)

This appears to have worked for me up to 50 1GB volumes (actually allocated 52) then it started failing giving me the error in the log file:

"Your proposed upload exceeds the maximum allowed object size. (edu.ucsb.eucalyptus.cloud.EntityTooLargeException)"

I have plenty of space on my physical disk so this might be some configuration option somewhere.

---

VOLUME vol-5EEE064E 1 cluster1 available 2010-06-07T15:46:45.163Z
VOLUME vol-5EDB064C 1 cluster1 available 2010-06-07T15:48:13.219Z
VOLUME vol-5F100652 1 cluster1 available 2010-06-07T15:48:11.546Z
VOLUME vol-5A140630 1 cluster1 available 2010-06-07T15:47:17.402Z
VOLUME vol-53E405FB 1 cluster1 available 2010-06-07T15:47:16.295Z
VOLUME vol-5A120630 1 cluster1 available 2010-06-07T15:47:15.132Z
VOLUME vol-59F0062B 1 cluster1 available 2010-06-07T15:47:19.435Z
VOLUME vol-5BA30654 1 cluster1 available 2010-06-07T15:47:13.564Z
VOLUME vol-598B0627 1 cluster1 available 2010-06-07T15:48:09.349Z
VOLUME vol-5FCA065B 1 cluster1 failed 2010-06-07T15:47:17.824Z
VOLUME vol-59900625 1 cluster1 unavailable 2010-06-07T15:48:17.719Z
VOLUME vol-59B4061D 1 cluster1 available 2010-06-07T15:47:10.239Z
VOLUME vol-5A4F0638 1 cluster1 failed 2010-06-07T15:47:13.013Z
VOLUME vol-59F6062E 1 cluster1 unavailable 2010-06-07T15:48:17.033Z
VOLUME vol-537005F7 1 cluster1 available 2010-06-07T15:48:15.373Z
VOLUME vol-59430623 1 cluster1 available 2010-06-07T15:48:10.004Z
VOLUME vol-5FFC065E 1 cluster1 available 2010-06-07T15:48:06.419Z
VOLUME vol-59890628 1 cluster1 available 2010-06-07T15:47:12.535Z
VOLUME vol-4D9B05C3 1 cluster1 available 2010-06-07T15:46:28.121Z
VOLUME vol-5F87065C 1 cluster1 available 2010-06-07T15:48:16.439Z
VOLUME vol-59820625 1 cluster1 available 2010-06-07T15:48:12.686Z
VOLUME vol-5985062A 1 cluster1 available 2010-06-07T15:47:20.595Z
VOLUME vol-591D0623 1 cluster1 available 2010-06-07T15:48:13.773Z
VOLUME vol-58960616 1 cluster1 available 2010-06-07T15:47:22.804Z
VOLUME vol-5A1A0633 1 cluster1 available 2010-06-07T15:47:18.268Z
VOLUME vol-5989061E 1 cluster1 available 2010-06-07T15:48:14.861Z
VOLUME vol-6066066F 1 cluster1 available 2010-06-07T15:47:23.346Z
VOLUME vol-597D0623 1 cluster1 available 2010-06-07T15:47:15.802Z
VOLUME vol-5E4F064C 1 cluster1 available 2010-06-07T15:47:09.079Z
VOLUME vol-59D20628 1 c...

Read more...

Revision history for this message
Chris Cheney (ccheney) wrote :

The above issue was caused by having my UEC set to 50GB limit for volumes. I changed it to 200GB under "Configuration -> Clusters -> Disk space reserved for volumes" and it allowed me to create 194 of them.

It then started having problems from what appears to be a different but related bug of not properly cleaning up after failed/unavailable volumes. I am still testing that problem though.

Revision history for this message
Chris Cheney (ccheney) wrote :

My removal errors are probably related to 517086, but the creation error appears to be new according to C de-Avillez, it is failing at vgcreate, example follows:

12:32:54 DEBUG [SystemUtil:pool-10-thread-4] Running command: ///usr/lib/eucalyptus/euca_rootwrap dd if=/dev/zero of=//var/lib/eucalyptus/volumes/vol-613B0677 count=1 bs=1M seek=1027
12:32:54 INFO [ServiceSinkHandler:ReplyQueue.10] :1275931974448:eucalyptus:uid:admin:cb7fa02b-1b7c-4f2a-807d-f171acfa1876:MSG_SERVICED:CreateVolumeResponseType:165:
12:32:54 DEBUG [SystemUtil:pool-10-thread-5] Running command: ///usr/lib/eucalyptus/euca_rootwrap vgcreate vg-sGXOSQ.. /dev/loop92
12:32:54 DEBUG [SystemUtil:pool-10-thread-4] Running command: ///usr/lib/eucalyptus/euca_rootwrap losetup -f
12:32:54 INFO [LVM2Manager:pool-10-thread-4] losetup /dev/loop93 //var/lib/eucalyptus/volumes/vol-613B0677
12:32:54 INFO [LVM2Manager:pool-10-thread-4]
12:32:54 INFO [LVM2Manager:pool-10-thread-4]
12:32:54 DEBUG [SystemUtil:pool-10-thread-4] Running command: ///usr/lib/eucalyptus/euca_rootwrap pvcreate /dev/loop93
12:32:54 INFO [AOEManager:pool-10-thread-2] Trying e5.7
12:32:54 DEBUG [SystemUtil:pool-10-thread-4] Running command: ///usr/lib/eucalyptus/euca_rootwrap vgcreate vg-6E6tMQ.. /dev/loop93
12:32:54 INFO [AOEManager:pool-10-thread-1] Trying e5.8
12:32:54 INFO [BlockStorageStatistics:pool-10-thread-3] Service: StorageController Version: 1.6.2 Volumes: 86 Space Used: 91268055040
12:32:54 INFO [BlockStorageStatistics:pool-10-thread-3] Service: StorageController Version: 1.6.2 Volumes: 86 Space Used: 92341796864
12:32:54 INFO [ServiceSinkHandler:ReplyQueue.10] :1275931974956:eucalyptus:uid:admin:60f67a28-1c90-4c26-9a93-0426f5e83f85:MSG_SERVICED:CreateVolumeResponseType:50:
12:32:54 DEBUG [SystemUtil:pool-10-thread-3] Running command: ///usr/lib/eucalyptus/euca_rootwrap dd if=/dev/zero of=//var/lib/eucalyptus/volumes/vol-58B20615 count=1 bs=1M seek=1027
12:32:54 ERROR [SystemUtil:pool-10-thread-4] com.eucalyptus.util.ExecutionException: ///usr/lib/eucalyptus/euca_rootwrap vgcreate vg-6E6tMQ.. /dev/loop93
com.eucalyptus.util.ExecutionException: ///usr/lib/eucalyptus/euca_rootwrap vgcreate vg-6E6tMQ.. /dev/loop93
        at edu.ucsb.eucalyptus.util.SystemUtil.run(SystemUtil.java:91)
        at edu.ucsb.eucalyptus.storage.LVM2Manager.createVolumeGroup(LVM2Manager.java:170)
        at edu.ucsb.eucalyptus.storage.LVM2Manager.createLogicalVolume(LVM2Manager.java:390)
        at edu.ucsb.eucalyptus.storage.LVM2Manager.createVolume(LVM2Manager.java:435)
        at edu.ucsb.eucalyptus.cloud.ws.BlockStorage$VolumeCreator.run(BlockStorage.java:789)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
12:32:54 ERROR [BlockStorage:pool-10-thread-4] com.eucalyptus.util.EucalyptusCloudException: Unable to create volume group vg-6E6tMQ.. for /dev/loop93

Revision history for this message
Chris Cheney (ccheney) wrote :

I manually ran:

# /usr/lib/eucalyptus/euca_rootwrap vgcreate vg-6E6tMQ.. /dev/loop93
Volume group "vg-6E6tMQ.." successfully created

So I am not sure what is going on.

Revision history for this message
Chris Cheney (ccheney) wrote :
Download full text (3.9 KiB)

Actually my volume removal errors seem to be different than the other bug at least in several of the cases I have looked at. They are also lvremove issues like the vgcreate ones.

---

12:53:57 DEBUG [SystemUtil:pool-10-thread-5] Running command: ///usr/lib/eucalyptus/euca_rootwrap lvremove -f /dev/vg-qJp0gg../lv-XL682g..
12:53:57 INFO [AbstractClusterMessageDispatcher:New I/O client worker #2-26] :1275933237619:GetKeysType:uid:eucalyptus:eucalyptus:MSG_SENT:[GetKeysType serviceTag=self correlationId=9b4bb56a-6eab-4031-9993-39eb7bc5b6c4 userId=null effectiveUserId=null _return=false statusMessage=null]:
12:53:57 INFO [AbstractClusterMessageDispatcher:New I/O client worker #2-27] :1275933237647:DescribeNetworksType:uid:eucalyptus:eucalyptus:MSG_SENT:[DescribeNetworksType nameserver=10.0.0.16 clusterControllers=[10.0.0.16] correlationId=707c04e9-6b93-480c-94cd-137256854560 userId=eucalyptus effectiveUserId=eucalyptus _return=false statusMessage=null]:
12:53:57 INFO [ClusterUtil:New I/O client worker #2-26] ---------------------------------------------------------------
12:53:57 INFO [ClusterUtil:New I/O client worker #2-26] -> [ cluster1 ] Cluster certificate valid=true
12:53:57 INFO [ClusterUtil:New I/O client worker #2-26] -> [ cluster1 ] Node certificate valid=true
12:53:57 INFO [ClusterUtil:New I/O client worker #2-26] ---------------------------------------------------------------
12:53:57 INFO [AbstractClusterMessageDispatcher:New I/O client worker #2-30] :1275933237661:DescribePublicAddressesType:uid:eucalyptus:eucalyptus:MSG_SENT:[DescribePublicAddressesType correlationId=dbbc8856-e385-4809-a800-a97c56779e63 userId=eucalyptus effectiveUserId=eucalyptus _return=false statusMessage=null]:
12:53:57 INFO [AbstractClusterMessageDispatcher:New I/O client worker #2-30] :1275933237708:DescribePublicAddressesResponseType:uid:eucalyptus:eucalyptus:MSG_SENT:[DescribePublicAddressesResponseType addresses=[10.0.0.32, 10.0.0.33, 10.0.0.34, 10.0.0.35, 10.0.0.36, 10.0.0.37, 10.0.0.38, 10.0.0.39, 10.0.0.40, 10.0.0.41, 10.0.0.42, 10.0.0.43, 10.0.0.44, 10.0.0.45, 10.0.0.46, 10.0.0.47, 10.0.0.48] mapping=[, , , , , , , , , , , , , , , , ] correlationId=dbbc8856-e385-4809-a800-a97c56779e63 userId=eucalyptus effectiveUserId=null _return=true statusMessage=null]:
12:53:57 DEBUG [SystemUtil:pool-10-thread-3] Running command: ///usr/lib/eucalyptus/euca_rootwrap pvremove /dev/loop55
12:53:57 INFO [AbstractClusterMessageDispatcher:New I/O client worker #2-28] :1275933237712:DescribeResourcesType:uid:eucalyptus:eucalyptus:MSG_SENT:[DescribeResourcesType instanceTypes=[VmTypeInfo{name='m1.small', memory=192, disk=2, cores=1}, VmTypeInfo{name='c1.medium', memory=256, disk=5, cores=1}, VmTypeInfo{name='m1.large', memory=512, disk=10, cores=2}, VmTypeInfo{name='m1.xlarge', memory=1024, disk=20, cores=2}, VmTypeInfo{name='c1.xlarge', memory=2048, disk=20, cores=4}] correlationId=d97ca222-d730-40d5-8a2f-7bb640da5d20 userId=eucalyptus effectiveUserId=eucalyptus _return=false statusMessage=null]:
12:53:57 ERROR [SystemUtil:pool-10-thread-5] com.eucalyptus.util.ExecutionException: ///usr/lib/eucalyptus/euca_rootwrap lvremove -f /dev/vg-qJp0gg../lv-XL682g..
com....

Read more...

Revision history for this message
Chris Cheney (ccheney) wrote :

So this bug regarding loop devices seems to be fixed but there is a new bug 590929 opened about the problems with creating/deleting volumes having problems with lvm.

Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Accepted into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in eucalyptus (Ubuntu Lucid):
status: Triaged → Fix Committed
tags: added: verification-needed
Revision history for this message
C de-Avillez (hggdh2) wrote :

Both Chris and I tested this. I run 2 loops:

* allocating and deallocating 200 devices;
* allocating and deallocating 513 devices.

Both worked as expected. We *did* see errors, but they are unrelated to this bug (see bug 590929). As such, I consider this bug fixed.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2-0ubuntu30.2

---------------
eucalyptus (1.6.2-0ubuntu30.2) lucid-proposed; urgency=low

  * Revert: node/handlers_kvm.c: fix console bug (was only showing first 64K),
    LP: #566793
  * clc/modules/www/src/main/java/edu/ucsb/eucalyptus/admin/server/EucalyptusWebBackendImpl.java:
    - fix user enumeration and account brute force, LP: #579942
  * debian/eucalyptus-sc.upstart: Bump maximum number of loop devices for
    SC to 512, LP: #586134

eucalyptus (1.6.2-0ubuntu30.1) lucid-proposed; urgency=low

  Address LP: #565101
  * debian/eucalyptus.conf: set default JVM_MEM option
  * debian/eucalyptus-common.eucalyptus.upstart: use $JVM_MEM
    from eucalyptus.conf, or default to 512m
  * tools/eucalyptus.conf.5: document the JVM_MEM option

  Cherry-pick upstream commit r1223..1227:
  * node/handlers.c, node/handlers_kvm.c: handle situation where NC's
    do not detach pthreads, LP: #567371
  * node/handlers_kvm.c: fix console bug (was only showing first 64K),
    LP: #566793
  * clc/modules/storage-common/src/main/java/edu/ucsb/eucalyptus/storage/StorageManager.java,
    clc/modules/storage-common/src/main/java/edu/ucsb/eucalyptus/storage/fs/FileSystemStorageManager.java,
    clc/modules/walrus/src/main/java/edu/ucsb/eucalyptus/cloud/ws/WalrusImageManager.java,
    clc/modules/walrus/src/main/java/edu/ucsb/eucalyptus/cloud/ws/WalrusManager.java,
    clc/modules/wsstack/src/main/java/com/eucalyptus/ws/handlers/ServiceSinkHandler.java:
    - fix Walrus OOM errors (java heap), LP: #565101
 -- Chris Cheney <email address hidden> Fri, 04 Jun 2010 00:39:00 -0500

Changed in eucalyptus (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Please upload the lucid fix to maverick as soon as possible (SRU policy). Bumping priority.

Changed in eucalyptus (Ubuntu Maverick):
milestone: lucid-updates → maverick-alpha-2
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2+bzr1230-0ubuntu1

---------------
eucalyptus (1.6.2+bzr1230-0ubuntu1) maverick; urgency=low

  [ Colin Watson ]
  * debian/eucalyptus-cloud.eucalyptus-cloud-publication.upstart: Only start
    after avahi-daemon has started.

  [ Dave Walker (Daviey) ]
  * Merge upstream branch, 1.6.2 (r1230)
  * Switch to dpkg-source 3.0 (quilt) format
    - Extracted the following patches from our bzr branch, into flat patches.
  * debian/build-jars: Replaced asm2 with asm3-all to match new groovy dependency.
  * clc/modules/www/src/main/java/edu/ucsb/eucalyptus/admin/server/EucalyptusWebBackendImpl.java:
    - fix user enumeration and account brute force. Courtesy of Chris Cheney. (LP: #579942)
  * debian/eucalyptus-sc.upstart: Bump maximum number of loop devices for SC to 512. (LP: #586134)
 -- Dave Walker (Daviey) <email address hidden> Mon, 14 Jun 2010 13:48:17 +0100

Changed in eucalyptus (Ubuntu Maverick):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.