Activity log for bug #1326870

Date Who What changed Old value New value Message
2014-06-05 16:15:11 Robert C Jennings bug added bug
2014-06-05 16:16:05 Robert C Jennings bug added subscriber Thor Nolen
2014-06-05 16:16:12 Robert C Jennings bug added subscriber Stefan Bader
2014-06-05 16:25:05 Launchpad Janitor linux-ec2 (Ubuntu): status New Confirmed
2014-06-11 08:46:45 Stefan Bader linux-ec2 (Ubuntu): assignee Stefan Bader (smb)
2014-06-11 08:46:55 Stefan Bader nominated for series Ubuntu Lucid
2014-06-11 08:46:55 Stefan Bader bug task added linux-ec2 (Ubuntu Lucid)
2014-06-11 08:47:05 Stefan Bader linux-ec2 (Ubuntu Lucid): assignee Stefan Bader (smb)
2014-06-11 08:47:11 Stefan Bader linux-ec2 (Ubuntu): assignee Stefan Bader (smb)
2014-06-11 08:47:17 Stefan Bader linux-ec2 (Ubuntu Lucid): importance Undecided Medium
2014-06-11 09:37:35 Stefan Bader linux-ec2 (Ubuntu Lucid): status New Confirmed
2014-06-11 09:37:56 Stefan Bader linux-ec2 (Ubuntu): status Confirmed Fix Released
2014-06-11 11:43:08 Stefan Bader description [Impact] * If a user detaches a volume before unmount a race is hit (kernel stuck detaching the volume) and new volumes are not recognized * Stefan Bader suggested the following patch set to resolve the issue: * 0e34582699392d67910bd3919bc8fd9bedce115e blkfront: fixes for 'xm block-detach ... --force' * 5d7ed20e822ef82117a4d9928b030fa0247b789d blkfront: don't access freed struct xenbus_device * a66b5aebb7dc9e695dcb4b528906fd398b63f3d9 blkfront: Clean up vbd release * b70f5fa043b318659c936d8c3c696250e6528944 blkfront: Lock blkfront_info when closing [Test Case] The was originally seen with AMI ami-bffa6fd6[0] doing the following: 1. Launch an instance. 2. Attach a new volume to the instance using the API. 3. Mount the volume on the instance. 4. Detach the volume using the API. 5. Wait a few seconds (30 seconds? 60 seconds?). 6. Unmount the volume on the instance. 7. Wait for volume to become available. 8. Delete the volume once it is available and go to step 2. With about 135 iterations of these steps this problem can be reproduced. [0] That AMI is "lucid server release 20130124 instance-store amd64 us-east-1 ami-bffa6fd6 aki-88aa75e1 paravirtual" bffa6fd6 In the ubuntu instance with a self-compiled 2.6.32 kernel with these patches applied the behavior of the kernel is as expected even with the user error. [Regression Potential] * tbd, looking to see if Stefan Bader can help here. [Other Info] Root cause of the problem are: 1. User error: The user first does force detach then unmount. Correct usage is: first unmount then detach. 2. This 2.6.32 kernel has a race bug in the blkfront driver. When the race bug is hit then the instance kernel is stuck in the detaching code and hence does not recognize the new attached volume. In the ubuntu instance with a self-compiled 2.6.32 kernel with these patches applied the behavior of the kernel is as expected even with the user error. $ lsb_release -rd Description: Ubuntu 10.04.4 LTS Release: 10.04 $ apt-cache policy linux-ec2 linux-ec2: Installed: 2.6.32.350.31 Candidate: 2.6.32.364.45 Version table: 2.6.32.364.45 0 500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages 500 http://security.ubuntu.com/ubuntu/ lucid-security/main Packages *** 2.6.32.350.31 0 100 /var/lib/dpkg/status 2.6.32.305.6 0 500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ lucid/main Packages [Impact]  * If a user detaches a volume before unmount a race is hit (kernel stuck detaching the volume) and new volumes are not recognized  * Stefan Bader suggested the following patch set to resolve the issue:    * 0e34582699392d67910bd3919bc8fd9bedce115e      blkfront: fixes for 'xm block-detach ... --force'    * 5d7ed20e822ef82117a4d9928b030fa0247b789d       blkfront: don't access freed struct xenbus_device    * a66b5aebb7dc9e695dcb4b528906fd398b63f3d9       blkfront: Clean up vbd release    * b70f5fa043b318659c936d8c3c696250e6528944       blkfront: Lock blkfront_info when closing [Test Case] The was originally seen with AMI ami-bffa6fd6[0] doing the following: 1. Launch an instance. 2. Attach a new volume to the instance using the API. 3. Mount the volume on the instance. 4. Detach the volume using the API. 5. Wait a few seconds (30 seconds? 60 seconds?). 6. Unmount the volume on the instance. 7. Wait for volume to become available. 8. Delete the volume once it is available and go to step 2. With about 135 iterations of these steps this problem can be reproduced. [0] That AMI is "lucid server release 20130124 instance-store amd64 us-east-1 ami-bffa6fd6 aki-88aa75e1 paravirtual" bffa6fd6 In the ubuntu instance with a self-compiled 2.6.32 kernel with these patches applied the behavior of the kernel is as expected even with the user error. [Regression Potential]  * Since EC2 images in Lucid are based on a separate branch, we can rule out regressions on the generic/server images. * Code changes are limited to the xen-blkfront driver and to hot-adding/-removing disk images. Only Xen guests using PV disks can be affected. * So there is potential for introducing new bugs into the process but then should be detected while testing for verification. * I would consider the risk of regressions as low. [Other Info] Root cause of the problem are:  1. User error: The user first does force detach then unmount.      Correct usage is: first unmount then detach.  2. This 2.6.32 kernel has a race bug in the blkfront driver. When the race bug is hit then the instance kernel is stuck in the detaching code and hence does not recognize the new attached volume. In the ubuntu instance with a self-compiled 2.6.32 kernel with these patches applied the behavior of the kernel is as expected even with the user error. $ lsb_release -rd Description: Ubuntu 10.04.4 LTS Release: 10.04 $ apt-cache policy linux-ec2 linux-ec2:   Installed: 2.6.32.350.31   Candidate: 2.6.32.364.45   Version table:      2.6.32.364.45 0         500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages         500 http://security.ubuntu.com/ubuntu/ lucid-security/main Packages  *** 2.6.32.350.31 0         100 /var/lib/dpkg/status      2.6.32.305.6 0         500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ lucid/main Packages
2014-06-11 12:57:51 Stefan Bader linux-ec2 (Ubuntu Lucid): status Confirmed Fix Committed
2014-06-19 16:38:49 Launchpad Janitor linux-ec2 (Ubuntu Lucid): status Fix Committed Fix Released
2014-06-19 16:38:49 Launchpad Janitor cve linked 2014-3145
2014-06-19 16:38:49 Launchpad Janitor cve linked 2014-3153