lvm snapshot on top of md raid 1 causes nearly 100% cpu usage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I have a machine with two sata disks, with partitions formed into a single raid1 device:
limpid# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda5[0] sdc5[1]
488191104 blocks [2/2] [UU]
unused devices: <none>
This is a physical volume for an lvm group:
limpid# pvs
PV VG Fmt Attr PSize PFree
/dev/md0 vg_main lvm2 a- 465.57G 160.81G
/dev/sdb5 vg2 lvm2 a- 232.41G 232.41G
limpid# vgs
VG #PV #LV #SN Attr VSize VFree
vg2 1 0 0 wz--n- 232.41G 232.41G
vg_main 1 4 0 wz--n- 465.57G 160.81G
I created snapshot volumes as backups of the root and home partitions:
limpid# lvs
LV VG Attr LSize Origin Snap% Move Log Copy%
feisty_20070509 vg_main swi-a- 5.00G feisty_root 2.08
feisty_amd64 vg_main -wi-a- 10.00G
feisty_root vg_main owi-ao 35.00G
home vg_main owi-ao 250.00G
home_20070509 vg_main swi-a- 10.00G home 65.43
swap vg_main -wi-ao 9.77G
When these snapshots are active, nearly 100% of cpu time is used in the kernel. It's not obvious what is using that time - top and htop don't show any process as being in D state much of the time, or any kernel thread using a lot of cpu. This happens even when the machine is nearly quiescent, although doing disk IO makes it worse.
I shut down and rebooted and the problem recurred.
Removing one snapshot means that only one of the two cores is 100% busy. Removing both snapshots gets things back to normal. I can create a new snapshot and the cpu stays idle.
OK, I rebooted with only the snapshot of /home present and there's no apparent problem.
I should have mentioned before that when both cores were at 100% the machine was nearly unusable - several seconds delay to run a command like 'lvs'.