lvm snapshot on top of md raid 1 causes nearly 100% cpu usage

Bug #113713 reported by Martin Pool on 2007-05-10
Affects Status Importance Assigned to Milestone

Bug Description

I have a machine with two sata disks, with partitions formed into a single raid1 device:

limpid# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda5[0] sdc5[1]
      488191104 blocks [2/2] [UU]

unused devices: <none>

This is a physical volume for an lvm group:
limpid# pvs
  PV VG Fmt Attr PSize PFree
  /dev/md0 vg_main lvm2 a- 465.57G 160.81G
  /dev/sdb5 vg2 lvm2 a- 232.41G 232.41G
limpid# vgs
  VG #PV #LV #SN Attr VSize VFree
  vg2 1 0 0 wz--n- 232.41G 232.41G
  vg_main 1 4 0 wz--n- 465.57G 160.81G

I created snapshot volumes as backups of the root and home partitions:

limpid# lvs
  LV VG Attr LSize Origin Snap% Move Log Copy%
  feisty_20070509 vg_main swi-a- 5.00G feisty_root 2.08
  feisty_amd64 vg_main -wi-a- 10.00G
  feisty_root vg_main owi-ao 35.00G
  home vg_main owi-ao 250.00G
  home_20070509 vg_main swi-a- 10.00G home 65.43
  swap vg_main -wi-ao 9.77G

When these snapshots are active, nearly 100% of cpu time is used in the kernel. It's not obvious what is using that time - top and htop don't show any process as being in D state much of the time, or any kernel thread using a lot of cpu. This happens even when the machine is nearly quiescent, although doing disk IO makes it worse.

I shut down and rebooted and the problem recurred.

Removing one snapshot means that only one of the two cores is 100% busy. Removing both snapshots gets things back to normal. I can create a new snapshot and the cpu stays idle.

Martin Pool (mbp) wrote :

OK, I rebooted with only the snapshot of /home present and there's no apparent problem.

I should have mentioned before that when both cores were at 100% the machine was nearly unusable - several seconds delay to run a command like 'lvs'.

Martin Pool (mbp) wrote :

It looks like things are OK as long as the machine is fairly idle, but once I do some substantial IO it gets into this state and can't get out, even long after the work has finished.

Martin Pool (mbp) wrote :

After some time (10s of seconds) it does go back to idleness, but it still seems unreasonably long and disruptive.

Martin Pool (mbp) wrote :

A similar setup on a different machine with the same kernel but with the pv on a physical partition rather than an md array doesn't show this problem.

Juerd (ubuntu-juerd) wrote :

Just leaving a note that I'm currently experiencing exactly the same on a non-Ubuntu Debian sid system with a 2.6.18-xen kernel.

md3 : active raid1 sda6[0] sdc6[2] sdb6[1]
      151091200 blocks [3/3] [UUU]
  PV VG Fmt Attr PSize PFree
  /dev/md3 main lvm2 a- 144.09G 35.09G
  VG #PV #LV #SN Attr VSize VFree
  main 1 10 0 wz--n- 144.09G 35.09G

Happens with any snapshot, as soon as I start using it. Does not happen directly after snapshot creation. Snapshot read speed is around 2 MB/s.

wangyb (yibin-wang) wrote :

I got similar problem with LVM on top of RAID1
OS: Ubuntu Hardy
kernel: 2.6.24-16-generic
RAID1: md3 on 2 sata partitions - /dev/sda6 /dev/sdb6
LVM: using md3 as PV
It's ok with lvm original volume, but the I/O speed become extremely slow on snapshots. a single 'rm file' operation take dozens of seconds!!!
strace show shat it takes very long time at
unlinkat(AT_FDCWD, "$filename", 0.......

Kees Cook (kees) wrote :

For the record, my entire system is LVM-on-MD-RAID1. While I some times seem some level of slow-down, it's never to the degree you're seeing. However, I don't have long-running snapshots. They're always temporary, and they're of unmounted filesystems. Snapshots are, by nature, pretty slow. When you create them, any deltas between the origin and the snapshot need to be stored in the COW file for the snapshot, so doing operations that make lots of changes to the origin partition (like, say "find" which touches the atime of ALL the inodes in the find) can send your COW file spinning for a while. There is a lot of IO done to manage a long-term snapshot, which is why they're not really recommended. They're good for snapshotting for backups, and then releasing them, though.

Check your IO levels with "iostat 5" (and ignore the first report, as that's just a running average). That may show the disk activity. You can map dm-* to names via /dev/mapper entries and their minor number. i.e. dm-19 is 254, 19 in /dev/mapper.

Did prior releases of the kernel behave better?

I know there's the potential for snapshots causing a lot of IO, but I
think what was happening here was beyond that. I had noatime on, and
was not running even any readonly disk access. Also, I am pretty sure
that the disk access light was off, with the cpu at full utilization.

Other IO intensive operations such as updatedb or a raid1 check don't
pin the cpu and don't make the machine unresponsive.

> Did prior releases of the kernel behave better?

I've never had an Ubuntu kernel where this worked well, but I haven't
tried every release, and obviously am not using it at the moment. I'm
willing to retry it sometime.

Martin <>

Martin Pool (mbp) wrote :

fwiw I've tested this again on the same machine with current hardy (2.6.24-19-generic) and am not seeing the problem.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers