Laggy & Finally Completely Frozen 12.04.2 on Large cp or rsync

Bug #1217229 reported by DiagonalArg
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

I have two pairs of raided disks. I am trying to copy 1/2 T of data from one to the other. First I tried "cp /from /to". It seemed to behave ok, so I left it overnight. When I came back in the AM, the machine was completely frozen. I had to use the power switch. Only about 150GB had been copied. I tried again with "nice -n 10 cp ..." and the same thing happened. Then I discovered ionice and shifted to rsync. So I used "ionize -n 7 rsync -cva /from /to". I also turned off the sleep/lock for the display. Again it seemed to behave ok, so I left it while I went for dinner. Two-three hours later, I found that all windows were frozen. I couldn't use REISUB, the reset switch _or_ the power switch. I had to power off using the switch on the power supply. Rebooting, I found that about 300G had been copied by this point. I am trying for the 4th time now, again using rsync. iotop is running. It regularly freezes between rsync's copying one file and the next. If I try to shift to a window or type at the terminal, there is a freeze until the file being copied by rsync is finished. If I keep typing at the terminal (or in the text-box in Firefox, as I am now) then things become smooth until I stop for a bit. Then the freezing begins to happen again. (That's not completely true. Tab completion at the terminal is a disaster. Cut & paste in this window is the same.) It may be my fantasy, but it seems that if I keep using the machine, getting it to respond to keyboard/mouse intput, then it doesn't permanently freeze up. It's when I leave to it's own devices for some time that I have a problem.

The only difference between the last try and the first 3, is that after having a look at dmesg, I enabled IOMMU in the BIOS. Probably irrelevant, but I did also notice the "spurious 8259A interrupt: IRQ7" message in dmesg that has been reported in bug #75626.

I haven't changed anyting with the I/O scheduler, which has been reported as causing I/O problems in bug# 427210 and #131094. So, it still reads: "noop [deadline] cfq"

Tyan S3970 with 2 CPU's (8 cores).
Two pairs of WD drives (500G & 2T)
TARO add-on storage card (used by the 2T drives)
Ubuntu 12.04.3 fully updated.
Essentially new install (During this process I only added iotop, atop, htop, acpi, m5deep & emacs)
500G drives are RAID-1; 2T drives are RAID-1 & encrypted.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.5.0-39-generic 3.5.0-39.60~precise1
ProcVersionSignature: Ubuntu 3.5.0-39.60~precise1-generic 3.5.7.17
Uname: Linux 3.5.0-39-generic x86_64
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.25.
ApportVersion: 2.0.1-0ubuntu17.4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: dev 2178 F.... pulseaudio
 /dev/snd/controlC1: dev 2178 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'NVidia'/'HDA NVidia at 0xfe77c000 irq 17'
   Mixer name : 'Nvidia GPU 0b HDMI/DP'
   Components : 'HDA:10de000b,10de0101,00100200'
   Controls : 24
   Simple ctrls : 4
Card1.Amixer.info:
 Card hw:1 'Audigy2'/'SB Audigy 2 ZS [SB0353] (rev.4, serial:0x10031102) at 0xd880, irq 29'
   Mixer name : 'SigmaTel STAC9721,23'
   Components : 'AC97a:83847609'
   Controls : 210
   Simple ctrls : 46
Date: Tue Aug 27 19:30:21 2013
HibernationDevice: RESUME=UUID=f10a5427-b0f6-4042-b58b-7241771b6a9e
InstallationMedia: Ubuntu 12.04.2 LTS "Precise Pangolin" - Release amd64 (20130214)
IwConfig:
 eth0 no wireless extensions.

 eth1 no wireless extensions.

 lo no wireless extensions.
MachineType: empty empty
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-39-generic root=/username/mapper/md2_crypt ro quiet splash
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-39-generic N/A
 linux-backports-modules-3.5.0-39-generic N/A
 linux-firmware 1.79.6
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/20/2008
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 'V2.05 '
dmi.board.asset.tag: empty
dmi.board.name: S3970
dmi.board.vendor: TYAN Computer Corporation
dmi.board.version: empty
dmi.chassis.asset.tag: empty
dmi.chassis.type: 3
dmi.chassis.vendor: empty
dmi.chassis.version: empty
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr'V2.05':bd11/20/2008:svnempty:pnempty:pvrempty:rvnTYANComputerCorporation:rnS3970:rvrempty:cvnempty:ct3:cvrempty:
dmi.product.name: empty
dmi.product.version: empty
dmi.sys.vendor: empty
---
ApportVersion: 2.12.1-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 3117 F.... pulseaudio
                      ubuntu 5017 F.... pulseaudio
 /dev/snd/controlC1: ubuntu 5017 F.... pulseaudio
CasperVersion: 1.336
DistroRelease: Ubuntu 13.10
IwConfig:
 eth0 no wireless extensions.

 eth1 no wireless extensions.

 lo no wireless extensions.
LiveMediaBuild: Ubuntu 13.10 "Saucy Salamander" - Alpha amd64 (20130827)
MachineType: empty empty
MarkForUpload: True
Package: linux (not installed)
ProcFB: 0 nouveaufb
ProcKernelCmdLine: file=/cdrom/preseed/username.seed boot=casper initrd=/casper/initrd.lz quiet splash -- maybe-ubiquity
ProcVersionSignature: Ubuntu 3.11.0-4.9-generic 3.11.0-rc7
RelatedPackageVersions:
 linux-restricted-modules-3.11.0-4-generic N/A
 linux-backports-modules-3.11.0-4-generic N/A
 linux-firmware 1.113
RfKill:

Tags: saucy
Uname: Linux 3.11.0-4-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
dmi.bios.date: 11/20/2008
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 'V2.05 '
dmi.board.asset.tag: empty
dmi.board.name: S3970
dmi.board.vendor: TYAN Computer Corporation
dmi.board.version: empty
dmi.chassis.asset.tag: empty
dmi.chassis.type: 3
dmi.chassis.vendor: empty
dmi.chassis.version: empty
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr'V2.05':bd11/20/2008:svnempty:pnempty:pvrempty:rvnTYANComputerCorporation:rnS3970:rvrempty:cvnempty:ct3:cvrempty:
dmi.product.name: empty
dmi.product.version: empty
dmi.sys.vendor: empty

Revision history for this message
DiagonalArg (diagonalarg) wrote :
Revision history for this message
DiagonalArg (diagonalarg) wrote :

Ok, on this 4th go-round, while using firefox to look at this bug report, it killed all processes & logged me out. I was able to log back in, and found that I have only succeeded in copying 375G.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
penalvch (penalvch)
tags: added: needs-crash-log regression-potential
removed: freezing heavy io
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
DiagonalArg (diagonalarg) wrote :

During the copy of a large file, iotop starts producing:

TID PRIO USER ... IO> COMMAND
446 be/3 root ... 99.99 % [jbd2/dm-0-8]

and

337 be/4 root ... 99.99% [md2_raid1]

dm-0 is the encryption of md2, which is the disk I'm copying to and on which the system is located.

Also, while a large file is being copied and things freeze up - even if for 60-90 seconds - the screen starts to fade to half dark. During that time I can select windows but I can't type.

(I'l have to confirm that it's error-free, but I think I've got this 1/2 T. Unfortunately, I've got another 1/2 T and I'm none too confident about the stability of this system.)

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
DiagonalArg (diagonalarg) wrote :

I am still working on this, but may be struggling in addition, with another bug. This new bug seems to have been noticed in other contexts, so I have reported it here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1011914/comments/21

In basic outline, I can boot my RAID/LUKS pair of disks, but when I also attach the RAID pair of data disks, I get dropped into busybox. A Control-D produces stuff that looks like:

BUG: soft lockup - CUP#7 stuck for 23s! [kworker/7:0:469]
BUG: soft lockup - CUP#7 stuck for 22s! [kworker/7:0:469]
INFO: rcu_sched self-detected stall on CUP { 7} (t=15000 jiffies)
INFO: rcu_sched detected stalls on CPUs/tasks; { 7} (detected by 2, t=15004 jiffies)
INFO: Stall ended before state dump start

More info at the above link. I will also get your information from a recent nightly, but right now I'm struggling with 3 failing machines & 4 bugs.

I'l mention that as I'm working on this, have upgraded the BIOS on the TARO SATA controller so that it along with the system BIOS are the most recent.

Revision history for this message
DiagonalArg (diagonalarg) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected saucy
description: updated
Revision history for this message
DiagonalArg (diagonalarg) wrote : BootDmesg.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : CRDA.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : CurrentDmesg.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : Lspci.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : Lsusb.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : ProcEnviron.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : ProcInterrupts.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : ProcModules.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : PulseList.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : UdevDb.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : UdevLog.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote : WifiSyslog.txt

apport information

Revision history for this message
DiagonalArg (diagonalarg) wrote :

I was trying to copy my data again, to see if it would freeze while copying, but I was finding the saucy nightly so buggy that after a bit of work, all the windows would freeze (though the mouse would continue to respond).

I'll look into what I can do with the mainline kernel, though I clearly won't be able to introduce it into saucy.

penalvch (penalvch)
tags: added: latest-bios-unknown
penalvch (penalvch)
tags: added: needs-upstream-testing
Revision history for this message
DiagonalArg (diagonalarg) wrote :

Update:

With the newest BIOS installed in this adaptec "TARO" SATA card, 12.04 is improved to the point where "ionice -n 7 rsync <from> <to>" behaves properly. On the other hand, "diff -qr <from> <to>" (no ionice) crashes _hard_. (Had to turn off machine at the power supply, and couldn't get the machine to post on reboot without removing all drives and then reattaching.)

I have also run mtest86+ for 24 hours with no sign of bad RAM.

This machine does not have access to the internet right now. Until it does, I will not be able to add more.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
penalvch (penalvch) wrote :

DiagonalArg, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Expired → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.