natty beta-1, beta-2 hangs during copy from usb disk and from ftp

Bug #758384 reported by Chris Hermansen
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libatasmart (Ubuntu)
Invalid
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Natty beta-2 updated to today 21:00 Pacific daylight time
Toshiba Satellite A-70
Dell Inspiron 1501

When copying Music directory from USB hard drive to internal disk, after about 1-3 Gb copied, system hangs. Specifically, screen is frozen, can't move mouse, can't CTRL-ALT-F1 / F2 etc to open a console, brief push on power switch doesn't bring up shutdown dialog, etc.

Nothing obvious in syslog after hard reset / boot. Here is most recent syslog on Toshiba from just before to just after hard reset / boot.

Same copy works fine on desktop machine running 10.10.

Apr 11 19:40:46 madrid avahi-daemon[702]: Registering new address record for fe80::211:f5ff:fe5f:f404 on wlan0.*.
Apr 11 19:40:55 madrid kernel: [ 96.528028] wlan0: no IPv6 routers present
Apr 11 19:40:59 madrid ntpdate[1571]: adjust time server 91.189.94.4 offset -0.103614 sec
Apr 11 20:17:01 madrid CRON[1752]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Apr 11 20:55:03 madrid kernel: [ 4544.336058] usb 1-3: new high speed USB device using ehci_hcd and address 2
Apr 11 20:55:03 madrid kernel: [ 4544.571022] Initializing USB Mass Storage driver...
Apr 11 20:55:03 madrid kernel: [ 4544.571423] scsi2 : usb-storage 1-3:1.0
Apr 11 20:55:03 madrid kernel: [ 4544.573146] usbcore: registered new interface driver usb-storage
Apr 11 20:55:03 madrid kernel: [ 4544.573158] USB Mass Storage support registered.
Apr 11 20:55:04 madrid kernel: [ 4545.573094] scsi 2:0:0:0: Direct-Access SAMSUNG HM120JC YL10 PQ: 0 ANSI: 0 CCS
Apr 11 20:55:04 madrid kernel: [ 4545.575169] sd 2:0:0:0: Attached scsi generic sg2 type 0
Apr 11 20:55:04 madrid kernel: [ 4545.577326] sd 2:0:0:0: [sdb] 234441648 512-byte logical blocks: (120 GB/111 GiB)
Apr 11 20:55:04 madrid kernel: [ 4545.577937] sd 2:0:0:0: [sdb] Write Protect is off
Apr 11 20:55:04 madrid kernel: [ 4545.577950] sd 2:0:0:0: [sdb] Mode Sense: 00 14 00 00
Apr 11 20:55:04 madrid kernel: [ 4545.578916] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 11 20:55:04 madrid kernel: [ 4545.633693] sdb: sdb1 sdb2 < sdb5 >
Apr 11 20:55:04 madrid kernel: [ 4545.636815] sd 2:0:0:0: [sdb] Attached SCSI disk
Apr 11 20:55:05 madrid kernel: [ 4546.259875] EXT4-fs (sdb1): warning: maximal mount count reached, running e2fsck is recommended
Apr 11 20:55:05 madrid kernel: [ 4546.261456] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
Apr 11 21:08:12 madrid kernel: imklog 4.6.4, log source = /proc/kmsg started.
Apr 11 21:08:12 madrid rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="460" x-info="http://www.rsyslog.com"] (re)start
Apr 11 21:08:12 madrid rsyslogd: rsyslogd's groupid changed to 103
Apr 11 21:08:12 madrid rsyslogd: rsyslogd's userid changed to 101
Apr 11 21:08:12 madrid rsyslogd-2039: Could no open output pipe '/dev/xconsole' [try http://www.rsyslog.com/e/2039 ]
Apr 11 21:08:12 madrid kernel: [ 0.000000] Initializing cgroup subsys cpuset
Apr 11 21:08:12 madrid kernel: [ 0.000000] Initializing cgroup subsys cpu

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

I see a few other bugs talking about system hangs during heavy I/O involving USB drives (e.g. 755066) but none seems specifically related to this.

These two machines are set up for testing right now so I am free to try different things.

The source hard drive is the former system disk with Ubuntu 10.10 on it for the Toshiba, in an external USB enclosure.

I should mention that I've tried several cables.

I should also mention that the external USB enclosure has a green LED indicating drive ready and a read LED indicating I/O. When the system hangs, the red LED goes out but the green remains on.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Sorry one other thing!

The note in syslog re running e2fsck on the external drive - I'm not too anxious to try this in case things hang up in the middle of the fsck...

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

I have now tried copying the Music directory by FTP. Eventually I start having I/O errors which are logged in syslog.

Here is the log info. All is well after boot, up to the point where the apt daemon shuts down. Then after the FTP is up and running for awhile, there is an "ata2: lost interrupt (Status 0x58)". The system never fully recovers but eventually I manage to get things shut down.

This is a very serious bug for me - a complete show-stopper for using natty... please help!

Apr 12 18:46:56 madrid NetworkManager[538]: <info> Policy set 'Auto 2WIRE061' (wlan0) as default for IPv4 routing and DNS.
Apr 12 18:46:56 madrid NetworkManager[538]: <info> Activation (wlan0) successful, device activated.
Apr 12 18:46:56 madrid NetworkManager[538]: <info> Activation (wlan0) Stage 5 of 5 (IP Configure Commit) complete.
Apr 12 18:47:05 madrid ntpdate[11278]: adjust time server 91.189.94.4 offset 0.005561 sec
Apr 12 18:51:15 madrid AptDaemon: INFO: Quiting due to inactivity
Apr 12 18:51:15 madrid AptDaemon: INFO: Shutdown was requested
Apr 12 19:10:40 madrid kernel: [ 1948.064036] ata2: lost interrupt (Status 0x58)
Apr 12 19:10:40 madrid kernel: [ 1948.100889] ata2: drained 65536 bytes to clear DRQ.
Apr 12 19:10:40 madrid kernel: [ 1948.120189] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 12 19:10:40 madrid kernel: [ 1948.120201] sr 1:0:0:0: CDB: Get event status notification: 4a 01 00 00 10 00 00 00 08 00
Apr 12 19:10:40 madrid kernel: [ 1948.120229] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Apr 12 19:10:40 madrid kernel: [ 1948.120231] res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Apr 12 19:10:40 madrid kernel: [ 1948.120237] ata2.00: status: { DRDY }
Apr 12 19:10:40 madrid kernel: [ 1948.120266] ata2: soft resetting link
Apr 12 19:10:40 madrid kernel: [ 1948.300413] ata2.00: configured for UDMA/33
Apr 12 19:10:40 madrid kernel: [ 1948.301245] ata2: EH complete

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Looking slightly further in syslog, after a number of repetitions of the above messages, I see the following:

Apr 12 19:45:53 madrid kernel: [ 4080.932059] INFO: task jbd2/sda1-8:203 blocked for more than 120 seconds.
Apr 12 19:45:53 madrid kernel: [ 4080.932072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 12 19:45:53 madrid kernel: [ 4080.932081] jbd2/sda1-8 D f6c9c000 0 203 2 0x00000000
Apr 12 19:45:53 madrid kernel: [ 4080.932099] f6ca9dc8 00000046 00000000 f6c9c000 f307c740 f6c9c000 f37f742c c183a8c0
Apr 12 19:45:53 madrid kernel: [ 4080.932123] 0dfc8bab 00000385 f37f7428 c183a8c0 c183a8c0 f5e068c0 f37f71a0 f50b0000
Apr 12 19:45:53 madrid kernel: [ 4080.932146] f6ca9d84 00000296 f37f71a0 f6ca9db8 c10b6786 f6df2238 f6ca9dc0 c1078088
Apr 12 19:45:53 madrid kernel: [ 4080.932169] Call Trace:
Apr 12 19:45:53 madrid kernel: [ 4080.932196] [<c10b6786>] ? delayacct_end+0x96/0xb0
Apr 12 19:45:53 madrid kernel: [ 4080.932214] [<c1078088>] ? ktime_get_ts+0xf8/0x120
Apr 12 19:45:53 madrid kernel: [ 4080.932231] [<c1507a4f>] io_schedule+0x5f/0xa0
Apr 12 19:45:53 madrid kernel: [ 4080.932243] [<c10e1d3a>] sync_page+0x3a/0x50
Apr 12 19:45:53 madrid kernel: [ 4080.932255] [<c150824d>] __wait_on_bit+0x4d/0x70
Apr 12 19:45:53 madrid kernel: [ 4080.932266] [<c10e1d00>] ? sync_page+0x0/0x50
Apr 12 19:45:53 madrid kernel: [ 4080.932277] [<c10e1ee5>] wait_on_page_bit+0x85/0x90
Apr 12 19:45:53 madrid kernel: [ 4080.932291] [<c106d3c0>] ? wake_bit_function+0x0/0x60
Apr 12 19:45:53 madrid kernel: [ 4080.932303] [<c10e1fcb>] filemap_fdatawait_range+0xdb/0x160
Apr 12 19:45:53 madrid kernel: [ 4080.932318] [<c125cc9e>] ? submit_bio+0x6e/0x100
Apr 12 19:45:53 madrid kernel: [ 4080.932333] [<c11ea70c>] ? jbd2_journal_file_buffer+0x4c/0x80
Apr 12 19:45:53 madrid kernel: [ 4080.932346] [<c10e2f97>] filemap_fdatawait+0x57/0x70
Apr 12 19:45:53 madrid kernel: [ 4080.932358] [<c11eadc7>] journal_finish_inode_data_buffers+0x57/0x130
Apr 12 19:45:53 madrid kernel: [ 4080.932371] [<c11eb2f2>] jbd2_journal_commit_transaction+0x452/0xf30
Apr 12 19:45:53 madrid kernel: [ 4080.932388] [<c102d8c8>] ? default_spin_lock_flags+0x8/0x10
Apr 12 19:45:53 madrid kernel: [ 4080.932402] [<c105dc07>] ? try_to_del_timer_sync+0x67/0xb0
Apr 12 19:45:53 madrid kernel: [ 4080.932417] [<c11ef8fe>] kjournald2+0x8e/0x1c0
Apr 12 19:45:53 madrid kernel: [ 4080.932429] [<c106d370>] ? autoremove_wake_function+0x0/0x50
Apr 12 19:45:53 madrid kernel: [ 4080.932442] [<c11ef870>] ? kjournald2+0x0/0x1c0
Apr 12 19:45:53 madrid kernel: [ 4080.932454] [<c106ce04>] kthread+0x74/0x80
Apr 12 19:45:53 madrid kernel: [ 4080.932465] [<c106cd90>] ? kthread+0x0/0x80
Apr 12 19:45:53 madrid kernel: [ 4080.932478] [<c100367e>] kernel_thread_helper+0x6/0x10

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Please help!!!

In Beta-2, my copy from USB drive still hangs partway through.

I will re-try with FTP tonight...

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

I no longer believe this to be related to the USB or FTP.

I made several experiments last night and all ended in the system hanging up and needing a hard shutdown:

· using FTP to copy files over wireless;
· using FTP to copy files over wired network;
· using NOAPIC on the boot flags and using FTP to copy files over the wired network;
· using cp -rvp and watching the files come across

In the end, what the problem starts to feel like is, when a fair bit of stuff (more than a Gb or so) is copied over, the system becomes unstable. That is, the system sometimes hangs AFTER the copy appears to be complete.

One thing that has changed in Beta 2 is that I no longer see messages of any kind in syslog related to this problem.

Anyway, for me this is a complete show-stopper with regard to using Natty, since I have no way of copying my data from Maverick days over - neither a USB drive nor FTP.

Please help!

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

A few more tests, working on my Dell Inspiron 1501 now.

Basically the problem continues. I tried using gftp and the system hung after transferring about 1-2Gb of files.

Thinking it might be a window manager thing, I used alt-F2 to get a console, then tried the transfer with wget. I got a lot of files, probably 5Gb, but eventually the wget froze midway through a transfer.

No messages in syslog in either case.

Hard reset only way out.

Remains a show-stopper, on both computers.

summary: - system hangs during copy from usb disk
+ system hangs during copy from usb disk and from ftp
summary: - system hangs during copy from usb disk and from ftp
+ natty beta-1, beta-2 hangs during copy from usb disk and from ftp
Revision history for this message
Chris Hermansen (c-hermansen) wrote :

This seems to be the package involved but I can't be 100% certain because the system hangs....

affects: ubuntu → libatasmart (Ubuntu)
Revision history for this message
Chris Hermansen (c-hermansen) wrote :

I have run GSmartControl on the hard drive in my Toshiba.

It passes the basic health check.

When I check the attributes tab, I see none of the attributes have failed. All show "old age" as their "type" with the exception of three pre-failures: Raw Read Error Rate, Spin-up Time and Reallocated Sector Count.

The error log tab shows 83 errors. The last five are all "interface CRC error, command aborted", except for one which is "[unknown type]".

So today I will get a new hard drive and see if that helps my problem. More later...

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Brand new hard drive in my Toshiba. Installed Natty Beta-2 and all updates to 23 April.

Used wget to copy over my Music directory.

What happens: every few GB, the system hangs for 1-5 minutes; the wget activity pauses in mid-transfer, and the screen and keyboard are unresponsive. The fans are running at a moderate speed. Eventually, the fans slow down and finally wget starts up. The 40gb transfer eventually finishes, but the system responds very slowly to keyboard and mouse input. Finally I get the option to reboot from the upper right menu. I click on it. The next morning, the reboot hasn't happened yet, though it looks as though the system is almost down - blank screen, unresponsive keyboard, but still a power light.

I do a hard reset and "everything seems OK".

In the log, lots of ata soft reset / lost interrupt messages e.g.

Apr 24 06:42:16 temuko kernel: [54591.008051] ata2: lost interrupt (Status 0x58)
Apr 24 06:42:16 temuko kernel: [54591.012007] ata2: drained 65536 bytes to clear DRQ.
Apr 24 06:42:16 temuko kernel: [54591.088075] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 24 06:42:16 temuko kernel: [54591.098936] sr 1:0:0:0: CDB: Get event status notification: 4a 01 00 00 10 00 00 00 08 00
Apr 24 06:42:16 temuko kernel: [54591.098983] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Apr 24 06:42:16 temuko kernel: [54591.098987] res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Apr 24 06:42:16 temuko kernel: [54591.120572] ata2.00: status: { DRDY }
Apr 24 06:42:16 temuko kernel: [54591.131328] ata2: soft resetting link
Apr 24 06:42:16 temuko kernel: [54591.308475] ata2.00: configured for PIO0
Apr 24 06:42:16 temuko kernel: [54591.308878] ata2: EH complete

I might try re-installing 10.10 on one of my spare drives to see if I get the same behaviour.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Just updated my Dell Inspiron 1501 to latest updates on beta-2 and tried to ftp my Music data over.

Hung after about 1-2 Gb of transfer. Nothing in logs.

Any suggestions as to what to do next would be very much appreciated... this makes natty a total no-go for me, since I cannot restore my home directories...

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

I have discovered through trial and error that if I set up my copy so that after every artist directory that I do a few "sync" commands followed by a "sleep 10" that the copy succeeds, ie all files are copied without any errors.

This holds true on both the Toshiba Satellite A70's IDE drive (brand new 120Gb western digital) and on the Dell Inspiron 1501's ATA drive (5 year old 100Gb Toshiba).

I am now officially at my wit's end. I have two laptops with the music files on them, but I am afraid to upgrade / install any other machines to Natty.

Please, if anyone is actually reading this, any ideas?

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

One more piece of data.

On the Toshiba Satellite A70, I re-installed my old 100MB testing drive (with which under natty my machine would hang during copying of my Music directory either by ftp or from usb hard drive) and installed a fresh copy of Debian

This is running kernel Linux 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686. I'm not familiar with Debian but I did notice that it automatically configured an ext3 file system.

With this installation, I copy my Music directory with no errors and no system hangs, both from usb and from ftp.

From this process I am pressed to conclude that there is definitely something wrong with the way natty is handling my hard drives on this Toshiba.

I don't know what else to try. A different kernel under natty?

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

A further data point.

Under Linux Mint 11 on the Toshiba, I have the same problem; which isn't too much of a surprise since this is derived from Ubuntu 11.04.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Still having this problem under natty and mint 11.

I can still get around it by copying one directory at a time and sleeping 10+ between directories.

Is that a hardware problem? Two laptops, different vintages, different processors, from different manufacturers, different disks, including one new? Or a software problem?

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

ok!

On the Toshiba Satellite A70, this is a *** Hardware Problem ***

I found an article on how to disassemble this baby (why are laptops so hard to take apart?) and when I got down to the cpu cooler, there were a couple of large gray dust bunnies snarling and yappping at me.

I used up most of a can of dust-off (R) and reassembled with new heat sink goop and my first test was to copy over my Music directory... which succeeded!!!

So I am blithely assuming something similar is true with the Dell, and one day I will disassemble it for cleaning as well.

Changed in libatasmart (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu):
status: New → Invalid
Revision history for this message
Chris Hermansen (c-hermansen) wrote :

I have marked this bug as "invalid" since it seems not to be a natty problem after all, but rather a marginal hardware / dirty computer issue

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.