System logs fill with cdrom errors (1 GB per 20 minutes)

Bug #64548 reported by kko
4
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

I am at a loss to diagnose what causes this. Faulty hardware, faulty kernel, faulty compact disc (except that I _think_ the drive was empty after I had removed the disc)?

I am using Ubuntu Breezy.

Today I noticed that my root partition was full, even though it was supposed to have 1 GB free. I suspected some error messages were flooding the log. I had been using Konqueror to convert music from music CD's (of various make and age) with two of my drives.

What I found in three logfiles (syslog, kern.log, messages), was these four messages:
Oct 7 14:23:54 mymachine kernel: [4303409.960000] cdrom: dropping to old style cdda (sense=4)
Oct 7 14:23:54 mymachine kernel: [4303409.960000] hdd: packet command error: status=0x51 { DriveReady SeekComplete Error }
Oct 7 14:23:54 mymachine kernel: [4303409.960000] hdd: packet command error: error=0x54 { AbortedCommand LastFailedSense=0x05 }
Oct 7 14:23:54 mymachine kernel: [4303409.960000] ide: failed opcode was: unknown

...and the last three of these messages kept repeating indefinitely, until the three files were all above 350 MB, totaling about 1 GB. This had been going on for 20 minutes, i.e. until the root partition filled.

The drive either fails to read a disc, or fails to identify that the drive is empty. Sometimes it also fails to eject nicely, having to use "sudo" to eject, but I don't know if this is connected. I had used "sudo eject /dev/hdd" on this drive twice. The first time it didn't eject, just opened and closed the tray. The second time it ejected. This _seems to_ possibly have triggered the errors.

hdparm -i (after the incident) shows this:
/dev/hdd:

 Model=DVD-ROM DDU1621, FwRev=VER S2.9, SerialNo=
 Config={ Fixed Removeable DTR<=5Mbs DTR>10Mbs nonMagnetic }
 RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=0kB, MaxMultSect=0
 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
 IORDY=yes, tPIO={min:227,w/IORDY:120}, tDMA={min:120,rec:150}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2
 AdvancedPM=no

 * signifies the current active mode

hdparm (after the incident) shows this:
/dev/hdd:
 IO_support = 0 (default 16-bit)
 unmaskirq = 0 (off)
 using_dma = 0 (off)
 keepsettings = 0 (off)
 readonly = 0 (off)
 readahead = 256 (on)
 HDIO_GETGEO failed: Invalid argument

If I remember, I'll check this after a clean boot, and see if it's different. (I believe it is, it should be using dma.)

Revision history for this message
Caroline Ford (secretlondon) wrote :

Could you provide the information as requested in https://wiki.ubuntu.com/DebuggingKernelProblems?

Revision history for this message
kko (kko) wrote :
Revision history for this message
kko (kko) wrote :
Revision history for this message
kko (kko) wrote :
Revision history for this message
kko (kko) wrote :

hdparm -i after a clean boot:
/dev/hdd:

 Model=DVD-ROM DDU1621, FwRev=VER S2.9, SerialNo=
 Config={ Fixed Removeable DTR<=5Mbs DTR>10Mbs nonMagnetic }
 RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=0kB, MaxMultSect=0
 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
 IORDY=yes, tPIO={min:227,w/IORDY:120}, tDMA={min:120,rec:150}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2
 AdvancedPM=no

 * signifies the current active mode

hdparm after a clean boot:
/dev/hdd:
 IO_support = 1 (32-bit)
 unmaskirq = 1 (on)
 using_dma = 0 (off)
 keepsettings = 0 (off)
 readonly = 0 (off)
 readahead = 256 (on)
 HDIO_GETGEO failed: Invalid argument

Revision history for this message
kko (kko) wrote :

Done. I must point out that this was a onetime event, and doesn't happen regularly. Once, before this incident, I accidentally kicked the machine into a hang - switching to VT 1 showed IIRC the same kind of errors flowing by at a huge pace.

The eject troubles I mentioned are probably related to the different kio_???? processes working / hanging in the background after Konqueror has been closed (I use _K_ubuntu Breezy, to be precise), since killing them re-enables eject without sudo. This is probably a different issue, although it might be among the possible causes that could trigger the error state.

Revision history for this message
kko (kko) wrote :

Is there more information I could provide ("Needs Info")?

Revision history for this message
kko (kko) wrote :

Reverting status to "Unconfirmed", since the information requested has been provided. If there's any more information I can provide, please let me know.

Changed in linux-source-2.6.12:
status: Needs Info → Unconfirmed
Revision history for this message
Brian Murray (brian-murray) wrote :

Thanks for taking the time to report this bug and helping to make Ubuntu better. Error messages of this type are usually indicative of faulty hardware. Have you tried replacing the IDE cable or CD-ROM drive? Thanks in advance.

Changed in linux-source-2.6.12:
assignee: nobody → brian-murray
status: Unconfirmed → Needs Info
Revision history for this message
kko (kko) wrote :
Download full text (4.2 KiB)

Thanks for looking at this. I'm sorry I ended up writing a long response, I hope you'll bear with me.

No, I haven't tried changing the cable - I've detached and re-attached it since (not immediately, but days or weeks after the incident), though, but it may not have affected this bug. I have two drives attached to the cable, the one affected on this bug was the slave. As said, it was most probably caused by a faulty CD or by something that went wrong with the drive as a disc was ejected. (I really don't know what a "DriveReady SeekComplete Error" denotes. This is all guessing, but some process may have tried to access the disc, not recognizing it was ejected... or something else.)

The drive has - before and after this incident - otherwise worked well, without any problems whatsoever.

As this was a one-time incident, even if it was the hardware's fault, it would be rather hard to actually reproduce. Nevertheless, I tried for a while to tease both of my drives with reading faulty CDs and ejecting discs in mid-operation, but didn't get the same effect as here. (The most I got was a single "cdrom: dropping to single frame dma" and "cdrom: This disc doesn't have any tracks I recognize!" around the time of a manual eject while cdparanoia was doing its job but stuck on some scratch.)

I'll recount, for the record, a few incidents I faced, that share some common characteristics:
1) As mentioned, at one point I inadvertently kicked the machine, at which it started flooding I/O errors on all tty's, forcing a reboot.
2) Some time after that, I faced this bug, somewhat reminding me of that incident.
3) A bit later, still, I faced what I've reported in bug 64914 - very much reminiscent of incident number 1, but this time provoked by an 'e2fsck' on a hard disk drive. (Apparently this was the damage caused by incident 1. See the referenced report for details.)
4) After that incident, not wishing to risk my data on a potentially damaged disk, I got a new disk, did a fresh install of Dapper on that one, and put my old disk as a slave (hdb) to the new one. (After I wrote data to fill up the damaged partition, the drive - as I had hoped - noticed and repaired the fault, assigning backup sectors to replace the faulty ones.)

As I've switched to Dapper, I'm using a different kernel at the moment, and as both of these were one-time incidents, it would be hard reproducing them. (The other one was reproduceable, but I didn't want to leave the disk in its damaged condition and not try to fix it.)

So, what do I think is the "issue" here? The issue, in my opinion, is that:
- Hardware may fail, without prior notice,
- but is it possible for the OS to handle it more gracefully?

_If_ I understand correctly, this _is_ problematic for the OS, because if a hardware component "complains" about something, it may block the operating system / program being used from responding to the situation. If the hardware keeps malfunctioning and the OS / program can't react, this is an uncomfortable situation for the user.

In this case, reported here, I could and did use the machine until and after the root partition filled up, and was able to reboot nicely (which fixed the drive ;-)...

Read more...

Revision history for this message
kko (kko) wrote :

UPDATE: It is happening again as I write this, this time on Dapper, but with the same device.

This time the first message was "Mar 7 16:00:04 mymachine kernel: [17190709.372000] cdrom: dropping to old style cdda (sense=b)". The earlier "sense=4" is now "sense=b", but otherwise identical. However, this time there was nearly 15 minutes before the repeating error messages started. They started a while ago, when I started transferring the rest of my CDs to FLAC, this time through KAudioCreator, and seem identical to those I met earlier:

"Mar 7 16:28:12 mymachine kernel: [17192397.524000] hdd: packet command error: status=0x51 { DriveReady SeekComplete Error }
Mar 7 16:28:12 mymachine kernel: [17192397.524000] hdd: packet command error: error=0x54 { AbortedCommand LastFailedSense=0x05 }
Mar 7 16:28:12 mymachine kernel: [17192397.524000] ide: failed opcode was: unknown"

The transfer operation proceeds, however, correctly. Two questions that come in mind:
- Why did the drive drop to "old style cdda" half an hour ago? (I'll try to think if I was doing anything special then.)
- I'm now using both drives simultaneously. Can the affected drive's slave status have anything to do with this?

Revision history for this message
kko (kko) wrote :

The messages stopped when the transfer was finished. And this time it was only 250 MB in 20 minutes, in total for the 3 mentioned log files.

It looks like there may be something wrong with the drive, even though it appears to work. What puzzles me is this, however:
- Is there a reason the drive claims to have dropped to "old style cdda" all of a sudden?
- Does the fault only occur when using this "old style cdda"? Seems likely, I would say.
- Also, the output of 'hdparm' and 'hdparm -i' is differen this time. (I'll attach the output.)

Revision history for this message
kko (kko) wrote :

As this turned out to reproduce, there's now really two separate issues here:

1) What is the real cause of the DVD-drive's error, and why does it occur? Is it a genuine hardware error, and/or is it something the kernel can affect / handle?

2) The intelligence of the logging daemons, and whether or not something can / should be done to improve this. (Thinking about this can perhaps be postponed, but... anyway you like. :-)

Revision history for this message
kko (kko) wrote :

And some more information - I'll drop these bits here as I come by them:
- Turns out I had, and still have, on a background instance of Konsole (well, Yakuake), a process of 'cdparanoia' trying to read '/dev/hdd', and have had since the past four hours. ;-)
- During this time I've ejected the discs from both drives, closed the trays, put in new discs, and transferred the music with KAudioCreator, which again ejected the discs after it was finished, and closed the trays.
- The repeating errors seem to have occurred when both the processes used by KAudioCreator _and_ the 'cdparanoia' -process tried to access the same device simultaneously.
- The command 'lsof|grep hdd' gives "cdparanoi 6175 myuserid 3r BLK 22,64 4512 /dev/hdd".

Revision history for this message
kko (kko) wrote :

So, small wonder if the drive gets hiccups when given conflicting demands.

It seems possible I can now reproduce this (provided I find out what causes the drop to "old style cdda") - I'll have to see if I can get even the other drive to do the same. ;-)

Changed in linux-source-2.6.12:
assignee: brian-murray → nobody
status: Needs Info → Unconfirmed
Changed in linux-source-2.6.12:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: Unconfirmed → Confirmed
Revision history for this message
kko (kko) wrote :

I'm dropping this information here, in case it turns out useful for you.

I saw you marked this as confirmed, so I tried to reproduce it again, with the current Dapper kernel (2.6.15-28.55) - this time without success. I only tried one known faulty disc though, (but the same as last time, IIRC,) so I can't be sure if something else would be needed to trigger the bug again.

My other (newer) drive performed without fault, and the drive that had previously experienced the trouble announced this in the logs: "cdrom: dropping to single frame dma" instead of "cdrom: dropping to old style cdda ...". In sum, this time the drive didn't drop to "old style cdda", which I think contributed to the issue originally. (With "single frame dma", if I tried to access the disk with two different programs, lsof would report both trying to access it, but neither program seemed able to read any data - once I quit one of these, the other started receiving data.)

I expect the issue with the flood of error messages would re-emerge if the drive again dropped to "old style cdda" for some reason. I also can't tell if the problem can still occur on other hardware - possibly yes.

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
kernel-janitor (kernel-janitor) wrote :

This bug report was marked as Confirmed a while ago but has not had any updated comments for quite some time. Please let us know if this issue remains in the current Ubuntu release, http://www.ubuntu.com/getubuntu/download . If the issue remains, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.