kswapd Bad page state in process

Bug #351089 reported by Mihai Tanasescu
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Colin Ian King

Bug Description

Hello all,

I think the bug I'm facing now is a regression of an older bug.
Short summary and system description:
Ubuntu 32bit 8.10, 4gb Ram, linux-server kernel image 2.6.27.11 (because of PAE support to see all RAM).
Partitions are like this:
3aware hardware RAID card - 2 x HDD 1 TB (RAID-1 hardware); LVM on top of this for snapshot functionality
Partitions:
/boot
LVM:
/ - reiserfs
/home - xfs
/storage - xfs
swap

What I was doing when I noticed the bug:
- copying files from the older hard-drives to the new setup

Output you will find in the attached logs (from dmesg).

[ 8245.217063] Bad page state in process 'kswapd0'
[ 8245.217064] page:c12df294 flags:0x40000008 mapping:00000000 mapcount:-67108864 count:0
[ 8245.217072] Trying to fix it up, but a reboot is needed
[ 8245.217074] Backtrace:
[ 8245.217078] Pid: 199, comm: kswapd0 Tainted: P B 2.6.27-11-server #1
[ 8245.217082] [<c038aff6>] ? printk+0x1d/0x1f
[ 8245.217091] [<c019140c>] bad_page+0x7c/0xd0
[ 8245.217096] [<c01924c1>] free_hot_cold_page+0x1f1/0x200
[ 8245.217101] [<c01924fd>] __pagevec_free+0x2d/0x40
[ 8245.217105] [<c0197488>] shrink_page_list+0x308/0x450
[ 8245.217111] [<c0197753>] shrink_inactive_list+0x183/0x410
[ 8245.217135] [<c0132dbf>] ? wake_up_process+0xf/0x20
[ 8245.217143] [<f905cefa>] ? xfsbufd_wakeup+0x5a/0x60 [xfs]
[ 8245.217168] [<c0153588>] ? up_read+0x8/0x20
[ 8245.217174] [<c0197a4a>] shrink_zone+0x6a/0x130
[ 8245.217178] [<c019835e>] balance_pgdat+0x3ae/0x3d0
[ 8245.217183] [<c0197000>] ? isolate_pages_global+0x0/0x60
[ 8245.217189] [<c014f106>] ? finish_wait+0x16/0x70
[ 8245.217194] [<c0198449>] kswapd+0xc9/0x120
[ 8245.217198] [<c014f040>] ? autoremove_wake_function+0x0/0x50
[ 8245.217203] [<c0198380>] ? kswapd+0x0/0x120
[ 8245.217208] [<c014ecc1>] kthread+0x41/0x80
[ 8245.217212] [<c014ec80>] ? kthread+0x0/0x80
[ 8245.217217] [<c010abe7>] kernel_thread_helper+0x7/0x10
[ 8245.217222] =======================

Basically this is what appears and I think it relates to XFS.

I would be grateful if you guys could take a look and help.

Thanks,
Mihai

Revision history for this message
Mihai Tanasescu (mihai-duras) wrote :
Revision history for this message
Mihai Tanasescu (mihai-duras) wrote :

Don't know if this has anything to do with it..but I was copying files from older NTFS drives to /storage (which is XFS) when by chance I ran a dmesg and started seeing these appear over and over.

I didn't try to reproduce it by copying from another source yet.

Revision history for this message
Mihai Tanasescu (mihai-duras) wrote :

I repeated some tests.
I downloaded from an FTP 1 GB to the /storage XFS partition -> no error.

I started copying from the NTFS older drive some other data (50 GB) to the /storage partition on the new drive and after 5-6 Gb of transfer I started getting periodically the output shown above in the logs.

Revision history for this message
Mihai Tanasescu (mihai-duras) wrote :

Again some tests:

NTFS -> reiserfs copying --> same error after 1-2 GB transferred; transfer works ok also afterwards but I get messages like that in the logs from time to time.
Same happens for NTFS -> XFS copying.
ReiserFS -> XFS copying generates similar messages but after 4-5 Gb transferred.

I don't know what else I can do.
It seems like a kernel issue.

Revision history for this message
Andy Whitcroft (apw) wrote :

This is a kernel bug moving to the correct package.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Mihai Tanasescu (mihai-duras) wrote :

Again another comment..hope it is useful.

I copied 5 Gb worth of smaller files..I received no error in the logs.
Seems this happens when copying larger files (when I was getting it I was transferring many files like 700 - 1 Gb each).

Revision history for this message
Mihai Tanasescu (mihai-duras) wrote :

I was copying now other files from the older external drive to the new internal ones and I got the same error but with ntfs-3g this time as process:

[ 1115.110898] Trying to fix it up, but a reboot is needed
[ 1115.110899] Backtrace:
[ 1115.110904] Pid: 6636, comm: mount.ntfs-3g Tainted: P 2.6.27-11-server #1
[ 1115.110908] [<c038aff6>] ? printk+0x1d/0x1f
[ 1115.110917] [<c019140c>] bad_page+0x7c/0xd0
[ 1115.110922] [<c01915ae>] prep_new_page+0x14e/0x160
[ 1115.110926] [<c019176a>] buffered_rmqueue+0x1aa/0x2b0
[ 1115.110931] [<c0191933>] get_page_from_freelist+0xc3/0x160
[ 1115.110935] [<c0191a83>] __alloc_pages_internal+0xb3/0x460
[ 1115.110962] [<f8842244>] ? fuse_copy_do+0x14/0x80 [fuse]
[ 1115.110976] [<c019470a>] __do_page_cache_readahead+0xda/0x1e0
[ 1115.110981] [<c0194b5d>] ondemand_readahead+0x12d/0x140
[ 1115.110985] [<c0194c2d>] page_cache_sync_readahead+0x2d/0x40
[ 1115.110990] [<c018db91>] do_generic_file_read+0x241/0x4d0
[ 1115.110996] [<c018dec3>] generic_file_aio_read+0xa3/0x1d0
[ 1115.111000] [<c018b960>] ? file_read_actor+0x0/0xe0
[ 1115.111005] [<c01bad89>] do_sync_read+0xd9/0x120
[ 1115.111010] [<c023dfb4>] ? aa_file_permission+0x14/0xc0
[ 1115.111016] [<c014f040>] ? autoremove_wake_function+0x0/0x50
[ 1115.111023] [<c021c494>] ? security_file_permission+0x14/0x20
[ 1115.111028] [<c01bae1d>] ? rw_verify_area+0x4d/0xc0
[ 1115.111033] [<c01ba8ab>] ? fsnotify_access+0x6b/0x80
[ 1115.111037] [<c01bb53d>] vfs_read+0x9d/0x110
[ 1115.111041] [<c01bacb0>] ? do_sync_read+0x0/0x120
[ 1115.111046] [<c01bb633>] sys_pread64+0x83/0x90
[ 1115.111050] [<c0109f03>] sysenter_do_call+0x12/0x2f
[ 1115.111055] =======================

Hope all I offered for debugging helps.

Revision history for this message
Colin Ian King (colin-king) wrote :

@Mihai,

Looks like you are getting "struct page" corruption, and as a result we see the "Bad page state" messages. It's of interest that you are seeing this in the following scenarios:

NTFS -> reiserfs
NTFS -> XFS
reiserfs -> XFS

Also, this error occurs after writing large files and very probably most the system memory has been exercised during the long duration transfers. If just one file system had this problem I would suspect it's a specific filesystem error, but because it's happening on two different target filesystems I'm more concerned this is a memory issue.

Can you boot the machine and run the memtest for a few hours just to factor out any memory issues before we start looking at kernel page problems and interactions with the filesystems.

Let me know the results, thanks, Colin

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks for taking the time to report this bug and helping to make Ubuntu better. Could you please attach the resulting log file of: gnome-power-bugreport.sh & > gpm.log to the report? You might also want to take a look at the debugging instructions located at https://wiki.ubuntu.com/DebuggingGNOMEPowerManager and submit any other logs related to your problem. Thanks in advance.

Changed in linux (Ubuntu):
assignee: nobody → Colin King (colin-king)
status: Triaged → Incomplete
Revision history for this message
Colin Ian King (colin-king) wrote :

Oops, ignore comment #9, it got pasted in by mistake. :-(

Revision history for this message
Colin Ian King (colin-king) wrote :

@Mihai,

This bug report is being closed because we received no response to the previous inquiry for information. Please reopen if this is still an issue in the current Ubuntu release, Jaunty Jackalope 9.04. To reopen the bug, click on the current status, under the Status column, and change the status back to "New". Thanks.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.