> I am not sure that would really help.
> Are you sure that it couldn't be a hardware problem.
I don't see any hardware problems in the log before the kernel oopses. If
there were, if there are hardware issues, then it's the kernel fault that
nothing gets reported.=20
The only think I can think of is that there might be some (unreported by
the kernel) hard drive problems which doesn't get reported by the kernel
and when it tries to use the swap space it cannot read/write to it and this
generates the oopses. Isnt' there a tool to test the swapspace? (besides=20
'mkswap -c')
The one thing I'm surprised about is that the oopses vary somewhat in their=
=20
messages:
kernel BUG at mmap.c:1172!
kernel BUG at page_alloc.c:152!
kernel BUG at page_alloc.c:221!
Digging the code of the first one I find it in mm/mmap.c exit_mmap():
/* This is just debugging */
if (mm->map_count) BUG();
And the page_alloc ones code are:
mm/page_alloc.c:
84 static void FASTCALL(__free_pages_ok (struct page *page, unsigned i=
nt or der));
85 static void __free_pages_ok (struct page *page, unsigned int order)
86 {
(...)
149 buddy1 =3D base + (page_idx ^ -mask);
150 buddy2 =3D base + page_idx;
151 if (BAD_RANGE(zone,buddy1))
152 BUG();
153 if (BAD_RANGE(zone,buddy2))
154 BUG();
(...)
203 static struct page * rmqueue(zone_t *zone, unsigned int order)
204 {
(...)
219 page =3D list_entry(curr, struct page, list=
);
220 if (BAD_RANGE(zone,page))
221 BUG();
I don't have an in depth knowledge of the kernel, but I don't believe that
hardware issues can make the above code generate those BUG(). It looks to
me that somehow, the kernel is not handling its swap definitions properly.
Can you figure up a way in which I could reproduce these errors and maybe=
=20
trace the kernel to see what's going on?
> It seems to be rather intermittend and do not have any
> other reports of similar failures.
The "intermittency" might be related to the fact that it's a problem in the=
=20
cleanup of swap pages, when swap is not used, the problem does not show=20
up. For what it's worth, in my system:
So swap is not usually used. The oops seem to appear when cron jobs make=20
intensive use of the system and the swap usage goes up and down.
> Pending a way to reliably reproduce the problem,=20
> or at least some confirmation that it manifests on
> different hardware I have changed the severity to important.
I understand this but I would appreciate some indication on how to debug=20
this issue myself if necessary and trace what the kernel is hitting=20
against.
Regards
Javier
--gKMricLos+KVdGMg
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
Message-ID: <email address hidden> 1?Q?Fern= E1ndez- Sanguino_ Pe=F1a? = <email address hidden> image-2. 4.26-1- 686: system crash due to kernel bug
Date: Thu, 29 Jul 2004 09:20:29 +0200
From: Javier =?iso-8859-
To: Horms <email address hidden>
Cc: <email address hidden>
Subject: Re: Bug#255175: kernel-
--gKMricLos+KVdGMg Disposition: inline Transfer- Encoding: quoted-printable
Content-Type: text/plain; charset=us-ascii
Content-
Content-
> I am not sure that would really help.
> Are you sure that it couldn't be a hardware problem.
I don't see any hardware problems in the log before the kernel oopses. If
there were, if there are hardware issues, then it's the kernel fault that
nothing gets reported.=20
The only think I can think of is that there might be some (unreported by
the kernel) hard drive problems which doesn't get reported by the kernel
and when it tries to use the swap space it cannot read/write to it and this
generates the oopses. Isnt' there a tool to test the swapspace? (besides=20
'mkswap -c')
The one thing I'm surprised about is that the oopses vary somewhat in their=
=20
messages:
kernel BUG at mmap.c:1172!
kernel BUG at page_alloc.c:152!
kernel BUG at page_alloc.c:221!
Digging the code of the first one I find it in mm/mmap.c exit_mmap():
BUG() ;
/* This is just debugging */
if (mm->map_count)
And the page_alloc ones code are:
mm/page_alloc.c: __free_ pages_ok (struct page *page, unsigned i= zone,buddy1) ) zone,buddy2) ) zone,page) )
84 static void FASTCALL(
nt or der));
85 static void __free_pages_ok (struct page *page, unsigned int order)
86 {
(...)
149 buddy1 =3D base + (page_idx ^ -mask);
150 buddy2 =3D base + page_idx;
151 if (BAD_RANGE(
152 BUG();
153 if (BAD_RANGE(
154 BUG();
(...)
203 static struct page * rmqueue(zone_t *zone, unsigned int order)
204 {
(...)
219 page =3D list_entry(curr, struct page, list=
);
220 if (BAD_RANGE(
221 BUG();
I don't have an in depth knowledge of the kernel, but I don't believe that
hardware issues can make the above code generate those BUG(). It looks to
me that somehow, the kernel is not handling its swap definitions properly.
Can you figure up a way in which I could reproduce these errors and maybe=
=20
trace the kernel to see what's going on?
> It seems to be rather intermittend and do not have any
> other reports of similar failures.
The "intermittency" might be related to the fact that it's a problem in the=
=20
cleanup of swap pages, when swap is not used, the problem does not show=20
up. For what it's worth, in my system:
$ free
total used free shared buffers cached
Mem: 386156 381800 4356 0 15712 258080
-/+ buffers/cache: 108008 278148
Swap: 979956 1088 978868
So swap is not usually used. The oops seem to appear when cron jobs make=20
intensive use of the system and the swap usage goes up and down.
> Pending a way to reliably reproduce the problem,=20
> or at least some confirmation that it manifests on
> different hardware I have changed the severity to important.
I understand this but I would appreciate some indication on how to debug=20
this issue myself if necessary and trace what the kernel is hitting=20
against.
Regards
Javier
--gKMricLos+KVdGMg pgp-signature; name="signature .asc" Description: Digital signature Disposition: inline
Content-Type: application/
Content-
Content-
-----BEGIN PGP SIGNATURE-----
ehJTrj0oRAvApAJ 4ky5Rd7V25CxGiC AfaiIj3Y+ KAqACeMVqP TY4oEGC4EHs=
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFBCKU9i4s
d8d4EAQxliq+
=OGPB
-----END PGP SIGNATURE-----
--gKMricLos+ KVdGMg- -