Comment 40 for bug 801840

Revision history for this message
Tomasz Kusmierz (wally-tm) wrote :

Boys / Girls

1.
To clear out things - AFAIK when version x.y.z of kernel is made a release version - it's taken at ground for next release x.y.(z+1) -dev or -rc (whatever). So if version 2.6.35 was stable there had to be some prerelease of 2.6.36 that was stable - and here is catch - when pushing for new version of kernel all tricks are allowed = completely changing queuing style / scheduler sequencing / anything - so at this point there will be a vast amount of changes to general design of kernel !!! Release 2.6.35 and 2.6.36 might be like black and white as far as I'm concerned. So IMHO it's better to narrow where in pre-releases of 36 f***up was introduced.

2.
Bisect might be good - but from my experience commits on svn / csv / git are made on base of "best practice" - ei there is no guarantee that certain code commit will work or even compile so having problems with compiling certain versions of repository tree does not surprise me.

3.
Since on every machine it crashes with same error - search for specific string to see where is generated and pick some of functions that might lead to it. Also, on all of those asus z7s MCE errors I've seen that cal stack look's exactly the same so I would suggest to search in:
machine_check()
do_machine_check()
mce_reign()
for any changes between offending and non offending version of kernel.

4.
My bet is that this is something extremely silly / stupid like:
- checking cache coherency while in / straight after E0 state
- accessing ram while in low power mode.
- or actually giving damn about errors reported by machine (ECC machines DO REPORT ERRORS but also correct those and properly bail out without creating any harm to running operations - good example is FB - DDR)
etc.