Comment 11 for bug 666211

Revision history for this message
Stefan Bader (smb) wrote :

Adding a few comments here as they come to my mind:

The message about barrier based sync failed is just status and can be safely ignored.

Reading through the dmesg from comment #10 and comparing to one gathered from a daily server instance boot:

[ 0.000000] Xen version: 3.0.3-rc5-8.1.14.f (mine was 3.0.3-rc5-8.el5)
...
[ 0.000000] trying to map vcpu_info 0 at ffff880003bc3020, mfn 124f8a, offset 32
[ 0.000000] register_vcpu_info failed: err=-38

Did not see this error in my log.

[ 0.016933] CPU: Physical Processor ID: 0
[ 0.016939] CPU: Processor Core ID: 0

Not sure this is really relevant the hw I booted seemed to have only 2 CPUs and showed a warning about an unsupported number of siblings (4).

[ 0.103804] alloc irq_desc for 16 on node 0
[ 0.103806] alloc kstat_irqs on node 0

Did not see messages like this either, but I suspect my hw was AMD dual core while this might be Intel quad core.

[ 0.171411] intel_idle: MWAIT substates: 0x2220
[ 0.171413] intel_idle: does not run on family 6 model 23

This proves the previous suspicion. At least it refuses here instead of crashing. Then mostly normal things. The only strange thing is the name of the device in the barrier based sync failed message: sda1-8, unfortunately the way xen works there is no partition detection in the log, but this sounds at least like sda has 8 partitions...

The following stack traces look very much like something deadlocks on flushing. The jbd2 tasks are transactions, what I am not sure about is pgbouncer (what should it do?). However it seems to involve aio and I see there are two patches in 2.6.35-23.36 which address aio completion ordering (coming from 2.6.35.5 upstream stable).

So probably Jay, do you know what pgbouncer is doing and maybe that is something not used in the common images. If yes, maybe it makes sense to check for that newer kernel versions?