Comment 3 for bug 541371

Revision history for this message
Jeff Hill (johill-lanl) wrote :

> The output from inetstatShow some time later (see below) demonstrates that the
> network buffers were probably all being used up by requests from CA clients not
> being handled.

Since the receive side is full this does point to some issue upstream of the IP kernel.

> PCB Proto Recv-Q Send-Q Local Address Foreign Address (state)
> 1d4c958 TCP 8056 0 164.54.3.153.5064 164.54.3.75.45183

It's significant that at least 8 of the server's receive threads are not keeping up with their requests. That might indicate that a global lock was compromised or deadlocked in the server or the database. That might also indicate that a high priority thread is using all of the CPU (possibly a device driver or tNetTask).

A stack trace from one of the stuck receive threads in the server might be very useful information.

Furthermore, one would expect that the send queue, and not the receive queue, would be full if something odd was going on with dbEvent.c. The normal quiescent condition which runs 24/7 is a more or less empty send queue. However, if the system is down on MBUFs then one could imagine that the dbEvent.c issue occurred in the past when a TCP output queue was temporarily stalled?

edited on: 2009-06-04 16:46