Comment 6 for bug 1827238

Revision history for this message
Blake Rouse (blake-rouse) wrote : Re: 2.6beta2: many nodes failed deployment with time out

Do you have a tcpdump of the network traffic? That is needed to be able to determine what the issue is. We need to inspect the HTTP headers that is being sent from nginx to the client. Specifically the "Content-Length" header to see if there is a mismatch there.

It is possible that the correct Content-Length is being sent to the client, but the client is either closing the HTTP connection to soon or a TCP reset is occurring. That will still report as a 200 response on the server side as the response was 200. The difference will be if the actual amount of data sent to the client matches the Content-Length of the response HTTP header.

Knowing if those mismatch will be the first step, because there is a few outcomes:

1. The server is reading the file wrong and setting the Content-Length not to the actual file length.
2. The server is sending the file and something occurs for the file handler to be closed so the streaming action is stopped (I would expect a log in nginx errors if this were to occur, but maybe not)
3. The client is closing the connection before reading all the data being streamed from the server.
4. A TCP reset is occurring break the TCP connection, giving the same result of #3 but caused by TCP reset instead of client disconnect.

Also the output of the whole tree structure of /var/lib/maas/boot-resources and byte sizes (in bytes) so the Content-Length can be compared to the actual file.