I saw the same thing happen when we were running the playbooks on a 160 node cluster. Some tasks within the plays would intermittently fail with this error, forcing us to re-run the play with the --rejoin flag. At first I thought we were saturating the network. You can't see this problem unless you run the plays on a big cluster. I'll keep my eye out for any of the theories stated above next time I'm running the playbooks on this cluster.
I saw the same thing happen when we were running the playbooks on a 160 node cluster. Some tasks within the plays would intermittently fail with this error, forcing us to re-run the play with the --rejoin flag. At first I thought we were saturating the network. You can't see this problem unless you run the plays on a big cluster. I'll keep my eye out for any of the theories stated above next time I'm running the playbooks on this cluster.