Comment 2 for bug 1790127

Revision history for this message
Kieran Forde (kieran-forde) wrote :

I've managed to find the root cause for this issue.
Basically since the reboot yesterday *some* users have been experiencing failure to reach metadata from their instances. I tracked this down to a missing PREROUTING rule in the 'qrouter' namespace for the users router. Without this rule an instance will never reach the metadata service.

There are several fixes/patches for this [1] which helps to populate the iptables rules earlier.
This patch only landed in queens and I've requested it to be backported to pike at least.

The delay in fixing this was due to the random nature of the failure which meant our monitoring didn't pick it up. Adding into this a misconfiguration of keepalived which had to be fixed and then the fact that only 2 lines were missing from the iptables rules! This made it hard to spot.

Initial tests showed metadata now reachable and gcerami is testing more completely now.

[1] https://review.openstack.org/#/c/524406/