Comment 31 for bug 1532823

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

OK, so here's what's going on:

1) During netconfig.pp task, nailgun-agent starts to collect info about the system. It may take some time (like 10-40 seconds)

2) Netconfig reconfigures NICs/IPs, in particular it moves admin IP from enp0s3 to br-fw-admin bridge

3) So if nailgun-agent calls _master_ip_and_mac() function in exactly the same moment when admin IP is already down on enp0s3 and not yet up on br-fw-admin, then it won't be able to find admin interface and will default to ohai_system_info() defaults:
https://github.com/openstack/fuel-nailgun-agent/blob/76f48ff6c6a3996a7800a34cd97c5bfd4539107f/agent#L775-L778

4) Then nailgun-agent sends wrong MAC and IP to nailgun, because there's a delay in nailgun-agent work, br-fw-admin is already configured so agent is able to connect to master node and send wrong info.

I suggest to fail nailgun-agent run if it can't find amdin MAC and IP - it will be much safer then sending random MAC/IP to nailgun as nodes new main MAC/IP. If we simply fail, then nailgun-agent will be able to collect correct info during the very next run.