Comment 0 for bug 1836253

william (wfelipew) wrote :

Sometimes on instance initialization, the metadata step fails.

On metadata-agent.log there are lots of 404:
"GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 len: 297 time: 0.0771070

On nova-api.log we get 404 too:
"GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404

After some debuging we get that problem occurs when new instances is getting same IP used by deleted instances.
The problem is related to cache implementation on method "_get_instance_and_tenant_id()" on "/neutron/agent/metadata/agent.py" that returns an port from deleted instance (with the same IP) which returns wrong InstanceID that will be sent to nova-api which will fail because this instanceId not exists.
This problem only occurs with cache enabled on neuton metadata-agent.

Version: Queens

How to reproduce:
---
#!/bin/bash

computenodelist=(
  'computenode00.test.openstack.net'
  'computenode01.test.openstack.net'
  'computenode02.test.openstack.net'
  'computenode03.test.openstack.net'
)

validate_metadata(){
cat << EOF > /tmp/metadata
#!/bin/sh -x
if curl 192.168.10.2
then
 echo "ControllerNode00 - OK"
else
 echo "ControllerNode00 - ERROR"
fi
EOF

  #SUBNAME=$(date +%s)
  openstack server delete "${node}" 2>/dev/null
  source /root/admin-openrc
  openstack server create --image cirros --nic net-id=internal --flavor Cirros --security-group default --user-data /tmp/metadata --availability-zone nova:${node} --wait "${node}" &> /dev/null

  i=0
  until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q "ControllerNode00"
  do
    i=$((i+1))
    sleep 1
  done
  openstack console log show "${node}" | grep -q "ControllerNode00 - OK"
  if [ $? == 0 ]; then
        echo "Metadata Servers OK: ${node}"
  else
        echo "Metadata Servers ERROR: ${node}"
  fi

  rm /tmp/metadata
}

for node in ${computenodelist[@]}
do
  export node
  validate_metadata
done
echo -e "\n"
---