Comment 3 for bug 1830763

Revision history for this message
Miguel Lavalle (minsel) wrote :

Looking at http://logs.openstack.org/14/668914/1/check/neutron-tempest-plugin-dvr-multinode-scenario/f2ce738/, we can see failure of test_connectivity_through_2_routers due to haproxy-metadata-proxy not being ready when an instance requests its metadata:

1) Instance b97e5dcc-7408-4ed4-8cef-95e65a31dec5, fixed IP 10.10.220.79. This instance landed in the Controller node and succeeds in getting metadata

2) Instance 62f21169-36c3-418d-a1cb-eec491810425, fixed IP 10.10.210.94. This instance landed in the Compute1 node and failed getting its metadata:

Sending discover...
Sending select for 10.10.210.94...
Lease of 10.10.210.94 obtained, lease time 86400
route: SIOCADDRT: Invalid argument
WARN: failed: route add -net "169.254.169.254/32" gw "10.10.210.254"
route: SIOCADDRT: File exists
WARN: failed: route add -net "0.0.0.0/0" gw "10.10.210.254"
cirros-ds 'net' up at 11.29
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 11.70. request failed
failed 2/20: up 17.21. request failed
failed 3/20: up 20.19. request failed
failed 4/20: up 25.53. request failed
failed 5/20: up 28.54. request failed
failed 6/20: up 33.91. request failed
failed 7/20: up 36.92. request failed
failed 8/20: up 42.31. request failed
failed 9/20: up 45.31. request failed
failed 10/20: up 50.67. request failed
failed 11/20: up 53.65. request failed
failed 12/20: up 59.04. request failed
failed 13/20: up 62.01. request failed
failed 14/20: up 67.37. request failed
failed 15/20: up 70.38. request failed
failed 16/20: up 75.77. request failed
failed 17/20: up 78.78. request failed
failed 18/20: up 84.15. request failed
failed 19/20: up 87.17. request failed
failed 20/20: up 92.54. request failed
failed to read iid from metadata. tried 20

3) The routers are c9d4b134-8064-46b9-94a7-d4a6fd775edb:

2019-07-05 09:20:43,293 4546 INFO [tempest.lib.common.rest_client] Request (NetworkConnectivityTest:test_connectivity_through_2_routers): 201 POST http://10.209.129.140:9696/v2.0/routers 4.268s
2019-07-05 09:20:43,293 4546 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: {"router": {"external_gateway_info": {"network_id": "ace47216-82e5-47cc-b721-8ffec818f30f"}, "name": "tempest-ap1_rt-1161204222", "admin_state_up": true}}
    Response - Headers: {'content-type': 'application/json', 'content-length': '753', 'x-openstack-request-id': 'req-dbb3cd88-32e4-438b-8717-905e721a1a50', 'date': 'Fri, 05 Jul 2019 09:20:43 GMT', 'connection': 'close', 'status': '201', 'content-location': 'http://10.209.129.140:9696/v2.0/routers'}
        Body: b'{"router": {"id": "c9d4b134-8064-46b9-94a7-d4a6fd775edb"

and 5a4d9551-0650-4ed1-9d18-2aaff5a5bc1f:

019-07-05 09:20:43,424 4546 INFO [tempest.lib.common.rest_client] Request (NetworkConnectivityTest:test_connectivity_through_2_routers): 201 POST http://10.209.129.140:9696/v2.0/routers 0.130s
2019-07-05 09:20:43,424 4546 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: {"router": {"external_gateway_info": {}, "name": "tempest-ap2_rt-29568820", "admin_state_up": true}}
    Response - Headers: {'content-type': 'application/json', 'content-length': '486', 'x-openstack-request-id': 'req-7bd24508-d790-4b6a-941f-45a481bbe662', 'date': 'Fri, 05 Jul 2019 09:20:43 GMT', 'connection': 'close', 'status': '201', 'content-location': 'http://10.209.129.140:9696/v2.0/routers'}
        Body: b'{"router": {"id": "5a4d9551-0650-4ed1-9d18-2aaff5a5bc1f

4) Instance that failed getting metadata (62f21169-36c3-418d-a1cb-eec491810425 in node Compute1) is ACTIVE according to Nova at:

2019-07-05 09:22:04,164 4546 INFO [tempest.lib.common.rest_client] Request (NetworkConnectivityTest:test_connectivity_through_2_routers): 200 GET http://10.209.129.140/compute/v2.1/servers/62f21169-36c3-418d-a1cb-eec491810425 0.385s
2019-07-05 09:22:04,164 4546 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: None
    Response - Headers: {'date': 'Fri, 05 Jul 2019 09:22:03 GMT', 'server': 'Apache/2.4.29 (Ubuntu)', 'content-length': '1525', 'content-type': 'application/json', 'openstack-api-version': 'compute 2.1', 'x-openstack-nova-api-version': '2.1', 'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', 'x-openstack-request-id': 'req-fdf8d58c-4c84-4a94-a645-2b97893bfc93', 'x-compute-request-id': 'req-fdf8d58c-4c84-4a94-a645-2b97893bfc93', 'connection': 'close', 'status': '200', 'content-location': 'http://10.209.129.140/compute/v2.1/servers/62f21169-36c3-418d-a1cb-eec491810425'}
        Body: b'{"server": {"id": "62f21169-36c3-418d-a1cb-eec491810425", "name": "tempest-server-test-1125157758", "status": "ACTIVE",

5) haproxy-metadata-proxys for the two routers in Compute1 ARE CREATED AFTER instance 62f21169-36c3-418d-a1cb-eec491810425 (landed on Compute1) became ACTIVE:

Jul 05 09:22:49.441033 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: DEBUG neutron.agent.metadata.driver [-] haproxy_cfg =
Jul 05 09:22:49.441033 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: global
Jul 05 09:22:49.441033 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: log /dev/log local0 debug
Jul 05 09:22:49.441033 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: log-tag haproxy-metadata-proxy-5a4d9551-0650-4ed1-9d18-2aaff5a5bc1f

and:

Jul 05 09:23:20.860992 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: DEBUG neutron.agent.metadata.driver [-] haproxy_cfg =
Jul 05 09:23:20.860992 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: global
Jul 05 09:23:20.860992 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: log /dev/log local0 debug
Jul 05 09:23:20.860992 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: log-tag haproxy-metadata-proxy-c9d4b134-8064-46b9-94a7-d4a6fd775edb

6) As a contrast, instance b97e5dcc-7408-4ed4-8cef-95e65a31dec5 (Controller node) became active, according to Nova, at:

2019-07-05 09:23:47,274 4546 INFO [tempest.lib.common.rest_client] Request (NetworkConnectivityTest:test_connectivity_through_2_routers): 200 GET http://10.209.129.140/compute/v2.1/servers/b97e5dcc-7408-4ed4-8cef-95e65a31dec5 0.641s
2019-07-05 09:23:47,274 4546 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: None
    Response - Headers: {'date': 'Fri, 05 Jul 2019 09:23:46 GMT', 'server': 'Apache/2.4.29 (Ubuntu)', 'content-length': '1523', 'content-type': 'application/json', 'openstack-api-version': 'compute 2.1', 'x-openstack-nova-api-version': '2.1', 'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', 'x-openstack-request-id': 'req-32af6871-e7c7-4bee-906a-78f3772787ed', 'x-compute-request-id': 'req-32af6871-e7c7-4bee-906a-78f3772787ed', 'connection': 'close', 'status': '200', 'content-location': 'http://10.209.129.140/compute/v2.1/servers/b97e5dcc-7408-4ed4-8cef-95e65a31dec5'}
        Body: b'{"server": {"id": "b97e5dcc-7408-4ed4-8cef-95e65a31dec5", "name": "tempest-server-test-554631732", "status": "ACTIVE"

whereas the corresponding haproxy-metadata-proxys were created at:

Jul 05 09:21:45.208061 ubuntu-bionic-rax-ord-0008698665 neutron-l3-agent[20762]: DEBUG neutron.agent.metadata.driver [-] haproxy_cfg =
Jul 05 09:21:45.208061 ubuntu-bionic-rax-ord-0008698665 neutron-l3-agent[20762]: global
Jul 05 09:21:45.208061 ubuntu-bionic-rax-ord-0008698665 neutron-l3-agent[20762]: log /dev/log local0 debug
Jul 05 09:21:45.208061 ubuntu-bionic-rax-ord-0008698665 neutron-l3-agent[20762]: log-tag haproxy-metadata-proxy-5a4d9551-0650-4ed1-9d18-2aaff5a5bc1f

and:

Jul 05 09:21:55.948830 ubuntu-bionic-rax-ord-0008698665 neutron-l3-agent[20762]: DEBUG neutron.agent.metadata.driver [-] haproxy_cfg =
Jul 05 09:21:55.948830 ubuntu-bionic-rax-ord-0008698665 neutron-l3-agent[20762]: global
Jul 05 09:21:55.948830 ubuntu-bionic-rax-ord-0008698665 neutron-l3-agent[20762]: log /dev/log local0 debug
Jul 05 09:21:55.948830 ubuntu-bionic-rax-ord-0008698665 neutron-l3-agent[20762]: log-tag haproxy-metadata-proxy-c9d4b134-8064-46b9-94a7-d4a6fd775edb

7) It is worth noting that the L3 agent in Compute1 (where the failed instance landed) shows the following error when trying to retrieve from the neutron server data on one of the routers:

Jul 05 09:22:13.458957 ubuntu-bionic-rax-ord-0008698810 neutron-l3-agent[18812]: ERROR neutron.agent.l3.agent [-] Failed to fetch router information for 'c9d4b134-8064-46b9-94a7-d4a6fd775edb': RemoteError: Remote error: InvalidRequestError This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.IntegrityError) (1062, "Duplicate entry '8831ed85-9ccf-48a2-92eb-ab39d3d30e89-ubuntu-bionic-rax-ord-00086' for key 'PRIMARY'")