Comment 4 for bug 1953478

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I spent some time investigating that issue today and here is what I found so far.

In the nova-compute logs in 2 failed cases which I checked I noticed that the sequence of events was:

Nov 23 19:22:05.162249 ubuntu-focal-ovh-bhs1-0027471441 nova-compute[58433]: INFO nova.compute.manager [None req-aad28555-9fa1-441f-9239-2a1299e35298 tempest-ServersNegativeTestJSON-724065228 tempest-ServersNegativeTestJSON-724065228-project] [instance: 1a207544-1228-4ec1-ad99-4b6f7b6a5ea1] Shelving

Nov 23 19:22:09.112566 ubuntu-focal-ovh-bhs1-0027471441 nova-compute[58433]: INFO nova.compute.manager [None req-aad28555-9fa1-441f-9239-2a1299e35298 tempest-ServersNegativeTestJSON-724065228 tempest-ServersNegativeTestJSON-724065228-project] [instance: 1a207544-1228-4ec1-ad99-4b6f7b6a5ea1] Shelve offloading

Nov 23 19:22:12.795123 ubuntu-focal-ovh-bhs1-0027471441 nova-compute[58433]: INFO nova.compute.manager [None req-7861da7e-4733-4cda-9a8d-aa569ef301f6 tempest-ServersNegativeTestJSON-724065228 tempest-ServersNegativeTestJSON-724065228-project] [instance: 1a207544-1228-4ec1-ad99-4b6f7b6a5ea1] Unshelving

And after "Unshelving" log in the nova-compute, I saw in the neutron server logs that port was switched to active but there was no notification send to nova then (I don't know why):

Nov 23 19:22:42.686159 ubuntu-focal-ovh-bhs1-0027471439 neutron-server[95486]: DEBUG neutron.plugins.ml2.rpc [None req-9ab9f406-90d2-4bf0-9f6f-37f98968d32d None None] Device 46ceed9e-1262-47c2-b7dc-335a31f78b71 up at agent ovs-agent-ubuntu-focal-ovh-bhs1-0027471441 {{(pid=95486) update_device_up /opt/stack/neutron/neutron/plugins/ml2/rpc.py:296}}
...
Nov 23 19:22:42.811808 ubuntu-focal-ovh-bhs1-0027471439 neutron-server[95486]: DEBUG neutron.db.provisioning_blocks [None req-9ab9f406-90d2-4bf0-9f6f-37f98968d32d None None] Provisioning complete for port 46ceed9e-1262-47c2-b7dc-335a31f78b71 triggered by entity L2. {{(pid=95486) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:139}}
Nov 23 19:22:42.812114 ubuntu-focal-ovh-bhs1-0027471439 neutron-server[95486]: DEBUG neutron_lib.callbacks.manager [None req-9ab9f406-90d2-4bf0-9f6f-37f98968d32d None None] Publish callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-3745000'] for port, provisioning_complete {{(pid=95486) _notify_loop /usr/local/lib/python3.8/dist-packages/neutron_lib/callbacks/manager.py:175}}

I compared it with the case when test passed and there in nova-compute logs I can see:

Dec 22 09:08:21.469791 ubuntu-focal-iweb-mtl01-0027794527 nova-compute[57704]: INFO nova.compute.manager [None req-a6373604-981c-4cca-b6a8-85abdc256ad0 tempest-ServersNegativeTestJSON-764009652 tempest-ServersNegativeTestJSON-764009652-project] [instance: 8b1d44ec-c81c-476f-95b0-624a8b62805c] Shelving

Dec 22 09:08:24.621978 ubuntu-focal-iweb-mtl01-0027794527 nova-compute[57704]: INFO nova.compute.manager [None req-a6373604-981c-4cca-b6a8-85abdc256ad0 tempest-ServersNegativeTestJSON-764009652 tempest-ServersNegativeTestJSON-764009652-project] [instance: 8b1d44ec-c81c-476f-95b0-624a8b62805c] Shelve offloading

But unshelving happened on the other node:
Dec 22 09:08:28.417211 ubuntu-focal-iweb-mtl01-0027794526 nova-compute[91278]: INFO nova.compute.manager [None req-dfd80af1-fdae-412b-bf4c-ae17882c70d8 tempest-ServersNegativeTestJSON-764009652 tempest-ServersNegativeTestJSON-764009652-project] [instance: 8b1d44ec-c81c-476f-95b0-624a8b62805c] Unshelving

In such case notification is sent properly as binding of the port was changed.

I think that this may be the problem (maybe some race condition) when it's unshelved on the same node as was shelved.
I will continue investigation on that.