error on nova boot failed to associate instance to baremetal node

Bug #1177596 reported by Robert Collins
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Medium
Unassigned

Bug Description

Verifying you have this bug:
select * from nova_bm.bm_nodes where uuid='$SOMEUUID' \G
If the instance_uuid returned is not present in nova.instances or is present but deleted == id for that row, then this is invalid: nova has deleted the instance but nova baremetal has not.

To fix:
update nova_bm.bm_nodes set instance_uuid=NULL where instance_uuid='thebaduuidreturnedbytheearlierquery';

2013-05-07 23:26:57,899.899 30255 ERROR nova.compute.manager [req-2903f513-9892-41ce-aa5e-e6e7b656bee9 baa113f6f7994ddd9c7d86945768616e 0d4df5d4fee24f18b8b5f425eb81e0c4] [instance: 6d0128f8-a9fb-4036-974d-4caaa915d45d] Error: ['Traceback (most recent call last):\n', ' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 942, in _build_instance\n set_access_ip=set_access_ip)\n', ' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1204, in _spawn\n LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', ' File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__\n self.gen.next()\n', ' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1200, in _spawn\n block_device_info)\n', ' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/driver.py", line 237, in spawn\n \'task_state\': baremetal_states.BUILDING})\n', ' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/db/api.py", line 121, in bm_node_associate_and_update\n return IMPL.bm_node_associate_and_update(context, node_uuid, values)\n', ' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 97, in wrapper\n return f(*args, **kwargs)\n', ' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/db/sqlalchemy/api.py", line 214, in bm_node_associate_and_update\n \'n_uuid\': node_uuid})\n', 'NovaException: Failed to associate instance 6d0128f8-a9fb-4036-974d-4caaa915d45d to baremetal node a476f747-6f05-469f-8a40-a7e38b7499c0.\n']
2013-05-07 23:26:57,938.938 1475 ERROR nova.scheduler.filter_scheduler [req-2903f513-9892-41ce-aa5e-e6e7b656bee9 baa113f6f7994ddd9c7d86945768616e 0d4df5d4fee24f18b8b5f425eb81e0c4] [instance: 6d0128f8-a9fb-4036-974d-4caaa915d45d] Error from last host: ubuntu (node a476f747-6f05-469f-8a40-a7e38b7499c0): [u'Traceback (most recent call last):\n', u' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 942, in _build_instance\n set_access_ip=set_access_ip)\n', u' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1204, in _spawn\n LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', u' File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__\n self.gen.next()\n', u' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1200, in _spawn\n block_device_info)\n', u' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/driver.py", line 237, in spawn\n \'task_state\': baremetal_states.BUILDING})\n', u' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/db/api.py", line 121, in bm_node_associate_and_update\n return IMPL.bm_node_associate_and_update(context, node_uuid, values)\n', u' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 97, in wrapper\n return f(*args, **kwargs)\n', u' File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/db/sqlalchemy/api.py", line 214, in bm_node_associate_and_update\n \'n_uuid\': node_uuid})\n', u'NovaException: Failed to associate instance 6d0128f8-a9fb-4036-974d-4caaa915d45d to baremetal node a476f747-6f05-469f-8a40-a7e38b7499c0.\n']

Tags: baremetal
Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (3.6 KiB)

 select * from bm_nodes;
+---------------------+---------------------+------------+---------+----+------+-----------+----------+-------------+---------------+-------------+--------------+------------------+--------------------------------------+------------+---------------+------------------------------------------------+-------------------------------------------------------+----------------------------------+---------+---------+--------------------------------------+---------------+
| created_at | updated_at | deleted_at | deleted | id | cpus | memory_mb | local_gb | pm_address | pm_user | pm_password | service_host | prov_mac_address | instance_uuid | task_state | terminal_port | image_path | pxe_config_path | deploy_key | root_mb | swap_mb | uuid | instance_name |
+---------------------+---------------------+------------+---------+----+------+-----------+----------+-------------+---------------+-------------+--------------+------------------+--------------------------------------+------------+---------------+------------------------------------------------+-------------------------------------------------------+----------------------------------+---------+---------+--------------------------------------+---------------+
| 2013-05-07 02:18:07 | 2013-05-07 22:14:01 | NULL | 0 | 1 | 1 | 512 | 10 | x.x.x.47 | Administrator | xxx | ubuntu | NULL | 0690eecc-ef18-4804-be34-206b122724f1 | error | NULL | /var/lib/nova/instances/instance-00000005/disk | /tftpboot/0690eecc-ef18-4804-be34-206b122724f1/config | QIZMMZKBKGABI3NI24JPQQO2A221ZCFO | 10240 | 1 | a3b04bf3-351f-4f7d-8d7b-878dec5b796d | foo |
| 2013-05-07 03:15:24 | 2013-05-07 03:29:25 | NULL | 0 | 2 | 1 | 512 | 10 | x.x.x.46 | Administrator | xxx | ubuntu | NULL | b74cbb15-edf8-4dbc-911c-430c0ff60a31 | building | NULL | /var/lib/nova/instances/instance-00000003/disk | /tftpboot/b74cbb15-edf8-4dbc-911c-430c0ff60a31/config | I0RJ5KCT69L1PCF1NGDQLNFQC8OASBEC | 10240 | 1 | 08bea2d7-e0ac-4224-8ee8-98495b93a13b | bmtest3 |
| 2013-05-07 03:27:40 | 2013-05-07 20:40:31 | NULL | 0 | 3 | 1 | 512 | 10 | x.x.x.45 | Administrator | xxx | ubuntu | NULL | 952b9558-c550-4b9d-9f9e-bcf152835161 | building | NULL | /var/lib/nova/instances/instance-00000004/disk | /tftpboot/952b9558-c550-4b9d-9f9e-bcf152835161/config | 4PRO1Q1Z249Z65C0NQYVSNK7BDGV52GN | 10240 | 1 | f665421b-65f4-4a6c-82d5-4e54bb647cff | bmtest-cmsj1 |
| 2013-05-07 22:27:31 | 2013-05-07 22:49:32 | NULL | 0 | 6 | 1 | 4096 | 20 | x.x.x.49 | Administrator | xxx | ubuntu | NULL | fdbb2896-0d00-465a-bb93-8fa7a89672e3 | building | NULL | /var/lib/nova/instances/instance-0000000c/disk | /tftpboot/fdbb2896-0d00-465a-bb93-8fa7a89672e3/config | 99MV3CH4CF9IERWH4WWKQD0H8VOMTUZ2 | 10240 | 1 | a476f74...

Read more...

Revision history for this message
aeva black (tenbrae) wrote :

This error came from nova/virt/baremetal/db/sqlalchemy/api.py:

203 with session.begin():
204 query = model_query(context, models.BareMetalNode,
205 session=session, read_deleted="no").\
206 filter_by(uuid=node_uuid)
207
208 count = query.filter_by(instance_uuid=None).\
209 update(values, synchronize_session=False)
210 if count != 1:
211 raise exception.NovaException(_(
212 "Failed to associate instance %(i_uuid)s to baremetal node "
213 "%(n_uuid)s.") % {'i_uuid': values['instance_uuid'],
214 'n_uuid': node_uuid})

It seems that the scheduler tried to allocate a new instance to node a476f747-6f05-469f-8a40-a7e38b7499c0, which is already associated to instance fdbb2896-0d00-465a-bb93-8fa7a89672e3. I'm not sure what could lead to this circumstance, but the error appears to be valid.

Revision history for this message
Robert Collins (lifeless) wrote :

Once I triggered this, all attempts to 'nova boot' spat it out. I had to reset my environment to brand new to boot anything.

Revision history for this message
Robert Collins (lifeless) wrote :

Also this only turned up when I had one(maybe more?) instances stuck in 'deleting', which there is another bug for. It may be only reproducable in that state.

Revision history for this message
aeva black (tenbrae) wrote :

It would be helpful to know how to reproduce this problem - the scheduler shouldn't have picked an already-allocated node - but I suspect it is related to one or both of these other bugs:

https://bugs.launchpad.net/nova/+bug/1178156
https://bugs.launchpad.net/nova/+bug/1177584

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

I bumped into this bug today

The logs are here: http://paste.openstack.org/show/37326/

Looking at the nova-api.log I can see it this warning message: "instance's host ubuntu is down, deleting from database" - I'm not familiar with the nova code - but looking at the api.py the _delete method will call _local_delete method in case of the compute node is down, and idk if the _local_delete method is actually deleting things from nova_bm database.

if not is_up:
    # If compute node isn't up, just delete from DB
    self._local_delete(context, instance, bdms)
    if reservations:
        QUOTAS.commit(context,
                      reservations,
                      project_id=project_id)
        reservations = None

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

As robert pointed out after this error all "nova boot" will fail. The only way I found to get it working again is changing the database directly:

Check if the instance UUID still related with ur bm uuid and then set it to NULL.

mysql> select instance_uuid,uuid from bm_nodes ;
+--------------------------------------+--------------------------------------+
| instance_uuid | uuid |
+--------------------------------------+--------------------------------------+
| 4fc39c94-e87b-4042-9053-c7774436c34f | 1739bc00-f1ce-4a47-b8f9-fdeee14af152 |
| NULL | 04901946-f4d4-43a4-87a3-d1cd247ef24c |
| NULL | d4ccc706-8246-4839-8659-e2b72af5f1ff |
| NULL | 3df8e64c-6d59-43fe-abb6-48c5db7e7360 |
+--------------------------------------+--------------------------------------+
4 rows in set (0.00 sec)

mysql> update bm_nodes set instance_uuid=NULL where instance_uuid="4fc39c94-e87b-4042-9053-c7774436c34f";

Revision history for this message
Robert Collins (lifeless) wrote :

devananda says this is to be expected - its part of the nova plumbing. What isn't desired or expected is for this to cause a failure at the 'nova boot' layer.

Revision history for this message
Robert Collins (lifeless) wrote :

I have triggered this when I have the virt power driver misconfigured (e.g. with the wrong user). nova boot will fail, and nova delete will then leave the instance_uuid in the baremetal row set.

Subsequent 'nova boot' attempts give the error reported in this bug.

Revision history for this message
Robert Collins (lifeless) wrote :

Joe and I looked at this in the Seattle sprint, it still happens :(.

description: updated
Revision history for this message
Joe Gordon (jogo) wrote :

nova baremetal is gone

Changed in nova:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.