Instance ends up with multiple IP addresses

Bug #1648851 reported by Michael Petersen
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
New
Medium
Unassigned

Bug Description

MOS 9.0

When a nova-compute service was disabled, there were issues with placing a few VMs. This node wasn't working properly, which results in the error below:

20161129/node-1/nova-conductor.log:2016-11-29T15:19:57.403069+02:00 err: 2016-11-29 15:19:57.402 15799 ERROR nova.scheduler.utils [req-3f545e8f-5a81-4826-93b7-62ed2cb5df5d a9d6111cdec443569e960ee6a25ad2da 0fdb73785f644987b7d76e3bf0952d47 - - -] [instance: f180c69e-49c7-4eda-9eab-81afe10160f7] Error from last host: node-4.data.lt (node node-4.data.lt): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance f180c69e-49c7-4eda-9eab-81afe10160f7 was re-scheduled: internal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied\nfailed to initialize KVM: Permission denied\n\n']

The instance was then placed on another node with multiple IP addresses.

Below is the output from the instance with some data removed such as the full IP.

root@node-1:~# nova show f180c69e-49c7-4eda-9eab-81afe10160f7
+--------------------------------------+----------------------------------------------------------+
| Property | Value
|
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | AUTO
|
| OS-EXT-AZ:availability_zone | nova
|
| OS-EXT-SRV-ATTR:host | node-5
|
| OS-EXT-SRV-ATTR:hostname | testdoubleip
|
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-5
|
| OS-EXT-SRV-ATTR:instance_name | instance-00000ace
|
| OS-EXT-SRV-ATTR:kernel_id |
|
| OS-EXT-SRV-ATTR:launch_index | 0
|
| OS-EXT-SRV-ATTR:ramdisk_id |
|
| OS-EXT-SRV-ATTR:root_device_name | /dev/vda
|
| OS-EXT-SRV-ATTR:user_data | -
|
| OS-EXT-STS:power_state | 1
|
| OS-EXT-STS:task_state | -
|
| OS-EXT-STS:vm_state | active
|
| OS-SRV-USG:launched_at | 2016-11-29T13:20:12.000000
|
| OS-SRV-USG:terminated_at | -
|
| Public-Internet network | *.*.150.175, *.*.150.179
|
| accessIPv4 |
|
| accessIPv6 |
|
| config_drive | True
|
| created | 2016-11-29T13:19:48Z
|
| description | testdoubleip
|
| flavor | m1.small (2)
|
| hostId |
053c391872bca5448909951b1e8ec7d20106c0dde6c86727d3b0f557 |
| host_status | UP
|
| id |
f180c69e-49c7-4eda-9eab-81afe10160f7 |
| locked | False
|
| metadata | {}
|
| name | testdoubleip
|
| os-extended-volumes:volumes_attached | []
|
| progress | 0
|
| security_groups | default
|
| status | ACTIVE
+--------------------------------------+----------------------------------------------------------+

After completion of maintenance on the node it was put back into rotation and the issue cannot be replicated.

tags: added: customer-found
Revision history for this message
Michael Petersen (mpetason) wrote :

I do not believe it is a duplicate as it happened on more than one instance. The issue could be replicated until the nova-compute service was returned to service.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

I don't get your point. How does the number of affected instances make this a different problem? We see essentially the same issue: VM failed to spawn on one of the compute nodes, then it was (automatically) rescheduled to another node and we see that it now has two ports in Neutron, while we expect to see only one.

Revision history for this message
Michael Petersen (mpetason) wrote :

My mistake. The other bug was discussing race conditions. I didn't think this was a race condition as you were able to replicate the issue every time you tried to spin up an instance. The instances appeared to be scheduled to the same node, as it had the least amount of used resources, which would fail and then it would get rescheduled. If it's a duplicate then we can mark it as a duplicate.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Since the instance was rescheduled, that would explain the second ip address.
But that's really a bug: on rescheduling or spawning tear down (due to exception), the first neutron port should have been deleted.

Also, if's a duplicate bug, please add it to the comments, otherwise launchpad doesn't store it in bug's history.

Changed in mos:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.