Lost node from previous deployment seen as bootstrap, but is not functional

Bug #1250137 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
Medium
Alexandr Notchenko

Bug Description

iso 4.0-22 Havana
Precondition:
1. Deploy env with Ubuntu on KVM (1 controller + 1 compute + 2 ceph + rados) = Neutron gre - (deployment was not success)
2. Delete failed env
3. Wait while slave nodes was discovered after deletion
4. Try to deploy simple env on Centos (1controller/cinder + compute + Nova Flat DHCP)
5. Deployment hung on installation centos (1 node - centos was successfully intalled - second one stay at bootstrap)
ssh om admin node and execute command cobbler list:
[root@nailgun ~]# cobbler list
distros:
   bootstrap
   centos-x86_64
   ubuntu_1204_x86_64

profiles:
   bootstrap
   centos-x86_64
   ubuntu_1204_x86_64

systems:
   default
   node-8
   node-9

repos:

images:

mgmtclasses:

packages:

ssh on node - 9

And see that hostname is node-4 and ubuntu is installed in it
root@nailgun ~]# ssh node-9
Warning: the RSA host key for 'node-9' differs from the key for the IP address '10.108.0.7'
Offending key for IP in /root/.ssh/known_hosts:2
Matching host key in /root/.ssh/known_hosts:5
Are you sure you want to continue connecting (yes/no)? yes
Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.8.0-31-generic x86_64)

 * Documentation: https://help.ubuntu.com/
Last login: Mon Nov 11 15:30:18 2013 from 10.108.0.2
root@node-4:~#
node ip is:

eth0 Link encap:Ethernet HWaddr 64:0e:dd:b6:94:67
          inet addr:10.108.0.7 Bcast:10.108.0.255 Mask:255.255.255.0
          inet6 addr: fe80::660e:ddff:feb6:9467/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:22794 errors:0 dropped:3714 overruns:0 frame:0
          TX packets:11426 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:51768264 (51.7 MB) TX bytes:2875390 (2.8 MB)

Also there is error in nailgun on PUT for this node
13-11-11 13:41:22 ERROR (logger) Traceback (most recent call last):
2013-11-11 13:41:23 ERROR (logger) Response code '500 Internal Server Error' for PUT /api/nodes/ from 10.108.0.7:40075
2013-11-11 13:41:23 ERROR

(logger) Response code '500 Internal Server Error' for PUT /api/nodes/ from 10.108.0.7:40075

2013-11-11 13:41:22 ERROR

(logger) Traceback (most recent call last):
  File "/opt/nailgun/lib/python2.6/site-packages/web/application.py", line 239, in process
    return self.handle()
  File "/opt/nailgun/lib/python2.6/site-packages/web/application.py", line 230, in handle
    return self._delegate(fn, self.fvars, args)
  File "/opt/nailgun/lib/python2.6/site-packages/web/application.py", line 420, in _delegate
    return handle_class(cls)
  File "/opt/nailgun/lib/python2.6/site-packages/web/application.py", line 396, in handle_class
    return tocall(*args)
  File "<string>", line 2, in PUT
  File "/opt/nailgun/lib/python2.6/site-packages/nailgun/api/handlers/base.py", line 55, in content_json
    data = func(*args, **kwargs)
  File "/opt/nailgun/lib/python2.6/site-packages/nailgun/api/handlers/node.py", line 394, in PUT
    db().commit()
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/session.py", line 656, in commit
    self.transaction.commit()
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/session.py", line 314, in commit
    self._prepare_impl()
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/session.py", line 298, in _prepare_impl
    self.session.flush()
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/session.py", line 1583, in flush
    self._flush(objects)
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/session.py", line 1654, in _flush
    flush_context.execute()
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/unitofwork.py", line 331, in execute
    rec.execute(self)
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/unitofwork.py", line 475, in execute
    uow
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/persistence.py", line 59, in save_obj
    mapper, table, update)
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/orm/persistence.py", line 485, in _emit_update_statements
    execute(statement, params)
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1449, in execute
    params)
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1584, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1698, in _execute_context
    context)
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1691, in _execute_context
    context)
  File "/opt/nailgun/lib/python2.6/site-packages/sqlalchemy/engine/default.py", line 331, in do_execute
    cursor.execute(statement, parameters)
IntegrityError: (IntegrityError) null value in column "mac" violates not-null constraint
 'UPDATE nodes SET meta=%(meta)s, mac=%(mac)s, ip=%(ip)s WHERE nodes.id = %(nodes_id)s' {'nodes_id': 2, 'mac': None, 'meta': '{"system": {"fqdn": "node-2.test.domain.local", "manufacturer": "KVM"}, "interfaces": [{"mac": "64:91:2C:52:6D:69", "max_speed": null, "name": "eth3", "current_speed": null}, {"mac": "64:AB:6F:11:92:95", "max_speed": null, "name": "eth2", "current_speed": null}, {"mac": "64:33:9D:A0:A0:BF", "max_speed": null, "name": "eth1", "current_speed": null}, {"mac": "64:4C:D6:E3:4B:22", "max_speed": null, "name": "eth0", "current_speed": null}], "disks": [{"model": null, "disk": "disk/by-path/pci-0000:00:09.0-virtio-pci-virtio6", "name": "vdc", "size": 21474836480}, {"model": null, "disk": "disk/by-path/pci-0000:00:08.0-virtio-pci-virtio5", "name": "vdb", "size": 21474836480}, {"model": null, "disk": "disk/by-path/pci-0000:00:07.0-virtio-pci-virtio4", "name": "vda", "size": 21474836480}], "cpu": {"real": 0, "total": 1, "spec": [{"model": "QEMU Virtual CPU version 1.0", "frequency": 3410}]}, "memory": {"slots": 1, "total": 1073741824, "maximum_capacity": 1073741824, "devices": [{"type": "RAM", "size": 1073741824}]}}', 'ip': u'192.168.0.3'}

Seems that node was not deleted properly

Tags: nailgun
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
importance: Undecided → Critical
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Node failed to reboot after cluster deletion. Later it was successfully discovered as a new node.

First, we should alert user about 'new' pre-deployed nodes.
Second, we should not believe that mcollective on such node is able to reboot it.

We need a design for this use case.

Changed in fuel:
importance: Critical → Medium
summary: - Inconsist deployment of second environment after forst one has been
- deleted
+ Lost node from previous deployment seen as bootstrap, but is not
+ functional
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 4.0
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Dmitry Pyzhov (lux-place) → Alexandr Notchenko (anotchenko)
Evgeniy L (rustyrobot)
Changed in fuel:
status: New → Confirmed
status: Confirmed → Triaged
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Not reproducible

Changed in fuel:
status: Triaged → Invalid
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Invalid → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.