IntegrityError while uploading leases - when there are reserved IP's on the dynamic range

Bug #1536604 reported by Andreas Hasenack
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
LaMont Jones

Bug Description

We upgraded MAAS 1.8.3+bzr4053-0ubuntu1~trusty1 to 1.9.0+bzr4533-0ubuntu1~trusty1 and started getting these errors in the logs:

maas.log:
Jan 21 11:22:36 atlas maas.lease_upload_service: [ERROR] Failed to upload leases: ('UNHANDLED', 'Unknown Error [atlas:pid=23547:cmd=UpdateLeases:ask=93f]')

regiond.log:
2016-01-21 11:25:36 [-] Unhandled failure dispatching AMP command. This is probably a bug. Please ensure that this error is handled within application code or declared in the signature of the UpdateLeases command. [atlas:pid=23547:cmd=UpdateLeases:ask=9
4b]
        Traceback (most recent call last):
          File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
            self.__bootstrap_inner()
          File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
            self.run()
          File "/usr/lib/python2.7/threading.py", line 763, in run
            self.__target(*self.__args, **self.__kwargs)
          File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/twisted.py", line 791, in _worker
            return super(ThreadPool, self)._worker()
        --- <exception caught here> ---
          File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
            result = context.call(ctx, function, *args, **kwargs)
          File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
            return func(*args,**kw)
          File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/twisted.py", line 200, in wrapper
            return func(*args, **kwargs)
          File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 501, in call_within_transaction
            return func_outside_txn(*args, **kwargs)
          File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 328, in retrier
            return func(*args, **kwargs)
          File "/usr/lib/python2.7/dist-packages/django/db/transaction.py", line 339, in inner
            return func(*args, **kwargs)
          File "/usr/lib/python2.7/dist-packages/maasserver/rpc/leases.py", line 51, in update_leases
            StaticIPAddress.objects.update_leases(nodegroup, leases)
          File "/usr/lib/python2.7/dist-packages/maasserver/models/staticipaddress.py", line 506, in update_leases
            subnet=subnet)
          File "/usr/lib/python2.7/dist-packages/django/db/models/manager.py", line 157, in create
            return self.get_queryset().create(**kwargs)
          File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 319, in create
            obj.save(force_insert=True, using=self.db)
          File "/usr/lib/python2.7/dist-packages/maasserver/models/cleansave.py", line 38, in save
            return super(CleanSave, self).save(*args, **kwargs)
          File "/usr/lib/python2.7/dist-packages/maasserver/models/timestampedmodel.py", line 70, in save
            return super(TimestampedModel, self).save(*args, **kwargs)
          File "/usr/lib/python2.7/dist-packages/django/db/models/base.py", line 545, in save
            force_update=force_update, update_fields=update_fields)
          File "/usr/lib/python2.7/dist-packages/django/db/models/base.py", line 573, in save_base
            updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
          File "/usr/lib/python2.7/dist-packages/django/db/models/base.py", line 654, in _save_table
            result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
          File "/usr/lib/python2.7/dist-packages/django/db/models/base.py", line 687, in _do_insert
            using=using, raw=raw)
          File "/usr/lib/python2.7/dist-packages/django/db/models/manager.py", line 232, in _insert
            return insert_query(self.model, objs, fields, **kwargs)
          File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 1511, in insert_query
            return query.get_compiler(using=using).execute_sql(return_id)
          File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 898, in execute_sql
            cursor.execute(sql, params)
          File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 53, in execute
            return self.cursor.execute(sql, params)
          File "/usr/lib/python2.7/dist-packages/django/db/utils.py", line 99, in __exit__
            six.reraise(dj_exc_type, dj_exc_value, traceback)
          File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 53, in execute
            return self.cursor.execute(sql, params)
        django.db.utils.IntegrityError: duplicate key value violates unique constraint "maasserver_staticipaddress_ip_key"
        DETAIL: Key (ip)=(10.96.4.45) already exists.

I'm almost sure not all nodes were recommissioned after the upgrade. I'll do that now.

Tags: landscape

Related branches

tags: added: kanban-cross-team
tags: removed: kanban-cross-team
tags: added: landscape
Revision history for this message
LaMont Jones (lamont) wrote :

Please let us know if recommissioning fixes things. Also, a copy of /var/lib/dhcp/dhcpd.leases would be helpful.

Gavin Panella (allenap)
Changed in maas:
status: New → Incomplete
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Not all nodes were recommissioned yet. Attached is the current dhcp leases file. The backtrace is still happening on IP 10.96.4.45. That machine is still live.

Changed in maas:
status: Incomplete → New
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

That IP (10.96.4.45) is one amongst hundreds that exist in the maasserver_staticipaddress table. It was allocated quite a while ago:
maasdb=# select * from maasserver_staticipaddress where ip='10.96.4.45';
  id | created | updated | ip | alloc_type | user_id | hostname | subnet_id
------+-------------------------------+-------------------------------+------------+------------+---------+----------+-----------
 1787 | 2015-04-23 21:52:18.021611+00 | 2015-04-23 21:52:18.021611+00 | 10.96.4.45 | 4 | 7 | | 10
(1 row)

It's live, though:
Jan 25 17:34:44 atlas dhcpd: DHCPREQUEST for 10.96.4.45 from 00:16:3e:57:13:d8 (juju-machine-0-lxc-17) via br0
Jan 25 17:34:44 atlas dhcpd: DHCPACK on 10.96.4.45 to 00:16:3e:57:13:d8 (juju-machine-0-lxc-17) via br0

That table has tons of "old" ips, about 700 from just 2015.

Revision history for this message
Gavin Panella (allenap) wrote :

I've marked this Critical because I assume it's preventing proper operation of MAAS. If that's not the case then we can downgrade the importance.

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
Changed in maas:
milestone: none → 1.9.1
Changed in maas:
milestone: 1.9.1 → 1.9.2
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I think it got a bit worse. I enlisted a new node in MAAS, and it got the same name as an existing one (lds-smokers). Not only that, but when I try to commission it, it gets the same IP as the existing lds-smokers node:

root@atlas:/var/log/maas# arping -I br0 10.96.13.239
ARPING 10.96.13.239 from 10.96.0.10 br0
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 1.071ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.886ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.829ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.757ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.839ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.886ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.815ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.826ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.807ms
Unicast reply from 10.96.13.239 [52:54:00:7C:3B:79] 0.734ms
Unicast reply from 10.96.13.239 [2C:59:E5:3B:11:20] 770.306ms
Unicast reply from 10.96.13.239 [2C:59:E5:3B:11:20] 770.510ms
Unicast reply from 10.96.13.239 [2C:59:E5:3B:11:20] 0.584ms
Unicast reply from 10.96.13.239 [2C:59:E5:3B:11:20] 0.597ms

root@atlas:/var/log/maas# host 10.96.13.239
239.13.96.10.in-addr.arpa domain name pointer lds-smokers.scapestack.

LaMont Jones (lamont)
Changed in maas:
assignee: nobody → LaMont Jones (lamont)
Revision history for this message
LaMont Jones (lamont) wrote :

There appear to be two issues here, after examination:

1) validation of the dynamic range does not appear to check for the existence of reserved IPs in the new range. (or others, for that matter, I suspect)

2) update_leases does not deal with a lease being assigned to a reserved address with any grace at all. At the very minimum, it should throw an error and CONTINUE PROCESSING THE LEASES.

summary: - IntegrityError while uploading leases
+ IntegrityError while uploading leases - when there are reserved IP's on
+ the dynamic range
Changed in maas:
importance: Critical → High
Changed in maas:
milestone: 1.9.2 → 1.9.3
Changed in maas:
status: Triaged → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.