regiond violates not-null constraint

Bug #2044851 reported by Amjad Chami
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Status tracked in 3.6
3.4
Won't Fix
Medium
Unassigned
3.5
Won't Fix
Medium
Unassigned
3.6
Triaged
Medium
Unassigned

Bug Description

When commissioning new machines, most machines succeed but one fails because it seems to fail to post back its dns info

mass version: 3.4.0

When the postgresql-ha fails to serialize access due to concurrent update:

2023-11-26 22:46:11.696 UTC [178210] maas@maasdb ERROR: could not serialize access due to concurrent update
2023-11-26 22:46:11.696 UTC [178210] maas@maasdb STATEMENT: SELECT "maasserver_staticipaddress"."id", "maasserver_staticipaddress"."created", "maasserver_staticipaddress"."updated", "maasserver_staticipaddress"."ip", "maasserver_staticipaddress"."alloc_type", "maasserver_staticipaddress"."subnet_id", "maasserver_staticipaddress"."user_id", "maasserver_staticipaddress"."lease_time", "maasserver_staticipaddress"."temp_expires_on" FROM "maasserver_staticipaddress" WHERE ("maasserver_staticipaddress"."alloc_type" = 6 AND "maasserver_staticipaddress"."ip" = '10.244.40.160'::inet) LIMIT 21 FOR UPDATE

50-maas-commisioning-1 throws the following error:

django.db.utils.IntegrityError: null value in column "source" of relation "maasserver_dnspublication" violates not-null constraint
        DETAIL: Failing row contains (96, 96, 2023-11-26 22:46:19.67433+00, null).
        CONTEXT: SQL statement "INSERT INTO maasserver_dnspublication
            (serial, created, source)
          VALUES
            (nextval('maasserver_zone_serial_seq'), now(),
             substring(reason FOR 255))"
        PL/pgSQL function sys_dns_publish_update(text) line 3 at SQL statement
        SQL statement "SELECT sys_dns_publish_update(
                'ip ' || host(sip.ip) || ' unlinked from resource ' ||
                COALESCE(resource.name, 'NULL') || ' on zone ' || domain.name)"
        PL/pgSQL function sys_dns_dnsresource_ip_unlink() line 17 at PERFORM

run: https://solutions.qa.canonical.com/testruns/bfd935c2-6cc8-4143-9722-9b424a848998
artefacts: https://oil-jenkins.canonical.com/artifacts/bfd935c2-6cc8-4143-9722-9b424a848998/index.html
files messages found in: /10.244.40.30/var/log/postgresql/postgresql-14-ha.log , 10.244.40.30/var/snap/maas/common/log/regiond.log

description: updated
Revision history for this message
Jacopo Rota (r00ta) wrote :

A first update: the problem seems to be in the trigger that calls `sys_dns_publish_update`
```
    CREATE OR REPLACE FUNCTION sys_dns_dnsresource_ip_unlink()
    RETURNS trigger as $$
    DECLARE
      sip maasserver_staticipaddress;
      resource maasserver_dnsresource;
      domain maasserver_domain;
    BEGIN
      SELECT maasserver_staticipaddress.* INTO sip
      FROM maasserver_staticipaddress
      WHERE maasserver_staticipaddress.id = OLD.staticipaddress_id;
      SELECT maasserver_dnsresource.* INTO resource
      FROM maasserver_dnsresource
      WHERE maasserver_dnsresource.id = OLD.dnsresource_id;
      SELECT maasserver_domain.* INTO domain
      FROM maasserver_domain
      WHERE maasserver_domain.id = resource.domain_id;
      IF sip.ip IS NOT NULL THEN
          PERFORM sys_dns_publish_update(
            'ip ' || host(sip.ip) || ' unlinked from resource ' ||
            COALESCE(resource.name, 'NULL') || ' on zone ' || domain.name);
      END IF;
      RETURN OLD;
    END;
```

if `sip.ip` == NULL, then `'ip ' || host(sip.ip)` == NULL and we hit the not-null contraint. I need to dig deeper on why
```
      SELECT maasserver_staticipaddress.* INTO sip
      FROM maasserver_staticipaddress
      WHERE maasserver_staticipaddress.id = OLD.staticipaddress_id;
```
produces a NULL sip. Maybe a race condition on the ip that changed or something

Revision history for this message
Amjad Chami (amjad-chami) wrote :
Bill Wear (billwear)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 3.5.0
milestone: 3.5.0 → 3.4.x
Changed in maas:
milestone: 3.4.x → 3.5.x
Revision history for this message
Anton Troyanov (troyanov) wrote :

I don't think it is related to a null `sip` but rather a non-existing `maasserver_dnsresource` leading to an empty `domain.name`.

The only place where we have `sys_dns_dnsresource_ip_unlink` trigger is:
```
register_procedure(DNS_DNSRESOURCE_IP_UNLINK)
register_trigger(
    "maasserver_dnsresource_ip_addresses",
    "sys_dns_dnsresource_ip_unlink",
    "delete",
)
```

And the only thing that might lead to the failure is a null value of `domain.name` (since `sip.ip` is guarded):
```
IF sip.ip IS NOT NULL THEN
      PERFORM sys_dns_publish_update(
        'ip ' || host(sip.ip) || ' unlinked from resource ' ||
        COALESCE(resource.name, 'NULL') || ' on zone ' || domain.name);
  END IF;
```

While `name` itself cannot be null according to the schema, the following query can produce no results, because `OLD.dnsresource_id` is pointing to `maasserver_dnsresource` that was removed:
```
SELECT maasserver_dnsresource.* INTO resource
FROM maasserver_dnsresource
WHERE maasserver_dnsresource.id = OLD.dnsresource_id;
SELECT maasserver_domain.* INTO domain
FROM maasserver_domain
WHERE maasserver_domain.id = resource.domain_id;
```

Bcause trigger is executed when we *delete* row from `maasserver_dnsresource_ip_addresses`, but there are no FK relation, so `maasserver_dnsresource_ip_addresses` might end up having an orphaned `dnsresource_id`

Here is an example orphaned row found in [0]
```
select *
from maasserver_dnsresource_ip_addresses t1
left join maasserver_dnsresource t2 on t1.dnsresource_id = t2.id
where t2.name is null;

 id | dnsresource_id | staticipaddress_id | id | created | updated | name | domain_id | address_ttl
----+----------------+--------------------+----+---------+---------+------+-----------+-------------
 12 | 8 | 27 | | | | | |
(1 row)
```

[0] https://solutions.qa.canonical.com/testruns/859d8396-5fe0-4c85-9fbd-26cce6ffdee7

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.