Having upgraded from 2023.1 to 2023.2 (2c817b3d7f01de44023f195c6e8de8853683a54a) we appear to see issues with all record changes to zones. This appears to relate in some way to how Designate is setting or checking zone serial numbers.
Upon adding a new record to a zone (for example an A record), the change is quickly reflected in both of our user-facing DNS servers but Designate keeps this record and the SOA in a PENDING state. Some time later these records move into an ERROR state. An explicit 'designate-manage pool update' does not improve the situation.
On the user-facing DNS servers we have regular messages such as:
Apr 02 10:06:26 ns1-mydomain.com named[1696326]: received control channel command 'showzone new.vm.mydomain.com '
Apr 02 10:06:26 ns1-mydomain.com named[1696326]: loading NZD config from '_default.nzd' for zone 'new.vm.mydomain.com'
Apr 02 10:06:26 ns1-mydomain.com named[1696326]: received control channel command 'modzone new.vm.mydomain.com { type slave; masters { 10.48.243.83 port 5354; 10.48.241.212 port 5354; 10.48.241.111 port 5354;}; file "slave.new.vm.mydomain.com.07644d3b-b142-4c6e-982e-da6d8dd0f982"; };'
Apr 02 10:06:26 ns1-mydomain.com named[1696326]: updated zone new.vm.mydomain.com in view _default via modzone
Apr 02 10:06:26 ns1-mydomain.com named[1696326]: client @0x7efc1c0013b8 10.48.243.83#41650: received notify for zone 'new.vm.mydomain.com'
Apr 02 10:06:26 ns1-mydomain.com named[1696326]: zone new.vm.mydomain.com/IN: notify from 10.48.243.83#41650: no serial
From Designate we see the following, with serial number checks returning an unexpected result:
Apr 02 10:24:46 infra3-designate-container-1dd51fe2 designate-worker[291935]: 2024-04-02 10:24:46.937 291935 DEBUG designate.storage.sqlalchemy [None req-0c581b93-b07a-4a25-b473-c884ff293c2b - - - - - -] Fetched zone <Zone id:'07644d3b-b142-4c6e-982e-da6d8dd0f982' type:'PRIMARY' name:'new.vm.mydomain.com.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1712050926' action:'UPDATE' status:'ERROR'> _find_zones /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/storage/sqlalchemy/__init__.py:406
Apr 02 10:24:46 infra3-designate-container-1dd51fe2 designate-worker[291935]: 2024-04-02 10:24:46.938 291935 DEBUG designate.worker.tasks.zone [None req-0c581b93-b07a-4a25-b473-c884ff293c2b - - - - - -] Polling serial=1712050926 for zone_name=new.vm.mydomain.com. zone_id=07644d3b-b142-4c6e-982e-da6d8dd0f982 action=UPDATE on ns=<PoolNameserver id:'af9ae98e-a7ea-4eb2-9487-e76c31cdad9f' host:'10.52.245.37' port:'53' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842'> __call__ /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/tasks/zone.py:391
Apr 02 10:24:46 infra3-designate-container-1dd51fe2 designate-worker[291935]: 2024-04-02 10:24:46.939 291935 DEBUG designate.worker.tasks.zone [None req-0c581b93-b07a-4a25-b473-c884ff293c2b - - - - - -] Polling serial=1712050926 for zone_name=new.vm.mydomain.com. zone_id=07644d3b-b142-4c6e-982e-da6d8dd0f982 action=UPDATE on ns=<PoolNameserver id:'d4cd165d-5962-4e96-a8d7-464667a96f51' host:'10.52.245.36' port:'53' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842'> __call__ /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/tasks/zone.py:391
Apr 02 10:24:46 infra3-designate-container-1dd51fe2 designate-worker[291935]: 2024-04-02 10:24:46.940 291935 DEBUG designate.worker.tasks.zone [None req-0c581b93-b07a-4a25-b473-c884ff293c2b - - - - - -] Found serial=1712050149 for zone_name=new.vm.mydomain.com. zone_id=07644d3b-b142-4c6e-982e-da6d8dd0f982 action=UPDATE on ns=<PoolNameserver id:'af9ae98e-a7ea-4eb2-9487-e76c31cdad9f' host:'10.52.245.37' port:'53' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842'> __call__ /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/tasks/zone.py:406
Apr 02 10:24:46 infra3-designate-container-1dd51fe2 designate-worker[291935]: 2024-04-02 10:24:46.942 291935 DEBUG designate.worker.tasks.zone [None req-0c581b93-b07a-4a25-b473-c884ff293c2b - - - - - -] Found serial=1712050149 for zone_name=new.vm.mydomain.com. zone_id=07644d3b-b142-4c6e-982e-da6d8dd0f982 action=UPDATE on ns=<PoolNameserver id:'d4cd165d-5962-4e96-a8d7-464667a96f51' host:'10.52.245.36' port:'53' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842'> __call__ /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/tasks/zone.py:406
Apr 02 10:24:46 infra3-designate-container-1dd51fe2 designate-worker[291935]: 2024-04-02 10:24:46.942 291935 DEBUG designate.worker.processing [None req-0c581b93-b07a-4a25-b473-c884ff293c2b - - - - - -] Finished Task(s): PollForZone, PollForZone in 0.004154s run /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/processing.py:77
Apr 02 10:24:46 infra3-designate-container-1dd51fe2 designate-worker[291935]: 2024-04-02 10:24:46.942 291935 DEBUG designate.worker.tasks.zone [None req-0c581b93-b07a-4a25-b473-c884ff293c2b - - - - - -] Results for polling zone_name=new.vm.mydomain.com. zone_id=07644d3b-b142-4c6e-982e-da6d8dd0f982 action=UPDATE serial=1712050926 query=DNSQueryResult(positives=0, no_zones=0, consensus_serial=0, results=[1712050149, 1712050149]) parse_query_results /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/tasks/zone.py:355
Apr 02 10:24:46 infra3-designate-container-1dd51fe2 designate-worker[291935]: 2024-04-02 10:24:46.943 291935 DEBUG designate.worker.tasks.zone [None req-0c581b93-b07a-4a25-b473-c884ff293c2b - - - - - -] Unsuccessful poll for zone_name=new.vm.mydomain.com. zone_id=07644d3b-b142-4c6e-982e-da6d8dd0f982 action=UPDATE on attempt=2 _do_poll /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/tasks/zone.py:501
After a long period of time the zone does eventually move to ACTIVE, but this could take perhaps half an hour. I'm not certain what triggers this, but I'm assuming it's a periodic sync task:
Apr 02 10:29:02 infra2-designate-container-e370b9d5 designate-worker[289929]: 2024-04-02 10:29:02.285 289929 DEBUG designate.storage.sqlalchemy [None req-c47aae2a-e029-4ad3-a184-c455722740f6 - - - - - -] Fetched zone <Zone id:'07644d3b-b142-4c6e-982e-da6d8dd0f982' type:'PRIMARY' name:'new.vm.mydomain.com.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1712050926' action:'UPDATE' status:'ERROR'> _find_zones /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/storage/sqlalchemy/__init__.py:406
Apr 02 10:29:02 infra2-designate-container-e370b9d5 designate-worker[289929]: 2024-04-02 10:29:02.287 289929 INFO designate.worker.tasks.zone [None req-c47aae2a-e029-4ad3-a184-c455722740f6 - - - - - -] Could not find serial=1712050926 for zone_name=new.vm.mydomain.com. zone_id=07644d3b-b142-4c6e-982e-da6d8dd0f982 action=UPDATE on enough nameservers
Apr 02 10:29:02 infra2-designate-container-e370b9d5 designate-worker[289929]: 2024-04-02 10:29:02.289 289929 DEBUG designate.worker.tasks.zone [None req-c47aae2a-e029-4ad3-a184-c455722740f6 - - - - - -] Updating status for zone_name=new.vm.mydomain.com. zone_id=07644d3b-b142-4c6e-982e-da6d8dd0f982 to action=UPDATE serial=1712050926 __call__ /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/tasks/zone.py:753
Apr 02 10:29:02 infra2-designate-container-e370b9d5 designate-worker[289929]: 2024-04-02 10:29:02.296 289929 DEBUG designate.worker.processing [None req-c47aae2a-e029-4ad3-a184-c455722740f6 - - - - - -] Finished Task(s): ZoneAction-Update in 156.227155s run /openstack/venvs/designate-28.1.0/lib/python3.10/site-packages/designate/worker/processing.py:77
I'd be happy to look through the logs for any other useful debug information. We're running bind9 1:9.18.18-0ubuntu0.22.04.2 and these are all Ubuntu Jammy systems.
This likely means that your designate-producer isn't configured properly. Is it running? Does it have all tasks enabled? Can you share some of the designate-producer logs?