MAAS

[2.x] psycopg2.IntegrityError: update or delete on table "maasserver_node" violates foreign key constraint "maasserver_event_node_id_xxx_fk_maasserver_node_id" on table "maasserver_event" DETAIL: Key (id)=(xx) is still referenced from table "maasserver_event".

Series 2.2
Bug #1726474

Bug #1726474 reported by Jason Hobbs on 2017-10-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Fix Released	Critical	Blake Rouse	MAAS 2.3.0beta3
	2.2	Won't Fix	Critical	Unassigned	MAAS 2.2.3

Bug Description

A "node delete" API request failed with an integrity error:

psycopg2.IntegrityError: update or delete on table "maasserver_node" violates foreign key constraint "maasserver_event_node_id_61e4521ce30bf078_fk_maasserver_node_id" on table "maasserver_event"
DETAIL: Key (id)=(26) is still referenced from table "maasserver_event".

This is in an HA setup, 3 region and 3 rack controllers. Logs are available here:

https://10.245.162.101/artifacts/f6a4b8b4-7ba6-49f0-b32a-c85569499618/cpe_cloud_232/infra-logs.tar

The integrity error stack trace can be seen in the regiond.log for 10.245.208.30.

Tags:

Related branches

~blake-rouse/maas:fix-1726474

Merged into maas:master at revision 313a2407f04281dc362e8488a50f5e9014c12b0a

MAAS Lander: Needs Fixing on 2017-10-27

Mike Pontillo (community): Approve on 2017-10-26

Andres Rodriguez (andreserl) on 2017-10-23

Changed in maas:
milestone:	none → 2.3.0beta3

Andres Rodriguez (andreserl) on 2017-10-23

Changed in maas:
importance:	Undecided → Critical
status:	New → Triaged

Andres Rodriguez (andreserl) on 2017-10-23

summary:

- psycopg2.IntegrityError: update or delete on table "maasserver_node"
- violates foreign key constraint
+ [2.x] psycopg2.IntegrityError: update or delete on table
+ "maasserver_node" violates foreign key constraint
"maasserver_event_node_id_xxx_fk_maasserver_node_id" on table
"maasserver_event" DETAIL: Key (id)=(xx) is still referenced from table
"maasserver_event".

Revision history for this message

Blake Rouse (blake-rouse) wrote on 2017-10-24:

Would it be possible that you are deleting a node why it is deploying?

I believe what is happening is that Django is deleting the node, which also involves the cascade delete of all the events for that node.

1. Region A starts a transaction.
2. Region A selects all events that are related to the node.
3. Region A deletes all selected events.
4. Region B starts a transaction.
5. Region B adds a new events.
6. Region B commits transaction.
7. Region A deletes the node.
8. Region A commits the transaction (which fails, since Region B added new events)

So back to the original question was the node still deploying or reboot and events where being created when you choose to delete this node?

Changed in maas:
status:	Triaged → Incomplete

Revision history for this message

Jason Hobbs (jason-hobbs) wrote on 2017-10-24:

In foundation engine, we add new nodes without knowing their MAC address (workaround bug 1707216) by taking the following approach:

1) add the node with a made up mac address and valid bmc credentials
2) maas tells the node to power on and pxe boot to perform commissioning
3) we delete the node before it starts pxe booting (right after the "add node" api call returns).
4) the node pxe boots and enlists
5) we poll listing nodes, matching the new node by its power parameters (ip address of bmc).

So, yes, the node would have been actively booting when we issued the delete API call.

Changed in maas:
status:	Incomplete → New

Revision history for this message

Björn Tillenius (bjornt) wrote on 2017-10-26:

Normally the situation Blake described would be solved by cascading deletes. However Django implements it's own cascading logic instead of relying on the database, and it doesn't seem to retry in case of errors like these.

Ideally it should be fixed in Django, but maybe we could retry the request if it fails like this?

Revision history for this message

Blake Rouse (blake-rouse) wrote on 2017-10-26:

The reason Django implements its own cascade delete logic is so it can send the pre_delete and post_delete signal for all objects that are cascade deleted. I don't think changing that logic is the correct path forward as that might cause other fallout.

I do think your correct in that we should retry in this case. We have builtin retry mechanism for handling serialization issues like this. I think that needs to be extended to handle this case correctly.

Revision history for this message

Björn Tillenius (bjornt) wrote on 2017-10-26:

Right. When I said it should be fixed in Django, I meant that it should expect serialization errors and automatically retry. But I don't see that happening so we'll have to fix it in our code.

Blake Rouse (blake-rouse) on 2017-10-26

Changed in maas:
status:	New → In Progress
assignee:	nobody → Blake Rouse (blake-rouse)

MAAS Lander (maas-lander) on 2017-10-26

Changed in maas:
status:	In Progress → Fix Committed

Andres Rodriguez (andreserl) on 2017-10-30

Changed in maas:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.