arm64 clustering hitting timeout (ADT failure)

Bug #1849530 reported by Kleber Sacilotto de Souza
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux-bluefield (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
lxd (Ubuntu)
Confirmed
Undecided
Unassigned
Bionic
Confirmed
High
Unassigned
tags: added: kernel-adt-failure
Changed in linux-bluefield (Ubuntu Bionic):
status: New → Confirmed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote : Re: lxd 3.0.3-0ubuntu1~18.04.1 ADT test failure on arm64

This has been failing a lot with all of our kernels, and even when util-linux or shadow is the trigger.

It seems to fail on the clustering membership test, at least on two different instance failures I checked.

Compared to amd64, I see some nodes are offline on arm64 with "no heartbeat".

+ /usr/bin/lxc cluster list --verbose
+-------+-------------------------+----------+---------+----------------------------------+
| NAME | URL | DATABASE | STATE | MESSAGE |
+-------+-------------------------+----------+---------+----------------------------------+
| node1 | https://10.1.1.101:8443 | YES | ONLINE | fully operational |
+-------+-------------------------+----------+---------+----------------------------------+
| node2 | https://10.1.1.102:8443 | YES | OFFLINE | no heartbeat since 51.32886019s |
+-------+-------------------------+----------+---------+----------------------------------+
| node3 | https://10.1.1.103:8443 | YES | OFFLINE | no heartbeat since 54.868743549s |
+-------+-------------------------+----------+---------+----------------------------------+
| node4 | https://10.1.1.104:8443 | NO | ONLINE | fully operational |
+-------+-------------------------+----------+---------+----------------------------------+
| node5 | https://10.1.1.105:8443 | NO | ONLINE | fully operational |
+-------+-------------------------+----------+---------+----------------------------------+

summary: - lxd 3.0.3-0ubuntu1~18.04.1 ADT test failure with linux-bluefield
- 5.0.0-1003.12
+ lxd 3.0.3-0ubuntu1~18.04.1 ADT test failure on arm64
Changed in linux-bluefield (Ubuntu):
status: New → Invalid
Changed in linux-bluefield (Ubuntu Bionic):
status: Confirmed → Invalid
Changed in lxd (Ubuntu Bionic):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Stéphane Graber (stgraber) wrote :

3.0.5 should be a bit more stable in that regard, when we get around to pushing it.
Until then, yes, armhf/arm64 will tend to be a bit more racy due to slow crypto on some CPUs.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lxd (Ubuntu):
status: New → Confirmed
summary: - lxd 3.0.3-0ubuntu1~18.04.1 ADT test failure on arm64
+ arm64 clustering hitting timeout (ADT failure)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.