dep8 test failing on machines with more than one interface on the default routes network

Bug #1734646 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
heartbeat (Debian)
Fix Released
Unknown
heartbeat (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

From the Test:

Setting up cluster-glue (1.0.12-5ubuntu2) ...
Adding group `haclient' (GID 118) ...
Done.
Warning: The home dir /var/lib/pacemaker you specified can't be accessed: No such file or directory
Adding system user `hacluster' (UID 112) ...
Adding new user `hacluster' (UID 112) with group `haclient' ...
ERROR: ld.so: object 'libeatmydata.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Not creating home directory `/var/lib/pacemaker'.
Created symlink /etc/systemd/system/multi-user.target.wants/logd.service → /lib/systemd/system/logd.service.
Setting up resource-agents (1:4.1.0~rc1-1) ...
resource-agents-deps.target is a disabled or a static unit, not starting it.
Setting up heartbeat (1:3.0.6-6) ...
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Setting up autopkgtest-satdep (0) ...
Processing triggers for libc-bin (2.26-0ubuntu2) ...
Processing triggers for systemd (235-2ubuntu3) ...
(Reading database ... 48537 files and directories currently installed.)
Removing autopkgtest-satdep (0) ...
autopkgtest [18:39:02]: test heartbeat: [-----------------------
● heartbeat.service - Heartbeat High Availability Cluster Communication and Membership
   Loaded: loaded (/lib/systemd/system/heartbeat.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sun 2017-11-26 18:39:02 UTC; 1min 0s ago
  Process: 2187 ExecStart=/usr/lib/heartbeat/heartbeat -f (code=exited, status=6)
 Main PID: 2187 (code=exited, status=6)

Nov 26 18:39:02 autopkgtest systemd[1]: Started Heartbeat High Availability Cluster Communication and Membership.
Nov 26 18:39:02 autopkgtest heartbeat[2187]: Nov 26 18:39:02 autopkgtest heartbeat: [2187]: ERROR: Illegal directive [enc1] in /etc/ha.d//ha.cf
Nov 26 18:39:02 autopkgtest heartbeat[2187]: [2187]: ERROR: Illegal directive [enc1] in /etc/ha.d//ha.cf
Nov 26 18:39:02 autopkgtest heartbeat[2187]: Nov 26 18:39:02 autopkgtest heartbeat: [2187]: ERROR: Heartbeat not started: configuration error.
Nov 26 18:39:02 autopkgtest heartbeat[2187]: [2187]: ERROR: Heartbeat not started: configuration error.
Nov 26 18:39:02 autopkgtest heartbeat[2187]: Nov 26 18:39:02 autopkgtest heartbeat: [2187]: ERROR: Configuration error, heartbeat not started.
Nov 26 18:39:02 autopkgtest heartbeat[2187]: [2187]: ERROR: Configuration error, heartbeat not started.
Nov 26 18:39:02 autopkgtest systemd[1]: heartbeat.service: Main process exited, code=exited, status=6/NOTCONFIGURED
Nov 26 18:39:02 autopkgtest systemd[1]: heartbeat.service: Failed with result 'exit-code'.

There are a few things important here:
1. on some platforms that was skipped so far (containers) but now breaks and makes this a blocker
2. It seems to fail on (at least) the network handling, it does seem to force a three digit network device name and then append a number, which then fails to be found.
  - x86: Illegal directive [ens2] in /etc/ha.d//ha.cf
  - s390x: ERROR: Illegal directive [enc1] in /etc/ha.d//ha.cf

Needs repro on such platforms (might work in local autopkgtest but fail on LP infra) to decide about next steps.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Actually this ran in the container fine, but now has different networking and thereby fails on s390x

Changed in heartbeat (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interesting as this comes down to the same as resource agent:

  ip route show 0/0 | cut -d ' ' -f 5

That in a s390x container delivers a valid "eth0" but it does not work in the current LP infra.
In my other case this was not good for a sed replacement.

Since s390x now urns in KVM it could very well be "correct" to be detected as "enc1".
Here heartbeat might be too old-fashined and check on ethX and fail otherwise?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm runs fine in:
- x86 local autopkgtest (qemu driver, so KVM as in LP)
- s390x container (eth0 device)
- s390x KVM Guest (enc2 device)

Here a snippet of the s390x-KVM execution (which should be closest to the one on LP)
$ sudo ./heartbeat
+ hostname
+ HOSTNAME=bionic-resource-agent
+ cut -d -f 5
+ ip route show 0/0
+ IFACE=enc2
+ cat
+ cat
+ cat
+ chmod 600 /etc/ha.d/authkeys
+ service heartbeat restart
+ sleep 60
+ service heartbeat status
● heartbeat.service - Heartbeat High Availability Cluster Communication and Membership
   Loaded: loaded (/lib/systemd/system/heartbeat.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-11-27 09:54:06 UTC; 1min 0s ago
 Main PID: 15853 (heartbeat)
    Tasks: 4 (limit: 4915)
   CGroup: /system.slice/heartbeat.service
           ├─15853 heartbeat: master control process
           ├─15866 heartbeat: FIFO reader
           ├─15867 heartbeat: write: bcast enc2
           └─15868 heartbeat: read: bcast enc2

Nov 27 09:54:37 bionic-resource-agent ip-request-resp(default)[16035]: received ip-request-resp IPaddr2::169.254.144.144/32/enc2 OK yes
Nov 27 09:54:37 bionic-resource-agent ResourceManager(default)[16049]: info: Acquiring resource group: bionic-resource-agent IPaddr2::169.254.144.
Nov 27 09:54:37 bionic-resource-agent /usr/lib/ocf/resource.d//heartbeat/IPaddr2(IPaddr2_169.254.144.144)[16126]: INFO: Resource is stopped
Nov 27 09:54:37 bionic-resource-agent ResourceManager(default)[16143]: info: Running /etc/ha.d//resource.d/IPaddr2 169.254.144.144/32/enc2 start
Nov 27 09:54:37 bionic-resource-agent IPaddr2(IPaddr2_169.254.144.144)[16222]: INFO: Adding inet address 169.254.144.144/32 to device enc2
Nov 27 09:54:37 bionic-resource-agent IPaddr2(IPaddr2_169.254.144.144)[16228]: INFO: Bringing device enc2 up
Nov 27 09:54:37 bionic-resource-agent IPaddr2(IPaddr2_169.254.144.144)[16234]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-
Nov 27 09:54:37 bionic-resource-agent /usr/lib/ocf/resource.d//heartbeat/IPaddr2(IPaddr2_169.254.144.144)[16239]: INFO: Success
Nov 27 09:54:47 bionic-resource-agent heartbeat[15853]: [15853]: info: Local Resource acquisition completed. (none)
Nov 27 09:54:47 bionic-resource-agent heartbeat[15853]: [15853]: info: local resource transition completed.
+ grep 169.254.144.144
+ ip a
    inet 169.254.144.144/32 scope global enc2

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I need to find what really happens in a LP test environment, but that is hard.
I tried if a "wrong" detected device might be the reason, but if that is set it would be:
     ERROR: glib: Get broadcast for interface enc2000 failed: No such device

But the error is on a directive, so it seems different, a newline could explain that:
If I start a line with the device name it is the error I see.
    ERROR: Illegal directive [enc2] in /etc/ha.d//ha.cf

So maybe "ip route show 0/0 | cut -d ' ' -f 5" returns multiple lines in bionic/s390x (and others).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Yeah if there are multiple connects to the same network we could have multiple default routes.
That would break the logic and kill all those tests.
Actually just pick one should be good.

summary: - dep8 test failing on many architectures
+ dep8 test failing on machines with more than one interface on the
+ default routes network
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package heartbeat - 1:3.0.6-6ubuntu1

---------------
heartbeat (1:3.0.6-6ubuntu1) bionic; urgency=medium

  * d/t/heartbeat: pick only one test device (LP: #1734646)

 -- Christian Ehrhardt <email address hidden> Mon, 27 Nov 2017 11:22:30 +0100

Changed in heartbeat (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Kicked all currently blocked on these in update-excuses so the new version can unlock those as well.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Fix reported to Debian and bug linked here

Changed in heartbeat (Debian):
status: Unknown → New
Changed in heartbeat (Debian):
status: New → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Changes picked by Debian, new version already in b-proposed (not sure who kicked the sync).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.