armhf lxd container does not start on arm64 system

Bug #1522026 reported by Martin Pitt on 2015-12-02
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Undecided
Unassigned
lxc (Ubuntu)
High
Unassigned

Bug Description

I'm trying to set up armhf testing on an arm64 host, as that's what we have in Scalingstack (no armhf images yet). The host is Ubuntu 15.10, with lxd 0.20-0ubuntu4.1 (no PPA).

$ uname -a
Linux arm64-lxd-test 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 19:56:51 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux

$ lxc image list | grep arm
| ubuntu/xenial/armhf | a406edc85653 | no | ubuntu xenial armv7l (default) (20151202_04:37) | armv7l | 63.68MB | Dec 2, 2015 at 1:23pm (UTC) |

$ lxc launch ubuntu/xenial/armhf x1

Starting the container throws no error, and with debugging I don't see anything bad:

$ lxc start x1 --debug --verbose
DBUG[12-02|13:36:56] Fingering the daemon
DBUG[12-02|13:36:56] Raw response: {"type":"sync","status":"Success","status_code":200,"metadata":{"api_compat":1,"auth":"trusted","config":{"core.https_address":"10.43.41.223","images.remote_cache_expiry":"10"},"environment":{"addresses":["10.43.41.223"],"architectures":[4,3],"driver":"lxc","driver_version":"1.1.4","kernel":"Linux","kernel_architecture":"aarch64","kernel_version":"4.2.0-18-generic","server":"lxd","server_pid":1339,"server_version":"0.20","storage":"dir","storage_version":""}}}

DBUG[12-02|13:36:56] Pong received
DBUG[12-02|13:36:56] Raw response: {"type":"sync","status":"Success","status_code":200,"metadata":{"architecture":0,"config":{"volatile.base_image":"a406edc85653e7b3232ea1ae77e35b67dd42574cb4c7335e9b586a6b4ad6223c","volatile.eth0.hwaddr":"00:16:3e:38:aa:2c","volatile.last_state.idmap":"[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":100000,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":100000,\"Nsid\":0,\"Maprange\":65536}]"},"devices":{},"ephemeral":false,"expanded_config":{"volatile.base_image":"a406edc85653e7b3232ea1ae77e35b67dd42574cb4c7335e9b586a6b4ad6223c","volatile.eth0.hwaddr":"00:16:3e:38:aa:2c","volatile.last_state.idmap":"[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":100000,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":100000,\"Nsid\":0,\"Maprange\":65536}]"},"expanded_devices":{"eth0":{"hwaddr":"00:16:3e:38:aa:2c","nictype":"bridged","parent":"lxcbr0","type":"nic"}},"name":"x1","profiles":["default"],"status":{"status":"Stopped","status_code":102,"init":0,"ips":null}}}

DBUG[12-02|13:36:56] Putting {"action":"start","force":false,"timeout":-1}
 to http://unix.socket/1.0/containers/x1/state
DBUG[12-02|13:36:56] Raw response: {"type":"async","status":"OK","status_code":100,"operation":"/1.0/operations/f17b8722-1573-4af8-a365-bc450bce6654","resources":null,"metadata":null}

DBUG[12-02|13:36:56] 1.0/operations/f17b8722-1573-4af8-a365-bc450bce6654/wait
DBUG[12-02|13:36:57] Raw response: {"type":"sync","status":"Success","status_code":200,"metadata":{"created_at":"2015-12-02T13:36:56.76183Z","updated_at":"2015-12-02T13:36:57.059047Z","status":"Success","status_code":200,"resources":null,"metadata":null,"may_cancel":false}}

But the container is not running afterwards. I'm attaching /var/log/lxd/x1/lxc.log, but the most interesting bits are several

  WARN lxc_cgmanager - cgmanager.c:cgm_get:993 - do_cgm_get exited with error

and

           NOTICE lxc_start - start.c:post_start:1265 - '/sbin/init' started with pid '2028'
           WARN lxc_start - start.c:signal_handler:310 - invalid pid for SIGCHLD
           DEBUG lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
           DEBUG lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
           DEBUG lxc_commands - commands.c:lxc_cmd_get_state:579 - 'x1' is in 'RUNNING' state
           DEBUG lxc_start - start.c:signal_handler:314 - container init process exited

cgmanager.service itself is active and running, though.

Is there some way to get a console for this, like we used to have with "lxc-start -n foo -F"?

Martin Pitt (pitti) wrote :
description: updated
Martin Pitt (pitti) wrote :

Some more obvservations:

- I get exactly the same failure with lxc launch'ing a trusty armhf instance.
- arm64 lxd images work fine (tested trusty and wily, there are no xenial ones yet)

So I went down a level and tried with LXC:

 sudo lxc-create -n x1armhf -t ubuntu -- -r xenial -a armhf

This also fails, but with some more info:

$ sudo lxc-start -n x1armhf -F -l debug -o /dev/stderr
[..]
      lxc-start 1449065480.085 NOTICE lxc_start - start.c:start:1254 - exec'ing '/sbin/init'
      lxc-start 1449065480.085 NOTICE lxc_start - start.c:post_start:1265 - '/sbin/init' started with pid '13393'
      lxc-start 1449065480.085 WARN lxc_start - start.c:signal_handler:310 - invalid pid for SIGCHLD
      lxc-start 1449065480.086 DEBUG lxc_start - start.c:signal_handler:314 - container init process exited
      lxc-start 1449065480.086 DEBUG lxc_start - start.c:__lxc_start:1207 - Container violated its seccomp policy
      lxc-start 1449065480.086 DEBUG lxc_start - start.c:__lxc_start:1215 - Pushing physical nics back to host namespace
      lxc-start 1449065480.086 DEBUG lxc_start - start.c:__lxc_start:1218 - Tearing down virtual network devices used by container
      lxc-start 1449065480.086 WARN lxc_conf - conf.c:lxc_delete_network:2939 - failed to remove interface '(null)'
      lxc-start 1449065480.092 INFO lxc_error - error.c:lxc_error_set_and_log:55 - child <13393> ended on signal (31)
      lxc-start 1449065480.093 WARN lxc_conf - conf.c:lxc_delete_network:2939 - failed to remove interface '(null)'

and then it exits again (with code 0!), and there is no container running. Not sure if the "violated its seccomp policy" bit is interesting?

So one further step down: I directly downloaded and unpacked https://images.linuxcontainers.org/images/ubuntu/xenial/armhf/default/20151202_04:37/lxd.tar.xz:

$ sudo tar xpf lxd.tar.xz
$ sudo chroot rootfs/
# dpkg --print-architecture
armhf

nspawn fails too, with a different error message:

$ sudo systemd-nspawn -b -D rootfs/
Spawning container rootfs on /home/ubuntu/rootfs.
Press ^] three times within 1s to kill container.
Failed to create directory /home/ubuntu/rootfs/sys/fs/selinux: Read-only file system
Failed to create directory /home/ubuntu/rootfs/sys/fs/selinux: Read-only file system
/etc/localtime is not a symlink, not updating container timezone.
Container rootfs terminated by signal SYS.

In syslog I'm getting seccomp errors (from LXC and nspawn):

Dec 02 14:11:57 arm64-lxd-test audit[13536]: SECCOMP auid=1000 uid=0 gid=0 ses=1 pid=13536 comm="init" exe="/lib/systemd/systemd" sig=31 arch=40000028 syscall=45 compat=1 ip=0xf763abd6 code=0x0
Dec 02 14:15:03 arm64-lxd-test audit[25812]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 pid=25812 comm="systemd" exe="/lib/systemd/systemd" sig=31 arch=40000028 syscall=45 compat=1 ip=0xf718fbd6 code=0x0

Martin Pitt (pitti) wrote :

I'd love to try this on a trusty system too, but https://launchpad.net/~ubuntu-lxc/+archive/ubuntu/lxd-stable does not have packages for arm64.

Martin Pitt (pitti) wrote :

Ah, trusty-backports does not have arm64 either:

  * Arch-restrict all binaries to i386 amd64 armhf.
    The other architectures require a recent gccgo5 and golang 1.5 to build.

So I guess I'm better off doing this on wily?

Stéphane Graber (stgraber) wrote :

Oh, so the problem appears to be seccomp related... May well be the first time someone does armhf on armv8 and there's something wrong in LXC's seccomp or in libseccomp itself...

Stéphane Graber (stgraber) wrote :

Martin: can you try "lxc profile set default raw.lxc lxc.seccomp="that should override our default seccomp profile in LXD and avoid that particular problem.

Stéphane Graber (stgraber) wrote :

Moving this one over to LXC, a quick look at the code (thanks Serge) seems to indicate that we only support personalities with seccomp for x86 and power. A similar code path must be added for arm on aarch64.

Changed in lxd (Ubuntu):
status: New → Triaged
importance: Undecided → High
affects: lxd (Ubuntu) → lxc (Ubuntu)
Martin Pitt (pitti) wrote :

With "lxc profile set default raw.lxc lxc.seccomp=" I indeed get much further, thanks! It at least boots (although very slow, 1.5 mins), and a lot of stuff fails:

 dev-hugepages.mount loaded failed failed Huge Pages File System
● sys-kernel-debug.mount loaded failed failed Debug File System
● cloud-init.service loaded failed failed Initial cloud-init job (metadata service crawler)
● console-setup.service loaded failed failed Set console keymap
● pollinate.service loaded failed failed Seed the pseudo random number generator on first boot
● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems
● systemd-setup-dgram-qlen.service loaded failed failed Increase datagram queue length
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket

But that should be okay for now, doesn't hurt much.

Martin Pitt (pitti) wrote :

I built a package with Serge's patch at https://github.com/hallyn/lxc/commit/c3863ddbb and confirm that this works nicely. Thanks Serge!

Changed in lxc (Ubuntu):
status: Triaged → Fix Committed
Martin Pitt (pitti) wrote :

Note to self: Test this patch on ppc64el with a powerpc (32 bit) container.

Stéphane Graber (stgraber) wrote :

Martin: that won't work.

You can't change endianness in a chroot, only in a VM.

So if you have a ppc64eb system (commonly known as ppc64), then you can have a ppc32eb (commonly known as powerpc) chroot or container.

But you cannot have a ppc32eb or ppc64eb chroot or container on a ppc64el system.

Martin Pitt (pitti) wrote :

Ah, right. Then I'm afraid I can't test the patch on Power, as I only have access to LE machines.

Martin Pitt (pitti) on 2015-12-14
Changed in auto-package-testing:
status: New → Invalid
Changed in lxc (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers