Need support for rewriting mount sources

Bug #1580765 reported by Brian Candler
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
criu (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Platform is two identical Intel DN2820 NUC (bay trail) running fully up-to-date Ubuntu 16.04

* The command you ran and the error message as displayed to you

nsrc@nuc1:~$ lxc move sample nuc2:sample2
error: Error transferring container data: restore failed:
(00.133553) 1: Error (mount.c:2406): mnt: Can't mount at ./dev/.lxd-mounts: No such file or directory
(00.145850) Error (cr-restore.c:1352): 3438 killed by signal 9
(00.194417) Error (cr-restore.c:2182): Restoring FAILED.
nsrc@nuc1:~$

* Output of “lxc info” (*)

nsrc@nuc1:~$ lxc info
apicompat: 0
auth: trusted
environment:
  addresses:
  - 10.10.0.238:8443
  - 10.10.0.237:8443
  - '[2001:db8:100::238]:8443'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIGGTCCBAGgAwIBAgIQd50ovE3VaC8aSqnl0TNnFTANBgkqhkiG9w0BAQsFADA+
    MRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMR4wHAYDVQQDDBVyb290QG51
    YzEud3MubnNyYy5vcmcwHhcNMTYwNTExMDcyNjE5WhcNMjYwNTA5MDcyNjE5WjA+
    MRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMR4wHAYDVQQDDBVyb290QG51
    YzEud3MubnNyYy5vcmcwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQDb
    ob53ADqW1gFnSEwHKQLfIOE0wFhgw5BROHcBbZUMyvuomGDDLQb3sy/RFQ9zbb/8
    CBk7d1ozB3TZxP+yOGlvPztWQeX6TLj3gXg/k8Gd2q3yZxtNWX9mKoc2d4SLUyjj
    laKiHlM7QxR6Ei+GvKUMtoS2FoOPghB46uXblP+lYyiZ2QQ3Kf2vzziOrz0Gvnkv
    EW+L/7KHdR/sCnWT8nBOKMMdeZIZrqJoBoaPIjiVwL8/fySdeS9JsPMNiOEXRoug
    T6azVDZHwk6O0FMCzgabePPFcoJtHMsgq4lHZxd8n4OJQddrr4/x/3yT0E3J++h8
    LAth8iURYZjWcz1vOHnxKL29aPg5TbhRlBUT04a/XZ9kKjnOUJA2I4p6FbftkjrF
    yF0aZbaE0exA5yLXjrEyizIrfYHdVHS3WhSazt3gPamko88WOiKN/N0RcWUiFrHh
    tEk+ELx9DM5LQY8IUdjDCgCFg53jeWCDlzWWQbao6TKN89Q+3yBu5nDeOgjmn2H/
    Q/FY1OPnspH3+6IMtc8N2PVBEfrrnsvUYe0W7+/AAQqFb+c0AdC7IsmEqagpcFjH
    Mchj0EjLbme6iFUIo+Nwyfl6cwXezAPgzJv9dhtQUiPEV3nCAVAolpRuLTKeKTve
    bpXwLCcfKJOmtBEBK++/GGHN6016em8EoZhuhUscCwIDAQABo4IBETCCAQ0wDgYD
    VR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAw
    gdcGA1UdEQSBzzCBzIIQbnVjMS53cy5uc3JjLm9yZ4IcZmU4MDo6YzIzZjpkNWZm
    OmZlNjM6NjQxMS82NIIOMTAuMTAuMC4yMzgvMjSCDjEwLjEwLjAuMjM3LzMyghQy
    MDAxOmRiODoxMDA6OjIzOC82NIIcZmU4MDo6YzIzZjpkNWZmOmZlNjM6NjQxMS82
    NIIcZmU4MDo6ZmNkNTo5NGZmOmZlNTA6OTZjYS82NIIcZmU4MDo6ODQ2MTpkZGZm
    OmZlNmY6M2FhOC82NIIKZmU4MDo6MS82NDANBgkqhkiG9w0BAQsFAAOCAgEA1H/9
    mvHGVUEG9eBZDyQ56HPMQR0L3ZnKgjSY8e2Kk0cusE7M2zNz0ErMHniFKerIMq42
    ZNlE7leceuUONXMrRcobwJrtb3OxzjEAvSX2ymAG+yMvhlV5rllw8bkXEctp+0n/
    +IvoSwsaeM5rb9E9BQBvkv4x1aiNAcqXRo1+EBtgQABBU/ATl2Y4PqIJXyL0O6yl
    SpYZrS2w5vsi0kgdN3mdtLFErrqwdsq7sfXyWSFBNrjG0H9X3ThgHC8AD8O+BqgD
    7EwOewMHsmdkMhuOCe1VoiEY6ahaf/+Y5zdXF1HLWZoTMSKeyickCl3g/9U34se7
    ReuaoyJ8ShNEciBMz+h8jQRMvzlUzE/hle2O9CZqOH8gK9K4eoi0+pBQqPgL6MZf
    J+0sVEQH5gtd5lPasZ78Y464ZO05iuZHfTm55fS+K1+zPYksz6XpjMNgoPfQBSlU
    Xrh8yAhE+5ItTcg+d8J+ZZkVDA+RI8/28scMbRiPpC/oi9AgQ+n9Izi3Pr5QtC4i
    XW8RMD19lRT6Lcd8COzUas4InoK2Fd0g6tA4cXrbGcPYjBZNMsWEOxI5qye6hPWk
    LW/sUiiuvnvLYEpa9NvVMuliGXiI0hCZwoH0YqTc4amtl3sA/nFaU//6Skvzd/TC
    n/pIX6cnD5CElJLE1LwIz+KGxUmsEk80ugQsT1A=
    -----END CERTIFICATE-----
  driver: lxc
  driverversion: 2.0.0
  kernel: Linux
  kernelarchitecture: x86_64
  kernelversion: 4.4.0-22-generic
  server: lxd
  serverpid: 1955
  serverversion: 2.0.0
  storage: dir
  storageversion: ""
config:
  core.https_address: '[::]:8443'
  core.trust_password: true
public: false
nsrc@nuc1:~$ lxc info nuc2:
apicompat: 0
auth: trusted
environment:
  addresses:
  - 10.10.0.239:8443
  - '[2001:db8:100::239]:8443'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIGRzCCBC+gAwIBAgIRANszGjXLHPsMykOckgF9VZ0wDQYJKoZIhvcNAQELBQAw
    PjEcMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEeMBwGA1UEAwwVcm9vdEBu
    dWMyLndzLm5zcmMub3JnMB4XDTE2MDUxMTA3MzY1MFoXDTI2MDUwOTA3MzY1MFow
    PjEcMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEeMBwGA1UEAwwVcm9vdEBu
    dWMyLndzLm5zcmMub3JnMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA
    v3IR6c/EoP+MWBHyd4/6rgpYGpPw30LQa8fI4JtwM097UvReHFU+qN5VryMDw0oU
    EPxVX9uWnevNpfNhGn8eaf1JPMK/JqpsAyR8Xd5aiHvJypl7cpXa6du3BDn4h642
    UBexyzLTPjDuUr7J/Md5a+J850ttLsMbRULUuCSBn5+ZPthMYpdV4NcNFfGndsqH
    VellP9WGqxeIZn8fEB7YGt1+ubGuMzdNhLC7FzuqUX1imudBrX16hcIZDyMtCL/Y
    4WosGDyR/jTuWU/+sTkkNtRDWDHGpCL3FaDwG2dnvKV2uxgAk5gLo2fJNrjOb/sa
    6Ixx3Ro6MHLEyd8a7xXco5q92MbO4PB2PGwVaeZdvAXEGeEA0GKgf/S0zipFHfr3
    Yz9yAvPsQr+eiB0wG2ekVkVuEioK52ebLTILXcYkE6sViVt4yvyGeMBYY2cc9U0q
    bg3Xb1YC0rzkRiwPLoywD7w/W+XZp2e3MSY7i2szDLpwRuXGANT+amYsbpWudE78
    ckLey/F+XiUOg4UV6hGIH0t8CJ20ePlkrupIV3W2d3KIC3CaeRpQrgz8iZHW9/uC
    SBn8tU66Ov9TsmVL3py3Kabgsw+T12/TZm1dli+sySWuYoh2L2/W/fN4ZvPxb5kI
    NeJZREtgXNn3BjkVd76fa0uOhBibLt6kYCD6o638TesCAwEAAaOCAT4wggE6MA4G
    A1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAA
    MIIBAwYDVR0RBIH7MIH4ghBudWMyLndzLm5zcmMub3JnghxmZTgwOjpjMjNmOmQ1
    ZmY6ZmU2NDo4OTIxLzY0gg4xMC4xMC4wLjIzOS8yNIIUMjAwMTpkYjg6MTAwOjoy
    MzkvNjSCHGZlODA6OmMyM2Y6ZDVmZjpmZTY0Ojg5MjEvNjSCHGZlODA6OjM0ZmI6
    MjFmZjpmZTJkOmI2NzkvNjSCHGZlODA6OmZjMmI6NDZmZjpmZTkyOjMzOTgvNjSC
    HGZlODA6OmZjZWM6NmNmZjpmZTI1Ojg3YTEvNjSCHGZlODA6OmUwNjI6MTdmZjpm
    ZTcwOmE5NjQvNjSCCmZlODA6OjEvNjQwDQYJKoZIhvcNAQELBQADggIBACwIK8DF
    MO9O747tvvDpdUSnuwMu5Z4sMlJZ/s4YB/mLBFMKKxihXkszIgzNmb6BARDmu21D
    VUWYFMGLtYxRZqWkMe2fn+xR/VftQewK8v8Q4oYIZCVklC3QuWcndH54DcRNsVHm
    E5d/TMfSE7QciAae96ecHOjBnYZdR6B6l9Mj1zuNNhfubCJy1dPEx+UVieEIhmkX
    UnKSBjF5Kr7WHa48/9a15zuDqk391sZunzJG3CTWG8DsY1MmH0M9cVzObmNyUGoD
    UuFNmfvSYYWnwyqE7+7Wp3ZJZQhGiaiZ9Rm67zyPg8tljq358BNGVx2Tj94wHRr4
    k5/f/Erja7QMHAIsXiyZ1Z3A75QCBCmUk+gaS0E5PvgqxYm+R1bqVBg++TbBXFGl
    7ytvFBUZjOF3zQ493H9SSfGQG7SCi05RgqnMXLLPsCbYQiNNkKOigGz8PoxBvdfu
    0vsxDLDx9N6tKcYAsEibvE7moNFeN+AVMjn4kJfP9YJ/JcxyB6R3Ka9ly/r66hvq
    BX9wT+XbZIE2ZmXkwTYwBs5qfOHThvSxP5E3xAUzV+hL26Fqo120R48qowJbU83O
    E028fWeLUBy+eioDvAmsNXOLf4rKcjiSFRJIHJ2hRd1mUEtZteOvKiqd+5Q5qY3r
    Bv6OSpImx6p5H/VwKZaJnSlh2t3shNZT5ghh
    -----END CERTIFICATE-----
  driver: lxc
  driverversion: 2.0.0
  kernel: Linux
  kernelarchitecture: x86_64
  kernelversion: 4.4.0-22-generic
  server: lxd
  serverpid: 2017
  serverversion: 2.0.0
  storage: dir
  storageversion: ""
config:
  core.https_address: '[::]:8443'
  core.trust_password: true
public: false

* Output of “lxc info <container name>”

nsrc@nuc1:~$ lxc info sample
Name: sample
Architecture: x86_64
Created: 2016/05/11 19:53 UTC
Status: Stopped
Type: persistent
Profiles: br-lan
nsrc@nuc1:~$ lxc info nuc2:sample2
error: not found

* Output of “lxc config show –expanded <container name>”

nsrc@nuc1:~$ lxc config show --expanded sample
name: sample
profiles:
- br-lan
config:
  volatile.base_image: f4c4c60a6b752a381288ae72a1689a9da00f8e03b732c8d1b8a8fcd1a8890800
  volatile.eth0.hwaddr: 00:16:3e:19:84:90
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br-lan
    type: nic
  root:
    path: /
    type: disk
ephemeral: false

* Output of “dmesg” (*)

In attached tarball

* Output of “/proc/self/mountinfo” (*)

nsrc@nuc1:~$ cat /proc/self/mountinfo
18 23 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
19 23 0:4 / /proc rw,nosuid,nodev,noexec,relatime shared:12 - proc proc rw
20 23 0:6 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=1947556k,nr_inodes=486889,mode=755
21 20 0:14 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
22 23 0:18 / /run rw,nosuid,noexec,relatime shared:5 - tmpfs tmpfs rw,size=393352k,mode=755
23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
24 18 0:12 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8 - securityfs securityfs rw
25 20 0:19 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw
26 22 0:20 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k
27 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:9 - tmpfs tmpfs ro,mode=755
28 27 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/
29 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:11 - pstore pstore rw
30 27 0:24 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,blkio,nsroot=/
31 27 0:25 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,pids,nsroot=/
32 27 0:26 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,devices,nsroot=/
33 27 0:27 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,net_cls,net_prio,nsroot=/
34 27 0:28 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,cpu,cpuacct,nsroot=/
35 27 0:29 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,cpuset,nsroot=/
36 27 0:30 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,memory,nsroot=/
37 27 0:31 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,freezer,nsroot=/
38 27 0:32 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,perf_event,nsroot=/
39 27 0:33 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,hugetlb,nsroot=/
40 20 0:16 / /dev/mqueue rw,relatime shared:23 - mqueue mqueue rw
41 18 0:7 / /sys/kernel/debug rw,relatime shared:24 - debugfs debugfs rw
42 20 0:34 / /dev/hugepages rw,relatime shared:25 - hugetlbfs hugetlbfs rw
43 19 0:35 / /proc/sys/fs/binfmt_misc rw,relatime shared:26 - autofs systemd-1 rw,fd=36,pgrp=1,timeout=0,minproto=5,maxproto=5,direct
44 18 0:36 / /sys/fs/fuse/connections rw,relatime shared:27 - fusectl fusectl rw
45 23 252:0 / /iso rw,relatime shared:28 - ext4 /dev/mapper/ganeti-iso rw,data=ordered
46 22 0:37 / /run/lxcfs/controllers rw,relatime shared:29 - tmpfs tmpfs rw,size=100k,mode=700
47 46 0:33 / /run/lxcfs/controllers/hugetlb rw,relatime shared:30 - cgroup hugetlb rw,hugetlb,nsroot=/
48 46 0:32 / /run/lxcfs/controllers/perf_event rw,relatime shared:31 - cgroup perf_event rw,perf_event,nsroot=/
49 46 0:31 / /run/lxcfs/controllers/freezer rw,relatime shared:32 - cgroup freezer rw,freezer,nsroot=/
50 46 0:30 / /run/lxcfs/controllers/memory rw,relatime shared:33 - cgroup memory rw,memory,nsroot=/
51 46 0:29 / /run/lxcfs/controllers/cpuset rw,relatime shared:34 - cgroup cpuset rw,cpuset,nsroot=/
52 46 0:28 / /run/lxcfs/controllers/cpu,cpuacct rw,relatime shared:35 - cgroup cpu,cpuacct rw,cpu,cpuacct,nsroot=/
53 46 0:27 / /run/lxcfs/controllers/net_cls,net_prio rw,relatime shared:36 - cgroup net_cls,net_prio rw,net_cls,net_prio,nsroot=/
54 46 0:26 / /run/lxcfs/controllers/devices rw,relatime shared:37 - cgroup devices rw,devices,nsroot=/
55 46 0:25 / /run/lxcfs/controllers/pids rw,relatime shared:38 - cgroup pids rw,pids,nsroot=/
56 46 0:24 / /run/lxcfs/controllers/blkio rw,relatime shared:39 - cgroup blkio rw,blkio,nsroot=/
57 46 0:22 / /run/lxcfs/controllers/name=systemd rw,relatime shared:40 - cgroup name=systemd rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/
58 23 0:38 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
59 23 8:2 /var/lib/lxd/shmounts /var/lib/lxd/shmounts rw,relatime shared:1 - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
61 22 0:40 / /run/user/1000 rw,nosuid,nodev,relatime shared:43 - tmpfs tmpfs rw,size=393352k,mode=700,uid=1000,gid=1000
111 43 0:44 / /proc/sys/fs/binfmt_misc rw,relatime shared:42 - binfmt_misc binfmt_misc rw

nsrc@nuc2:~$ cat /proc/self/mountinfo
18 23 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
19 23 0:4 / /proc rw,nosuid,nodev,noexec,relatime shared:12 - proc proc rw
20 23 0:6 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=1947536k,nr_inodes=486884,mode=755
21 20 0:14 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
22 23 0:18 / /run rw,nosuid,noexec,relatime shared:5 - tmpfs tmpfs rw,size=393352k,mode=755
23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
24 18 0:12 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8 - securityfs securityfs rw
25 20 0:19 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw
26 22 0:20 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k
27 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:9 - tmpfs tmpfs ro,mode=755
28 27 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/
29 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:11 - pstore pstore rw
30 27 0:24 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,hugetlb,nsroot=/
31 27 0:25 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,cpu,cpuacct,nsroot=/
32 27 0:26 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,cpuset,nsroot=/
33 27 0:27 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,blkio,nsroot=/
34 27 0:28 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,net_cls,net_prio,nsroot=/
35 27 0:29 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,perf_event,nsroot=/
36 27 0:30 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,memory,nsroot=/
37 27 0:31 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,freezer,nsroot=/
38 27 0:32 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,pids,nsroot=/
39 27 0:33 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,devices,nsroot=/
40 19 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:23 - autofs systemd-1 rw,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct
41 20 0:35 / /dev/hugepages rw,relatime shared:24 - hugetlbfs hugetlbfs rw
42 18 0:7 / /sys/kernel/debug rw,relatime shared:25 - debugfs debugfs rw
43 20 0:16 / /dev/mqueue rw,relatime shared:26 - mqueue mqueue rw
44 18 0:36 / /sys/fs/fuse/connections rw,relatime shared:27 - fusectl fusectl rw
45 23 252:0 / /iso rw,relatime shared:28 - ext4 /dev/mapper/ganeti-iso rw,data=ordered
46 22 0:37 / /run/lxcfs/controllers rw,relatime shared:29 - tmpfs tmpfs rw,size=100k,mode=700
47 46 0:33 / /run/lxcfs/controllers/devices rw,relatime shared:30 - cgroup devices rw,devices,nsroot=/
48 46 0:32 / /run/lxcfs/controllers/pids rw,relatime shared:31 - cgroup pids rw,pids,nsroot=/
49 46 0:31 / /run/lxcfs/controllers/freezer rw,relatime shared:32 - cgroup freezer rw,freezer,nsroot=/
50 46 0:30 / /run/lxcfs/controllers/memory rw,relatime shared:33 - cgroup memory rw,memory,nsroot=/
51 46 0:29 / /run/lxcfs/controllers/perf_event rw,relatime shared:34 - cgroup perf_event rw,perf_event,nsroot=/
52 46 0:28 / /run/lxcfs/controllers/net_cls,net_prio rw,relatime shared:35 - cgroup net_cls,net_prio rw,net_cls,net_prio,nsroot=/
53 46 0:27 / /run/lxcfs/controllers/blkio rw,relatime shared:36 - cgroup blkio rw,blkio,nsroot=/
54 46 0:26 / /run/lxcfs/controllers/cpuset rw,relatime shared:37 - cgroup cpuset rw,cpuset,nsroot=/
55 46 0:25 / /run/lxcfs/controllers/cpu,cpuacct rw,relatime shared:38 - cgroup cpu,cpuacct rw,cpu,cpuacct,nsroot=/
56 46 0:24 / /run/lxcfs/controllers/hugetlb rw,relatime shared:39 - cgroup hugetlb rw,hugetlb,nsroot=/
57 46 0:22 / /run/lxcfs/controllers/name=systemd rw,relatime shared:40 - cgroup name=systemd rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/
58 23 0:38 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
59 23 8:2 /var/lib/lxd/shmounts /var/lib/lxd/shmounts rw,relatime shared:1 - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
60 22 0:39 / /run/user/1000 rw,nosuid,nodev,relatime shared:42 - tmpfs tmpfs rw,size=393352k,mode=700,uid=1000,gid=1000
61 22 0:40 / /run/user/0 rw,nosuid,nodev,relatime shared:43 - tmpfs tmpfs rw,size=393352k,mode=700

* Output of “lxc exec <container name> — cat /proc/self/mountinfo”

nsrc@nuc1:~$ lxc exec sample -- cat /proc/self/mountinfo
error: Container is not running.

<< it had died after failed migration attempt >>

nsrc@nuc1:~$ lxc start sample
nsrc@nuc1:~$ lxc exec sample -- cat /proc/self/mountinfo
60 62 8:2 /var/lib/lxd/containers/sample/rootfs / rw,relatime master:1 - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
109 60 0:39 / /dev rw,nodev,relatime - tmpfs none rw,size=492k,mode=755,uid=100000,gid=100000
110 60 0:42 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
112 60 0:43 / /sys rw,nodev,relatime - sysfs sysfs rw
113 110 0:44 / /proc/sys/fs/binfmt_misc rw,relatime master:42 - binfmt_misc binfmt_misc rw
114 112 0:36 / /sys/fs/fuse/connections rw,relatime master:27 - fusectl fusectl rw
115 112 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime master:11 - pstore pstore rw
116 112 0:7 / /sys/kernel/debug rw,relatime master:24 - debugfs debugfs rw
117 112 0:12 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime master:8 - securityfs securityfs rw
118 109 0:16 / /dev/mqueue rw,relatime master:23 - mqueue mqueue rw
119 109 8:2 /var/lib/lxd/devlxd /dev/lxd rw,relatime master:1 - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
120 109 8:2 /var/lib/lxd/shmounts/sample /dev/.lxd-mounts rw,relatime master:1 - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
121 110 0:38 /proc/cpuinfo /proc/cpuinfo rw,nosuid,nodev,relatime master:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
122 110 0:38 /proc/diskstats /proc/diskstats rw,nosuid,nodev,relatime master:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
123 110 0:38 /proc/meminfo /proc/meminfo rw,nosuid,nodev,relatime master:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
124 110 0:38 /proc/stat /proc/stat rw,nosuid,nodev,relatime master:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
125 110 0:38 /proc/swaps /proc/swaps rw,nosuid,nodev,relatime master:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
126 110 0:38 /proc/uptime /proc/uptime rw,nosuid,nodev,relatime master:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
127 60 0:38 / /var/lib/lxcfs rw,nosuid,nodev,relatime master:41 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
128 109 0:6 /null /dev/null rw,nosuid,relatime master:2 - devtmpfs udev rw,size=1947556k,nr_inodes=486889,mode=755
129 109 0:6 /zero /dev/zero rw,nosuid,relatime master:2 - devtmpfs udev rw,size=1947556k,nr_inodes=486889,mode=755
130 109 0:6 /full /dev/full rw,nosuid,relatime master:2 - devtmpfs udev rw,size=1947556k,nr_inodes=486889,mode=755
131 109 0:6 /urandom /dev/urandom rw,nosuid,relatime master:2 - devtmpfs udev rw,size=1947556k,nr_inodes=486889,mode=755
132 109 0:6 /random /dev/random rw,nosuid,relatime master:2 - devtmpfs udev rw,size=1947556k,nr_inodes=486889,mode=755
133 109 0:6 /tty /dev/tty rw,nosuid,relatime master:2 - devtmpfs udev rw,size=1947556k,nr_inodes=486889,mode=755
134 109 0:14 /1 /dev/console rw,nosuid,noexec,relatime master:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
63 109 0:45 / /dev/pts rw,relatime - devpts devpts rw,gid=100005,mode=620,ptmxmode=666
64 109 0:46 / /dev/shm rw,nosuid,nodev - tmpfs tmpfs rw,uid=100000,gid=100000
65 60 0:47 / /run rw,nosuid,nodev - tmpfs tmpfs rw,mode=755,uid=100000,gid=100000
66 65 0:48 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=5120k,uid=100000,gid=100000
67 112 0:49 / /sys/fs/cgroup ro,nosuid,nodev,noexec - tmpfs tmpfs ro,mode=755,uid=100000,gid=100000
68 67 0:50 /lxc/sample /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/lxc/sample
69 67 0:51 /lxc/sample /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer,nsroot=/lxc/sample
70 67 0:52 /lxc/sample /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpu,cpuacct,nsroot=/lxc/sample
71 67 0:53 /lxc/sample /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,pids,nsroot=/lxc/sample
72 67 0:54 /lxc/sample /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,devices,nsroot=/lxc/sample
73 67 0:55 /lxc/sample /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,hugetlb,nsroot=/lxc/sample
74 67 0:56 /lxc/sample /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory,nsroot=/lxc/sample
75 67 0:57 /lxc/sample /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,net_cls,net_prio,nsroot=/lxc/sample
76 67 0:58 /lxc/sample /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,blkio,nsroot=/lxc/sample
77 67 0:59 /lxc/sample /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,perf_event,nsroot=/lxc/sample
78 67 0:60 /lxc/sample /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpuset,nsroot=/lxc/sample

* Output of “uname -a” (*)

nsrc@nuc1:~$ uname -a
Linux nuc1.ws.nsrc.org 4.4.0-22-generic #39-Ubuntu SMP Thu May 5 16:53:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

nsrc@nuc2:~$ uname -a
Linux nuc2.ws.nsrc.org 4.4.0-22-generic #39-Ubuntu SMP Thu May 5 16:53:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

* The content of /var/log/lxd.log (*) [actually /var/log/lxd/lxd.log]
* The content of /etc/default/lxd-bridge (*)
* A tarball of /var/log/lxd/<container name>/ (*)

All these in tarballs

Revision history for this message
Brian Candler (b-candler) wrote :
Revision history for this message
Brian Candler (b-candler) wrote :
Revision history for this message
Tycho Andersen (tycho-s) wrote :

Hmm. This means that either the source of the mount or the mount target didn't exist. I would expect LXD to create /var/lib/lxd/shmounts on the target host when it starts a migration (you can verify that this happens by `inotifywait /var/lib/lxd/shmounts` which should tell you that the directory gets created).

So, the only other option is that the directory doesn't exist on the target filesystem. I'm not really sure how that could be given that it was mounted originally, unless the target filesystem didn't get transferred correctly somehow. Is there any chance you can poke at the filesystem on the target once it gets migrated to see?

Revision history for this message
Brian Candler (b-candler) wrote :
Download full text (18.2 KiB)

[I should add: these systems are using plain dir backend on ext4: no zfs, btrfs etc]

> Hmm. This means that either the source of the mount or the mount target
> didn't exist. I would expect LXD to create /var/lib/lxd/shmounts on the
> target host when it starts a migration (you can verify that this happens
> by `inotifywait /var/lib/lxd/shmounts` which should tell you that the
> directory gets created).

[On target system]
root@nuc2:~# inotifywait /var/lib/lxd/shmounts
Setting up watches.
Watches established.

[On source system]
root@nuc1:~# lxc move sample nuc2:sample2

It now just hangs here. I've waited 10+ minutes.

ps auxwww on source shows:

...
root 3830 0.0 0.2 141828 10540 pts/0 Sl+ 06:41 0:00 lxc move sample nuc2:sample2
root 3842 0.1 1.3 65464 52056 ? S 06:41 0:00 rsync -arvP --devices --numeric-ids --partial /var/lib/lxd/containers/sample/ localhost:/tmp/foo -e sh -c "nc -U /tmp/lxd_rsync_952127373"
root 3843 0.0 0.0 4508 752 ? S 06:41 0:00 sh -c nc -U /tmp/lxd_rsync_952127373 localhost rsync --server -vlogDtpre.iLsfx --partial --numeric-ids . /tmp/foo
root 3844 0.0 0.0 9052 956 ? S 06:41 0:00 nc -U /tmp/lxd_rsync_952127373
root 4486 0.0 0.0 0 0 ? S 06:44 0:00 [kworker/u4:0]
root 5875 3.5 0.1 22624 5192 pts/2 Ss 06:53 0:00 -bash
root 5888 0.0 0.0 37372 3280 pts/2 R+ 06:53 0:00 ps auxwww

root@nuc1:~# strace -p 3844
strace: Process 3844 attached
restart_syscall(<... resuming interrupted poll ...>

ps auxwww on target shows:

root 3391 0.0 0.0 6532 704 pts/0 S+ 06:41 0:00 inotifywait /var/lib/lxd/shmounts
root 3402 0.0 0.0 14052 912 ? S 06:41 0:00 rsync --server -vlogDtpre.iLsfx --numeric-ids --devices --partial . /tmp/lxd_restore_287818406/
root 3403 0.1 0.1 28908 5192 ? S 06:41 0:00 rsync --server -vlogDtpre.iLsfx --numeric-ids --devices --partial . /var/lib/lxd/containers/sample2/
root 3407 0.0 0.0 44328 3800 ? S 06:41 0:00 rsync --server -vlogDtpre.iLsfx --numeric-ids --devices --partial . /var/lib/lxd/containers/sample2/
root 3591 0.0 0.0 0 0 ? S 06:48 0:00 [kworker/u4:1]
root 3748 0.0 0.0 0 0 ? S 06:54 0:00 [kworker/1:0]
root 3757 7.0 0.1 22608 5140 pts/1 Ss 06:54 0:00 -bash
root 3770 0.0 0.0 37372 3384 pts/1 R+ 06:54 0:00 ps auxwww

root@nuc2:~# strace -p 3407
strace: Process 3407 attached
select(1, [0], [], [0], {55, 2239}

So it appears that the filesystem transfer has simply frozen.

(Note: these systems applied all updates yesterday; this gave a minor kernel update)

> So, the only other option is that the directory doesn't exist on the
> target filesystem. I'm not really sure how that could be given that it
> was mounted originally, unless the target filesystem didn't get
> transferred correctly somehow. Is there any chance you can poke at the
> filesystem on the target once it gets migrated to see?

I wasn't sure how I was going to do this, but fortunately because of the hang I can.

root@nuc2:~# ls /var/lib/lxd/con...

Revision history for this message
Brian Candler (b-candler) wrote :
Download full text (23.3 KiB)

The rsync --server processes remained on the target, so I've rebooted both boxes, and also deleted the (stopped) target container.

The container to be migrated is running, and has a shell open running a simple loop:
while [ 1 ]; do date; sleep 0.5; done

Same problem:

* The migration has frozen
* The files seem to have been more or less copied:

root@nuc2:~# du -sch /var/lib/lxd/containers/sample2/
664M /var/lib/lxd/containers/sample2/
664M total

* Filesystem diff is suspiciously similar!

--- f1 2016-05-17 07:07:19.000000000 +0100
+++ f2 2016-05-17 07:07:23.000000000 +0100
@@ -653,6 +653,7 @@
 ./rootfs/etc/network/if-up.d/upstart
 ./rootfs/etc/network/interfaces
 ./rootfs/etc/network/interfaces.d/50-cloud-init.cfg
+./rootfs/etc/network/interfaces.d/eth0.cfg
 ./rootfs/etc/networks
 ./rootfs/etc/newt/palette.original
 ./rootfs/etc/newt/palette.ubuntu
@@ -21780,6 +21781,7 @@
 ./rootfs/var/backups/shadow.bak
 ./rootfs/var/cache/apt/archives/lock
 ./rootfs/var/cache/apt/pkgcache.bin
+./rootfs/var/cache/apt/.pkgcache.bin.Fo4toa
 ./rootfs/var/cache/apt/srcpkgcache.bin
 ./rootfs/var/cache/debconf/config.dat
 ./rootfs/var/cache/debconf/config.dat-old
@@ -21834,120 +21836,38 @@
 ./rootfs/var/lib/apparmor/profiles/.apparmor.md5sums
 ./rootfs/var/lib/apt/extended_states
 ./rootfs/var/lib/apt/keyrings/ubuntu-archive-keyring.gpg
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_InRelease
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_main_binary-amd64_Packages
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_main_i18n_Translation-en
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_main_source_Sources
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_multiverse_binary-amd64_Packages
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_multiverse_i18n_Translation-en
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_multiverse_source_Sources
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_restricted_binary-amd64_Packages
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_restricted_i18n_Translation-en
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_restricted_source_Sources
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_universe_binary-amd64_Packages
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_universe_i18n_Translation-en
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial-backports_universe_source_Sources
 ./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_InRelease
 ./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_main_binary-amd64_Packages
 ./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_main_i18n_Translation-en
-./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_main_source_Sources
 ./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_multiverse_binary-amd64_Packages
 ./rootfs/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xe...

Revision history for this message
Tycho Andersen (tycho-s) wrote :

You're probably hitting https://github.com/lxc/lxd/issues/1944 now. I think LXD 2.0.1 has a fix for this if you want to enable xenial-proposed. Otherwise, just deleting the container and recreating it should fix that (until it modifies the disk in such a way that it breaks again :)

Revision history for this message
Brian Candler (b-candler) wrote :

Thank you. I have updated to lxd/xenial-proposed on both nodes. Now I am back to the original problem:

root@nuc1:~# lxc move sample nuc2:sample2
error: Error transferring container data: restore failed:
(00.173165) 1: Error (mount.c:2406): mnt: Can't mount at ./dev/.lxd-mounts: No such file or directory
(00.177785) Error (cr-restore.c:1352): 4619 killed by signal 9
(00.214312) Error (cr-restore.c:2182): Restoring FAILED.

On the target node, if I do "ls /var/lib/lxd/containers/sample2/rootfs/dev" repeatedly, then shortly after the migration starts I see the directory appear:

root@nuc2:~# ls /var/lib/lxd/containers/sample2/rootfs/dev
agpgart core kmem loop6 midi02 mixer2 pts ram13 ram5 rmidi1 smpte2 tty0 tty7
audio dsp loop0 loop7 midi03 mixer3 ram ram14 ram6 rmidi2 smpte3 tty1 tty8
audio1 dsp1 loop1 mapper midi1 mpu401data ram0 ram15 ram7 rmidi3 sndstat tty2 tty9
audio2 dsp2 loop2 mem midi2 mpu401stat ram1 ram16 ram8 sequencer stderr tty3 urandom
audio3 dsp3 loop3 midi0 midi3 null ram10 ram2 ram9 shm stdin tty4 zero
audioctl fd loop4 midi00 mixer port ram11 ram3 random smpte0 stdout tty5
console full loop5 midi01 mixer1 ptmx ram12 ram4 rmidi0 smpte1 tty tty6

... and then finally it vanishes again:

root@nuc2:~# ls /var/lib/lxd/containers/sample2/rootfs/dev
ls: cannot access '/var/lib/lxd/containers/sample2/rootfs/dev': No such file or directory

At least this problem is consistent. Trying again with inotify:

root@nuc2:~# inotifywait /var/lib/lxd/shmounts
Setting up watches.
Watches established.
/var/lib/lxd/shmounts/ CREATE,ISDIR sample2
root@nuc2:~#

Then same error. It seems that this directory appears and disappears very quickly:

root@nuc2:~# inotifywait /var/lib/lxd/shmounts; while ! ls /var/lib/lxd/shmounts/sample2 >/dev/null; do sleep 0.01; done; while ls /var/lib/lxd/shmounts/sample2; do sleep 0.01; done
Setting up watches.
Watches established.
/var/lib/lxd/shmounts/ CREATE,ISDIR sample2
ls: cannot access '/var/lib/lxd/shmounts/sample2': No such file or directory
root@nuc2:~#

Next I tried running strace on the target:

root@nuc2:~# strace -f -p [pid-of-lxd] 2>/tmp/strace.out

This worked once (i.e. it failed with the mount error). Unfortunately I then ran it again trying to catch longer strings:

root@nuc2:~# strace -f -s 128 -p [pid-of-lxd] 2>/tmp/strace.out

(which of course overwrote my strace file), and now I get a different error:

root@nuc1:~# lxc move sample nuc2:sample2
error: Error transferring container data: restore failed:
(00.173337) Error (cr-restore.c:2012): Can't attach to init: Operation not permitted
(00.194078) Error (cr-restore.c:2182): Restoring FAILED.

I am now getting this error on every migration with strace attached (even without -s 128, even after rebooting the nodes). I wish I had kept the first file :-(

Any other suggestions for what to look for? Are you interested in the strace from the "Can't attach to init" problem?

Revision history for this message
Brian Candler (b-candler) wrote :

Using inotifywait -mr on the target:

~~~
root@nuc2:~# inotifywait -mr /var/lib/lxd/shmounts
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
/var/lib/lxd/shmounts/ CREATE,ISDIR sample2
/var/lib/lxd/shmounts/ OPEN,ISDIR sample2
/var/lib/lxd/shmounts/ ACCESS,ISDIR sample2
/var/lib/lxd/shmounts/ CLOSE_NOWRITE,CLOSE,ISDIR sample2
/var/lib/lxd/shmounts/sample2/ DELETE_SELF
/var/lib/lxd/shmounts/ DELETE,ISDIR sample2
~~~

Next a little more low-tech: an infinite loop trying to read the directory contents.

-----
#include <dirent.h>
#include <stdio.h>
#include <errno.h>

int main(int argc, char *argv[])
{
  while(1) {
    DIR *dirp=opendir(argv[1]);
    struct dirent *de;
    if (!dirp) {
      if (errno != ENOENT) {
        perror("opendir");
        return 1;
      }
      continue;
    }
    de = readdir(dirp);
    while(de) {
      printf("%s\n", de->d_name);
      de = readdir(dirp);
    }
    closedir(dirp);
    printf("---\n");
    fflush(stdout);
  }
}
-----

Left this running on the target with argument "/var/lib/lxd/shmounts/sample2", then ran the migration. It showed that as long as that directory exists, it is empty (obviously this is a race, but it's a pretty quick loop).

So this confirms your theory that /dev does not exist - but doesn't help explain why not :-(

I then tried running the migration with both strace *and* the above loop running. I got a variation on the error that I have not seen before:

~~~
root@nuc1:~# lxc move sample nuc2:sample2
error: Error transferring container data: restore failed:
(00.130707) 1: Error (namespaces.c:1153): uns: send req error: Invalid argument
(00.130828) 1: Error (cgroup.c:1065): cg: couldn't set cgns prefix blkio//lxc/sample2/tasks: Invalid argument
(00.130889) 1: Error (cgroup.c:1089): cg: failed preparing cgns(00.131423) PID: real 24376 virt 1
(00.132255) Error (cr-restore.c:2012): Can't attach to init: Operation not permitted
(00.191169) Error (cr-restore.c:2182): Restoring FAILED.
~~~

Revision history for this message
Brian Candler (b-candler) wrote :

I just thought: /var/lib/lxd/shmounts/sample/ is empty on the source machine. Is that expected?

root@nuc1:~# ls -A /var/lib/lxd/shmounts/
sample
root@nuc1:~# ls -A /var/lib/lxd/shmounts/sample/
root@nuc1:~#

Also:

root@nuc1:~# mount | grep lxd
/dev/sda2 on /var/lib/lxd/shmounts type ext4 (rw,relatime,errors=remount-ro,data=ordered)
root@nuc1:~# mount | grep sda2
/dev/sda2 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/sda2 on /var/lib/lxd/shmounts type ext4 (rw,relatime,errors=remount-ro,data=ordered)

What?? sda2 is the root filesystem!

Revision history for this message
Tycho Andersen (tycho-s) wrote :

Hi, sorry for the delay. The reason things are failing when you try and strace the restore is that CRIU also tries to ptrace its children to restore certain things, and the kernel only allows one tracer at a time.

It is expected that /var/lib/lxd/shmounts/sample is empty, it's really just a directory that we use for passing mounts into containers, so unless you're in the middle of adding a mount to container, it should be empty.

I just realized what your issue is though :). We should do something about preventing this in LXD for now.

The problem is that the source of the bind mount is the container name, but you're changing the container name when you move it across hosts, so lxd makes /var/lib/lxd/shmounts/sample2 on the target, and criu records (rightly so) on the host that /var/lib/lxd/shmounts/sample is bound to /dev/.lxd-mounts.

The real solution is to patch CRIU to allow some rewriting of the mount tree. We have this problem in this case, and in the non-cgroup-namespace case, where a cgroup is bind mounted into the container.

summary: - Live migration error: Can't mount at ./dev/.lxd-mounts
+ Need support for rewriting mount sources
Changed in criu (Ubuntu):
status: New → Confirmed
Revision history for this message
Brian Candler (b-candler) wrote :

That sounds a plausible reason. Preventing change of target name when migrating a running container would be a reasonable temporary solution.

So I retried, this time keeping the target container name the same. However it failed again, but with a different error:

root@nuc1:~# lxc move sample nuc2:sample
error: Error transferring container data: restore failed:
(00.213976) 1: Error (mount.c:2406): mnt: Can't mount at ./sys/kernel/debug: Invalid argument
(00.229795) Error (cr-restore.c:1352): 4333 killed by signal 9
(00.272166) Error (cr-restore.c:2182): Restoring FAILED.

Running again with inotifywait on the target:

root@nuc2:~# inotifywait -mr /var/lib/lxd/shmounts
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
/var/lib/lxd/shmounts/ CREATE,ISDIR sample
/var/lib/lxd/shmounts/ OPEN,ISDIR sample
/var/lib/lxd/shmounts/ ACCESS,ISDIR sample
/var/lib/lxd/shmounts/ CLOSE_NOWRITE,CLOSE,ISDIR sample
/var/lib/lxd/shmounts/sample/ DELETE_SELF
/var/lib/lxd/shmounts/ DELETE,ISDIR sample

Let me know if you want this opening as a separate issue.

Revision history for this message
Tycho Andersen (tycho-s) wrote :

Hi,

I believe:

https://github.com/lxc/lxc/pull/1266

and

https://lists.openvz.org/pipermail/criu/2016-October/032680.html

will fix this, without the need for rewriting mount sources in CRIU itself.

Revision history for this message
Brian Candler (b-candler) wrote :

With the same nodes fully updated (kernel 4.4.0-53, lxd 2.0.8, criu 2.6-1ubuntu1~ubuntu16.04.2), I find that live migration now works - yay!

Note: this is only if I don't change the container name. If I do (e.g. "lxc move sample nuc1:foobar") then I get:

error: migration restore failed
(00.059324) Warn (criu/apparmor.c:421): apparmor namespace /sys/kernel/security/apparmor/policy/namespaces/lxd-sample_<var-lib-lxd> already exists, restoring into it
(00.063180) Warn (criu/cr-restore.c:853): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!
(00.168620) 1: Warn (criu/autofs.c:77): Failed to find pipe_ino option (old kernel?)
(00.182952) 1: Error (criu/mount.c:2517): mnt: Can't mount at ./dev/.lxd-mounts: No such file or directory
(00.201952) Error (criu/cr-restore.c:1024): 5637 killed by signal 9: Killed
(00.230527) Error (criu/cr-restore.c:1890): Restoring FAILED.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.