Task modify-image : Run virt-customize on the provided image fails while uploading image

Bug #1743749 reported by Gabriele Cerami
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
tripleo
Won't Fix
Critical
yatin

Bug Description

Logs at

https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-pike-upload/1/console

show that during pike images upload, the task that should modify the image fails

Tags: ci quickstart
Revision history for this message
Alfredo Moralejo (amoralej) wrote :

We've seen this in all releases [1], [2] and [3]

Running command with debug and trace enabled in a local reproducer we find http://paste.openstack.org/show/646502/

Note error:

[ 2.573801] general protection fault: 0000 [#1] SMP
[ 2.574091] Modules linked in:
[ 2.574091] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-693.11.6.el7.x86_64 #1
[ 2.574091] Hardware name: Red Hat KVM, BIOS 1.10.2-3.el7_4.1 04/01/2014
[ 2.574091] task: ffff88001eac8000 ti: ffff88001ead0000 task.ti: ffff88001ead0000
[ 2.574091] RIP: 0010:[<ffffffff8120a3f0>] [<ffffffff8120a3f0>] flush_old_exec+0x3b0/0x930
[ 2.574091] RSP: 0000:ffff88001ead3d10 EFLAGS: 00010246
[ 2.574091] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000049
[ 2.574091] RDX: 0000000000000000 RSI: ffff88001d558000 RDI: ffffffff81a7c100
[ 2.574091] RBP: ffff88001ead3d68 R08: 000000000001bfc0 R09: ffffffff8120a1e2
[ 2.574091] R10: ffff88001ead3b30 R11: 0000000000000007 R12: ffff88001d558000
[ 2.574091] R13: ffffffff81a7c100 R14: ffff88001eac8000 R15: ffff88001d53e400
[ 2.574091] FS: 0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000
[ 2.574091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.574091] CR2: 0000000000000000 CR3: 00000000019fa000 CR4: 00000000003606f0
[ 2.574091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.574091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.574091] Call Trace:
[ 2.574091] [<ffffffff8126199c>] load_elf_binary+0x33c/0xe00
[ 2.574091] [<ffffffff812d6743>] ? ima_get_action+0x23/0x30
[ 2.574091] [<ffffffff812d5d7e>] ? process_measurement+0x8e/0x250
[ 2.574091] [<ffffffff812d6239>] ? ima_bprm_check+0x49/0x50
[ 2.574091] [<ffffffff81261660>] ? load_elf_library+0x220/0x220
[ 2.574091] [<ffffffff81209b9d>] search_binary_handler+0xed/0x300
[ 2.574091] [<ffffffff8120b1d6>] do_execve_common.isra.24+0x5b6/0x6c0
[ 2.574091] [<ffffffff81694d50>] ? rest_init+0x80/0x80
[ 2.574091] [<ffffffff8120b2f8>] do_execve+0x18/0x20
[ 2.574091] [<ffffffff8100202b>] run_init_process+0x2b/0x30
[ 2.574091] [<ffffffff81694d8d>] kernel_init+0x3d/0xf0
[ 2.574091] [<ffffffff816b8798>] ret_from_fork+0x58/0x90
[ 2.574091] [<ffffffff81694d50>] ? rest_init+0x80/0x80
[ 2.574091] Code: 00 3e 0f ab 08 48 8b 05 db e5 90 00 48 c1 e8 34 a8 01 74 19 65 8b 05 28 2c e0 7e a8 02 74 0e 31 d2 b8 01 00 00 00 b9 49 00 00 00 <0f> 30 bf 00 00 00 80 49 03 7c 24 58 48 8b 05 0d 7c 7f 00 72 0e
[ 2.574091] RIP [<ffffffff8120a3f0>] flush_old_exec+0x3b0/0x930
[ 2.574091] RSP <ffff88001ead3d10>
[ 2.966949] ---[ end trace 9f4f0bee8dbd524c ]---

Not that firs occurrences of this error appeared on Jan-10th, and some security patches where applied in RDO Cloud the same day.

[1] https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-ocata-upload/1/consoleFull
[2] https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-pike-upload/1/consoleFull
[3] https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/1/consoleFull

Revision history for this message
Alfredo Moralejo (amoralej) wrote :

The same error is found when running libguestfs-test-tool in undercloud.

Revision history for this message
Alfredo Moralejo (amoralej) wrote :

After some troubleshooting, we've found that issue is only hit with kernel 3.10.0-693.11.6.el7. With kernel-3.10.0-693.11.1 it works fine.

Also, the problematic kernel worked fine before 10-January when some patches including kernel and kvm were applied in RDO Cloud.

Other finding is that it seems that only happen in instances with cpu model reported as:

model name : Intel Core Processor (Skylake, IBRS)

Revision history for this message
Alfredo Moralejo (amoralej) wrote :

Versions of packages in the compute nodes:

kernel-3.10.0-693.11.6.el7.x86_64
libvirt-daemon-driver-qemu-3.2.0-14.el7_4.7.x86_64
qemu-kvm-ev-2.9.0-16.el7_4.13.1.x86_64

Revision history for this message
yatin (yatinkarel) wrote :

We tried the following workaround and it worked:-

sudo rm -rf /var/tmp/.guestfs-*
export SUPERMIN_KERNEL_VERSION=3.10.0-693.el7.x86_64
export SUPERMIN_KERNEL=/boot/vmlinuz-$SUPERMIN_KERNEL_VERSION
export SUPERMIN_MODULES=/lib/modules/$SUPERMIN_KERNEL_VERSION

now the following command pass: libguestfs-test-tool

So until a kernel fix is there we could use this workaround.

Revision history for this message
Richard Jones (rjones-redhat) wrote :

Sorry my comment #5 was wrong. This is *not* the supermin bug, it's a new kernel bug of some kind.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.openstack.org/535293

Changed in tripleo:
assignee: Gabriele Cerami (gcerami) → yatin (yatinkarel)
status: Triaged → In Progress
Revision history for this message
Richard Jones (rjones-redhat) wrote :

A bug was opened to track the kernel issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1535973

Revision history for this message
Rafael Folco (rafaelfolco) wrote :

just for the record... not reproducing the issue with a manual virt-customize kvm nested run: cpu model = Intel Core Processor (Haswell) kernel = 3.10.0-693.11.6.el7.x86_64

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/535293
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=6aac21081db4054b71257c396823b9af1d760311
Submitter: Zuul
Branch: master

commit 6aac21081db4054b71257c396823b9af1d760311
Author: yatin <email address hidden>
Date: Thu Jan 18 16:32:38 2018 +0530

    Set SUPERMIN_* variables to working kernel

    virt-customize(libguestfs) is not working with current
    kernel 3.10.0-693.11.6.el7 in centos.
    So until it's fixed in libguestfs or kernel, let's
    use working kernel 3.10.0-693.el7.x86_64.
    The kernel version is managed using following variable
    which can be overridden when required.
    Libguesfs requires following environment variables
    to use the overridden kernel.

    - SUPERMIN_KERNEL_VERSION: {{ libguestfs_kernel_override }}
    - SUPERMIN_KERNEL: /boot/vmlinuz-{{ libguestfs_kernel_override }}
    - SUPERMIN_MODULES: /lib/modules/{{ libguestfs_kernel_override }}

    Change-Id: I129ec6c48d801cd605c4befda5ee00a025480413
    Partial-Bug: #1743749

Revision history for this message
Matthias Runge (mrunge) wrote :

Unfortunately, this bug still happens for me.
I have actually 2 kernels installed on the undercloud:

[stack@111-mrunge--undercloud ~]$ rpm -q kernel
kernel-3.10.0-693.5.2.el7.x86_64
kernel-3.10.0-693.11.6.el7.x86_64

Revision history for this message
Alan Pevec (apevec) wrote :

@mrunge you need to have kernel-3.10.0-693.el7 installed,
see CI scripts workaround:
https://review.rdoproject.org/r/#/c/11523/2/ci-scripts/tripleo-upstream/convert-upload-undercloud.sh@59

There is implementation of this in oooqe role https://review.openstack.org/536366
Real fix will be in nested KVM, tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1535973

tags: removed: promotion-blocker
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
yatin (yatinkarel) wrote :
Download full text (33.4 KiB)

Since Centos 7.5 is released we need to consider the current workaround as the kernel-3.10.0-693.el7.x86_64 would not be avialble now in Centos base repo.

Tested with centos 7.5 and seeing the issue on a vm on RDO Cloud, but not sure it's the same issue or a new issue.

[centos@ykarel-test-temp ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel Core Processor (Skylake, IBRS)
stepping : 3
microcode : 0x1
cpu MHz : 2599.996
cache size : 16384 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt ibpb ibrs arat spec_ctrl
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

[centos@ykarel-test-temp ~]$ uname -a
Linux ykarel-test-temp.rdocloud 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[centos@ykarel-test-temp ~]$ rpm -q kernel
kernel-3.10.0-693.el7.x86_64
kernel-3.10.0-862.2.3.el7.x86_64
[centos@ykarel-test-temp ~]$ rpm -q qemu-kvm
qemu-kvm-1.5.3-156.el7.x86_64

[centos@ykarel-test-temp ~]$ libguestfs-test-tool
     ************************************************************
     * IMPORTANT NOTICE
     *
     * When reporting bugs, include the COMPLETE, UNEDITED
     * output below in your bug report.
     *
     ************************************************************
SUPERMIN_KERNEL_VERSION=3.10.0-862.2.3.el7.x86_64
SUPERMIN_MODULES=/lib/modules/3.10.0-862.2.3.el7.x86_64
SUPERMIN_KERNEL=/boot/vmlinuz-3.10.0-862.2.3.el7.x86_64
PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/centos/.local/bin:/home/centos/bin
XDG_RUNTIME_DIR=/run/user/1000
SELinux: Enforcing
guestfs_get_append: (null)
guestfs_get_autosync: 1
guestfs_get_backend: libvirt
guestfs_get_backend_settings: []
guestfs_get_cachedir: /var/tmp
guestfs_get_hv: /usr/libexec/qemu-kvm
guestfs_get_memsize: 500
guestfs_get_network: 0
guestfs_get_path: /usr/lib64/guestfs
guestfs_get_pgroup: 0
guestfs_get_program: libguestfs-test-tool
guestfs_get_recovery_proc: 1
guestfs_get_smp: 1
guestfs_get_sockdir: /run/user/1000
guestfs_get_tmpdir: /tmp
guestfs_get_trace: 0
guestfs_get_verbose: 1
host_cpu: x86_64
Launching appliance, timeout set to 600 seconds.
libguestfs: launch: program=libguestfs-test-tool
libguestfs: launch: version=1.36.10rhel=7,release=6.el7.centos,libvirt
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt
libguestfs: launch: backend registered: direct
libguestfs: launch: backend=libvirt
libguestfs: launch: tmpdir=/tmp/libguestfsd1jhpX
l...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/567976

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/567976
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=071b871bad31425aa307245ebc6186db344927a9
Submitter: Zuul
Branch: master

commit 071b871bad31425aa307245ebc6186db344927a9
Author: yatin <email address hidden>
Date: Sat May 12 11:21:18 2018 +0530

    Revert the workarouds for kernel override

    Centos 7.5 is released with the new kernel so workaround applied
    for Centos 7.4 update kernel needs revert.
    The patch reverts the following 4 commits applied
    in order to override kernel. Some nested VMs(Skylake) have
    issues running virt-customize with kvm backend, so set
    LIBGUESTFS_BACKEND_SETTINGS=force_tcg to not use
    kvm accelaration. Setting this option drops performance
    for running virt-customize a little but can't find any other
    option until it's fixed in kernel/kvm.

    Revert "add option to turn on/off non default kernel"

    This reverts commit fec6fd069d074a73304c3e0498a00855df2fa4c4.

    Revert "Do not attempt install kernel when chrooted"

    This reverts commit d991c1033f3a6390720210d682e3c155959bbe71.

    Revert "Ensure libguestfs_kernel_override kernel is installed"

    This reverts commit ff0a5c9ac7149e89da2c01a8e184004bb536d43e.

    Revert "Set SUPERMIN_* variables to working kernel"

    This reverts commit 6aac21081db4054b71257c396823b9af1d760311.

    Change-Id: If46010d7ca14f9dde9a49173aa0b6de91c3826a8
    Related-Bug: #1743749

Changed in tripleo:
milestone: rocky-2 → rocky-3
Revision history for this message
Adam Huffman (adam-huffman) wrote :

I'm seeing this on Skylake nodes running CentOS 7.5 with the latest kernel (3.10.0-862.3.3)

Revision history for this message
Adam Huffman (adam-huffman) wrote :

The parameter LIBGUESTFS_BACKEND_SETTINGS=force_tcg doesn't solve this for me.

Revision history for this message
wes hayutin (weshayutin) wrote :

Closing as out of date, please re-open if needed

Changed in tripleo:
status: In Progress → Won't Fix
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I'm not happy with LIBGUESTFS_BACKEND_SETTINGS: force_tcg enforced. It runs so slow!
Could we at least provide a way for users to override it??

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/595566

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Juan Antonio Osorio Robles (<email address hidden>) on branch: master
Review: https://review.openstack.org/595566

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/595566
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=850595c42a21d85fc56de73d9abc8cad3c14be0f
Submitter: Zuul
Branch: master

commit 850595c42a21d85fc56de73d9abc8cad3c14be0f
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Aug 23 11:49:54 2018 +0200

    Do not enforce libguestfs emulation mode

    Allow users to override the setting on their own risk
    of being hit by bug 1743749

    Related-Doc: http://libguestfs.org/guestfs.3.html#force_tcg

    Related-Bug: #1743749

    Change-Id: I5aa3f665d9e449eaa8e91441a3f46d322d5d43a4
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/634272

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart (master)

Reviewed: https://review.openstack.org/633444
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=8f8a7a19e0ecb0fb0627ecabe243c8bf7bbf1626
Submitter: Zuul
Branch: master

commit 8f8a7a19e0ecb0fb0627ecabe243c8bf7bbf1626
Author: Quique Llorente <email address hidden>
Date: Mon Jan 28 08:24:41 2019 +0100

    Use force_tcg by libguestfs is not ok

    Use force_tcg for virt-resize and virt-customize if the
    libguestfs-test-tool fails.

    Also dump virt-resize, virt-customize and libguestfs-test-tools to log files

    Related-Bug: #1743749

    Change-Id: Idb7b2be9900ef115f71379cfd7b9c3edf675119e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master)

Change abandoned by Sorin Sbarnea (<email address hidden>) on branch: master
Review: https://review.openstack.org/634272

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/882138
Committed: https://opendev.org/openstack/tripleo-quickstart-extras/commit/bed8aa011577a09077c00c63f52a5490af7b22f5
Submitter: "Zuul (22348)"
Branch: master

commit bed8aa011577a09077c00c63f52a5490af7b22f5
Author: Cédric Jeanneret <email address hidden>
Date: Wed May 3 13:03:27 2023 +0200

    Remove emulation enforcing

    Lately this is making the Wallaby on CS9 line crumble. After some tests,
    it seems, at least on CS9, we're able to get rid of this option - and
    should, since it's crashing virt-customize.

    Change-Id: I4e3cbe4507cbe7d1471f75cb41af99f84725b3ad
    Closes-Bug: #2018356
    Related-Bug: #1743749

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.