overcloud-full image fails to build calling mkfs -t xfs, command failed for missing xfsprogs package

Bug #1818305 reported by wes hayutin on 2019-03-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Gabriele Cerami

Bug Description

https://logs.rdoproject.org/26/639026/1/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/ab2f2ca/logs/undercloud/home/zuul/overcloud_image_build.log.txt.gz#_2019-03-01_18_49_10

2019-03-01 18:49:06 | 2019-03-01 18:49:06.261 | INFO diskimage_builder.block_device.level0.localloop [-] New block device [/dev/loop0]
2019-03-01 18:49:06 | 2019-03-01 18:49:06.261 | INFO diskimage_builder.block_device.utils [-] Calling [sudo mkfs -t xfs -s size=4096 -L img-rootfs -m uuid=8fac897b-261b-4d0b-9364-52776b5f2616 -q /dev/loop0]
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | ERROR diskimage_builder.block_device.blockdevice [-] Create failed; rollback initiated
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | Traceback (most recent call last):
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/blockdevice.py", line 406, in cmd_create
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | node.create()
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/level2/mkfs.py", line 133, in create
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | exec_sudo(cmd)
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/utils.py", line 143, in exec_sudo
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | raise e
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | BlockDeviceSetupException: exec_sudo failed
2019-03-01 18:49:06 | 2019-03-01 18:49:06.296 | INFO diskimage_builder.block_device.level0.localloop [-] loopdev detach
2019-03-01 18:49:06 | 2019-03-01 18:49:06.297 | INFO diskimage_builder.block_device.utils [-] Calling [sudo losetup -d /dev/loop0]
2019-03-01 18:49:06 | 2019-03-01 18:49:06.328 | INFO diskimage_builder.block_device.level0.localloop [-] Remove image file [/tmp/dib_image.zFsUnNfg/image0.raw]
2019-03-01 18:49:06 | 2019-03-01 18:49:06.330 | ERROR diskimage_builder.block_device.blockdevice [-] Rollback complete, exiting
2019-03-01 18:49:06 | 2019-03-01 18:49:06.331 | Traceback (most recent call last):
2019-03-01 18:49:06 | 2019-03-01 18:49:06.331 | File "/usr/bin/dib-block-device", line 10, in <module>
2019-03-01 18:49:06 | 2019-03-01 18:49:06.331 | sys.exit(main())
2019-03-01 18:49:06 | 2019-03-01 18:49:06.332 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 120, in main
2019-03-01 18:49:06 | 2019-03-01 18:49:06.332 | return bdc.main()
2019-03-01 18:49:06 | 2019-03-01 18:49:06.332 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 115, in main
2019-03-01 18:49:06 | 2019-03-01 18:49:06.333 | self.args.func()
2019-03-01 18:49:06 | 2019-03-01 18:49:06.333 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 36, in cmd_create
2019-03-01 18:49:06 | 2019-03-01 18:49:06.333 | self.bd.cmd_create()
2019-03-01 18:49:06 | 2019-03-01 18:49:06.334 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/blockdevice.py", line 406, in cmd_create
2019-03-01 18:49:06 | 2019-03-01 18:49:06.334 | node.create()
2019-03-01 18:49:06 | 2019-03-01 18:49:06.334 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/level2/mkfs.py", line 133, in create
2019-03-01 18:49:06 | 2019-03-01 18:49:06.335 | exec_sudo(cmd)
2019-03-01 18:49:06 | 2019-03-01 18:49:06.335 | File "/usr/lib/python2.7/site-packages/diskimage_builder/block_device/utils.py", line 143, in exec_sudo
2019-03-01 18:49:06 | 2019-03-01 18:49:06.335 | raise e
2019-03-01 18:49:06 | 2019-03-01 18:49:06.335 | diskimage_builder.block_device.exception.BlockDeviceSetupException: exec_sudo failed
2019-03-01 18:49:07 | 2019-03-01 18:49:07.038 | INFO diskimage_builder.block_device.blockdevice [-] State already cleaned - no way to do anything here
2019-03-01 18:49:07 | 2019-03-01 18:49:07.189 | *** /tmp/dib_build.NlAK5lb7/mnt is not a directory
2019-03-01 18:49:10 | Exception occured while running the command
2019-03-01 18:49:10 | Traceback (most recent call last):
2019-03-01 18:49:10 | File "/usr/lib/python2.7/site-packages/tripleoclient/command.py", line 29, in run
2019-03-01 18:49:10 | super(Command, self).run(parsed_args)
2019-03-01 18:49:10 | File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
2019-03-01 18:49:10 | return super(Command, self).run(parsed_args)
2019-03-01 18:49:10 | File "/usr/lib/python2.7/site-packages/cliff/command.py", line 184, in run
2019-03-01 18:49:10 | return_code = self.take_action(parsed_args) or 0
2019-03-01 18:49:10 | File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_image.py", line 94, in take_action
2019-03-01 18:49:10 | manager.build()
2019-03-01 18:49:10 | File "/usr/lib/python2.7/site-packages/tripleo_common/image/build.py", line 85, in build
2019-03-01 18:49:10 | elements, options, packages, extra_options)
2019-03-01 18:49:10 | File "/usr/lib/python2.7/site-packages/tripleo_common/image/image_builder.py", line 139, in build_image
2019-03-01 18:49:10 | raise subprocess.CalledProcessError(process.returncode, cmd)
2019-03-01 18:49:10 | CalledProcessError: Command '['disk-image-create', '-a', 'amd64', '-o', './overcloud-full', '-t', 'qcow2', '-p', 'yum-plugin-priorities,python-psutil,python-debtcollector,plotnetcfg,sos,device-mapper-multipath,openstack-heat-agents,os-net-config,jq,dbus-python', '--mkfs-options', '-s size=4096', '--min-tmpfs=7', 'centos7', 'selinux-permissive', 'baremetal', 'dhcp-all-interfaces', 'openvswitch', 'overcloud-agent', 'overcloud-base', 'overcloud-controller', 'overcloud-compute', 'overcloud-ceph-storage', 'puppet-modules', 'stable-interface-names', 'grub2', 'element-manifest', 'dynamic-login', 'iptables', 'enable-packages-install', 'override-pip-and-virtualenv', 'dracut-regenerate', 'remove-machine-id', 'remove-resolvconf', 'openssh', 'disable-nouveau']' returned non-zero exit status 1
2019-03-01 18:49:10 | Command '['disk-image-create', '-a', 'amd64', '-o', './overcloud-full', '-t', 'qcow2', '-p', 'yum-plugin-priorities,python-psutil,python-debtcollector,plotnetcfg,sos,device-mapper-multipath,openstack-heat-agents,os-net-config,jq,dbus-python', '--mkfs-options', '-s size=4096', '--min-tmpfs=7', 'centos7', 'selinux-permissive', 'baremetal', 'dhcp-all-interfaces', 'openvswitch', 'overcloud-agent', 'overcloud-base', 'overcloud-controller', 'overcloud-compute', 'overcloud-ceph-storage', 'puppet-modules', 'stable-interface-names', 'grub2', 'element-manifest', 'dynamic-login', 'iptables', 'enable-packages-install', 'override-pip-and-virtualenv', 'dracut-regenerate', 'remove-machine-id', 'remove-resolvconf', 'openssh', 'disable-nouveau']' returned non-zero exit status 1

wes hayutin (weshayutin) on 2019-03-01
summary: - overcloud-full image fails to build calling mkfs -t xfs
+ overcloud-full image fails to build calling mkfs -t xfs, exec sudo
+ failed
Changed in tripleo:
assignee: nobody → Gabriele Cerami (gcerami)
Gabriele Cerami (gcerami) wrote :

I'm trying to understand if DIB logs the results of the commands it calls somwhere. Without the logs we can only speculate. So let's speculate

the same command ran successfully previously. So it's not a change in the command configuration.

[sudo mkfs -t xfs -s size=4096 -L img-rootfs -m uuid=8a723932-d558-42d8-abe4-1e0473660bd4 -q /dev/loop0

- Sudo is not a problem here, another command runs sudo successfully just before
- /dev/loop0 should be present as stated in the logs above
- mkfs is a base command and should be present
- mkfs.xfs comes from xfsprogs that i already installed, as can be understood from the logs.
- xfs module should be present by default in all recent kernel, especially in RH based distros.

So I'm assuming the command is at least running , and then failing.
To my memory, I've never seen a mkfs fail if not for permission errors, or disk failure.

Gabriele Cerami (gcerami) wrote :

proposed https://review.openstack.org/641417 to increase the verbosity level until we have more information.

wes hayutin (weshayutin) wrote :

2019-03-06 21:23:59.825 | ++ diskimage_builder/lib/common-functions:tmpfs_check:26 : awk '/^MemTotal/ { print $2 }' /proc/meminfo
2019-03-06 21:23:59.829 | + diskimage_builder/lib/common-functions:tmpfs_check:26 : total_kB=16266216
2019-03-06 21:23:59.831 | + diskimage_builder/lib/common-functions:tmpfs_check:29 : RAM_NEEDED=14
2019-03-06 21:23:59.834 | + diskimage_builder/lib/common-functions:tmpfs_check:30 : '[' 16266216 -lt 14680064 ']'
2019-03-06 21:23:59.837 | + diskimage_builder/lib/common-functions:tmpfs_check:30 : return 0
2019-03-06 21:23:59.839 | + diskimage_builder/lib/common-functions:cleanup_image_dir:228 : timeout 120 sh -c 'while ! sudo umount -f /tmp/dib_image.vrRT1krJ; do sleep 1; done'
2019-03-06 21:23:59.971 | + diskimage_builder/lib/common-functions:cleanup_image_dir:233 : rm -rf --one-file-system /tmp/dib_image.vrRT1krJ
2019-03-06 21:23:59.975 | Build completed successfully
[zuul@standalone ~]$ ^C
[zuul@standalone ~]$ ls
containers-prepare-parameters.yaml ironic-python-agent.d overcloud-full.log standalone_container_prep.log undercloud-ansible-PSXWO1
containers-prepare-parameters.yaml.21442.2019-03-06@20:14:02~ ironic-python-agent.initramfs overcloud-full.qcow2 standalone-container-prep.sh undercloud-install-20190306201946.tar.bzip2
DLRN ironic-python-agent.kernel overcloud-full.vmlinuz standalone_deploy.log undercloud-passwords.conf
dlrn-venv ironic-python-agent.log overcloud-image-build.sh standalone_parameters.yaml workspace
get-overcloud-nodes.py ironic-python-agent.vmlinuz repo_setup.log standalone.sh
install_packages.sh overcloud-full.d repo_setup.sh tripleo-heat-installer-templates
install_packages.sh.log overcloud-full.initrd src tripleo-undercloud-passwords.yaml
[zuul@standalone ~]$ ls *.vmlinuz|qcow2
-bash: qcow2: command not found
[zuul@standalone ~]$ ls *.qcow2
overcloud-full.qcow2
[zuul@standalone ~]$ ls *.kernel
ironic-python-agent.kernel
[zuul@standalone ~]$

wes hayutin (weshayutin) wrote :

2019-03-06 21:35:34.825 | + diskimage_builder/lib/common-functions:tmpfs_check:25 : '[' -r /proc/meminfo ']'
2019-03-06 21:35:34.828 | ++ diskimage_builder/lib/common-functions:tmpfs_check:26 : awk '/^MemTotal/ { print $2 }' /proc/meminfo
2019-03-06 21:35:34.832 | + diskimage_builder/lib/common-functions:tmpfs_check:26 : total_kB=16266216
2019-03-06 21:35:34.834 | + diskimage_builder/lib/common-functions:tmpfs_check:29 : RAM_NEEDED=14
2019-03-06 21:35:34.837 | + diskimage_builder/lib/common-functions:tmpfs_check:30 : '[' 16266216 -lt 14680064 ']'
2019-03-06 21:35:34.839 | + diskimage_builder/lib/common-functions:tmpfs_check:30 : return 0
2019-03-06 21:35:34.842 | + diskimage_builder/lib/common-functions:cleanup_image_dir:228 : timeout 120 sh -c 'while ! sudo umount -f /tmp/dib_image.SyK3PTpB; do sleep 1; done'
2019-03-06 21:35:34.978 | + diskimage_builder/lib/common-functions:cleanup_image_dir:233 : rm -rf --one-file-system /tmp/dib_image.SyK3PTpB
2019-03-06 21:35:34.982 | Build completed successfully
[zuul@standalone test2]$ ls
ironic-python-agent.d ironic-python-agent.kernel ironic-python-agent.vmlinuz overcloud-full.initrd overcloud-full.qcow2 overcloud-image-build.sh
ironic-python-agent.initramfs ironic-python-agent.log overcloud-full.d overcloud-full.log overcloud-full.vmlinuz
[zuul@standalone test2]$

wes hayutin (weshayutin) wrote :
Ian Wienand (iwienand) wrote :

Agree with all this analysis, a head-to-head of a working v non working shows no difference [1]. One is dib 2.20.2 and other 2.20.3, only difference there is a change of a key for gentoo so it's not that. dib hasn't made any changes to this particular part of the code (block-device) for some time.

some thoughts

* -x logging should help see command output. dib could do better reporting in failure cases at the default level, however.
* working in /tmp ... slight chance that the loopback image that's made is too big for /tmp or similar? can set TMPDIR to a disk-based location
* possibly something hotplug-ish related that is noticing the new device, and doing ... something? to it. syslog might reveal something in this case?

[1] http://paste.openstack.org/show/747378/

Gabriele Cerami (gcerami) wrote :

And thanks to the increased verbosity, we have a winner, and the winner is

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/68c31f1/logs/undercloud/home/zuul/overcloud-full.log.txt.gz#_2019-03-07_00_11_53_451

2019-03-07 00:11:53.451 | DEBUG diskimage_builder.block_device.utils [-] exec_sudo: mkfs.xfs: No such file or directory exec_sudo /usr/lib/python2.7/site-packages/diskimage_builder/block_device/utils.py:135

xfsprogs is missing!
An analisys from yatin suggests that xfsprogs is not an essential package and was pulled as a dependency from docker. Now that we are removing docker everywhere, xfsprogs is not installed anymore.

Yatin also proposed https://review.openstack.org/641576 and seems to be working at least as a workaround until we have the final solution in place

Gabriele Cerami (gcerami) wrote :

proposed https://review.rdoproject.org/r/19148 sa a safe place where to pull the missing package

summary: - overcloud-full image fails to build calling mkfs -t xfs, exec sudo
- failed
+ overcloud-full image fails to build calling mkfs -t xfs, command failed
+ for missing xfsprogs package
Gabriele Cerami (gcerami) wrote :

https://review.rdoproject.org/r/19148 will be merged as a quick way to get the promotion rolling, but https://review.rdoproject.org/r/19150 will be the definitive solution, which would take too much time to enter the promotion instead.

Changed in tripleo:
status: Triaged → Fix Committed
wes hayutin (weshayutin) on 2019-03-11
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers