fedora-atomic-24 root partition is not resized to flavor disk size on boot

Bug #1652706 reported by yatin
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Magnum
New
Undecided
Unassigned

Bug Description

I confirmed by booting swarm cluster on both fedora-atomic-latest and fedora-24

Fedora-24(partition is not grown)
=========
[fedora@sw-im4u7vsva-0-z4xw3t5p7jch-swarm-master-srf5iccviyjz ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 20G 0 disk
`-vda1 252:1 0 2.5G 0 part /sysroot
loop0 7:0 0 100G 0 loop
`-docker-252:1-98998-pool 253:0 0 100G 0 dm
  `-docker-252:1-98998-2bd20a476b66a7f3a7ee3819fbf68fcaa3ea32d45b779fe2a41d833119b13d47
                          253:1 0 10G 0 dm /var/lib/docker/devicemapper/mnt/2bd20a476b66a7f3a7ee3819fbf68fcaa3ea32d45b779fe2a41d833119b
loop1 7:1 0 2G 0 loop
`-docker-252:1-98998-pool 253:0 0 100G 0 dm
  `-docker-252:1-98998-2bd20a476b66a7f3a7ee3819fbf68fcaa3ea32d45b779fe2a41d833119b13d47
                          253:1 0 10G 0 dm /var/lib/docker/devicemapper/mnt/2bd20a476b66a7f3a7ee3819fbf68fcaa3ea32d45b779fe2a41d833119b

Fedora-23(partition is grown successfully)
==========
[fedora@sw-w34tz6rvn-0-vwv34wgqo56h-swarm-master-6nkayxr6ww3y ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 20G 0 disk
|-vda1 252:1 0 300M 0 part /boot
`-vda2 252:2 0 19.7G 0 part
  |-atomicos-root 253:0 0 3G 0 lvm /sysroot
  |-atomicos-docker--pool_tmeta 253:1 0 24M 0 lvm
  | `-atomicos-docker--pool 253:3 0 6.7G 0 lvm
  | `-docker-253:0-6300779-89b50ae096a7617219d32be3a0d0340480ffc0fe9811c47895717e0728146c46 253:4 0 100G 0 dm
  `-atomicos-docker--pool_tdata 253:2 0 6.7G 0 lvm
    `-atomicos-docker--pool 253:3 0 6.7G 0 lvm
      `-docker-253:0-6300779-89b50ae096a7617219d32be3a0d0340480ffc0fe9811c47895717e0728146c46 253:4 0 100G 0 dm

Also cloud init log has this:
2016-12-27 02:52:25,846 - util.py[WARNING]: Failed: growpart /dev/vda 1

Revision history for this message
yatin (yatinkarel) wrote :

Related bug may be useful: https://bugzilla.redhat.com/show_bug.cgi?id=1327337
https://bugs.launchpad.net/cloud-utils/+bug/1587971

The fedora-atomic 23 image i prepared: has
[fedora@sw-dpdag7pwl-0-zujxqyrndvdr-swarm-master-uwjmaoyzgr7t ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 20G 0 disk
`-vda1 252:1 0 20G 0 part /sysroot
loop0 7:0 0 100G 0 loop
`-docker-252:1-512275-pool 253:0 0 100G 0 dm
  `-docker-252:1-512275-65065f4ed89b2a3f097eb50a42e34731e14d18613c2cec1cfabfb905ec748d16 253:1 0 100G 0 dm
loop1 7:1 0 2G 0 loop
`-docker-252:1-512275-pool 253:0 0 100G 0 dm
  `-docker-252:1-512275-65065f4ed89b2a3f097eb50a42e34731e14d18613c2cec1cfabfb905ec748d16 253:1 0 100G 0 dm

[fedora@sw-dpdag7pwl-0-zujxqyrndvdr-swarm-master-uwjmaoyzgr7t ~]$ rpm -qa|grep cloud
cloud-init-0.7.6-5.20140218bzr1060.fc23.noarch
cloud-utils-growpart-0.27-14.fc23.noarch

[fedora@sw-dpdag7pwl-0-zujxqyrndvdr-swarm-master-uwjmaoyzgr7t ~]$ rpm -q util-linux libblkid libuuid libfdisk libmount
util-linux-2.27.1-2.fc23.x86_64
libblkid-2.27.1-2.fc23.x86_64
libuuid-2.27.1-2.fc23.x86_64
libfdisk-2.27.1-2.fc23.x86_64
libmount-2.27.1-2.fc23.x86_64

And fedora-atomic-24
====================

[fedora@k8-fpyllqmbha-0-vcsoeohltmyj-kube-master-gx5dylv42y7w ~]$ rpm -qa|grep cloud
cloud-init-0.7.6-8.20150813bzr1137.fc24.noarch
cloud-utils-growpart-0.27-16.fc24.noarch
[fedora@k8-fpyllqmbha-0-vcsoeohltmyj-kube-master-gx5dylv42y7w ~]$ rpm -q util-linux libblkid libuuid libfdisk libmount
util-linux-2.28.1-1.fc24.x86_64
libblkid-2.28.1-1.fc24.x86_64
libuuid-2.28.1-1.fc24.x86_64
libfdisk-2.28.1-1.fc24.x86_64
libmount-2.28.1-1.fc24.x86_64

[fedora@k8-fpyllqmbha-0-vcsoeohltmyj-kube-master-gx5dylv42y7w ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 20G 0 disk
`-vda1 252:1 0 2.5G 0 part /sysroot
vdb 252:16 0 5G 0 disk
|-docker-docker--pool_tmeta 253:0 0 8M 0 lvm
| `-docker-docker--pool 253:2 0 2G 0 lvm
`-docker-docker--pool_tdata 253:1 0 2G 0 lvm
  `-docker-docker--pool 253:2 0 2G 0 lvm

Revision history for this message
yatin (yatinkarel) wrote :

@Spyros, the image(fedora-atomic-24-k8s-1.2-docker-1.10.3-cvmfs.qcow2) that you shared doesn't have this issue, Can you have a look, have you customized this image.
[fedora@k8-hjl7yci5wq-0-rz65k3c4cwss-kube-master-vjw5ldu3s3zp ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 20G 0 disk
`-vda1 252:1 0 20G 0 part /sysroot
vdb 252:16 0 5G 0 disk
|-docker-docker--pool_tmeta 253:0 0 8M 0 lvm
| `-docker-docker--pool 253:2 0 2G 0 lvm
| |-docker-252:1-524532-38f9fdfc66151d8aedc4d9ca1adb953b9aa9bb8f0330314aec7a422d2e407735 253:3 0 10G 0 dm /var/lib/docker/devicemapper
| |-docker-252:1-524532-ba85f3de641cacdca41d43652b51dfa7d23dbe3908b77c852fce7b398187efbb 253:4 0 10G 0 dm /var/lib/docker/devicemapper
| `-docker-252:1-524532-f9c8e43465fde1054878c4652e967d0738869090eb4d54966f0ec7c0f8cb3ec1 253:5 0 10G 0 dm /var/lib/docker/devicemapper
`-docker-docker--pool_tdata 253:1 0 2G 0 lvm
  `-docker-docker--pool 253:2 0 2G 0 lvm
    |-docker-252:1-524532-38f9fdfc66151d8aedc4d9ca1adb953b9aa9bb8f0330314aec7a422d2e407735 253:3 0 10G 0 dm /var/lib/docker/devicemapper
    |-docker-252:1-524532-ba85f3de641cacdca41d43652b51dfa7d23dbe3908b77c852fce7b398187efbb 253:4 0 10G 0 dm /var/lib/docker/devicemapper
    `-docker-252:1-524532-f9c8e43465fde1054878c4652e967d0738869090eb4d54966f0ec7c0f8cb3ec1 253:5 0 10G 0 dm /var/lib/docker/devicemapper

Revision history for this message
Spyros Trigazis (strigazi) wrote :

In the upstream fedora atomic 25 image there is no problem.

I'll have a look for the 24. This image fedora-atomic-24-k8s-1.2-docker-1.10.3-cvmfs.qcow2 is custom only to add a three CERN specific packages.

Revision history for this message
Johannes Grassler (jgr-launchpad) wrote :

I can confirm the problem for 24. I set up a Devstack instance yesterday, which downloaded fedora-atomic-latest when deploying. /etc/fedora-release on one of the Kubernetes masters says that's 24:

  bash-4.3# cat /etc/fedora-release
  Fedora release 24 (Twenty Four)

It fails to growpart during cloud-init and it fails as well when attempting to resize manually later:

  bash-4.3# growpart /dev/vda 1
  attempt to resize /dev/vda failed. sfdisk output below:
  | Backup files:
  | MBR (offset 0, size 512): /tmp/growpart.iZ7Vdl/backup-vda-0x00000000.bak
  |
  | Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
  | Units: sectors of 1 * 512 = 512 bytes
  | Sector size (logical/physical): 512 bytes / 512 bytes
  | I/O size (minimum/optimal): 512 bytes / 512 bytes
  | Disklabel type: dos
  | Disk identifier: 0x00048e3c
  |
  | Old situation:
  |
  | Device Boot Start End Sectors Size Id Type
  | /dev/vda1 * 2048 4612095 4610048 2.2G 83 Linux
  |
  | >>> Script header accepted.
  | >>> Script header accepted.
  | >>> Script header accepted.
  | >>> Script header accepted.
  | >>> Created a new DOS disklabel with disk identifier 0x00048e3c.
  | Created a new partition 1 of type 'Linux' and of size 10 GiB.
  | /dev/vda2:
  | New situation:
  |
  | Device Boot Start End Sectors Size Id Type
  | /dev/vda1 * 2048 20971519 20969472 10G 83 Linux
  |
  | The partition table has been altered.
  | Calling ioctl() to re-read partition table.
  | Re-reading the partition table failed.: Device or resource busy
  | The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).
  FAILED: failed to resize
  ***** WARNING: Resize failed, attempting to revert ******
  512+0 records in
  512+0 records out
  512 bytes copied, 0.0069897 s, 73.3 kB/s
  ***** Appears to have gone OK ****
  bash-4.3# echo $?
  2

MD5 sum of the image in question: 1db43e5c5f8fda49dc4324cbb391bc55

I haven't tried Fedora Atomic 25, yet, but that's my plan for tomorrow.

Revision history for this message
Johannes Grassler (jgr-launchpad) wrote :

For the benefit of those searching for "Clusters fail to deploy randomly" or "Clusters take ages to deploy": this disk space shortage causes a bit of a Heisenbug. When the Kubernetes master runs out of disk space, etcd will crash. And if that happens the Kubernetes Minions will never be able to reach that etcd instance. Consequently they will remain stuck in state CREATE_IN_PROGRESS until their wait conditions eventually time out.

Revision history for this message
Johannes Grassler (jgr-launchpad) wrote :

Just for the record (so nobody tries what I just tried in vain): Fedora Atomic 25(fedora-atomic-25-20170106.qcow2, MD5 sum e2e19eb37b5b026254410c38569d3649) gets the resizing bit right and generally works for the Kubernetes master, so that's something. Unfortunately the default /etc/sysconfig/etcd shipped with its flannel package doesn't match the expectations of
https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-minion.sh#L76 anymore: a few variables, such as FLANNEL_ETCD are missing, rendering the sed invocations in configure-kubernets-minion.sh and as a consequence the image useless for Kubernetes minions.

Revision history for this message
Johannes Grassler (jgr-launchpad) wrote :

Alright, I found the problem and came up with ad-hoc solution for this now. Download https://fedorapeople.org/groups/magnum/fedora-atomic-latest.qcow2 (the image with checksum 1db43e5c5f8fda49dc4324cbb391bc55 I tested above) and apply the attached patch to the following paths in the image:

* /ostree/deploy/fedora-atomic/deploy/d9c8b8a31238e857f010c6fdc282f5f611d3c8af3e78caa891f7edb85822771b.0/
* /

(especially the latter is important, because that contains the running instance's actual root file system)

With that in place, resizing should work.

The reason for the problem is in cloud-init's growpart utility. It parses the output from sfdisk to determine whether partition resizing succeeded. At some point the status/error messages from sfdisk changed subtly, breaking this mechanism. With my patch in place growpart can cope with both the old and the new message format. This will break again if the sfdisk output format changes at some point.

I'm fairly busy right now, so it'll be a while until I find the time to figure out how to best get this upstream and take care of it. If somebody wants to take that over in the meantime feel free to use my patch.

Revision history for this message
Spyros Trigazis (strigazi) wrote :

This image was used during after the newton release and before the ocata. Since we moved to f25 we don't have to fix this one.

The supported image for newton is fedora-atomic-newton (which is fedora 23)
The supported image for ocata will be the current fedora-atomic-latest (soon fedora-atomic-ocata)

Please confirm that the problem doesn't appear in f25 and then we can close this bug as won't fix.

Revision history for this message
Johannes Grassler (jgr-launchpad) wrote :

It doesn't appear in F25, no. This is limited to F24.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.