Monitoring of disk space is required on master node

Bug #1530921 reported by Vladimir
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Fuel Sustaining
8.0.x
Won't Fix
High
Fuel Python (Deprecated)
Mitaka
Won't Fix
High
Fuel Sustaining

Bug Description

Steps to reproduce:

1. Install Fuel master node
2. Create a cluster with Neutron VLAN network
    - Enable Ceph for volumes and images
    - Enable Ceilometer
3. On master node, run
    fuel-mirror create -P ubuntu -G ubuntu
    fuel-mirror apply --group ubuntu -P ubuntu --env 1
    fuel-mirror create -G mos -I /usr/share/fuel-mirror/ubuntu.yaml
    fuel-mirror apply --group mos -P ubuntu --env 1
4. Update Fuel UI page, go to the 'Settings' tab of the cluster and check that all links to the repositories are pointing on the master node.
5. Add 3 controller+mongoDB nodes
6. Add 2 compute nodes
7. Add 3 ceph nodes
8. Put tagged management, storage and fixed networks to the same interface where the public network assigned
9. Deploy changes

Deploy failed with the following message;
Error
Provision has failed. Failed to execute hook 'shell' Failed to run command cd / && fa_build_image --image_build_dir /var/lib/fuel/ibp --log-file /var/log/fuel-agent-env-1.log --data_driver nailgun_build_image --input_data '{"image_data": {"/boot": {"container": "gzip", "uri": "http://10.109.0.2:8080/targetimages/env_1_ubuntu_1404_amd64-boot.img.gz", "format": "ext2"}, "/": {"container": "gzip", "uri": "http://10.109.0.2:8080/targetimages/env_1_ubuntu_1404_amd64.img.gz", "format": "ext4"}}, "output": "/var/www/nailgun/targetimages", "repos": [{"name": "ubuntu", "section": "main multiverse restricted universe", "uri": "http://10.109.0.2:8080/mirrors/ubuntu", "priority": null, "suite": "trusty", "type": "deb"}, {"name": "ubuntu-updates", "section": "main multiverse restricted universe", "uri": "http://10.109.0.2:8080/mirrors/ubuntu", "priority": null, "suite": "trusty-updates", "type": "deb"}, {"name": "ubuntu-security", "section": "main multiverse restricted universe", "uri": "http://10.109.0.2:8080/mirrors/ubuntu", "priority": null, "suite": "trusty-security", "type": "deb"}, {"name": "mos", "section": "main restricted", "uri": "http://10.109.0.2:8080/mirrors/mos-repos/ubuntu/8.0", "priority": 1000, "suite": "mos8.0", "type": "deb"}, {"name": "mos-updates", "section": "main restricted", "uri": "http://10.109.0.2:8080/mirrors/mos-repos/ubuntu/8.0", "priority": 1000, "suite": "mos8.0-updates", "type": "deb"}, {"name": "mos-security", "section": "main restricted", "uri": "http://10.109.0.2:8080/mirrors/mos-repos/ubuntu/8.0", "priority": 1000, "suite": "mos8.0-security", "type": "deb"}, {"name": "mos-holdback", "section": "main restricted", "uri": "http://10.109.0.2:8080/mirrors/mos-repos/ubuntu/8.0", "priority": 1000, "suite": "mos8.0", "type": "deb"}, {"name": "Auxiliary", "section": "main restricted", "uri": "http://10.109.0.2:8080/2015.1.0-8.0/ubuntu/auxiliary", "priority": 1150, "suite": "auxiliary", "type": "deb"}], "codename": "trusty"}'

Astute log:
2016-01-04 16:49:51 ERROR [103] Error running provisioning: Image build task failed. Please check build log here for details: /var/log/docker-logs/fuel-agent-env-1.log. Hint: restart deployment can help if no error in build log was found, trace:

tail -n 30 /var/log/docker-logs/fuel-agent-env-1.log:
2016-01-04 16:49:49.941 30685 DEBUG fuel_agent.utils.utils [-] Trying to execute command: losetup -a
2016-01-04 16:49:49.945 30685 DEBUG fuel_agent.utils.build [-] Loop device /dev/loop0 seems to be attached. Trying to detach.
2016-01-04 16:49:49.945 30685 DEBUG fuel_agent.utils.utils [-] Trying to execute command: losetup -d /dev/loop0
2016-01-04 16:49:49.948 30685 DEBUG fuel_agent.manager [-] Finally: removing temporary file: /var/lib/fuel/ibp/tmpLwEepf.fuel-agent-image
2016-01-04 16:49:49.954 30685 DEBUG fuel_agent.manager [-] Finally: detaching loop device: /dev/loop1
2016-01-04 16:49:49.954 30685 DEBUG fuel_agent.utils.build [-] Trying to figure out if loop device /dev/loop1 is attached
2016-01-04 16:49:49.954 30685 DEBUG fuel_agent.utils.utils [-] Trying to execute command: losetup -a
2016-01-04 16:49:49.958 30685 DEBUG fuel_agent.utils.build [-] Loop device /dev/loop1 seems to be attached. Trying to detach.
2016-01-04 16:49:49.958 30685 DEBUG fuel_agent.utils.utils [-] Trying to execute command: losetup -d /dev/loop1
2016-01-04 16:49:49.962 30685 DEBUG fuel_agent.manager [-] Finally: removing temporary file: /var/lib/fuel/ibp/tmpqCQA90.fuel-agent-image
2016-01-04 16:49:50.021 30685 ERROR fuel_agent.cmd.agent [-] Unexpected error while running command.
Command: chroot /var/lib/fuel/ibp/tmp0dQxJE.fuel-agent-image apt-get -y install acl anacron bash-completion bridge-utils bsdmainutils build-essential cloud-init curl daemonize debconf-utils gdisk grub-pc hpsa-dkms i40e-dkms linux-firmware linux-firmware-nonfree linux-headers-generic-lts-trusty linux-image-generic-lts-trusty lvm2 mcollective mdadm nailgun-agent nailgun-mcagents network-checker ntp openssh-client openssh-server puppet python-amqp ruby-augeas ruby-ipaddress ruby-json ruby-netaddr ruby-openstack ruby-shadow ruby-stomp telnet ubuntu-minimal ubuntu-standard uuid-runtime vim virt-what vlan
Exit code: 100
Stdout: ''
Stderr: "E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem. \n"
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent Traceback (most recent call last):
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent File "/usr/lib/python2.7/site-packages/fuel_agent/cmd/agent.py", line 101, in main
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent getattr(mgr, action)()
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent File "/usr/lib/python2.7/site-packages/fuel_agent/manager.py", line 944, in do_build_image
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent attempts=CONF.fetch_packages_attempts)
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent File "/usr/lib/python2.7/site-packages/fuel_agent/utils/build.py", line 115, in run_apt_get
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent stdout, stderr = utils.execute(*cmds, attempts=attempts)
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent File "/usr/lib/python2.7/site-packages/fuel_agent/utils/utils.py", line 133, in execute
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent stderr=stderr, cmd=command)
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent ProcessExecutionError: Unexpected error while running command.
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent Command: chroot /var/lib/fuel/ibp/tmp0dQxJE.fuel-agent-image apt-get -y install acl anacron bash-completion bridge-utils bsdmainutils build-essential cloud-init curl daemonize debconf-utils gdisk grub-pc hpsa-dkms i40e-dkms linux-firmware linux-firmware-nonfree linux-headers-generic-lts-trusty linux-image-generic-lts-trusty lvm2 mcollective mdadm nailgun-agent nailgun-mcagents network-checker ntp openssh-client openssh-server puppet python-amqp ruby-augeas ruby-ipaddress ruby-json ruby-netaddr ruby-openstack ruby-shadow ruby-stomp telnet ubuntu-minimal ubuntu-standard uuid-runtime vim virt-what vlan
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent Exit code: 100
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent Stdout: ''
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent Stderr: "E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem. \n"
2016-01-04 16:49:50.021 30685 TRACE fuel_agent.cmd.agent

Diagnostic snapshot available in attahment.

Revision history for this message
Vladimir (vushakov) wrote :
Revision history for this message
Vladimir (vushakov) wrote :

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "361"
  build_id: "361"
  fuel-nailgun_sha: "53c72a9600158bea873eec2af1322a716e079ea0"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "7463551bc74841d1049869aaee777634fb0e5149"
  fuel-nailgun-agent_sha: "92ebd5ade6fab60897761bfa084aefc320bff246"
  astute_sha: "c7ca63a49216744e0bfdfff5cb527556aad2e2a5"
  fuel-library_sha: "ba8063d34ff6419bddf2a82b1de1f37108d96082"
  fuel-ostf_sha: "889ddb0f1a4fa5f839fd4ea0c0017a3c181aa0c1"
  fuel-mirror_sha: "8adb10618bb72bb36bb018386d329b494b036573"
  fuelmenu_sha: "824f6d3ebdc10daf2f7195c82a8ca66da5abee99"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "07d5f1c3e1b352cb713852a3a96022ddb8fe2676"

Revision history for this message
Vladimir (vushakov) wrote :

Bug was reproduced on another configuration.
Diagnostic snapshot in attachment.

Artem Roma (aroma-x)
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 9.0
tags: added: area-library
Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

This bug seems to be caused by insufficient free space inside /var/lib/fuel/ibp/; /var is 10G, fuel-agent-image files are sparse (non-preallocated) 8GB.

Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

kernel.log:

2016-01-04T19:10:00.513047+00:00 err: [ 5020.846379] loop: Write error at byte offset 925216768, length 4096.
2016-01-04T19:10:00.513066+00:00 warning: [ 5020.846412] EXT4-fs warning (device loop1): ext4_end_bio:332: I/O error -28 writing to inode 142710 (offset 8388608 size 4464640 starting block 225869)

fs/ext4/page-io.c:

                ext4_warning(inode->i_sb, "I/O error %d writing to inode %lu "
                             "(offset %llu size %ld starting block %llu)",
                             bio->bi_error, inode->i_ino,
                             (unsigned long long) io_end->offset,
                             (long) io_end->size,
                             (unsigned long long)
                             bi_sector >> (inode->i_blkbits - 9));

include/uapi/asm-generic/errno-base.h:

#define ENOSPC 28 /* No space left on device */

Revision history for this message
slava valyavskiy (slava-val-al) wrote :

Issue is related to IBP feature. Fuel-python is correct assignee there.

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel Python Team (fuel-python)
tags: added: area-python feature-image-based
removed: area-library
Dmitry Pyzhov (dpyzhov)
tags: added: team-bugfix
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

I can second that fuel master node had run out of free space.

[10.109.0.2] out: /dev/mapper/os-var 9.5G 8.4G 640M 94% /var
[10.109.0.2] out: /dev/mapper/os-varlog 33G 159M 31G 1% /var/log

it seems that we should have separate /var/www volume. It seems that 10G couldn't be enough. We store all our repos and images under /var/www. If fuel-master was upgraded, then prevously exising repos will consume space too.

bearing in mind, that we need for about 500M per existing cluster for IBP images, slightly less than 300M for ubuntu bootstrap image (few could exist) and centos bootstrap ~300M at least, therefore in order to get 20 clusters deployed

/var/www should have a least 7G of free space for that, just to store all the images.

Moreover, as IBP and bootstrap image always compressed, we still have a space consumption spikes during the building. I'm not sure if 1G of free space is enough for building base system, but 1.5G should be as well. So, to run multiple cluster deployment simultaneously I can safely predict that 2G per cluster is near the lowest adequate requirement for free space.

In order to store:
0.5G per cluster + 1G for bootstrap images

In order to build simultaneously:
2G per cluster (which will be reduced to 500M)
2G for bootstrap images. They is going to be built at once.

so, for deploying 10 clusters at the same time, /var/www should have 20G free space at least.

In addition to these 20G the rest will be consumed by repos. So, we need to reserve some additional space for repos (and for previosly exising repos if fuel-master was updaged)

To fix that, we need to introduce /var/www of satisfiable size here:

https://github.com/openstack/fuel-main/blob/8182e3f301f89c0bfa61ff45965a515d41c4ae0c/iso/ks.template#L297-L298

This is not python related by, should be passed to build team or somewhere else.

From fuel-agent side, i can propose a free space check say for 2G. It will ease catching of these situation in future. But doesn't prevent from running out of space in real-time.

> 4. Update Fuel UI page, go to the 'Settings' tab of the cluster and check that all links to the repositories are pointing on the master node.

this is the root cause. It seems that ubuntu repos occupied slightly more than 3G. (Compared with what 8.0 BVT has

Both snapshots have the same amount of free space:
[10.109.0.2] out: /dev/mapper/os-var 9.5G 8.4G 640M 94% /var

For fuel-snapshot-2016-01-04_16-58-01.tar.xz, fuel-agent tried to built images from locally mirror of ubuntu repos.

But for fuel-snapshot-2016-01-05_08-13-15.tar.xz, fuel-agent tried to build images from external repos. (I think, that ubuntu repos was mirrored onto /var/www again, but not used)

I can safely assume that root cause was the same.

summary: - Deploy failed: Provision has failed. Failed to execute hook 'shell'
+ FMN could run out of free space on /var if ubuntu mirror was created
Revision history for this message
Alexander Gordeev (a-gordeev) wrote : Re: FMN could run out of free space on /var if ubuntu mirror was created

Not sure if this bug deserves a fix. Could we just add a warning to our documents somewhere?

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

So, by default, FMN have limitation on cluster's deploy amount: 2-3 for simultaneous and 10 for sequential.

With local mirror created, there's no chance even for a single cluster deploy. Pretty embarrassing.

should we increase those limitations somehow?

tags: added: area-build
removed: area-python
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel build team (fuel-build)
summary: - FMN could run out of free space on /var if ubuntu mirror was created
+ Provisioning failed when FMN run out of free space on /var if local
+ ubuntu mirror was created
summary: - Provisioning failed when FMN run out of free space on /var if local
+ Provisioning failed when FMN runs out of free space on /var if local
ubuntu mirror was created
Revision history for this message
Roman Vyalov (r0mikiam) wrote : Re: Provisioning failed when FMN runs out of free space on /var if local ubuntu mirror was created

Reassign to fuel-python. related bug https://bugs.launchpad.net/fuel/+bug/1526026, looks like a duplicate

Changed in fuel:
assignee: Fuel build team (fuel-build) → Fuel Enhancements (fuel-enhancements-team)
status: Confirmed → New
tags: removed: area-build
tags: added: area-python
Ilya Kutukov (ikutukov)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

I think that observed behaviour at least should be reflected in documentation somewhere.

"Make sure that /var on master node have got at least 2G of free space prior deployment of single cluster"

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Can we add a check before deploy and fail if there is not enough disk space for image build? I guess it should be enough.

Changed in fuel:
assignee: Fuel Enhancements (fuel-enhancements-team) → Fuel Python Team (fuel-python)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

So here are the options:
1) Add a pre-deployment check for disk space. It will become tricky in case of parallel deployments.
2) Write a good error message. It looks tricky because there is no way to find a root cause in dpkg logs.
3) Set up a monitoring system for disk space that will alert user in case of disk shortage. Looks like big feature.
4) Enhance documentation with requirements for disk space for image based provisioning.

I think 2+4 should be enough for fixing this bug. However we should think about 3rd option in future.

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

My opinion is to mention this bug in release notes, and add note to docs for 8.0.

Nailgun's working in separate container, it's not good to go to host and check free space. Moreover, it won't solve the real problem - only symptom. Today it's IBP, what will be tomorrow? Plugin installation? fuel-mirror?

Definitely, users MUST manage free space on their own way. Good solution is, probably, setup some simple monitoring mechanism (like monit), and spawn notifications if free space is critically low. But it's a feature, obviously.

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

I propose to set Won't Fix for 8.0, and consider to use monit for 9.0.

Revision history for this message
slava valyavskiy (slava-val-al) wrote :

I guess we can handle this case in fuel-bootstrap-cli util as it's acting in default userspace. But, I'm agree with Igor there that it's admin's work to control disk space on FMN.

Dmitry Pyzhov (dpyzhov)
tags: added: need-bp
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 9.0 → 10.0
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel Sustaining (fuel-sustaining-team)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

We should either implement monit support in Fuel core or in separate plugin. Marking this bug as a new functionality request

summary: - Provisioning failed when FMN runs out of free space on /var if local
- ubuntu mirror was created
+ Monitoring of disk space is required on master node
tags: removed: feature-image-based team-bugfix
Sulaco (fco-sendra)
Changed in fuel:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.