[provision] Large number of disks could fail ubuntu installation

Bug #1340414 reported by Roman Sokolkov
76
This bug affects 15 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Kozhukalov

Bug Description

Description:
While ubuntu installation "late_command" directive is used for a lot of stuff i.e. disk partitioning, grub installation, etc.
But actually all this stuff just single shell string. Pretty big string...

In our case we have ceph-osd nodes with 23 disks (1 OS, 4 journals, 18 osds). While Ubuntu installation "late_command" directive fails and node starts loop reboots. (nopxe flag was not set).After some investigation I've found that late_command immediately fails with 'Argu ment list too long' error. After reducing number of disks or removing some parts of the late_command it starts working. My assumption that string too long and not fits some kernel limits (i.e. MAX_ARG_PAGES http://www.linuxjournal.com/article/6060). How "late_command" executed http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/precise/preseed/precise/view/head:/preseed_command#L16

Environment:
Fuel 5.0, Ubuntu, HA, Neutron+VLAN, Ceph on dedicated nodes

Steps to reproduce:
- Create environment(HA or not) with dedicated ceph nodes
- Assign large number of disks for ceph-osds (at least 20)
- Deploy

Expected result:
- Ubuntu installed successfully

Actual result:
- Ubuntu installation stuck on 100% in Fuel (loop reinstallation)

Possible solution:
- Rebuild debian-installer kernel with increased limit (need more research)
- Move out ceph partitioning stuff from late_command (to puppet?)

Details:
In /var/log/remote/node-X.domain.tld/finish-install.log:
2014-07-10T12:42:38.789606+00:00 notice: info: Running /usr/lib/finish-install.d/07preseed
2014-07-10T12:42:38.845174+00:00 notice: /bin/preseed_command: line 23: logger: Argument list too long
2014-07-10T12:42:38.846209+00:00 notice: warning: /usr/lib/finish-install.d/07preseed returned error code 2

description: updated
Changed in fuel:
importance: Undecided → Medium
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 5.1
Dmitry Ilyin (idv1985)
summary: - Large number of disks could fail ubuntu installation
+ [library] Large number of disks could fail ubuntu installation
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Vladimir Kozhukalov (kozhukalov)
Changed in fuel:
status: New → Confirmed
Dmitry Ilyin (idv1985)
summary: - [library] Large number of disks could fail ubuntu installation
+ [provision] Large number of disks could fail ubuntu installation
tags: added: customer-found
Changed in fuel:
importance: Medium → High
Changed in fuel:
milestone: 5.1 → 6.0
importance: High → Medium
Revision history for this message
Vladimir Kozhukalov (kozhukalov) wrote :

To fix this bug we can put this long string into a script somewhere on master node and then download this script via http during late preseed stage. That is QUITE UGLY solution and my suggestion is to fix this bug by using image based provisioning scheme which will be available since 6.0. As far as nodes with such a large amount of disks are quite rare this bug is rather medium, not high.

tags: added: release-notes
Revision history for this message
Andrey Kirilochkin (andreika-mail) wrote :

have the same bug.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Bug marked in progress and attached to the image-based-provisioning blueprint.

Changed in fuel:
status: Confirmed → In Progress
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: In Progress → Confirmed
milestone: 6.0 → 6.1
tags: added: volumes
Revision history for this message
Andrey Kirilochkin (andreika-mail) wrote :
Revision history for this message
Dmitriy Novakovskiy (dnovakovskiy) wrote :

I think workaround is required for this, it's not enough to rely on image based provisioning (until, at least, it is claimed to be fully stable and substituting default Cobbler-based mechanism). Around 60% of installations that are happening or about to happen w/ 6.0 in near future that I'm aware of are using 20+ disks Ceph nodes.

Ugly workaround that Vladimir has described, or something like "run a shell script to partition OSD disks and add them to OSD at the very end of deployment"

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

+1 to comment #5, please document a way to recover from this.

Revision history for this message
Jon Skarpeteig (jskarpet) wrote :

I strongly disagree that downloading a script in the late command is an ugly hack. This is the way it's done in for instance The Foreman - and it's also a lot easier to read and maintain such scripts than a 2K one-liner in preseed late_command.

I also don't think image based is the way to go, as I'm more inclined to trust that the distro itself is able to configure itself and run on any system, than Fuel images being able to do the same for all kinds of hardware setups. Image based installation makes sense for very large installations and homogenous environments - but that's not really what Fuel is meant to solve.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Vladimir Kozhukalov (kozhukalov) → Fuel provisioning team (fuel-provisioning)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

We are focused on image based provisioning in future releases and we do have limited support of it in 6.0. So workaround here is to use image based provisioning.

However, we have another limitation in volume manager. We can produce workaround if required, but it will take some time.

Bug will be obsoleted by combination of image based provisioning (production-ready in 6.1) and volume manager refactoring (planned to be implemented in 7.0).

https://blueprints.launchpad.net/fuel/+spec/volume-manager-refactoring

Changed in fuel:
milestone: 6.1 → 7.0
Dmitry Pyzhov (dpyzhov)
tags: added: module-volumes
removed: volumes
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

This bug is medium because we right now we can deploy on about 28 disks per node. It should not affect majority of our users.

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Let's create a workaround for Ubuntu in fuel-agent in order to support more nodes in 6.1

Changed in fuel:
importance: Medium → High
milestone: 7.0 → 6.1
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel provisioning team (fuel-provisioning) → Vladimir Kozhukalov (kozhukalov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/171668

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/171668
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=ffdd9f8d189d65209296072898b747c8ba0612fe
Submitter: Jenkins
Branch: master

commit ffdd9f8d189d65209296072898b747c8ba0612fe
Author: Vladimir Kozhukalov <email address hidden>
Date: Wed Apr 8 17:06:47 2015 +0300

    IBP Got rid of md for /boot

    Centos does use legacy grub, and it is
    only able to boot from md with metadata 0.9.
    So, we are limited to have not more than 28 disks
    on a node because current version of volume
    manager assumes /boot is spread over all disks and
    0.9 metadata does not support more than 28.
    To avoid this limitation we've got rid of using
    md for /boot completely.

    Change-Id: I08398453625a4e9136d67989c7ebea41cb9cb766
    Closes-Bug: #1340414

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OSCI Robot (oscirobot) wrote : Fix merged to stackforge/fuel-web (master)

Reviewed: https://review.openstack.org/171668
Committed: https://review.openstack.org/gitweb?p=stackforge/fuel-web.git;a=commitdiff;h=ffdd9f8d189d65209296072898b747c8ba0612fe
Submitter: Jenkins
Branch: master

IBP Got rid of md for /boot

Centos does use legacy grub, and it is
only able to boot from md with metadata 0.9.
So, we are limited to have not more than 28 disks
on a node because current version of volume
manager assumes /boot is spread over all disks and
0.9 metadata does not support more than 28.
To avoid this limitation we've got rid of using
md for /boot completely.

Change-Id: I08398453625a4e9136d67989c7ebea41cb9cb766
Closes-Bug: #1340414

tags: added: release-notes-done
Revision history for this message
Pawel Stefanski (pejotes) wrote :

Have the same on Fuel 6.0, even with 12 SAS disks due to longer wwn/by-path addressing scheme with SAS topology. Are you planning any hotfix for this in 6.0 ?

api: '1.0'
astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
auth_required: true
build_id: 2014-12-26_14-25-46
build_number: '58'
feature_groups:
- mirantis
- experimental
fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
production: docker
release: '6.0'
release_versions:
  2014.2-6.0:
    VERSION:
      api: '1.0'
      astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
      build_id: 2014-12-26_14-25-46
      build_number: '58'
      feature_groups:
      - mirantis
      fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
      fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
      nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
      ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
      production: docker
      release: '6.0'

2015-05-22T08:16:13.768219+00:00 notice: info: Running /usr/lib/finish-install.d/07preseed
2015-05-22T08:16:13.884784+00:00 notice: /bin/preseed_command: line 23: logger: Argument list too long
2015-05-22T08:16:13.885888+00:00 notice: warning: /usr/lib/finish-install.d/07preseed returned error code 2

and then reboot loop.

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

Hello Pawel Stefanski, could you provide to us the sample udevadm output from one SAS disk?

the exact command is:
$ udevadm info --query=all --name=<name of SAS disk>

eg.:

$ udevadm info --query=all --name=/dev/sda

Revision history for this message
Pawel Stefanski (pejotes) wrote :

Hi Aleksandr!

Sorry for a huge delay, I didn't have access to this environment for a long time.

Here the output od udevadm

udevadm info --query=all --name=sdc
P: /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:0/end_device-6:0:0/target6:0:0/6:0:0:0/block/sdc
N: sdc
S: disk/by-id/ata-ST3000NM0033-9ZM178_Z1Y306VQ
S: disk/by-id/scsi-SATA_ST3000NM0033-9Z_Z1Y306VQ
S: disk/by-id/wwn-0x5000c5007a74ffa7
S: disk/by-path/pci-0000:01:00.0-sas-0x500163606789abe0-lun-0
E: DEVLINKS=/dev/disk/by-id/ata-ST3000NM0033-9ZM178_Z1Y306VQ /dev/disk/by-id/scsi-SATA_ST3000NM0033-9Z_Z1Y306VQ /dev/disk/by-id/wwn-0x5000c5007a74ffa7 /dev/disk/by-path/pci-0000:01:00.0-sas-0x500163606789abe0-lun-0
E: DEVNAME=/dev/sdc
E: DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host6/port-6:0/expander-6:0/port-6:0:0/end_device-6:0:0/target6:0:0/6:0:0:0/block/sdc
E: DEVTYPE=disk
E: ID_ATA=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_FEATURE_SET_HPA=1
E: ID_ATA_FEATURE_SET_HPA_ENABLED=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_SECURITY=1
E: ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=352
E: ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=352
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_ROTATION_RATE_RPM=7200
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=1
E: ID_BUS=ata
E: ID_MODEL=ST3000NM0033-9ZM178
E: ID_MODEL_ENC=ST3000NM0033-9ZM178\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_PART_TABLE_TYPE=gpt
E: ID_PATH=pci-0000:01:00.0-sas-0x500163606789abe0-lun-0
E: ID_PATH_TAG=pci-0000_01_00_0-sas-0x500163606789abe0-lun-0
E: ID_REVISION=SN04
E: ID_SCSI_COMPAT=SATA_ST3000NM0033-9Z_Z1Y306VQ
E: ID_SERIAL=ST3000NM0033-9ZM178_Z1Y306VQ
E: ID_SERIAL_SHORT=Z1Y306VQ
E: ID_TYPE=disk
E: ID_WWN=0x5000c5007a74ffa7
E: ID_WWN_WITH_EXTENSION=0x5000c5007a74ffa7
E: MAJOR=8
E: MINOR=32
E: SUBSYSTEM=block
E: UDEV_LOG=3
E: USEC_INITIALIZED=33428804

The machine is Quanta S200-X22TQ, 12bay in front connected to LSI HBA, SAS2008.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-docs (stable/6.1)

Related fix proposed to branch: stable/6.1
Review: https://review.openstack.org/194961

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-docs (stable/6.1)
Download full text (45.4 KiB)

Reviewed: https://review.openstack.org/194961
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=0e26e7d7cc153d179ec34985645dd23cdd239ddb
Submitter: Jenkins
Branch: stable/6.1

commit 5cc5f0c643aebecaf3bf4580535a3ea7c3334a6c
Author: Mike Scherbakov <email address hidden>
Date: Tue Jun 23 13:43:35 2015 -0700

    Removed streamlined patching backend pieces

    Change-Id: I955e76ccdbd12a9145f4e9b689f80bdf9fcaf929

commit 563c4b5c78ebfcb1f4f91047c2919f6270f9a1d4
Author: Mike Scherbakov <email address hidden>
Date: Tue Jun 23 13:30:30 2015 -0700

    Removed outdated patching guide

    Change-Id: I76180c277789ade9c5ebedd19fe2092847c0b7d9

commit 8d120c14bec1ab41d448683ad146a3053a57c4ee
Author: Irina Povolotskaya <email address hidden>
Date: Tue Jun 23 19:59:11 2015 +0300

    Add dual hypervisor ref arch into 6.1 docs

    Change-Id: I900c24c9de878eafadbfc995aa879b7f55737fac

commit feebd1592d3305b64bbdfd0bc5fe108190aef120
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 18:38:17 2015 +0300

    [OPs guide] Running Ceilometer section edits

    1. conf file extract is updated
    2. note is updated

    Closes-bug: 1467817
    Change-Id: I0217e164108e0ba6c1397045a5e57d13ff429223

commit 44a93f9dead7511a3461ec35248dbb689c81eafd
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 18:04:40 2015 +0300

    [RN6_1] Final changes

    1. capitalization
    2. 2014.2 to 2014.2.2
    3. general improvements

    Change-Id: I45057e90c90550559f66bc67ccdf97a559fd9000

commit bb41389cae58084285688853281516b659686422
Author: evkonstantinov <email address hidden>
Date: Tue Jun 23 16:45:35 2015 +0300

    Update patching decription

    Update patching description with
    the standard Linux commands.

    Change-Id: Ia1a8346639c468fdfce15a11d2430bf3a4731244

commit bf3018fae3f2e564413d33aba6cdebf8868f0b4e
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 15:55:49 2015 +0300

    [RN6_1] Clean up

    1. Rearranges sections
    2. Improves RST
    3. Changes titles order

    Change-Id: I6110bf515667d3d6ba08ad35ff5d593dbc96641e

commit 1c7e4457808e8f2d6c56fdf31252170972e444b9
Author: Maria Zlatkova <email address hidden>
Date: Tue Jun 23 15:26:28 2015 +0300

    Replaces VBOX screenshots

    This patch:
    - replaces VBOX screenshots
    - changes the link for Download Mirantis VirtualBox scripts
     to https://docs.mirantis.com/openstack/fuel/fuel-master/#downloads

    Change-Id: I58dede960c5c3355d39b07ff44b757403f6af02c
    Closes-Bug: #1467872

commit 0a568bf53fc0e25d1d692d5d74b4a7b4d983bbcc
Author: evkonstantinov <email address hidden>
Date: Tue Jun 23 14:01:55 2015 +0300

    6.1 --separate repos

    change wording and add links to the
    separate repos feature.

    Change-Id: Ib5d0778a0d8f1534f79ed2f553574cb69a3150b0

commit 95a188b21cbdd064d92696b7920e6a0105fe0c56
Author: Maria Zlatkova <email address hidden>
Date: Tue Jun 23 12:07:28 2015 +0300

    Corrects the output 'pcs status'

    Changes the example outputs to appropriate ones.

    Change-Id: Ib6d83...

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

Hello Pawel Stefanski,

Thank you for the provided information. Regarding the bug, it won't be fixed for 6.0/6.1 for classic provisioning.

Instead of that, you could use 'image based provisioning' in order to provision nodes with relatively large amount of disks. It's experimental for 6.0 and is default for 6.1.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.