deployment MOS 9.0 ISO installation fails

Bug #1602229 reported by ivano
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
ivano
Mitaka
Invalid
High
ivano

Bug Description

When trying to install Fuel using MOS9.0 iso >
unicorn> md5 MirantisOpenStack-9.0.iso
MD5 (MirantisOpenStack-9.0.iso) = 07461ba42d5056830dd6f203e8fe9691
installation fails with error in atexit. _run_exit_funcs , atexit.py
pointing to error in sys.exitfunc its reproducible everytime using HP SL230s Gen8 BM server
Screencapture showing the terminal erorr and anaconda logs attached.

anaconda-post-configure-sysconfig.log >>>

+ source /root/anaconda.cmdline.vars
++ initrd=initrd.img
++ net__ifnames=0
++ biosdevname=0
++ inst__repo=cdrom:LABEL=OpenStack_Fuel:/
++ inst__ks=cdrom:LABEL=OpenStack_Fuel:/ks.cfg
++ ip=10.20.0.2::10.20.0.1:255.255.255.0:fuel.domain.tld:eth0:off:::
++ nameserver=10.20.0.1
++ BOOT_IMAGE=vmlinuz
+ SOURCE=/tmp/source
+ mkdir -p /etc/fuel
+ cat
+ umount -f /tmp/source
umount: /tmp/source: not mounted
+ rm -rf /tmp/source
+ umount -f
Usage:
 umount [-hV]
 umount -a [options]
 umount [options] <source> | <directory>
Options:
 -a, --all unmount all filesystems
 -A, --all-targets unmount all mountpoins for the given device
                         in the current namespace
 -c, --no-canonicalize don't canonicalize paths
 -d, --detach-loop if mounted loop device, also free this loop device
     --fake dry run; skip the umount(2) syscall
 -f, --force force unmount (in case of an unreachable NFS system)
 -i, --internal-only don't call the umount.<type> helpers
 -n, --no-mtab don't write to /etc/mtab
 -l, --lazy detach the filesystem now, and cleanup all later
 -O, --test-opts <list> limit the set of filesystems (use with -a)
 -R, --recursive recursively unmount a target with all its children
 -r, --read-only In case unmounting fails, try to remount read-only
 -t, --types <list> limit the set of filesystem types
 -v, --verbose say what is being done
 -h, --help display this help and exit
 -V, --version output version information and exit
For more details see umount(8).
+ true
+ rm -rf

Revision history for this message
ivano (l-ivan) wrote :
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Maksim Malchuk (mmalchuk)
milestone: none → 10.0
tags: added: area-library customer-found team-bugfix
Revision history for this message
ivano (l-ivan) wrote :
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/340918

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/340956

Changed in fuel:
importance: High → Critical
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

RCA:
1. The problem was added a long time ago on Nov 19, 2015 in the commit:
https://github.com/openstack/fuel-main/commit/2a5b896bcf2048c7c7d050a43890dbe3816c63e9#diff-271600620e265f5cd9858ad050e629ccR457
when the kickstart script was split into pieces (snippets).

2. As a result, finally we got this piece of code: https://github.com/openstack/fuel-main/blob/stable/mitaka/iso/ks.template#L443-L467
where we lost the value of the ${FS} variable.

3. Also we never have the workaround in the line: https://github.com/openstack/fuel-main/blob/stable/mitaka/iso/ks.template#L461
as we have in the line: https://github.com/openstack/fuel-main/blob/stable/mitaka/iso/ks.template#L464
which can solve the unmount issues.

4. The problem with unmount is specific to the hardware (HP iLO gen8 in this case) and its configuration, so the problem why we have never seen it before - because QA didn't test 'Installation of the Fuel master node via IPMI on certified servers'.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/340918
Committed: https://git.openstack.org/cgit/openstack/fuel-main/commit/?id=acf0865eec99d92e71fdb570704b012c617bf499
Submitter: Jenkins
Branch: master

commit acf0865eec99d92e71fdb570704b012c617bf499
Author: Maksim Malchuk <email address hidden>
Date: Tue Jul 12 15:38:13 2016 +0300

    Fix umount issue in the postinstall scripts

    This change adds the workaround for unclean umounting the installation dirs.

    Change-Id: Ifccb0f6b341b9315dc9ace500ab855728fcbc98f
    Closes-Bug: #1602229
    Signed-off-by: Maksim Malchuk <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

After deep research and QA also can't reproduce this bug on hardware, we assume that this is exactly the problem with hardware during the installation process (the CD-ROM is accidentally disappeared from the system) so this problem should be fixed in the anaconda (it should handle such situations and don't halt the process).

As a workaround for the customer, it should simply reboot the server to continue the installation process, as we can see on the screenshot the process halted on the last phase of the installation, and reboot should help. Need the feedback from the customer in this case.

Moving this bug to the MOS-Linux team to fix the issue in the anaconda.

Changed in fuel:
status: Fix Committed → Confirmed
assignee: Maksim Malchuk (mmalchuk) → MOS Linux (mos-linux)
importance: Critical → High
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

I tried to reproduce this issue on HP ProLiant DL320e Gen8 v2, but MOS 9.0 installation was successfully completed. Here are my steps:

1. Connected to the server using GUI remote console (IPMI)
2. Attached virtual cdrom using .iso file (MOS 9.0) located on my PC
3. Reset power and chose boot from CD-ROM
4. In Fuel ISO menu didn't change anything, all boot options were default
5. CentOS installation took few hours (slow internet connection), then server has been rebooted and deployment of Fuel master node was started

Also I checked mount points during OS provisioning (anaconda) and virtual CD-ROM was detected as /dev/sr0 and mounted to the filesystem.

@Ivan, how many times did you try to install MOS using virtual cdrom? Did you get the same error each time? Did you check iLO event logs on your server? They contain messages like 'Scriptable virtual media ejected by: operator.' or 'Virtual media disconnected by: operator.', so should indicate if virtual cdrom was unexpectedly disconnected (for example due to some network issue) during the installation of MOS.

Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

@ivano,

Could you please give us an access to the server where you met this issue? We could try to investigate it to decide if patching anaconda is needed.

If the issue was caused by failed hardware or network-related issues then adding 'workarounds' to anaconda is wrong - it might just hide other problems that will appear later. It's not a 'normal operation' when someone removes installation media in the middle of the process (that's what the issues looks like), so there is a little could be done.

Anyway, we need an access to the server to investigate, if required.

Revision history for this message
ivano (l-ivan) wrote :

@teselkin-d , the media was NOT manually removed mid install, neither you could blame the network when the installation fails at exactly the same step several times. I am working with a SP customer and have no access to their lab myself. I witness the problem happening several times via webex. Dont think its realistic at all expecting to have access to ACTUAL hw where the problem was observed in order to reproduce...

Revision history for this message
ivano (l-ivan) wrote :

@apanchenko-8 pretty much same as what you did. Tried several times. I will ask for iLo logs and report back.

Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

@Ivan, are there any chances to try installing CentOS-7 [0],[1] there following the same steps you did when installing MOS?

[0] http://isoredirect.centos.org/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-1511.iso
[1] http://isoredirect.centos.org/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-1511.iso

Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

It also possible that firmware on the server is outdated, could you please give more info about
* BIOS version
* iLO version

tags: added: area-linux area-mos
removed: area-library team-bugfix
Changed in fuel:
status: Confirmed → Incomplete
assignee: MOS Linux (mos-linux) → ivano (l-ivan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (stable/mitaka)

Reviewed: https://review.openstack.org/340956
Committed: https://git.openstack.org/cgit/openstack/fuel-main/commit/?id=70d1ff99a9e68dad14099d652f60441f9e220aea
Submitter: Jenkins
Branch: stable/mitaka

commit 70d1ff99a9e68dad14099d652f60441f9e220aea
Author: Maksim Malchuk <email address hidden>
Date: Tue Jul 12 15:38:13 2016 +0300

    Fix umount issue in the postinstall scripts

    This change adds the workaround for unclean umounting the installation dirs.

    Change-Id: Ifccb0f6b341b9315dc9ace500ab855728fcbc98f
    Closes-Bug: #1602229
    Signed-off-by: Maksim Malchuk <email address hidden>

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

This was merged as a tech dept. Moving to the Incomplete status as is.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

So we merged the fix, could we also build new ISO with the fix and test this ISO on the ivano lab?

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Timur, this fix not solves the issue as described above. There is need a fix to anaconda to handle this.

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Moving to Invalid after a month in Incomplete. Feel free to re-open it if you have new information.

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-main 10.0.0rc1

This issue was fixed in the openstack/fuel-main 10.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-main 10.0.0

This issue was fixed in the openstack/fuel-main 10.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.