ceph-deploy: non-zero return code on successful OSD deployment

Bug #1581279 reported by Miroslav Anashkin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Rodion Tikunov

Bug Description

A customer using Fuel 5.1 have issues in running ceph-deploy from shell scripts and in handling errors.
Message `Error in sys.exitfunc:` may be safely ignored if ceph-deploy was started manually, but not if external script manages ceph-deploy runs.

Steps to reproduce:
Deploy a new OSD with ceph-deploy.

Expected results:
Ceph-deploy process finishes successful OSD deployment without fake error messages.

Actual result:
Error message `Error in sys.exitfunc:` appears after successful OSD deployment by ceph-deploy.

Impact:
ceph-deploy automation.

Environment:
Fuel 5.1
CentOS 6.5
Custom packages:
- ceph 0.80.9 from MOS 6.1 [1]
- ceph-deploy 1.5.20 from MOS 6.1 [2]

[1] http://fuel-repository.mirantis.com/fwm/6.1/centos/os/x86_64/Packages/ceph-0.80.9-0.mira2.x86_64.rpm

[2] http://fuel-repository.mirantis.com/fwm/6.1/centos/os/x86_64/Packages/ceph-deploy-1.5.20-0.mira1.noarch.rpm

Changed in mos:
milestone: none → 6.1-updates
assignee: nobody → MOS Ceph (mos-ceph)
description: updated
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> Steps to reproduce:
> Deploy a new OSD with ceph-deploy

Adding OSDs with ceph-deploy works for me just fine. What I'm doing wrong?

> Mirantis-built ceph-deploy package has one commit missing.

Please describe the actual problem: what command have you (or the customer) run,
what error message you've got, etc. In general "we need this patch" reports are
counter-productive (it's "doctor, I've got a headeache" but not "doctor, I need aspirin")

Changed in mos:
status: New → Incomplete
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Adding OSD works OK.

Customer is running OSD re-deployment from shell script (planned HDD upgrade).
Script adds new OSD and addition works OK.
But, even if OSD is added successfully, ceph-deploy throws the error message `Error in sys.exitfunc:`
This message is a blocker to shell script to catch the OSD deployment status correctly - script consider a successful OSD deployment as failed.
Since customer needs to re-deploy 3000+ OSDs - running re-deploy manually with manual error verification is not an option

Changed in mos:
status: Incomplete → New
description: updated
description: updated
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> ceph-deploy throws the error message `Error in sys.exitfunc:`

Please post the actual command and the error message, and explain the exact steps to reproduce the bug.

> This message is a blocker to shell script to catch the OSD deployment status

Quite a number of users (both humans and deployment tools) use ceph-deploy from scripts, nobody else seems to experience such an error with ceph-deploy 1.5.20 (which is shipped with MOS 6.1/7.0/8.0) and newer.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Marking as incomplete since neither the command which presumably fails, no the error message are known

Changed in mos:
status: New → Incomplete
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Attached ceph-deploy command and its output.

Changed in mos:
status: Incomplete → New
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexei, please take a look at Miroslav's comment ^

Changed in mos:
status: New → Confirmed
assignee: MOS Ceph (mos-ceph) → Alexei Sheplyakov (asheplyakov)
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

ceph-deploy version 1.5.20 works for me and lots of other people (including many Mirantis'
customers) just fine, no "Error in sys.exitfunc:" error messages here. Therefore I'm marking
the bug as Incomplete.

Also patching ceph-deploy for MOS 6.1 at this stage of its lifecycle is rather risky.

Roman,

you've marked the bug as confirmed. This means someone except the reporter (presumably you)
can reproduce it. Could you please enlighten me how to reproduce this bug?

Changed in mos:
status: Confirmed → Incomplete
summary: - Include missing commit to ceph-deploy package
+ ceph-deploy: non-zero return code on successful OSD deployment
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Summary for this issue.
It was discovered that newer fix for this bug is already integrated into Mirantis built Ceph-deploy 1.5.20 package.
However, this fix looks like does not work with Centos 6.5 kernel 2.6.32-431.20.3 and Python 2.6.6-52, shipped with MOS 5.1

Here are the updated steps to reproduce:
1. Deploy any Centos-based MOS 5.1 configuration with Ceph. You may use VBox scripts to speedup process.
2. Update ceph-deploy package to version 1.5.20 on any OpenStack node with Ceph installed.
http://fuel-repository.mirantis.com/fwm/6.1/centos/os/x86_64/Packages/ceph-deploy-1.5.20-0.mira1.noarch.rpm
3. Run `ceph-deploy --version` command.

Changed in mos:
status: Incomplete → New
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> Deploy any Centos-based MOS 5.1
> Milestone: Mirantis OpenStack 6.1-updates

Either of these is inaccurate and should be updated.

description: updated
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Setting to incomplete since it's not clear which version(s) of MOS is (are) affected.

> 1. Deploy any Centos-based MOS 5.1 configuration with Ceph.
> 2. Update ceph-deploy package to version 1.5.20 on any OpenStack node with Ceph installed.

In this case the bug is invalid since one should use ceph-deploy shipped with MOS 5.1, everything else is not supported.

Changed in mos:
status: New → Incomplete
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Setting importance to `Whishlist' since the bug can't be reproduced with packages shipped with Fuel 5.1.

description: updated
description: updated
Changed in mos:
importance: High → Wishlist
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Ceph-deploy version, shipped with 5.1, finishes with non-zero exit code as well.
The error is different. as following (below are the several last strings from a successful ceph-deploy osd create call:

###########################################################
[root@node-6 ~]# ceph-deploy osd create ${NODE_NAME}:${VDISK_DEVICE_NAME}${JOURNAL_PARTITION}
...
...
[ceph_deploy.osd][DEBUG ] Host node-6 is now ready for osd use.

Unhandled exception in thread started by

Error in sys.excepthook:

Original exception was:

[root@node-6 ~]#
###########################################################

So, the bug importance is still High.

Changed in mos:
importance: Wishlist → High
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Affected version:
MOS 5.1

Revision history for this message
Miroslav Anashkin (manashkin) wrote :

This patch turns off forced stdout and stderr closure at the exit from ceph-deploy.
If Python needs to report something at exit - let it report.
Forsed stdout and stderr closure only masquerades the actual issues.

Many thanks to Bulat Gaifullin for this patch and explanation on wrong ways to code in Python.

Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Attached is patch to Remoto library.
It adds waiting for thread finish to its Connection module after exit from thread and so, fixes the strange non-zero exit codes, if Remoto thread did not finished exit procedure on exit from ceph-deploy.

Thanks to Bulat Gaifullin for debugging ceph-deploy and elaborating this patch.

Changed in mos:
status: Incomplete → Confirmed
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/centos6/ceph-deploy (6.1)

Fix proposed to branch: 6.1
Change author: Denis V. Meltsaykin <email address hidden>
Review: https://review.fuel-infra.org/23327

Changed in mos:
status: Confirmed → In Progress
Changed in mos:
assignee: Alexei Sheplyakov (asheplyakov) → MOS Maintenance (mos-maintenance)
Changed in mos:
assignee: MOS Maintenance (mos-maintenance) → Rodion Tikunov (rtikunov)
milestone: 6.1-updates → 6.1-mu-7
Revision history for this message
Rodion Tikunov (rtikunov) wrote :

Workaround:
set CEPH_DEPLOY_TEST=YES in the shell environment [0]

[0] http://docs.oracle.com/cd/E37670_01/E66514/html/section-npq_yrx_gt.html

Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Workaround with setting CEPH_DEPLOY_TEST=YES only hides the error message - with a lot of other possible error messages, which should not be hidden.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/centos6/ceph-deploy (6.1)

Reviewed: https://review.fuel-infra.org/23327
Submitter: Vitaly Sedelnik <email address hidden>
Branch: 6.1

Commit: 05e1d10f21b055f4bdd5d3cbaf0dabc2afd895e2
Author: Denis V. Meltsaykin <email address hidden>
Date: Fri Jul 15 16:54:41 2016

Fix stdout closure and wait for threads to finish

This patch turns off forced stdout and stderr closure at the exit from
ceph-deploy. Alos it patches Remoto. It adds waiting for thread finish
to its Connection module after exit from thread and so, fixes the
strange non-zero exit codes, if Remoto thread did not finished exit
procedure on exit from ceph-deploy.

Change-Id: Icae7c43f9575e30e4a39f28fdca3223ef2d9f197
Closes-Bug: #1581279

Changed in mos:
status: In Progress → Fix Committed
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

Verified on the local lab and on the problematic prod environment.

Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.