Bug #1354384 “[fuel] HA deployment with primary-controller+ceph+...” : Bugs : Fuel for OpenStack

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2014-08-08:

#1

Wait bug repetition from Nastay. Potentially connected with https://bugs.launchpad.net/fuel/+bug/1353389 (possible common native of problem with Mcollective).

summary:	- HA deployment with controllers+ceph has failed + [astute] HA deployment with controllers+ceph has failed
tags:	added: astute mco
Changed in fuel:
status:	New → Triaged

Vladimir Sharshov (vsharshov) on 2014-08-11

summary:	- [astute] HA deployment with controllers+ceph has failed + [fuel] HA deployment with primary-controller+ceph+neutron has failed
tags:	added: ceph puppet removed: astute mco

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2014-08-11:

#2

After discovery HA:
* controller + neutron vlan -> success
* controller + ceph + simple network -> success
* controller + ceph + neutron vlan -> fail

Symptoms: node with roles: primary-controller and ceph, go offline in the middle of ceph deployment. After it node became unavailable via mco or ssh.

Logs: https://drive.google.com/file/d/0Bz7Fsls9aSjkOVhWdlVITEktY0k/edit?usp=sharing

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-08-11:

#3

puppet_pr_controller.tar.gz Edit (300.1 KiB, application/x-tar)

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2014-08-11:

#4

Выжимка ошибок из Puppet лога:

Fri Aug 08 16:09:13 +0000 2014 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/unless (debug): 2014-08-08 16:09:13.812275 7f2172418700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
Fri Aug 08 16:09:35 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[container]/Exec[hours_passed_container]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:09:37 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[hours_passed_account]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:09:39 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[hours_passed_object]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:17:18 +0000 2014 /Stage[main]/Mysql::Password/Exec[set_mysql_rootpw]/unless (debug): error: 'Access denied for user 'root'@'localhost' (using password: YES)'
Fri Aug 08 18:43:40 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][WARNING] ceph-disk: Error: Device is mounted: /dev/sdb2
Fri Aug 08 18:43:40 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][ERROR ] RuntimeError: command returned non-zero exit status: 1
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][WARNING] OSError: [Errno 16] Device or resource busy: '/var/lib/ceph/tmp/mnt.gH3HgL'
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][ERROR ] RuntimeError: command returned non-zero exit status: 1
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [ceph_deploy][ERROR ] GenericError: Failed to create 2 OSDs

Ключи в ls /var/lib/astute/ceph/ присутствуют (публичный и закрытый)

Выжимка ошибок из Puppet лога:

Fri Aug 08 16:09:13 +0000 2014 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/unless (debug): 2014-08-08 16:09:13.812275 7f2172418700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
Fri Aug 08 16:09:35 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[container]/Exec[hours_passed_container]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:09:37 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[hours_passed_account]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:09:39 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[hours_passed_object]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:17:18 +0000 2014 /Stage[main]/Mysql::Password/Exec[set_mysql_rootpw]/unless (debug): error: 'Access denied for user 'root'@'localhost' (using password: YES)'
Fri Aug 08 18:43:40 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][WARNING] ceph-disk: Error: Device is mounted: /dev/sdb2
Fri Aug 08 18:43:40 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][ERROR ] RuntimeError: command returned non-zero exit status: 1
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][WARNING] OSError: [Errno 16] Device or resource busy: '/var/lib/ceph/tmp/mnt.gH3HgL'
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][ERROR ] RuntimeError: command returned non-zero exit status: 1
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [ceph_deploy][ERROR ] GenericError: Failed to create 2 OSDs
 
Ключи в ls /var/lib/astute/ceph/ присутствуют (публичный и закрытый)

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2014-08-11:

#5

English version:

Error message from Puppet logs:

Fri Aug 08 16:09:13 +0000 2014 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/unless (debug): 2014-08-08 16:09:13.812275 7f2172418700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
Fri Aug 08 16:09:35 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[container]/Exec[hours_passed_container]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:09:37 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[hours_passed_account]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:09:39 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[hours_passed_object]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:17:18 +0000 2014 /Stage[main]/Mysql::Password/Exec[set_mysql_rootpw]/unless (debug): error: 'Access denied for user 'root'@'localhost' (using password: YES)'
Fri Aug 08 18:43:40 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][WARNING] ceph-disk: Error: Device is mounted: /dev/sdb2
Fri Aug 08 18:43:40 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][ERROR ] RuntimeError: command returned non-zero exit status: 1
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][WARNING] OSError: [Errno 16] Device or resource busy: '/var/lib/ceph/tmp/mnt.gH3HgL'
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][ERROR ] RuntimeError: command returned non-zero exit status: 1
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [ceph_deploy][ERROR ] GenericError: Failed to create 2 OSDs

Both keys in /var/lib/astute/ceph/ on problem node presented

English version:

Error message from Puppet logs:

Fri Aug 08 16:09:13 +0000 2014 /Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/unless (debug): 2014-08-08 16:09:13.812275 7f2172418700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
Fri Aug 08 16:09:35 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[container]/Exec[hours_passed_container]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:09:37 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[hours_passed_account]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:09:39 +0000 2014 /Stage[main]/Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[hours_passed_object]/returns (notice): TypeError: 'NoneType' object does not support item assignment
Fri Aug 08 16:17:18 +0000 2014 /Stage[main]/Mysql::Password/Exec[set_mysql_rootpw]/unless (debug): error: 'Access denied for user 'root'@'localhost' (using password: YES)'
Fri Aug 08 18:43:40 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][WARNING] ceph-disk: Error: Device is mounted: /dev/sdb2
Fri Aug 08 18:43:40 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][ERROR ] RuntimeError: command returned non-zero exit status: 1
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][WARNING] OSError: [Errno 16] Device or resource busy: '/var/lib/ceph/tmp/mnt.gH3HgL'
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [node-3][ERROR ] RuntimeError: command returned non-zero exit status: 1
Fri Aug 08 18:43:44 +0000 2014 /Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns (notice): [ceph_deploy][ERROR ] GenericError: Failed to create 2 OSDs

Both keys in /var/lib/astute/ceph/ on problem node presented

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2014-08-11:

#6

Fix for Neutron https://launchpad.net/bugs/1352203 was not helped here. Last cluster deploy with it.

Changed in fuel:
assignee:	Vladimir Sharshov (vsharshov) → Fuel Library Team (fuel-library)

Dmitry Ilyin (idv1985) on 2014-08-11

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Dmitry Ilyin (idv1985)

Vladimir Kuklin (vkuklin) on 2014-08-12

Changed in fuel:
assignee:	Dmitry Ilyin (idv1985) → Dmitry Borodaenko (dborodaenko)

Revision history for this message

Dmitry Ilyin (idv1985) wrote on 2014-08-12:

#7

Research shows that we have the same problem that we previously had with nova and cinder apis.
https://bugs.launchpad.net/oslo/+bug/1101404

The problem here is when rsyslog is restarted, for example when a new log config is added, neutron keeps sendind messages to the old unix socket. It doesn't work and neutron enters endless loop consuming 100% of cpu and deployment fails. Dometimes nodes even go offline.

This can be fixed bu something like this: http://paste.openstack.org/show/93408/

But MOS guyz previously have decided that is't a Python bug and should be fixed there http://bugs.python.org/issue15179
https://github.com/eventlet/eventlet/issues/63

There are rumors thta it's already fixed in centos
http://hg.python.org/cpython/rev/99f0c0207faa
But this deployment is Ubuntu

Please port this fix either to Python or to Eventlet. It can also affect other Openstack packages.

Changed in fuel:
status:	Triaged → Confirmed
assignee:	Dmitry Borodaenko (dborodaenko) → MOS Neutron (mos-neutron)

Roman Podoliaka (rpodolyaka) on 2014-08-12

Changed in fuel:
assignee:	MOS Neutron (mos-neutron) → MOS Linux (mos-linux)

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2014-08-12:

#8

If Dima is right, this should be marked as a duplicate of https://bugs.launchpad.net/mos/+bug/1342068

Revision history for this message

Andrew Woodward (xarses) wrote on 2014-08-12:

#9

does not reproduce with 423

Changed in fuel:
status:	Confirmed → Incomplete

Fuel for OpenStack

[fuel] HA deployment with primary-controller+ceph+neutron has failed

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches