Fuel for OpenStack

"Timeout of deployment is exceeded" error without any other errors

Bug #1418991 reported by Leontii Istomin on 2015-02-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	Low	MOS Linux	Fuel for OpenStack 6.1

Bug Description

[root@fuel ~]# fuel --fuel-version
api: '1.0'
astute_sha: ed5270bf9c6c1234797e00bd7d4dd3213253a413
auth_required: true
build_id: 2015-01-29_22-55-01
build_number: '86'
feature_groups:
- mirantis
fuellib_sha: b2fbaa9ffb74fafe1f5c2c480944a78424e1ae28
fuelmain_sha: ''
nailgun_sha: 6d8745abb64d392ccdf5d3a5fa8ca17ac1f57942
ostf_sha: c9100263140008abfcc2704732e98fbdfd644068
production: docker
python-fuelclient_sha: cb8928ce34f5ca88c0d6cecc6331488db75362ac
release: '6.1'
release_versions:
  2014.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: ed5270bf9c6c1234797e00bd7d4dd3213253a413
      build_id: 2015-01-29_22-55-01
      build_number: '86'
      feature_groups:
      - mirantis
      fuellib_sha: b2fbaa9ffb74fafe1f5c2c480944a78424e1ae28
      fuelmain_sha: ''
      nailgun_sha: 6d8745abb64d392ccdf5d3a5fa8ca17ac1f57942
      ostf_sha: c9100263140008abfcc2704732e98fbdfd644068
      production: docker
      python-fuelclient_sha: cb8928ce34f5ca88c0d6cecc6331488db75362ac
      release: '6.1'

Baremetal,Centos, HA, Neutron-gre,Ceilometer,Ceph-for-all, Debug, 6.1_86
Controllers:3 Computes:47

I got the following error during deployment step:
Deployment has failed. Timeout of deployment is exceeded.

There was only one error in astute log
2015-02-06 11:55:59 ERR [430] Timeout of deployment is exceeded.

snapshot is here: https://drive.google.com/a/mirantis.com/file/d/0Bx4ptZV1Jt7hcTRQUG0wUEpaZEU/view?usp=sharing

Tags:

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2015-02-06:

node-46.domain.tld got stuck during deployment which has caused this timeout.

puppet.log:
2015-02-06T11:01:20.214116+00:00 info: (/Stage[main]/Ceph/Service[ceph]) Starting to evaluate the resource
2015-02-06T11:01:20.215260+00:00 debug: Executing '/sbin/service ceph status'
2015-02-06T11:01:20.382134+00:00 debug: Executing '/sbin/chkconfig ceph'
2015-02-06T11:01:20.515140+00:00 debug: Executing '/sbin/service ceph status'
2015-02-06T11:01:20.682346+00:00 debug: Executing '/sbin/service ceph stop'
2015-02-06T11:01:20.849594+00:00 debug: Executing '/sbin/service ceph start'

ps.txt:
root 9853 0.0 0.4 249676 138680 ? Ss 10:55 0:07 /usr/bin/ruby /usr/bin/puppet apply /etc/puppet/modules/osnailyfacter/modular/legacy.pp --modulepath=/etc/puppet/modules --logdest syslog --trace --no-report --debug --evaltrace --logdest /var/log/puppet.log
root 14200 0.0 0.0 11564 1688 ? Ss 11:01 0:00 \_ /bin/sh /sbin/service ceph start
root 14203 0.0 0.0 11432 1564 ? S 11:01 0:00 \_ /bin/sh /etc/init.d/ceph start
root 14227 0.0 0.0 59276 7980 ? S 11:01 0:00 \_ python /usr/sbin/ceph-disk activate-all
root 14237 0.0 0.0 14908 824 ? D 11:01 0:00 \_ /bin/mount -t xfs -o noatime -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6082823d-5227-499e-8fcf-c7e92946e3f8 /var/lib/ceph/tmp/mnt.CVRsmY

root 11661 0.0 0.0 11300 1304 ? S 10:56 0:00 /bin/sh /usr/sbin/ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdb4
root 11674 0.0 0.0 59256 8008 ? S 10:56 0:00 \_ python /usr/sbin/ceph-disk prepare --fs-type xfs --cluster ceph -- /dev/sdb4
root 11981 0.0 0.0 8344 700 ? D 10:56 0:00 \_ /bin/umount -- /var/lib/ceph/tmp/mnt.URKkEe

kernel.log:
2015-02-06T11:00:05.438528+00:00 err: INFO: task umount:11981 blocked for more than 120 seconds.
2015-02-06T11:00:05.438655+00:00 err: Not tainted 2.6.32-504.1.3.el6.x86_64 #1
2015-02-06T11:00:05.488551+00:00 err: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

'umount' and 'mount xfs' have stuck in "D" state for some reason.

node-46.domain.tld got stuck during deployment which has caused this timeout.

puppet.log:
2015-02-06T11:01:20.214116+00:00 info:  (/Stage[main]/Ceph/Service[ceph]) Starting to evaluate the resource
2015-02-06T11:01:20.215260+00:00 debug:  Executing '/sbin/service ceph status'
2015-02-06T11:01:20.382134+00:00 debug:  Executing '/sbin/chkconfig ceph'
2015-02-06T11:01:20.515140+00:00 debug:  Executing '/sbin/service ceph status'
2015-02-06T11:01:20.682346+00:00 debug:  Executing '/sbin/service ceph stop'
2015-02-06T11:01:20.849594+00:00 debug:  Executing '/sbin/service ceph start'

ps.txt:
root      9853  0.0  0.4 249676 138680 ?       Ss   10:55   0:07 /usr/bin/ruby /usr/bin/puppet apply /etc/puppet/modules/osnailyfacter/modular/legacy.pp --modulepath=/etc/puppet/modules --logdest syslog --trace --no-report --debug --evaltrace --logdest /var/log/puppet.log
root     14200  0.0  0.0  11564  1688 ?        Ss   11:01   0:00  \_ /bin/sh /sbin/service ceph start
root     14203  0.0  0.0  11432  1564 ?        S    11:01   0:00      \_ /bin/sh /etc/init.d/ceph start
root     14227  0.0  0.0  59276  7980 ?        S    11:01   0:00          \_ python /usr/sbin/ceph-disk activate-all
root     14237  0.0  0.0  14908   824 ?        D    11:01   0:00              \_ /bin/mount -t xfs -o noatime -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6082823d-5227-499e-8fcf-c7e92946e3f8 /var/lib/ceph/tmp/mnt.CVRsmY

root     11661  0.0  0.0  11300  1304 ?        S    10:56   0:00 /bin/sh /usr/sbin/ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdb4
root     11674  0.0  0.0  59256  8008 ?        S    10:56   0:00  \_ python /usr/sbin/ceph-disk prepare --fs-type xfs --cluster ceph -- /dev/sdb4
root     11981  0.0  0.0   8344   700 ?        D    10:56   0:00      \_ /bin/umount -- /var/lib/ceph/tmp/mnt.URKkEe

kernel.log:
2015-02-06T11:00:05.438528+00:00 err:  INFO: task umount:11981 blocked for more than 120 seconds.
2015-02-06T11:00:05.438655+00:00 err:       Not tainted 2.6.32-504.1.3.el6.x86_64 #1
2015-02-06T11:00:05.488551+00:00 err:  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

'umount' and 'mount xfs' have stuck in "D" state for some reason.

Changed in fuel:
assignee:	nobody → MOS Linux (mos-linux)

Revision history for this message

Aleksander Mogylchenko (amogylchenko) wrote on 2015-02-06:

Ok, I'm confused :)

Mount/Unmount can stuck in D state for various reasons. What is the state of disks, network mounts, host nodes (since tests run in kvm)? Why it got assigned to mos-linux without proper investigation?

Changed in fuel:
assignee:	MOS Linux (mos-linux) → Aleksandr Didenko (adidenko)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-02-12:

@Aleksandr, log snapshot provides info about mounts, running processes, all required logs and atop binaries. Please investigate the issue.

Changed in fuel:
assignee:	Aleksandr Didenko (adidenko) → MOS Linux (mos-linux)
milestone:	none → 6.1
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

Aleksander Mogylchenko (amogylchenko) wrote on 2015-02-17:

Were you able to reproduce it at least once more?

Also, I'm getting errors with provided archive:
amnk$ tar -xzf fuel-snapshot-2015-02-06_13-03-31.tgz
fuel-snapshot-2015-02-06_13-03-31/node-29.domain.tld/var/log/atop/atop_current: (Empty error message)
tar: Error exit delayed from previous errors.

amnk$ tar -tzf fuel-snapshot-2015-02-06_13-03-31.tgz > /dev/null
tar: Truncated input file (needed 13027840 bytes, only 0 available)
tar: Error exit delayed from previous errors.

Changed in fuel:
importance:	High → Low

Revision history for this message

Dina Belova (dbelova) wrote on 2015-02-18:

This was not reproduced since that time, but please do not be too happy with it :) We did not have too many redeployments since that time :)

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-04-15:

We still didn't reproduce the issue. Please mark this one as invalid

Aleksander Mogylchenko (amogylchenko) on 2015-04-15

Changed in fuel:
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.