"Timeout of deployment is exceeded" error without any other errors

Bug #1418991 reported by Leontii Istomin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Low
MOS Linux

Bug Description

[root@fuel ~]# fuel --fuel-version
api: '1.0'
astute_sha: ed5270bf9c6c1234797e00bd7d4dd3213253a413
auth_required: true
build_id: 2015-01-29_22-55-01
build_number: '86'
feature_groups:
- mirantis
fuellib_sha: b2fbaa9ffb74fafe1f5c2c480944a78424e1ae28
fuelmain_sha: ''
nailgun_sha: 6d8745abb64d392ccdf5d3a5fa8ca17ac1f57942
ostf_sha: c9100263140008abfcc2704732e98fbdfd644068
production: docker
python-fuelclient_sha: cb8928ce34f5ca88c0d6cecc6331488db75362ac
release: '6.1'
release_versions:
  2014.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: ed5270bf9c6c1234797e00bd7d4dd3213253a413
      build_id: 2015-01-29_22-55-01
      build_number: '86'
      feature_groups:
      - mirantis
      fuellib_sha: b2fbaa9ffb74fafe1f5c2c480944a78424e1ae28
      fuelmain_sha: ''
      nailgun_sha: 6d8745abb64d392ccdf5d3a5fa8ca17ac1f57942
      ostf_sha: c9100263140008abfcc2704732e98fbdfd644068
      production: docker
      python-fuelclient_sha: cb8928ce34f5ca88c0d6cecc6331488db75362ac
      release: '6.1'

Baremetal,Centos, HA, Neutron-gre,Ceilometer,Ceph-for-all, Debug, 6.1_86
Controllers:3 Computes:47

I got the following error during deployment step:
Deployment has failed. Timeout of deployment is exceeded.

There was only one error in astute log
2015-02-06 11:55:59 ERR [430] Timeout of deployment is exceeded.

snapshot is here: https://drive.google.com/a/mirantis.com/file/d/0Bx4ptZV1Jt7hcTRQUG0wUEpaZEU/view?usp=sharing

Tags: scale
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

node-46.domain.tld got stuck during deployment which has caused this timeout.

puppet.log:
2015-02-06T11:01:20.214116+00:00 info: (/Stage[main]/Ceph/Service[ceph]) Starting to evaluate the resource
2015-02-06T11:01:20.215260+00:00 debug: Executing '/sbin/service ceph status'
2015-02-06T11:01:20.382134+00:00 debug: Executing '/sbin/chkconfig ceph'
2015-02-06T11:01:20.515140+00:00 debug: Executing '/sbin/service ceph status'
2015-02-06T11:01:20.682346+00:00 debug: Executing '/sbin/service ceph stop'
2015-02-06T11:01:20.849594+00:00 debug: Executing '/sbin/service ceph start'

ps.txt:
root 9853 0.0 0.4 249676 138680 ? Ss 10:55 0:07 /usr/bin/ruby /usr/bin/puppet apply /etc/puppet/modules/osnailyfacter/modular/legacy.pp --modulepath=/etc/puppet/modules --logdest syslog --trace --no-report --debug --evaltrace --logdest /var/log/puppet.log
root 14200 0.0 0.0 11564 1688 ? Ss 11:01 0:00 \_ /bin/sh /sbin/service ceph start
root 14203 0.0 0.0 11432 1564 ? S 11:01 0:00 \_ /bin/sh /etc/init.d/ceph start
root 14227 0.0 0.0 59276 7980 ? S 11:01 0:00 \_ python /usr/sbin/ceph-disk activate-all
root 14237 0.0 0.0 14908 824 ? D 11:01 0:00 \_ /bin/mount -t xfs -o noatime -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6082823d-5227-499e-8fcf-c7e92946e3f8 /var/lib/ceph/tmp/mnt.CVRsmY

root 11661 0.0 0.0 11300 1304 ? S 10:56 0:00 /bin/sh /usr/sbin/ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdb4
root 11674 0.0 0.0 59256 8008 ? S 10:56 0:00 \_ python /usr/sbin/ceph-disk prepare --fs-type xfs --cluster ceph -- /dev/sdb4
root 11981 0.0 0.0 8344 700 ? D 10:56 0:00 \_ /bin/umount -- /var/lib/ceph/tmp/mnt.URKkEe

kernel.log:
2015-02-06T11:00:05.438528+00:00 err: INFO: task umount:11981 blocked for more than 120 seconds.
2015-02-06T11:00:05.438655+00:00 err: Not tainted 2.6.32-504.1.3.el6.x86_64 #1
2015-02-06T11:00:05.488551+00:00 err: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

'umount' and 'mount xfs' have stuck in "D" state for some reason.

Changed in fuel:
assignee: nobody → MOS Linux (mos-linux)
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Ok, I'm confused :)

Mount/Unmount can stuck in D state for various reasons. What is the state of disks, network mounts, host nodes (since tests run in kvm)? Why it got assigned to mos-linux without proper investigation?

Changed in fuel:
assignee: MOS Linux (mos-linux) → Aleksandr Didenko (adidenko)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Aleksandr, log snapshot provides info about mounts, running processes, all required logs and atop binaries. Please investigate the issue.

Changed in fuel:
assignee: Aleksandr Didenko (adidenko) → MOS Linux (mos-linux)
milestone: none → 6.1
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Were you able to reproduce it at least once more?

Also, I'm getting errors with provided archive:
amnk$ tar -xzf fuel-snapshot-2015-02-06_13-03-31.tgz
fuel-snapshot-2015-02-06_13-03-31/node-29.domain.tld/var/log/atop/atop_current: (Empty error message)
tar: Error exit delayed from previous errors.

amnk$ tar -tzf fuel-snapshot-2015-02-06_13-03-31.tgz > /dev/null
tar: Truncated input file (needed 13027840 bytes, only 0 available)
tar: Error exit delayed from previous errors.

Changed in fuel:
importance: High → Low
Revision history for this message
Dina Belova (dbelova) wrote :

This was not reproduced since that time, but please do not be too happy with it :) We did not have too many redeployments since that time :)

Revision history for this message
Leontii Istomin (listomin) wrote :

We still didn't reproduce the issue. Please mark this one as invalid

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.