[astute] Upload Cirros TestVM image failed with - import pkg_resources Error communicating with http://10.108.6.2:9292 [Errno 32] Broken pipe

Bug #1340593 reported by Andrey Sledzinskiy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Medium
Fuel Library (Deprecated)
5.0.x
Incomplete
Medium
Fuel Library (Deprecated)

Bug Description

{
    "build_id": "2014-07-10_00-39-56",
    "mirantis": "yes",
    "build_number": "112",
    "ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
    "nailgun_sha": "f5ff82558f99bb6ca7d5e1617eddddf7142fe857",
    "production": "docker",
    "api": "1.0",
    "fuelmain_sha": "293015843304222ead899270449495af91b06aed",
    "astute_sha": "5df009e8eab611750309a4c5b5c9b0f7b9d85806",
    "release": "5.0.1",
    "fuellib_sha": "364dee37435cbdc85d6b814a61f57800b83bf22d"
}
Steps:
1. Create next cluster - CentOS, HA, Flat nova-network, Ceph for volumes and for images, 3 controller+ceph nodes, 2 compute+ceph nodes, 1 ceph node
2. Deploy cluster

Actual result - deployment failed with error on controller-1 (node-1)

Trying to execute image uploading from controller:

[root@node-1 ~]# /usr/bin/glance -N http://10.108.7.2:5000/v2.0/ -T admin -I admin -K admin image-create --name 'TestVM' --is-public true --container-format='bare' --disk-format='qcow2' --min-ram=64 --property murano_image_info='{"title": "Murano Demo", "type": "cirros.demo"}' --file '/opt/vm/cirros-x86_64-disk.img'
/usr/lib/python2.6/site-packages/eventlet/hubs/__init__.py:8: UserWarning: Module backports was already imported from /usr/lib64/python2.6/site-packages/backports/__init__.pyc, but /usr/lib/python2.6/site-packages is being added to sys.path
  import pkg_resources
Error communicating with http://10.108.6.2:9292 [Errno 32] Broken pipe

Errors in astute log:

 [402] Error running RPC method deploy: Upload cirros "TestVM" image failed, trace: ["/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/post_deploy_actions/upload_cirros_image.rb:91:in `process'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/post_deploy_actions.rb:29:in `block in process'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/post_deploy_actions.rb:29:in `each'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/post_deploy_actions.rb:29:in `process'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/orchestrator.rb:45:in `deploy'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/dispatcher.rb:105:in `deploy'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:126:in `dispatch_message'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:89:in `block in dispatch'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:64:in `call'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:64:in `block in each'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:56:in `each'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:56:in `each'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:87:in `each_with_index'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:87:in `dispatch'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:72:in `block in perform_main_job'"]

2014-07-11 06:21:07 ERR

[402] 02353a31-fa07-42e5-9017-1c483e7a9838: Upload cirros "TestVM" image failed

2014-07-11 06:21:07 ERR

[402] 02353a31-fa07-42e5-9017-1c483e7a9838: cmd: /usr/bin/glance -N http://10.108.7.2:5000/v2.0/ -T admin -I admin -K admin image-create --name 'TestVM' --is-public true --container-format='bare' --disk-format='qcow2' --min-ram=64 --property murano_image_info='{"title": "Murano Demo", "type": "cirros.demo"}' --file '/opt/vm/cirros-x86_64-disk.img'
                                               mcollective error: 02353a31-fa07-42e5-9017-1c483e7a9838: MCollective agents '1' didn't respond within the allotted time.

2014-07-11 06:21:07 ERR

[402] MCollective agents '1' didn't respond within the allotted time.

Errors in puppet log

(/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) change from notrun to 0 failed: ceph-deploy osd activate node-1:/dev/vdb4 node-1:/dev/vdc4 returned 1 instead of one of [0]

Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
tags: added: ceph
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

There are also errors in puppet log from node-1 before the specified ones:
2014-07-11T05:40:08.063114 node-1 ./node-1.test.domain.local/puppet-apply.log:2014-07-11T05:40:08.063114+00:00 debug: (/Stage[main]/Ceph::Mon/Exec[c
eph-deploy mon create]/unless) 2014-07-11 05:40:08.050066 7fc6f032f700 0 librados: client.admin initialization error (2) No such file or directory
2014-07-11T05:40:08.063882 node-1 ./node-1.test.domain.local/puppet-apply.log:2014-07-11T05:40:08.063882+00:00 debug: (/Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/unless) Error connecting to cluster: ObjectNotFound

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The "ObjectNotFound" error is red herring, it's supposed to happen: Exec[ceph-deploy mon create] runs 'ceph mon stat' first to see if monitor already exists.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The real problem seems to be that ceph-deploy has crashed with the following backtrace:

2014-07-11T06:15:31.573851+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) Traceback (most recent call last):
2014-07-11T06:15:31.574079+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) File "/usr/bin/ceph-deploy", line 21, in <module>
2014-07-11T06:15:31.574840+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) sys.exit(main())
2014-07-11T06:15:31.574962+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", line 83, in newfunc
2014-07-11T06:15:31.574962+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) return f(*a, **kw)
2014-07-11T06:15:31.575167+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 150, in main
2014-07-11T06:15:31.576201+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) return args.func(args)
2014-07-11T06:15:31.576315+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 371, in osd
2014-07-11T06:15:31.576315+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) activate(args, cfg)
2014-07-11T06:15:31.576596+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 278, in activate
2014-07-11T06:15:31.577323+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) cmd=cmd, ret=ret, out=out, err=err)
2014-07-11T06:15:31.577323+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) NameError: global name 'ret' is not defined

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Across the whole cluster, 2 OSDs out of 12 failed to activate:
2014-07-11 06:18:46.948432 mon.0 10.108.7.3:6789/0 122 : [INF] osdmap e61: 12 osds: 10 up, 12 in

At the end of ceph.log on node-1, there's a number of slow requests:
2014-07-11 06:19:38.746562 osd.9 10.108.7.7:6800/24689 2 : [WRN] slow request 30.154477 seconds old, received at 2014-07-11 06:19:08.592017: osd_op(client.4777.0:1 00109b68-f1b2-4e8d-90e4-ebb4b5eeffe0.rbd [stat] 3.82587b08 e61) v4 currently reached pg
...
2014-07-11 06:21:09.288881 osd.10 10.108.7.8:6802/24827 4 : [WRN] slow request 60.062250 seconds old, received at 2014-07-11 06:20:09.226564: osd_op(client.4783.0:14 rbd_data.12af32850276.0000000000000000 [write 0~8388608] 3.29b77010 e61) v4 currently reached pg

Could be that Glance image upload timed out due to Ceph OSD cluster rebalance following failure of 2 OSDs.

Changed in fuel:
importance: High → Medium
status: New → Incomplete
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Marking Incomplete to confirm that this is actually reproducible and not due to overloaded virtual test environment.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Cannot reproduce on {"build_id": "2014-07-14_00-31-14", "mirantis": "yes", "build_number": "122", "ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f", "nailgun_sha": "4637ae61920b92f56154a03f7e717e3bd597525c", "production": "docker", "api": "1.0", "fuelmain_sha": "8a8faa68f7939d095dde60665f16b86d068f696d", "astute_sha": "5df009e8eab611750309a4c5b5c9b0f7b9d85806", "release": "5.0.1", "fuellib_sha": "2d1e1369c13bc9771e9473086cb064d257a21fc2"}
(KVM)

Dmitry Ilyin (idv1985)
summary: - Upload Cirros TestVM image failed with - import pkg_resources Error
- communicating with http://10.108.6.2:9292 [Errno 32] Broken pipe
+ [astute] Upload Cirros TestVM image failed with - import pkg_resources
+ Error communicating with http://10.108.6.2:9292 [Errno 32] Broken pipe
Changed in fuel:
milestone: 5.1 → 6.0
Egor Kotko (ykotko)
Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Egor Kotko (ykotko) wrote :

{"build_id": "2014-09-08_13-06-43", "ostf_sha": "2969c1ad443b632e815bb1f01149c3800cd7aa3a", "build_number": "199", "api": "1.0", "nailgun_sha": "c18a21381843dffe807b254a4ff96eec259953cb", "production": "docker", "fuelmain_sha": "b1e46c8c3ac86a8eafac225507294226322160ea", "astute_sha": "6db5f5031b74e67b92fcac1f7998eaa296d68025", "feature_groups": ["mirantis"], "release": "5.0.2", "fuellib_sha": "c891490242d7d88f3c9582cf023e8f2db0b9d337"}

I have reproduced this issue via Fuel cli env.

My steps to reproduce:
1. Deploy lab via fuel cli:
Ubuntu HA, neutron vlan, Ceph for volumes and images,
3Comntrollers, 2Computes, 2Cephs

Here is my env:
http://172.18.164.197:8000/

Revision history for this message
Egor Kotko (ykotko) wrote :

Right version of ISO for previous comment:

{"build_id": "2014-09-12_00-01-11", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "4", "auth_required": true, "api": "1.0", "nailgun_sha": "d389bc6489fe296c9c210f7c65ac84e154a8b82b", "production": "docker", "fuelmain_sha": "d899675a5a393625f8166b29099d26f45d527035", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["experimental"], "release": "5.1", "release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-12_00-01-11", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "4", "api": "1.0", "nailgun_sha": "d389bc6489fe296c9c210f7c65ac84e154a8b82b", "production": "docker", "fuelmain_sha": "d899675a5a393625f8166b29099d26f45d527035", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["experimental"], "release": "5.1", "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"}}}, "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"}

Revision history for this message
Egor Kotko (ykotko) wrote :
tags: added: release-notes
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmitry Borodaenko (dborodaenko)
Changed in fuel:
assignee: Dmitry Borodaenko (dborodaenko) → Fuel Library Team (fuel-library)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.