deployment fails at upload_cirros.rb step

Bug #1454364 reported by Amichay Polishuk
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Stanislaw Bogatkin

Bug Description

After 100% Of the deployment almost finished, i got the following message :

Deployment has failed. Method granular_deploy. Failed to execute hook 'shell'.
---
priority: 800
fail_on_error: true
type: shell
uids:
- '4'
parameters:
  retries: 3
  cmd: ruby /etc/puppet/modules/osnailyfacter/modular/astute/upload_cirros.rb
  timeout: 180
  interval: 20

Fuel version :

{"build_id": "2015-05-08_11-08-49", "build_number": "395", "release_versions": {"2014.2.2-6.1": {"VERSION": {"build_id": "2015-05-08_11-08-49", "build_number": "395", "api": "1.0", "fuel-library_sha": "f385d6a58298c702f8d4f14c452dcffdc0b1e2a3", "nailgun_sha": "46f55c293e4540d31bcaa6ca3fba77235fb27537", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "af6c9c3799b9ec107bcdc6dbf035cafc034526ce", "astute_sha": "6a4dcd11c67af2917815f3678fb594c7412a4c97", "fuel-ostf_sha": "740ded337bb2a8a9b3d505026652512257375c01", "release": "6.1", "fuelmain_sha": "3eca5e8f7ca6a83faff5feeca92c21cff31c0af1"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "f385d6a58298c702f8d4f14c452dcffdc0b1e2a3", "nailgun_sha": "46f55c293e4540d31bcaa6ca3fba77235fb27537", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "af6c9c3799b9ec107bcdc6dbf035cafc034526ce", "astute_sha": "6a4dcd11c67af2917815f3678fb594c7412a4c97", "fuel-ostf_sha": "740ded337bb2a8a9b3d505026652512257375c01", "release": "6.1", "fuelmain_sha": "3eca5e8f7ca6a83faff5feeca92c21cff31c0af1"}

Diagnostic Snapshot :

https://drive.google.com/file/d/0BzuAt0EZGLAMVTMxNVk4Zk1qZGs/view?usp=sharing

Changed in fuel:
milestone: none → 6.1
assignee: nobody → Fuel Astute Team (fuel-astute)
importance: Undecided → High
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Astute Team (fuel-astute) → Fuel Library Team (fuel-library)
Revision history for this message
Alex Schultz (alex-schultz) wrote :

I have also replicated this on an Ubuntu deploy on VBox. 1 controller, 1 compute.

It looks as though the vip__public is not running or available as I cannot seem to access 172.16.0.2.

Deployment has failed. Method granular_deploy. Failed to execute hook 'shell'.
---
priority: 1100
fail_on_error: true
type: shell
uids:
- '1'
parameters:
  retries: 3
  cmd: ruby /etc/puppet/modules/osnailyfacter/modular/astute/upload_cirros.rb
  timeout: 180
  interval: 20
.
Inspect Astute logs for the details

Changed in fuel:
status: New → Confirmed
Revision history for this message
Alex Schultz (alex-schultz) wrote :
summary: - Centos deployment Failed
+ deployment fails at upload_cirros.rb step
Revision history for this message
Alex Schultz (alex-schultz) wrote :

Just to add additional information, it appears that my vip__public is not running on the controller node [0]. Currently I cannot ping the gateway for the public (172.16.0.0/24 for me) network which caused this vip__public to be downed and probably causing the cirros upload to fail as it is attempting to publish to the vip__public ip address.

@amichayp, is the gateway available in your environment?

[0] http://paste.openstack.org/show/221175/

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Actually, we are using internal url when doing glance call - assigned to Stas Bogatking to investigate the issue and fix if it really happens

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)
status: Incomplete → Confirmed
Revision history for this message
Aviram Bar-Haim (aviramb) wrote :
Download full text (3.3 KiB)

Alex, vip__public Stopped at Amichay's CentOS env too with no ping from outside..

[root@node-4 ~]# crm resource show
 Clone Set: clone_p_vrouter [p_vrouter]
     Started: [ node-4.domain.tld ]
 vip__management (ocf::fuel:ns_IPaddr2): Started
 vip__public_vrouter (ocf::fuel:ns_IPaddr2): Started
 vip__management_vrouter (ocf::fuel:ns_IPaddr2): Started
 vip__public (ocf::fuel:ns_IPaddr2): Stopped
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-4.domain.tld ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-4.domain.tld ]
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-4.domain.tld ]
 Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
     Started: [ node-4.domain.tld ]
 Clone Set: clone_p_neutron-dhcp-agent [p_neutron-dhcp-agent]
     Started: [ node-4.domain.tld ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-4.domain.tld ]
 Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
     Started: [ node-4.domain.tld ]
 Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine]
     Started: [ node-4.domain.tld ]
 Clone Set: clone_p_dns [p_dns]
     Started: [ node-4.domain.tld ]
 Clone Set: clone_ping_vip__public [ping_vip__public]
     Started: [ node-4.domain.tld ]

However, glance call indeed uses internal IP but fails:

[root@node-4 ~]# glance -d image-create --name 'TestVM' --is-public 'true' --container-format='bare' --disk-format='qcow2' --min-ram='64' --file '/opt/vm/cirros-x86_64-disk.img'
/usr/lib/python2.6/site-packages/glanceclient/client.py:26: DeprecationWarning: `version` keyword is being deprecated. Please pass the version as part of the URL. http://$HOST:$PORT/v$VERSION_NUMBER
  DeprecationWarning)
curl -i -X POST -H 'Accept-Encoding: gzip, deflate, compress' -H 'x-image-meta-container_format: bare' -H 'Accept: */*' -H 'X-Auth-Token: {SHA1}bc4cacea163762b4ec6d80f1fc04cc48d94d942f' -H 'x-image-meta-size: 13167616' -H 'x-image-meta-is_public: True' -H 'x-image-meta-min_ram: 64' -H 'User-Agent: python-glanceclient' -H 'Content-Type: application/octet-stream' -H 'x-image-meta-disk_format: qcow2' -H 'x-image-meta-name: TestVM' http://192.168.0.2:9292/v1/images
Request returned failure status 400.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/glanceclient/shell.py", line 637, in main
    args.func(client, args)
  File "/usr/lib/python2.6/site-packages/glanceclient/v1/shell.py", line 232, in do_image_create
    image = gc.images.create(**fields)
  File "/usr/lib/python2.6/site-packages/glanceclient/v1/images.py", line 288, in create
    data=image_data)
  File "/usr/lib/python2.6/site-packages/glanceclient/common/http.py", line 262, in post
    return self._request('POST', url, **kwargs)
  File "/usr/lib/python2.6/site-packages/glanceclient/common/http.py", line 227, in _request
    raise exc.from_response(resp, resp.content)
HTTPBadRequest: <html>
 <head>
  <title>400 Bad Request</title>
 </head>
 <body>
  <h1>400 Bad Request</h1>
  Client disconnected before sending all data to backend<br /><br />

 </body>
</html> (HTTP 400)
<html>
 <head>
  <title>4...

Read more...

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

In order to make the deployment succeeded, public GW must be reachable from all controller nodes, there is known issue regarding this UX. I'm marking this one as a duplicate. If you think this is not, let me know.

Revision history for this message
Aviram Bar-Haim (aviramb) wrote :

Vladimir, I see in glance logs that swiftclient tries to reach public vip with no success http://paste.openstack.org/show/221619/ - maybe this causes the problem?

Revision history for this message
Aviram Bar-Haim (aviramb) wrote :

Bogdan, In one setup this was indeed configuration issue that blocked access to public GW, but we see this failure in another CentOS setup, where the pubilc vip is started.
In the second setup there is no ping to public vip from the primary controller (but there is from other controllers) until closing iptables of one of the other controllers. After closing the iptables there was a ping to public_vip from the primary controller but glance image-create still fails with a different error "500 Internal Server Error" instead of "400 Bad Request". I suspect that there is more than one bug here and only one of them is duplicated.

diagnostic snapshot from the second cluster: https://drive.google.com/file/d/0B26TG8UdpL-EcE5JZFJwd1RHSDA/view?usp=sharing

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

I remove duplicate from this, cause it can be easily fixed. I'll prepare fix today.

Changed in fuel:
status: Confirmed → Triaged
Revision history for this message
Aviram Bar-Haim (aviramb) wrote :

I think that this is related too - https://bugs.launchpad.net/fuel/+bug/1452715.
what we saw today is:
single controller - installation finished with no ping to vip from outside until closing iptables
3 controllers - installation failed on upload image

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/182719

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/182719
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=915e996fc24db41b6d511c113ba026231cc1aba4
Submitter: Jenkins
Branch: master

commit 915e996fc24db41b6d511c113ba026231cc1aba4
Author: Stanislaw Bogatkin <email address hidden>
Date: Wed May 13 18:30:26 2015 +0300

    Add endpoint type parameter

    To get ability to change swift endpoint type, this parameter
    was added.

    Upstream Change-Id: I25ce5d9d119804f1aca5b5567be46750050ebacd
    Change-Id: I356b5d95ec2242722dbcf9ec1fc1cb7be65db867
    Closes-bug: #1454364

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.