[systests] deploy_bonding_ha_balance_slb failed, there is no installed cirros-testvm on the controllers

Bug #1391584 reported by Tatyanka
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Dennis Dmitriev
5.1.x
Won't Fix
Medium
Dennis Dmitriev
6.0.x
Won't Fix
Medium
Dennis Dmitriev
6.1.x
Invalid
Medium
Dennis Dmitriev

Bug Description

[root@nailgun ~]# cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1.1"
  api: "1.0"
  build_number: "32"
  build_id: "2014-11-10_21-00-23"
  astute_sha: "702af3db6f5bca92525bc8322d7d5d7675ec857e"
  fuellib_sha: "e5b3de834a400d98d8c6ba416249832a0c16076c"
  ostf_sha: "64cb59c681658a7a55cc2c09d079072a41beb346"
  nailgun_sha: "c1d83acf28d20b2e96d24611924cb25b501af343"
  fuelmain_sha: "05c9d133a11efe21b78cf72a5b3001b5853e1ea0"

Deployment failed with error:
http://paste.openstack.org/show/131952/

On controller:
 * Documentation: https://help.ubuntu.com/
New release '14.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Tue Nov 11 04:36:24 2014 from 10.108.60.2
root@node-4:~# . /root/openrc && /usr/bin/glance image-create --name 'TestVM' --is-public true --container-format='bare' --disk-format='qcow2' --min-ram=64 --property murano_image_info='{"title": "Murano Demo", "type": "cirros.demo"}' --file '/usr/share/cirros-testvm/cirros-x86_64-disk.img'
[Errno 2] No such file or directory: u'/usr/share/cirros-testvm/cirros-x86_64-disk.img'

after I run command:
apt-get install cirros-testvm the same command works fine
+------------------------------+-------------------------------------------------+
| Property | Value |
+------------------------------+-------------------------------------------------+
| Property 'murano_image_info' | {"title": "Murano Demo", "type": "cirros.demo"} |
| checksum | 64d7c1cd2b6f60c92c14662941cb7913 |
| container_format | bare |
| created_at | 2014-11-11T04:54:14 |
| deleted | False |
| deleted_at | None |
| disk_format | qcow2 |
| id | 2e952eca-7062-4447-b26b-8396191d775b |
| is_public | True |
| min_disk | 0 |
| min_ram | 64 |
| name | TestVM |
| owner | b813afb40d424bec945107c488903d75 |
| protected | False |
| size | 13167616 |
| status | active |
| updated_at | 2014-11-11T04:54:15 |
| virtual_size | None |
+------------------------------+-------------------------------------------------+

Tags: system-tests
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
description: updated
Changed in fuel:
status: New → Confirmed
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Puppet installs cirros-testvm package on primary controller, Astute also runs image upload on the primary controller, which is node-1, so cirros-testvm package is not a problem. The problem is:

04:35:36 Error communicating with http://10.108.61.2:9292 [Errno 32] Broken pipe

Here's appropriate records from Haproxy log on node-1:

2014-11-11T04:35:36.545182+00:00 info: 10.108.61.3:37007 [11/Nov/2014:04:35:33.719] glance-api glance-api/node-4 0/0/0/-1/2829 502 204 - - SH-- 68/0/0/0/0 0/0 "POST /v1/images HTTP/1.1"

As we can see Haproxy got "502 Bad Gateway" error. Here is a part of glance-api.log from the problem server that was handling this request (node-4):

2014-11-11T04:35:38.020647+00:00 info: 2014-11-11 04:35:38.020 15642 INFO glance.wsgi.server [3f687a27-501c-4a53-96ba-07c81bc6a8eb f419725e6811497dbcf3550718c0e14c b813afb40d424bec945107c488903d75 - - -] Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 405, in handle_one_response
    write(''.join(towrite))
  File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 349, in write
    _writelines(towrite)
  File "/usr/lib/python2.7/socket.py", line 334, in writelines
    self.flush()
  File "/usr/lib/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
  File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 307, in sendall
    tail = self.send(data, flags)
  File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 293, in send
    total_sent += fd.send(data[total_sent:], flags)
error: [Errno 32] Broken pipe

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Aleksandr, there was also an error associated with the same req-id before:
2014-11-11T04:35:37.637217+00:00 err: 2014-11-11 04:35:37.635 15642 ERROR swiftclient [3f687a27-501c-4a53-96ba-07c81bc6a8eb f419725e6811497dbcf3550718c0e14c b813afb40d424bec945107c48890
3d75 - - -] Container HEAD failed: http://10.108.61.2:8080/v1/AUTH_750fa318e28c4448b32628f860e9d0f9/glance 404 Not Found
2014-11-11 04:35:37.635 15642 TRACE swiftclient Traceback (most recent call last):
2014-11-11 04:35:37.635 15642 TRACE swiftclient File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1110, in _retry
2014-11-11 04:35:37.635 15642 TRACE swiftclient rv = func(self.url, self.token, *args, **kwargs)
2014-11-11 04:35:37.635 15642 TRACE swiftclient File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 581, in head_container
2014-11-11 04:35:37.635 15642 TRACE swiftclient http_response_content=body)
2014-11-11 04:35:37.635 15642 TRACE swiftclient ClientException: Container HEAD failed: http://10.108.61.2:8080/v1/AUTH_750fa318e28c4448b32628f860e9d0f9/glance 404 Not Found
2014-11-11 04:35:37.635 15642 TRACE swiftclient
2014-11-11T04:35:37.735910+00:00 debug: 2014-11-11 04:35:37.735 15642 DEBUG swiftclient [3f687a27-501c-4a53-96ba-07c81bc6a8eb f419725e6811497dbcf3550718c0e14c b813afb40d424bec945107c488903d75 - - -] REQ: curl -i http://10.108.61.2:8080/v1/AUTH_750fa318e28c4448b32628f860e9d0f9/glance -X PUT -H "Content-Length: 0" -H "X-Auth-Token: b9e4797e8c784b05bf40dfe36d8eb33c"
 http_log /usr/lib/python2.7/dist-packages/swiftclient/client.py:74
2014-11-11T04:35:37.736993+00:00 debug: 2014-11-11 04:35:37.736 15642 DEBUG swiftclient [3f687a27-501c-4a53-96ba-07c81bc6a8eb f419725e6811497dbcf3550718c0e14c b813afb40d424bec945107c488903d75 - - -] RESP STATUS: 201

Also, there are multiple galeracheck failures http://pastebin.com/EdA5DA9k in logs around as well. As you can see, status 0 if followed by signal 13 (Broken pipe) many times. Perhaps, galera check script should detect such flapping better and mark the mysql backend as down?..
Anyway, looks like glance-api was hit by network flapping and couldn't handle reconnect as appropriate as well

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Looks like I'm wrong about network flapping, there are no proof of that in logs. But it is still not clear why galeracheck sometimes report signal 13 and is it normal or not, though, this one should not be related to the glance-api issue anyway

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

@Bogdan, you're not wrong. It looks like the network setup is the problem here. I've tried to run the same system test locally and also got problems with bond. Current scheme:

        raw_data = {
            'mac': None,
            'mode': 'balance-slb',
            'name': 'ovs-bond0',
            'slaves': [
                {'name': 'eth4'},
                {'name': 'eth3'},
                {'name': 'eth2'},
                {'name': 'eth1'}
            ],
        interfaces = {
            'eth0': ['fuelweb_admin'],
            'ovs-bond0': [
                'public',
                'management',
                'storage',
                'private'
            ]
        }

I.e. NAT forwarded interface eth1 ('public' network) in in the same bond as isolated virtual networks which may cause problems with public network/IPs accessibility.

I suggest to change it to:

        raw_data = {
            'mac': None,
            'mode': 'balance-slb',
            'name': 'ovs-bond0',
            'slaves': [
                {'name': 'eth4'},
                {'name': 'eth3'},
                {'name': 'eth2'},
            ],
            'state': None,
            'type': 'bond',
            'assigned_networks': []
        }

        interfaces = {
            'eth0': ['fuelweb_admin'],
            'eth1': ['public'],
            'ovs-bond0': [
                'management',
                'storage',
                'private'
            ]
        }

I.e. leave NAT forwarded eth0 (admin) and eth1 (public) intact and test bonding on remaning 3 interfaces.

summary: - 5.1.1 deploy_bonding_ha_balance_slb failed, there is no installed
+ [systests] deploy_bonding_ha_balance_slb failed, there is no installed
cirros-testvm on the controllers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/134890

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Aleksandr Didenko (adidenko)
status: Triaged → In Progress
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

It still looks like we're using not valid virtual environment for balance-slb testing. We should use virtual switch for this (ovs on the host system side), plug our bond interfaces into that switch and configure it for bonding.

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

I've faced random network connectivity issues thru the networks attached to the 'ovs-bond0' interface, which is configured in the following way:

* on the slave nodes:
   - eth0 dedicated to 'admin' network;
   - eth1 dedicated to 'public' network;
   - eth2, eth3, eth4 combined into balance-slb bond interface 'ovs-bond0' with tagged networks 'management', 'storage' and 'private';

* on the host:
   - eth2, eth3, eth4 attached to the same libvirt network (the same dobr* bridge on the host)

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Doesn't reproduces anymore.
Jenkins jobs that perform bonds testing on CI was configured to use INTERFACE_MODEL=e1000.

ISO version: {u'build_id': u'2015-02-15_22-54-44', u'ostf_sha': u'f9c37d0876141e1550eb4e703a8e500cd463282f', u'build_number': u'126', u'auth_required': True, u'nailgun_sha': u'1e3a40dd8a17abe1d38f42da1e0dc1a6d4572666', u'production': u'docker', u'api': u'1.0', u'python-fuelclient_sha': u'61431ed16fc00039a269424bdbaa410277eff609', u'astute_sha': u'1f87a9b9a47de7498b4061d15a8c7fb9435709d5', u'fuelmain_sha': u'2054229e275d08898b5d079a6625ffcc79ae23b8', u'feature_groups': [u'mirantis'], u'release': u'6.1', u'release_versions': {u'2014.2-6.1': {u'VERSION': {u'build_id': u'2015-02-15_22-54-44', u'ostf_sha': u'f9c37d0876141e1550eb4e703a8e500cd463282f', u'build_number': u'126', u'api': u'1.0', u'nailgun_sha': u'1e3a40dd8a17abe1d38f42da1e0dc1a6d4572666', u'production': u'docker', u'python-fuelclient_sha': u'61431ed16fc00039a269424bdbaa410277eff609', u'astute_sha': u'1f87a9b9a47de7498b4061d15a8c7fb9435709d5', u'feature_groups': [u'mirantis'], u'release': u'6.1', u'fuelmain_sha': u'2054229e275d08898b5d079a6625ffcc79ae23b8', u'fuellib_sha': u'7f8d4382abfcd4338964182ebfea1d539f963e66'}}}, u'fuellib_sha': u'7f8d4382abfcd4338964182ebfea1d539f963e66'}

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-main (master)

Change abandoned by Aleksandr Didenko (<email address hidden>) on branch: master
Review: https://review.openstack.org/134890
Reason: No longer needed

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.