Fuel for OpenStack

Failed to schedule_create_volume: No valid host was found.

Bug #1333210 reported by Anastasia Palkina on 2014-06-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	Aleksandr Didenko	Fuel for OpenStack 5.1
	5.0.x	Invalid	High	Aleksandr Didenko	Fuel for OpenStack 5.0.1

Bug Description

"build_id": "2014-06-23_00-31-14",
"mirantis": "yes",
"build_number": "265",
"ostf_sha": "429c373fb79b1073aa336bc62c6aad45a8f93af6",
"nailgun_sha": "eaabb2c9bbe8e921aaa231960dcda74a7bc86213",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "4394ca9be6540d652cc3919556633d9381e0db64",
"astute_sha": "694b5a55695e01e1c42185bfac9cc7a641a9bd48",
"release": "5.1",
"fuellib_sha": "dc2713b3ba20ccff2816cf61e8481fe2f17ed69b"

1. Create new environment (CentOS, simple mode)
2. Choose GRE segmentation
3. Add controller, compute, cinder
4. Click "Select all" and click button "Configure interfaces"
5. Select interfaces with "Storage" and "Management" networks and click button "Bond interfaces" (see result on screen)
6. Select mode "LACP balance TCP"
7. Save changes
8. Start deployment. It was successful
9. Start OSTF test. Some of them has failed because instance and volume have 'error' state (see screen)
10. There are errors in /var/log/docker-logs/remote/node-6.domain.tld/cinder-scheduler.log:

2014-06-23T11:40:49.896511+01:00 err: 2014-06-23 10:40:49.900 19518 ERROR cinder.scheduler.flows.create_volume [req-4fe88578-cdb1-44e6-b9e0-f49ba4ea779e 8de99de14f9b4ab383e53930d8f4d147 605023f8bac74b97b084d3d732d685bf - - -] Failed to schedule_create_volume: No valid host was found.

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-06-23:

fuel-snapshot-2014-06-23_12-16-19.tgz Edit (15.7 MiB, application/x-tar)

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-06-23:

snapshot8.png Edit (704.7 KiB, image/png)

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-06-23:

snapshot7.png Edit (202.7 KiB, image/png)

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-06-23:

Looks like cinder-volume on the cinder node is unable to connect to MySQL on the controller.

cinder-scheduler.log on node-6 (controller):
2014-06-23T11:40:49.896511+01:00 err: 2014-06-23 10:40:49.900 19518 ERROR cinder.scheduler.flows.create_volume [req-4fe88578-cdb1-44e6-b9e0-f49ba4ea779e 8de99de14f9b4ab383e53930d8f4d147 605023f8bac74b97b084d3d732d685bf - - -] Failed to schedule_create_volume: No valid host was found.

cinder-volume.log on node-8 (cinder):
2014-06-23T10:30:12.173519+01:00 warning: 2014-06-23 09:30:12.149 20448 WARNING cinder.openstack.common.db.sqlalchemy.session [req-390721ea-9d7e-40c2-826f-192f4781bc5e - - - - -] SQL connection failed. infinite attempts left.
...
2014-06-23T12:16:14.147116+01:00 warning: 2014-06-23 11:16:14.150 20448 WARNING cinder.openstack.common.db.sqlalchemy.session [req-390721ea-9d7e-40c2-826f-192f4781bc5e - - - - -] SQL connection failed. infinite attempts left.

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-06-23:

node-8 wasn't able to connect to node-6 from the very beginning of deployment:
Mon Jun 23 09:29:47 +0000 2014 /Stage[main]/Cinder::Base/Exec[cinder-manage db_sync] (err): Failed to call refresh: Command exceeded timeout

Changed in fuel:
status:	New → Incomplete

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-06-23:

Looks like a network problem. If it's not a one-off issue, the real bug here is that deployment didn't fail after cinder db_sync failed on node-8.

Revision history for this message

Nastya Urlapova (aurlapova) wrote on 2014-06-23:

Dmitry, what kind of logs we are looking for? How we can confirm this issue?

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-06-23:

We need to confirm that there wasn't a general network connectivity problem between node-6 (controller) and node-8 (cinder) in this environment. If the test environment is still available, standard network connectivity troubleshooting routine is needed: ping node-6 from node-8, telnet node-6 3306, etc. If the connectivity problem is specific to mysql, we need to investigate a root cause for that. If it's a generic network failure, we need to look at the test environment where this happened, and try to reproduce it again, likely to be a one-off failure that can be ignored.

If there is a real problem behind this bug report, we need to create a new bug for the problem of cinder db_sync failure not being reported as deployment failure. Otherwise, we can hijack this bug report for investigating that.

Sergii Golovatiuk (sgolovatiuk) on 2014-06-24

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Aleksandr Didenko (adidenko)

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-06-24:

I reproduced this bug again.

There are errors in puppet.log on cinder node (node-3):

2014-06-24 12:43:05 ERR

(/Stage[main]/Cinder::Base/Exec[cinder-manage db_sync]) Command exceeded timeout

2014-06-24 12:43:03 ERR

(/Stage[main]/Cinder::Base/Exec[cinder-manage db_sync]) Failed to call refresh: Command exceeded timeout

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-06-24:

#10

fuel-snapshot-2014-06-24_14-16-47.tgz Edit (16.4 MiB, application/x-tar)

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-06-24:

#11

It's virtual environment issue. Balance-tcp bond for eth2+eth3 does not work in this environment:

[root@node-3 ~]# ovs-appctl bond/show
---- ovs-bond0 ----
bond_mode: balance-tcp
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 9987 ms
lacp_status: negotiated

slave eth2: disabled
may_enable: false

slave eth3: disabled
may_enable: false
hash 204: 0 kB load

[root@node-2 ~]# telnet 192.168.0.1 3306
Trying 192.168.0.1...
telnet: connect to address 192.168.0.1: No route to host

[root@node-2 ~]# telnet 192.168.0.1 5672
Trying 192.168.0.1...
telnet: connect to address 192.168.0.1: Connection timed out

As soon as I switch bond to active-backup like this:

ovs-vsctl del-port ovs-bond0
ovs-vsctl add-bond br-ovs-bond0 ovs-bond0 eth2 eth3 bond_mode=active-backup

Management network becomes working again and we can access 3306 and other ports just fine.

Or when I just put down eth3 on all VMs (which makes bond work via eth2 on all VMs), then it works too:

[root@node-2 ~]# ovs-appctl bond/show
---- ovs-bond0 ----
bond_mode: balance-tcp
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 3470 ms
lacp_status: negotiated

slave eth2: enabled
        active slave
        may_enable: true
        hash 9: 0 kB load
        hash 101: 0 kB load
        hash 189: 0 kB load

slave eth3: disabled
may_enable: false

[root@node-2 ~]# telnet 192.168.0.1 3306
Trying 192.168.0.1...
Connected to 192.168.0.1.
Escape character is '^]'

Marking this bug as invalid, because in order to test balance-tcp+LACP bonds we need to make sure we have a proper virtual network setup for such bonding scheme.

Changed in fuel:
status:	Incomplete → Invalid

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-07-02:

#12

Reproduced this bug without bonding on ISO #83, version 5.0.1

"build_id": "2014-07-02_00-31-14",
"mirantis": "yes",
"build_number": "83",
"ostf_sha": "d0fe60e0eba61685008b86d101f459fc2d3bb654",
"nailgun_sha": "63a852bc402c079083a8cd0896c44254a1adcdbc",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "5691d5e83e95738a5fc3557e5641579ad05a3a03",
"astute_sha": "644d279970df3daa5f5a2d2ccf8b4d22d53386ff",
"release": "5.0.1",
"fuellib_sha": "37eb307f55d5f35559a13227a3a75c57059d6469"

1. Create new environment (CentOS, simple mode)
2. Choose Ceph for images
3. Add controller, compute and 3 ceph
4. Start deployment. It was successful
5. Start OSTF tests
6. Test "Create volume and attach it to instance" has failed this error: There are no cinder nodes or ceph storage for volume but ceph nodes are available
7. Login to Horizon
8. Manually created volume, it has status "error"
9. There is error in log on controller (node-1):

[root@fuel ~]# less /var/log/remote/node-1.domain.tld/cinder-cinder.scheduler.flows.create_volume.log | grep err
2014-07-02T13:02:31.192909+00:00 err: ERROR: Failed to schedule_create_volume: No valid host was found.