Failed to schedule_create_volume: No valid host was found.

Bug #1333210 reported by Anastasia Palkina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Aleksandr Didenko
5.0.x
Invalid
High
Aleksandr Didenko

Bug Description

"build_id": "2014-06-23_00-31-14",
"mirantis": "yes",
"build_number": "265",
"ostf_sha": "429c373fb79b1073aa336bc62c6aad45a8f93af6",
"nailgun_sha": "eaabb2c9bbe8e921aaa231960dcda74a7bc86213",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "4394ca9be6540d652cc3919556633d9381e0db64",
"astute_sha": "694b5a55695e01e1c42185bfac9cc7a641a9bd48",
"release": "5.1",
"fuellib_sha": "dc2713b3ba20ccff2816cf61e8481fe2f17ed69b"

1. Create new environment (CentOS, simple mode)
2. Choose GRE segmentation
3. Add controller, compute, cinder
4. Click "Select all" and click button "Configure interfaces"
5. Select interfaces with "Storage" and "Management" networks and click button "Bond interfaces" (see result on screen)
6. Select mode "LACP balance TCP"
7. Save changes
8. Start deployment. It was successful
9. Start OSTF test. Some of them has failed because instance and volume have 'error' state (see screen)
10. There are errors in /var/log/docker-logs/remote/node-6.domain.tld/cinder-scheduler.log:

2014-06-23T11:40:49.896511+01:00 err: 2014-06-23 10:40:49.900 19518 ERROR cinder.scheduler.flows.create_volume [req-4fe88578-cdb1-44e6-b9e0-f49ba4ea779e 8de99de14f9b4ab383e53930d8f4d147 605023f8bac74b97b084d3d732d685bf - - -] Failed to schedule_create_volume: No valid host was found.

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Looks like cinder-volume on the cinder node is unable to connect to MySQL on the controller.

cinder-scheduler.log on node-6 (controller):
2014-06-23T11:40:49.896511+01:00 err: 2014-06-23 10:40:49.900 19518 ERROR cinder.scheduler.flows.create_volume [req-4fe88578-cdb1-44e6-b9e0-f49ba4ea779e 8de99de14f9b4ab383e53930d8f4d147 605023f8bac74b97b084d3d732d685bf - - -] Failed to schedule_create_volume: No valid host was found.

cinder-volume.log on node-8 (cinder):
2014-06-23T10:30:12.173519+01:00 warning: 2014-06-23 09:30:12.149 20448 WARNING cinder.openstack.common.db.sqlalchemy.session [req-390721ea-9d7e-40c2-826f-192f4781bc5e - - - - -] SQL connection failed. infinite attempts left.
...
2014-06-23T12:16:14.147116+01:00 warning: 2014-06-23 11:16:14.150 20448 WARNING cinder.openstack.common.db.sqlalchemy.session [req-390721ea-9d7e-40c2-826f-192f4781bc5e - - - - -] SQL connection failed. infinite attempts left.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

node-8 wasn't able to connect to node-6 from the very beginning of deployment:
Mon Jun 23 09:29:47 +0000 2014 /Stage[main]/Cinder::Base/Exec[cinder-manage db_sync] (err): Failed to call refresh: Command exceeded timeout

Changed in fuel:
status: New → Incomplete
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Looks like a network problem. If it's not a one-off issue, the real bug here is that deployment didn't fail after cinder db_sync failed on node-8.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Dmitry, what kind of logs we are looking for? How we can confirm this issue?

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

We need to confirm that there wasn't a general network connectivity problem between node-6 (controller) and node-8 (cinder) in this environment. If the test environment is still available, standard network connectivity troubleshooting routine is needed: ping node-6 from node-8, telnet node-6 3306, etc. If the connectivity problem is specific to mysql, we need to investigate a root cause for that. If it's a generic network failure, we need to look at the test environment where this happened, and try to reproduce it again, likely to be a one-off failure that can be ignored.

If there is a real problem behind this bug report, we need to create a new bug for the problem of cinder db_sync failure not being reported as deployment failure. Otherwise, we can hijack this bug report for investigating that.

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Aleksandr Didenko (adidenko)
Revision history for this message
Anastasia Palkina (apalkina) wrote :

I reproduced this bug again.

There are errors in puppet.log on cinder node (node-3):

2014-06-24 12:43:05 ERR

 (/Stage[main]/Cinder::Base/Exec[cinder-manage db_sync]) Command exceeded timeout

2014-06-24 12:43:03 ERR

 (/Stage[main]/Cinder::Base/Exec[cinder-manage db_sync]) Failed to call refresh: Command exceeded timeout

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

It's virtual environment issue. Balance-tcp bond for eth2+eth3 does not work in this environment:

[root@node-3 ~]# ovs-appctl bond/show
---- ovs-bond0 ----
bond_mode: balance-tcp
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 9987 ms
lacp_status: negotiated

slave eth2: disabled
        may_enable: false

slave eth3: disabled
        may_enable: false
        hash 204: 0 kB load

[root@node-2 ~]# telnet 192.168.0.1 3306
Trying 192.168.0.1...
telnet: connect to address 192.168.0.1: No route to host

[root@node-2 ~]# telnet 192.168.0.1 5672
Trying 192.168.0.1...
telnet: connect to address 192.168.0.1: Connection timed out

As soon as I switch bond to active-backup like this:

ovs-vsctl del-port ovs-bond0
ovs-vsctl add-bond br-ovs-bond0 ovs-bond0 eth2 eth3 bond_mode=active-backup

Management network becomes working again and we can access 3306 and other ports just fine.

Or when I just put down eth3 on all VMs (which makes bond work via eth2 on all VMs), then it works too:

[root@node-2 ~]# ovs-appctl bond/show
---- ovs-bond0 ----
bond_mode: balance-tcp
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 3470 ms
lacp_status: negotiated

slave eth2: enabled
        active slave
        may_enable: true
        hash 9: 0 kB load
        hash 101: 0 kB load
        hash 189: 0 kB load

slave eth3: disabled
        may_enable: false

[root@node-2 ~]# telnet 192.168.0.1 3306
Trying 192.168.0.1...
Connected to 192.168.0.1.
Escape character is '^]'

Marking this bug as invalid, because in order to test balance-tcp+LACP bonds we need to make sure we have a proper virtual network setup for such bonding scheme.

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced this bug without bonding on ISO #83, version 5.0.1

"build_id": "2014-07-02_00-31-14",
"mirantis": "yes",
"build_number": "83",
"ostf_sha": "d0fe60e0eba61685008b86d101f459fc2d3bb654",
"nailgun_sha": "63a852bc402c079083a8cd0896c44254a1adcdbc",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "5691d5e83e95738a5fc3557e5641579ad05a3a03",
"astute_sha": "644d279970df3daa5f5a2d2ccf8b4d22d53386ff",
"release": "5.0.1",
"fuellib_sha": "37eb307f55d5f35559a13227a3a75c57059d6469"

1. Create new environment (CentOS, simple mode)
2. Choose Ceph for images
3. Add controller, compute and 3 ceph
4. Start deployment. It was successful
5. Start OSTF tests
6. Test "Create volume and attach it to instance" has failed this error: There are no cinder nodes or ceph storage for volume but ceph nodes are available
7. Login to Horizon
8. Manually created volume, it has status "error"
9. There is error in log on controller (node-1):

[root@fuel ~]# less /var/log/remote/node-1.domain.tld/cinder-cinder.scheduler.flows.create_volume.log | grep err
2014-07-02T13:02:31.192909+00:00 err: ERROR: Failed to schedule_create_volume: No valid host was found.

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

It works as expected.

storage:
  osd_pool_size: "2"
  objects_ceph: false
  volumes_ceph: false
  images_ceph: true
  volumes_lvm: true
  ephemeral_ceph: false

So ceph is used for images only (glance) and you don't have any cinder nodes in your env. This is why it's impossible to create volumes.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.