manila (neutron) gate jobs are timing out

Bug #1528597 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Invalid
Medium
Unassigned

Bug Description

Seen here:

http://logs.openstack.org/65/256865/5/gate/gate-manila-tempest-dsvm-neutron-multibackend/76e7926/console.html#_2015-12-21_17_26_11_797

But logstash also shows the manila jobs with the neutron backend are timing out regularly:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22Killed%5C%22%20AND%20message:%5C%22timeout%20-s%209%5C%22%20AND%20tags:%5C%22console%5C%22%20AND%20project:%5C%22openstack/manila%5C%22%20AND%20voting:%5C%221%5C%22

90 hits in 7 days, master and stable/liberty, check and gate, all failures.

There is a test in there taking 421 seconds:

http://logs.openstack.org/65/256865/5/gate/gate-manila-tempest-dsvm-neutron-multibackend/76e7926/console.html#_2015-12-21_16_58_16_198

manila_tempest_tests.tests.api.test_security_services_mapping_negative.SecServicesMappingNegativeTest.test_delete_ss_from_sn_used_by_share_server [421.032172s] ... ok

It should also be noted that most of the failures are in OVH nodes, which are the slowest that infra uses, but there are also some timeouts on HP cloud nodes.

It looks like the overall job timeout is set to 70 minutes:

http://logs.openstack.org/65/256865/5/gate/gate-manila-tempest-dsvm-neutron-multibackend/76e7926/console.html#_2015-12-21_16_20_30_813

export DEVSTACK_GATE_TIMEOUT=70

Matt Riedemann (mriedem)
Changed in manila:
status: New → Confirmed
Revision history for this message
Valeriy Ponomaryov (vponomaryov) wrote :

Using jobs results from run you used, we can see that there are "connectivity" timeouts:

http://logs.openstack.org/65/256865/5/gate/gate-manila-tempest-dsvm-neutron-multibackend/76e7926/logs/screen-m-shr.txt.gz?level=TRACE#_2015-12-21_16_56_49_193

The best approach to fix this bug is avoid usage of Nova. So, problem not in timeout, problem in failures of Nova. If not timeout then just failed tests, but, anyway, it is breakage of things that manila uses.

And there is ongoing work on driver that is expected to replace "Generic" driver that works with Nova and Cinder, the LXD driver. - https://review.openstack.org/#/c/245751/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Valeriy, if the instance is volume-backed then this is probably the same issue as we hit in the gate with test_volume_boot_pattern and ssh timeouts:

http://status.openstack.org/elastic-recheck/index.html#1355573

Revision history for this message
Valeriy Ponomaryov (vponomaryov) wrote :

Matt,

Instances for Manila booted from an image, not volume. But we see "ssh timeout"s as well. So, there is definitely some more generic problem related to SSH connection, that is carried over releases again and again.

Changed in manila:
milestone: none → newton-1
Changed in manila:
importance: Undecided → Medium
Revision history for this message
Matt Riedemann (mriedem) wrote :

We haven't seen this failure signature in the gate in a long time, so maybe worth dropping this one and re-opening later.

Changed in manila:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.