multinode-oooq-container scenarios and Fs016/17/20 periodic jobs fails on validate-tempest: Unexpected response code received

Bug #1793665 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Juan Antonio Osorio Robles

Bug Description

In most of the fs016 periodic jobs once tempest run finishes, the validate-tempest role fails to generate tempest.html result. It makes very hard to user to debug and go to check tempest.log file for the tempest errors.
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-master/2723047/job-output.txt.gz#_2018-09-20_02_35_29_598462

2018-09-20 02:35:29.598462 | primary | TASK [validate-tempest : Generate testrepository.subunit results file] *********
2018-09-20 02:35:29.615521 | primary | Thursday 20 September 2018 02:35:29 +0000 (0:00:00.672) 0:18:53.370 ****
2018-09-20 02:35:30.688127 | primary | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "set -o pipefail && testr last --subunit | subunit-1to2 > /home/zuul/tempest/testrepository.subunit", "delta": "0:00:00.381209", "end": "2018-09-20 02:35:30.669697", "msg": "non-zero return code", "rc": 127, "start": "2018-09-20 02:35:30.288488", "stderr": "/bin/sh: testr: command not found", "stderr_lines": ["/bin/sh: testr: command not found"], "stdout": "", "stdout_lines": []}
2018-09-20 02:35:30.688196 | primary | ...ignoring
2018-09-20 02:35:30.712744 | primary |
2018-09-20 02:35:30.712900 | primary | TASK [validate-tempest : Generate HTML results file] ***************************
2018-09-20 02:35:30.733215 | primary | Thursday 20 September 2018 02:35:30 +0000 (0:00:01.117) 0:18:54.488 ****
2018-09-20 02:35:31.726561 | primary | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "set -o pipefail &&\n subunit2html $(find /home/zuul/tempest/.testrepository -name [0-9] | head -1) /home/zuul/tempest/tempest.html 2>&1 >> tempest.log", "delta": "0:00:00.360371", "end": "2018-09-20 02:35:31.706626", "msg": "non-zero return code", "rc": 1, "start": "2018-09-20 02:35:31.346255", "stderr": "find: ‘/home/zuul/tempest/.testrepository’: No such file or directory", "stderr_lines": ["find: ‘/home/zuul/tempest/.testrepository’: No such file or directory"], "stdout": "Traceback (most recent call last):\n File \"/usr/bin/subunit2html\", line 10, in <module>\n sys.exit(main())\n File \"/usr/lib/python2.7/site-packages/os_testr/subunit2html.py\", line 740, in main\n stream = open(subunit_file, 'rb')\nIOError: [Errno 2] No such file or directory: '/home/zuul/tempest/tempest.html'", "stdout_lines": ["Traceback (most recent call last):", " File \"/usr/bin/subunit2html\", line 10, in <module>", " sys.exit(main())", " File \"/usr/lib/python2.7/site-packages/os_testr/subunit2html.py\", line 740, in main", " stream = open(subunit_file, 'rb')", "IOError: [Errno 2] No such file or directory: '/home/zuul/tempest/tempest.html'"]}

I think we need to seperate the html generation part. otherwise it is very hard to go through logs and find issue.

Tags: alert ci
summary: - Fs020 periodic jobs fails to generate tempest results once tempest run
- finishes
+ Fs016/17 periodic jobs fails to generate tempest results once tempest
+ run finishes
Revision history for this message
Martin Kopec (mkopec) wrote : Re: Fs016/17/20 periodic jobs fails on validate-tempest: Unexpected response code received
summary: - Fs016/17 periodic jobs fails to generate tempest results once tempest
- run finishes
+ Fs016/17/20 periodic jobs fails on validate-tempest: Unexpected response
+ code received
Revision history for this message
Marios Andreou (marios-b) wrote :

duplicating comment from https://bugs.launchpad.net/tripleo/+bug/1796040 just now since it will be marked duplicate for this one:

master gate check hitting this today too http://logs.openstack.org/98/604298/20/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/037bd06/job-output.txt.gz#_2018-10-04_04_39_49_234612 2018-10-04 04:39:48.087051

primary | TASK [validate-tempest : Exit with tempest result code if configured] **********
2018-10-04 04:39:48.127135 | primary | Thursday 04 October 2018 04:39:48 +0000 (0:00:02.239) 0:04:26.387 ******
2018-10-04 04:39:49.234612 | primary | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "tail -10 tempest.log; exit 1", "delta": "0:00:00.012430", "end": "2018-10-04 04:39:49.184002", "msg": "non-zero return code", "rc": 1, "start": "2018-10-04 04:39:49.171572", "stderr": "", "stderr_lines": [], "stdout": "2018-10-04 04:39:18 | - Skipped: 0\n2018-10-04 04:39:18 | - Expected Fail: 0\n2018-10-04 04:39:18 | - Unexpected Success: 0\n2018-10-04 04:39:18 | - Failed: 1\n2018-10-04 04:39:18 | Sum of execute time for each test: 23.6150 sec.\n2018-10-04 04:39:18 | \n2018-10-04 04:39:18 | ==============\n2018-10-04 04:39:18 | Worker Balance\n2018-10-04 04:39:18 | ==============\n2018-10-04 04:39:18 | - Worker 0 (1 tests) => 0:00:23.614971", "stdout_lines": ["2018-10-04 04:39:18 | - Skipped: 0", "2018-10-04 04:39:18 | - Expected Fail: 0", "2018-10-04 04:39:18 | - Unexpected Success: 0", "2018-10-04 04:39:18 | - Failed: 1", "2018-10-04 04:39:18 | Sum of execute time for each test: 23.6150 sec.", "2018-10-04 04:39:18 | ", "2018-10-04 04:39:18 | ==============", "2018-10-04 04:39:18 | Worker Balance", "2018-10-04 04:39:18 | ==============", "2018-10-04 04:39:18 | - Worker 0 (1 tests) => 0:00:23.614971"]}

Revision history for this message
Martin Kopec (mkopec) wrote :
summary: - Fs016/17/20 periodic jobs fails on validate-tempest: Unexpected response
- code received
+ Jobs fails on validate-tempest: Unexpected response code received
tags: added: alert
tags: added: ci
Revision history for this message
Marios Andreou (marios-b) wrote : Re: Jobs fails on validate-tempest: Unexpected response code received
Revision history for this message
Marios Andreou (marios-b) wrote :

arx thinks this will fix it 11:52 #oooq: < arxcruz> marios|rover: https://review.openstack.org/605419 should fix it

summary: - Jobs fails on validate-tempest: Unexpected response code received
+ multinode-oooq-container scenarios and Fs016/17/20 periodic jobs fails
+ on validate-tempest: Unexpected response code received
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Download full text (4.0 KiB)

http://logs.openstack.org/81/602681/3/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/16ad793/logs/subnode-2/var/log/containers/nova/nova-compute.log.txt.gz#_2018-10-04_06_46_11_197

2018-10-04 06:46:11.197 8 ERROR nova.compute.manager [req-acf88fb6-a57f-42e6-a5fe-4ed401bf778e - - - - -] Error updating resources for node centos-7-ovh-gra1-0002589272.localdomain.: ResourceProviderCreationFailed: Failed to create resource provider centos-7-ovh-gra1-0002589272.localdomain
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager Traceback (most recent call last):
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7753, in _update_available_resource_for_node
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager rt.update_available_resource(context, nodename, startup=startup)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 725, in update_available_resource
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager return f(*args, **kwargs)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 802, in _update_available_resource
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager self._update(context, cn, startup=startup)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager return attempt.get(self._wrap_exception)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2])
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 977, in _update
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 916, in _update_to_placement
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager context, compute_node.uuid, name=compute_node.hypervisor_hostname)
2018-10-04 06:46:11.197 8 ERROR nova.compu...

Read more...

Revision history for this message
Marios Andreou (marios-b) wrote :
Download full text (6.8 KiB)

thanks weshay for pointer here are the relevant failing tests - there are two different traces afaics - but both traces have a "No valid host was found. ', u'code': 500, u'created': u'2018-10-04T04:41:14" thanks arxcruz via irc relevant error from nova: http://logs.openstack.org/81/602681/3/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/16ad793/logs/subnode-2/var/log/extra/errors.txt.gz#_2018-10-04_06_36_10_504 and here http://logs.openstack.org/81/602681/3/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/16ad793/logs/subnode-2/var/log/containers/nova/nova-compute.log.txt.gz#_2018-10-04_06_45_08_354

so this looks to be a nova error.

Tempest traces fyi/fwiw 2 types:

2018-10-04 04:50:16 | {0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops [28.607288s] ... FAILED

  2018-10-04 04:41:18 | {0} tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern [29.796705s] ... FAILED
  2018-10-04 04:41:18 |
  2018-10-04 04:41:18 | Captured traceback:
  2018-10-04 04:41:18 | ~~~~~~~~~~~~~~~~~~~
  2018-10-04 04:41:18 | Traceback (most recent call last):
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
  2018-10-04 04:41:18 | return f(*func_args, **func_kwargs)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/scenario/test_volume_boot_pattern.py", line 104, in test_volume_boot_pattern
  2018-10-04 04:41:18 | security_group=security_group)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/scenario/test_volume_boot_pattern.py", line 64, in _boot_instance_from_resource
  2018-10-04 04:41:18 | return self.create_server(image_id='', **create_kwargs)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 214, in create_server
  2018-10-04 04:41:18 | image_id=image_id, **kwargs)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/common/compute.py", line 258, in create_test_server
  2018-10-04 04:41:18 | server['id'])
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  2018-10-04 04:41:18 | self.force_reraise()
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2018-10-04 04:41:18 | six.reraise(self.type_, self.value, self.tb)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/common/compute.py", line 229, in create_test_server
  2018-10-04 04:41:18 | clients.servers_client, server['id'], wait_until)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/common/waiters.py", line 76, in wait_for_server_status
  2018-10-04 04:41:18 | server_id=server_id)
  2018-10-04 04:41:18 | tempest.exceptions.BuildErrorException: Server 967aa07e-804d-4405-bda0-322d87b25b18 failed to build and is in ERROR status
  2018-10-04 04:41:18 | Details: {u'message': u'No valid host was found. ', u'code': 500, u'created': u'2...

Read more...

Changed in tripleo:
assignee: nobody → Juan Antonio Osorio Robles (juan-osorio-robles)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/607987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/607952
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f2e72352b1376ce719614e9cad4e4c71a3f9c3d8
Submitter: Zuul
Branch: master

commit f2e72352b1376ce719614e9cad4e4c71a3f9c3d8
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Thu Oct 4 15:52:40 2018 +0300

    Fix placement region setting

    We were using a deprecated interfce to set this value. This uses the
    correct one.

    Closes-Bug: #1793665
    Change-Id: Ib7717911aba3267f855ac6682b0144bfe92034fb

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.0.0

This issue was fixed in the openstack/tripleo-heat-templates 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master)

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/607987

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.