multinode-oooq-container scenarios and Fs016/17/20 periodic jobs fails on validate-tempest: Unexpected response code received

Bug #1793665 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Juan Antonio Osorio Robles

Bug Description

In most of the fs016 periodic jobs once tempest run finishes, the validate-tempest role fails to generate tempest.html result. It makes very hard to user to debug and go to check tempest.log file for the tempest errors.
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-master/2723047/job-output.txt.gz#_2018-09-20_02_35_29_598462

2018-09-20 02:35:29.598462 | primary | TASK [validate-tempest : Generate testrepository.subunit results file] *********
2018-09-20 02:35:29.615521 | primary | Thursday 20 September 2018 02:35:29 +0000 (0:00:00.672) 0:18:53.370 ****
2018-09-20 02:35:30.688127 | primary | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "set -o pipefail && testr last --subunit | subunit-1to2 > /home/zuul/tempest/testrepository.subunit", "delta": "0:00:00.381209", "end": "2018-09-20 02:35:30.669697", "msg": "non-zero return code", "rc": 127, "start": "2018-09-20 02:35:30.288488", "stderr": "/bin/sh: testr: command not found", "stderr_lines": ["/bin/sh: testr: command not found"], "stdout": "", "stdout_lines": []}
2018-09-20 02:35:30.688196 | primary | ...ignoring
2018-09-20 02:35:30.712744 | primary |
2018-09-20 02:35:30.712900 | primary | TASK [validate-tempest : Generate HTML results file] ***************************
2018-09-20 02:35:30.733215 | primary | Thursday 20 September 2018 02:35:30 +0000 (0:00:01.117) 0:18:54.488 ****
2018-09-20 02:35:31.726561 | primary | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "set -o pipefail &&\n subunit2html $(find /home/zuul/tempest/.testrepository -name [0-9] | head -1) /home/zuul/tempest/tempest.html 2>&1 >> tempest.log", "delta": "0:00:00.360371", "end": "2018-09-20 02:35:31.706626", "msg": "non-zero return code", "rc": 1, "start": "2018-09-20 02:35:31.346255", "stderr": "find: ‘/home/zuul/tempest/.testrepository’: No such file or directory", "stderr_lines": ["find: ‘/home/zuul/tempest/.testrepository’: No such file or directory"], "stdout": "Traceback (most recent call last):\n File \"/usr/bin/subunit2html\", line 10, in <module>\n sys.exit(main())\n File \"/usr/lib/python2.7/site-packages/os_testr/subunit2html.py\", line 740, in main\n stream = open(subunit_file, 'rb')\nIOError: [Errno 2] No such file or directory: '/home/zuul/tempest/tempest.html'", "stdout_lines": ["Traceback (most recent call last):", " File \"/usr/bin/subunit2html\", line 10, in <module>", " sys.exit(main())", " File \"/usr/lib/python2.7/site-packages/os_testr/subunit2html.py\", line 740, in main", " stream = open(subunit_file, 'rb')", "IOError: [Errno 2] No such file or directory: '/home/zuul/tempest/tempest.html'"]}

I think we need to seperate the html generation part. otherwise it is very hard to go through logs and find issue.

Tags: alert ci
summary: - Fs020 periodic jobs fails to generate tempest results once tempest run
- finishes
+ Fs016/17 periodic jobs fails to generate tempest results once tempest
+ run finishes
Revision history for this message
Martin Kopec (mkopec) wrote : Re: Fs016/17/20 periodic jobs fails on validate-tempest: Unexpected response code received
summary: - Fs016/17 periodic jobs fails to generate tempest results once tempest
- run finishes
+ Fs016/17/20 periodic jobs fails on validate-tempest: Unexpected response
+ code received
Revision history for this message
Marios Andreou (marios-b) wrote :

duplicating comment from https://bugs.launchpad.net/tripleo/+bug/1796040 just now since it will be marked duplicate for this one:

master gate check hitting this today too http://logs.openstack.org/98/604298/20/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/037bd06/job-output.txt.gz#_2018-10-04_04_39_49_234612 2018-10-04 04:39:48.087051

primary | TASK [validate-tempest : Exit with tempest result code if configured] **********
2018-10-04 04:39:48.127135 | primary | Thursday 04 October 2018 04:39:48 +0000 (0:00:02.239) 0:04:26.387 ******
2018-10-04 04:39:49.234612 | primary | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "tail -10 tempest.log; exit 1", "delta": "0:00:00.012430", "end": "2018-10-04 04:39:49.184002", "msg": "non-zero return code", "rc": 1, "start": "2018-10-04 04:39:49.171572", "stderr": "", "stderr_lines": [], "stdout": "2018-10-04 04:39:18 | - Skipped: 0\n2018-10-04 04:39:18 | - Expected Fail: 0\n2018-10-04 04:39:18 | - Unexpected Success: 0\n2018-10-04 04:39:18 | - Failed: 1\n2018-10-04 04:39:18 | Sum of execute time for each test: 23.6150 sec.\n2018-10-04 04:39:18 | \n2018-10-04 04:39:18 | ==============\n2018-10-04 04:39:18 | Worker Balance\n2018-10-04 04:39:18 | ==============\n2018-10-04 04:39:18 | - Worker 0 (1 tests) => 0:00:23.614971", "stdout_lines": ["2018-10-04 04:39:18 | - Skipped: 0", "2018-10-04 04:39:18 | - Expected Fail: 0", "2018-10-04 04:39:18 | - Unexpected Success: 0", "2018-10-04 04:39:18 | - Failed: 1", "2018-10-04 04:39:18 | Sum of execute time for each test: 23.6150 sec.", "2018-10-04 04:39:18 | ", "2018-10-04 04:39:18 | ==============", "2018-10-04 04:39:18 | Worker Balance", "2018-10-04 04:39:18 | ==============", "2018-10-04 04:39:18 | - Worker 0 (1 tests) => 0:00:23.614971"]}

Revision history for this message
Martin Kopec (mkopec) wrote :
summary: - Fs016/17/20 periodic jobs fails on validate-tempest: Unexpected response
- code received
+ Jobs fails on validate-tempest: Unexpected response code received
tags: added: alert
tags: added: ci
Revision history for this message
Marios Andreou (marios-b) wrote : Re: Jobs fails on validate-tempest: Unexpected response code received
Revision history for this message
Marios Andreou (marios-b) wrote :

arx thinks this will fix it 11:52 #oooq: < arxcruz> marios|rover: https://review.openstack.org/605419 should fix it

summary: - Jobs fails on validate-tempest: Unexpected response code received
+ multinode-oooq-container scenarios and Fs016/17/20 periodic jobs fails
+ on validate-tempest: Unexpected response code received
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Download full text (4.0 KiB)

http://logs.openstack.org/81/602681/3/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/16ad793/logs/subnode-2/var/log/containers/nova/nova-compute.log.txt.gz#_2018-10-04_06_46_11_197

2018-10-04 06:46:11.197 8 ERROR nova.compute.manager [req-acf88fb6-a57f-42e6-a5fe-4ed401bf778e - - - - -] Error updating resources for node centos-7-ovh-gra1-0002589272.localdomain.: ResourceProviderCreationFailed: Failed to create resource provider centos-7-ovh-gra1-0002589272.localdomain
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager Traceback (most recent call last):
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7753, in _update_available_resource_for_node
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager rt.update_available_resource(context, nodename, startup=startup)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 725, in update_available_resource
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager return f(*args, **kwargs)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 802, in _update_available_resource
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager self._update(context, cn, startup=startup)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager return attempt.get(self._wrap_exception)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2])
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 977, in _update
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup)
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 916, in _update_to_placement
2018-10-04 06:46:11.197 8 ERROR nova.compute.manager context, compute_node.uuid, name=compute_node.hypervisor_hostname)
2018-10-04 06:46:11.197 8 ERROR nova.compu...

Read more...

Revision history for this message
Marios Andreou (marios-b) wrote :
Download full text (6.8 KiB)

thanks weshay for pointer here are the relevant failing tests - there are two different traces afaics - but both traces have a "No valid host was found. ', u'code': 500, u'created': u'2018-10-04T04:41:14" thanks arxcruz via irc relevant error from nova: http://logs.openstack.org/81/602681/3/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/16ad793/logs/subnode-2/var/log/extra/errors.txt.gz#_2018-10-04_06_36_10_504 and here http://logs.openstack.org/81/602681/3/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/16ad793/logs/subnode-2/var/log/containers/nova/nova-compute.log.txt.gz#_2018-10-04_06_45_08_354

so this looks to be a nova error.

Tempest traces fyi/fwiw 2 types:

2018-10-04 04:50:16 | {0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops [28.607288s] ... FAILED

  2018-10-04 04:41:18 | {0} tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern [29.796705s] ... FAILED
  2018-10-04 04:41:18 |
  2018-10-04 04:41:18 | Captured traceback:
  2018-10-04 04:41:18 | ~~~~~~~~~~~~~~~~~~~
  2018-10-04 04:41:18 | Traceback (most recent call last):
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
  2018-10-04 04:41:18 | return f(*func_args, **func_kwargs)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/scenario/test_volume_boot_pattern.py", line 104, in test_volume_boot_pattern
  2018-10-04 04:41:18 | security_group=security_group)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/scenario/test_volume_boot_pattern.py", line 64, in _boot_instance_from_resource
  2018-10-04 04:41:18 | return self.create_server(image_id='', **create_kwargs)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 214, in create_server
  2018-10-04 04:41:18 | image_id=image_id, **kwargs)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/common/compute.py", line 258, in create_test_server
  2018-10-04 04:41:18 | server['id'])
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  2018-10-04 04:41:18 | self.force_reraise()
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2018-10-04 04:41:18 | six.reraise(self.type_, self.value, self.tb)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/common/compute.py", line 229, in create_test_server
  2018-10-04 04:41:18 | clients.servers_client, server['id'], wait_until)
  2018-10-04 04:41:18 | File "/usr/lib/python2.7/site-packages/tempest/common/waiters.py", line 76, in wait_for_server_status
  2018-10-04 04:41:18 | server_id=server_id)
  2018-10-04 04:41:18 | tempest.exceptions.BuildErrorException: Server 967aa07e-804d-4405-bda0-322d87b25b18 failed to build and is in ERROR status
  2018-10-04 04:41:18 | Details: {u'message': u'No valid host was found. ', u'code': 500, u'created': u'2...

Read more...

Changed in tripleo:
assignee: nobody → Juan Antonio Osorio Robles (juan-osorio-robles)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/607987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/607952
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f2e72352b1376ce719614e9cad4e4c71a3f9c3d8
Submitter: Zuul
Branch: master

commit f2e72352b1376ce719614e9cad4e4c71a3f9c3d8
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Thu Oct 4 15:52:40 2018 +0300

    Fix placement region setting

    We were using a deprecated interfce to set this value. This uses the
    correct one.

    Closes-Bug: #1793665
    Change-Id: Ib7717911aba3267f855ac6682b0144bfe92034fb

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.0.0

This issue was fixed in the openstack/tripleo-heat-templates 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master)

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/607987

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers