Unexpected API Error on large scale: MessagingTimeout

Bug #1598156 reported by Andrew Kalach
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Andrew Kalach
9.x
Fix Released
High
Andrew Kalach

Bug Description

Description
===========
During 9.0RC2 Certification Testing Sutie run, the following tasks:

  CinderVolumes.create_and_attach_volume
  NovaServers.boot_and_live_migrate_server

failed with error:

Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. <class 'oslo_messaging.exceptions.MessagingTimeout'> (HTTP 500) (Request-ID: req-4bffec57-e890-486d-b197-2edf5aa19e97)

Steps to reproduce
==================
Install Rally on masternode and run specified rally tasks (see attachment):

1. cd /opt/stack
2. source .venv/bin/activate
3. rally task start <task YAML file>

Expected result
===============
Task run without errors

Actual result
=============
The following error logs:

2016-06-27T14:15:36.672815+00:00 err: 2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] Unexpected exception in API method
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions Traceback (most recent call last):
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/extensions.py", line 478, in wrapped
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions return f(*args, **kwargs)
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 73, in wrapper
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions return func(*args, **kwargs)
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/volumes.py", line 316, in create
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions volume_id, device)
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 170, in wrapped
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions return func(self, context, target, *args, **kwargs)
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 159, in inner
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions return function(self, context, instance, *args, **kwargs)
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 140, in inner
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions return f(self, context, instance, *args, **kw)
2016-06-27 14:15:36.673 21518 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packag
2016-06-27T14:15:36.676579+00:00 info: 2016-06-27 14:15:36.681 21518 INFO nova.api.openstack.wsgi [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.exceptions.MessagingTimeout'>

Environment
===========
Scale 200-node lab (ENV-10):
  3 controllers
  199 compute nodes
  20 ceph nodes
  DVR enabled
  XVLAN
Image RC2 build:
  fuel-9.0-mos-495-2016-06-16_18-18-00.iso

Diagnostic snapshot:
  http://mos-scale-share.mirantis.com/fuel-snapshot-2016-06-28_13-13-50.tar.gz

Revision history for this message
Andrew Kalach (akndex) wrote :
Revision history for this message
Andrew Kalach (akndex) wrote :
Revision history for this message
Andrew Kalach (akndex) wrote :
description: updated
Dina Belova (dbelova)
tags: added: area-nova
Changed in mos:
assignee: nobody → MOS Nova (mos-nova)
status: New → Confirmed
milestone: none → 9.1
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Download full text (3.7 KiB)

I see that nova-compute fails to reply to an RPC call:

fuel-snapshot-2016-06-28_13-13-50/node-184/var/log/nova/nova-compute.log.1:2016-06-27 14:15:36.465 25594 DEBUG oslo_messaging._drivers.amqpdriver [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] The reply ec15e0ead09e46569f1145672d2ecd56 cannot be sent reply_d5de362462464ed39adce35fe468a538 reply queue don't exist, retrying... reply /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:130
fuel-snapshot-2016-06-28_13-13-50/node-184/var/log/nova/nova-compute.log.1:2016-06-27 14:15:36.716 25594 DEBUG oslo_messaging._drivers.amqpdriver [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] REPLY msg_id: ec15e0ead09e46569f1145672d2ecd56 size: 789 reply queue: reply_d5de362462464ed39adce35fe468a538 time elapsed: 60.044688078s _send_reply /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:96
fuel-snapshot-2016-06-28_13-13-50/node-184/var/log/nova/nova-compute.log.1:2016-06-27 14:15:36.719 25594 DEBUG oslo_messaging._drivers.amqpdriver [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] The reply ec15e0ead09e46569f1145672d2ecd56 cannot be sent reply_d5de362462464ed39adce35fe468a538 reply queue don't exist, retrying... reply /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:130
fuel-snapshot-2016-06-28_13-13-50/node-184/var/log/nova/nova-compute.log.1:2016-06-27 14:15:36.970 25594 DEBUG oslo_messaging._drivers.amqpdriver [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] REPLY msg_id: ec15e0ead09e46569f1145672d2ecd56 size: 789 reply queue: reply_d5de362462464ed39adce35fe468a538 time elapsed: 60.298709969s _send_reply /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:96
fuel-snapshot-2016-06-28_13-13-50/node-184/var/log/nova/nova-compute.log.1:2016-06-27 14:15:36.973 25594 WARNING oslo_messaging._drivers.amqpdriver [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] reply_d5de362462464ed39adce35fe468a538 doesn't exists, drop reply to ec15e0ead09e46569f1145672d2ecd56
fuel-snapshot-2016-06-28_13-13-50/node-184/var/log/nova/nova-compute.log.1:2016-06-27 14:15:36.974 25594 INFO oslo_messaging._drivers.amqpdriver [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] The reply ec15e0ead09e46569f1145672d2ecd56 cannot be sent reply_d5de362462464ed39adce35fe468a538 reply queue don't exist after 60 sec abandoning...

which eventually times out on the nova-api side:

2016-06-27T14:15:36.676579+00:00 info: 2016-06-27 14:15:36.681 21518 INFO nova.api.openstack.wsgi [req-4bffec57-e890-486d-b197-2edf5aa19e97 eb078fc45f524ec3a33276a9ebc4a307 82306fa61eae4bb99a308d72c919afe8 - - -] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.exceptions.Messaging...

Read more...

Changed in mos:
assignee: MOS Nova (mos-nova) → MOS Oslo (mos-oslo)
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Andrew, please reproduce the problem and attach more logs. Namely, we need:
 * /var/log/rabbitmq/* logs from all controllers
 * /var/log/pacemaker.log* from all controllers
 * /var/log/remote/node-X.domain.tld/lrmd.log from master node, where node-X is hostname of each controllers

Also, we need all the logs you can find for the _client side_. For example, in the bug nova-api sent a request to nova-compute and it timed out. It would help if we had all the nova-api.log files from the nova-api where MessageTimeout occurred, not just nova-api.log.6.gz.

It might be even better if you give me access to an env where issue occurred, it would help in case I have forgotten to mention some log file.

Changed in mos:
status: Confirmed → Incomplete
assignee: MOS Oslo (mos-oslo) → Andrew Kalach (akndex)
Revision history for this message
Andrew Kalach (akndex) wrote :

*UPDATE*

The same error appeared during scale testing on MOS9.1

Full logs (diagnostic snapshot) canbe found at:

  http://mos-scale-share.mirantis.com/fuel-snapshot-2016-08-17_13-25-23.tar.gz

Controller logs (nova, rabbitmq, pacemaker, etc):

  http://mos-scale-share.mirantis.com/controller-logs-160818-e10.tar.gz

We tested on scale lab ENV-10
(contact me via email or slack for lab access details)

Andrew Kalach (akndex)
Changed in mos:
status: Incomplete → Confirmed
tags: added: area-scale
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.