Comment 8 for bug 1362863

Revision history for this message
Sam Morrison (sorrison) wrote :

Configs in nova and cinder:
rabbit_hosts="mq:5671,mq1:5671,mq2:5671,mq3:5671"
rabbit_ha_queues=True
rabbit_userid=nova
rabbit_password=XXXXXXX
rabbit_durable_queues=True
rabbit_use_ssl=True

Some related package versions:

apt-cache policy librabbitmq1
librabbitmq1:
  Installed: 0.4.1-1
  Candidate: 0.4.1-1

apt-cache policy python-amqp
python-amqp:
  Installed: 1.3.3-1ubuntu1
  Candidate: 1.3.3-1ubuntu1

apt-cache policy python-amqplib
python-amqplib:
  Installed: 1.0.2-1
  Candidate: 1.0.2-1

apt-cache policy python-kombu
python-kombu:
  Installed: 3.0.7-1ubuntu1
  Candidate: 3.0.7-1ubuntu1

The reply_xxxx queue in RabbitMQ that has unacked messages does have a consumer. I trace that back to the offending API host using the port number and see that it's a nova-api process that has an established connection to rabbit. Then I've been using strace on the process to see what it's doing. All it has is:

gettimeofday({1409785170, 966963}, NULL) = 0
gettimeofday({1409785170, 967066}, NULL) = 0
epoll_wait(6, {{EPOLLIN, {u32=7, u64=39432335262744583}}}, 1023, 482100) = 1
epoll_ctl(6, EPOLL_CTL_DEL, 7, {EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|0xe0a1800, {u32=32681, u64=22396489217114025}}) = 0
accept(7, 0x7fffdf911350, [16]) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(6, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=7, u64=39432335262744583}}) = 0
gettimeofday({1409785188, 431971}, NULL) = 0
gettimeofday({1409785188, 432035}, NULL) = 0
epoll_wait(6, {{EPOLLIN, {u32=7, u64=39432335262744583}}}, 1023, 464636) = 1
epoll_ctl(6, EPOLL_CTL_DEL, 7, {EPOLLWRNORM|EPOLLERR|EPOLLONESHOT|0xe0a1800, {u32=32681, u64=22396489217114025}}) = 0
accept(7, 0x7fffdf911350, [16]) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(6, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=7, u64=39432335262744583}}) = 0

The API host isn't under any load etc. either nor are the rabbit and DB hosts.

This is happening for us multiple times a day so it would be easy for me to get more information. I just don't know how to debug further.