Too many metadata agents on compute node in DVR

Bug #1575724 reported by Ilya Shakhat
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Fix Committed
High
Sergey Kolekonov
7.0.x
Won't Fix
High
MOS Maintenance
8.0.x
Won't Fix
High
MOS Maintenance
9.x
Fix Released
High
Sergey Kolekonov

Bug Description

Detailed bug description:
  There are N (N=cores) metadata-agent processes on compute node. Each process holds connection to RabbitMQ, resulting in N*computes of total connections (almost 10k at 200 nodes with 48 cores)

Steps to reproduce:
  Deploy MOS with Neutron DVR. Boot instances at the compute node (see comment #3 for reasoning)

Expected results:
  I'd expect less number of metadata agent process since having so many of them looks like overkill.

Actual result:
  One process per core.

Reproducibility:
 just deploy

Workaround:
 no

Impact:
  too many connections to RabbitMQ in large scale

Description of the environment:
- Operation system: Ubuntu
- Versions of components: MOS 9.0
- Reference architecture: MOS classic, Neutron DVR
- Network model: VxLAN
- Related projects installed: none

Revision history for this message
Ilya Shakhat (shakhat) wrote :
Download full text (8.5 KiB)

root@node-2:~# lsof -i 4 | grep 5673 | grep neutron-m
neutron-m 48784 neutron 10u IPv4 189433 0t0 TCP messaging-node-2.domain.tld:58859->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48784 neutron 13u IPv4 13134201 0t0 TCP messaging-node-2.domain.tld:33659->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48795 neutron 10u IPv4 12755678 0t0 TCP messaging-node-2.domain.tld:33796->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48797 neutron 10u IPv4 13024039 0t0 TCP messaging-node-2.domain.tld:48403->messaging-node-138.domain.tld:5673 (ESTABLISHED)
neutron-m 48797 neutron 12u IPv4 1328637 0t0 TCP messaging-node-2.domain.tld:59400->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48798 neutron 10u IPv4 12610008 0t0 TCP messaging-node-2.domain.tld:51428->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48798 neutron 12u IPv4 1315489 0t0 TCP messaging-node-2.domain.tld:59454->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48799 neutron 10u IPv4 13024033 0t0 TCP messaging-node-2.domain.tld:33744->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48800 neutron 10u IPv4 12607908 0t0 TCP messaging-node-2.domain.tld:51398->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48800 neutron 12u IPv4 1791292 0t0 TCP messaging-node-2.domain.tld:39521->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48801 neutron 10u IPv4 12904059 0t0 TCP messaging-node-2.domain.tld:48406->messaging-node-138.domain.tld:5673 (ESTABLISHED)
neutron-m 48802 neutron 11u IPv4 3129010 0t0 TCP messaging-node-2.domain.tld:59729->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48802 neutron 12u IPv4 3126438 0t0 TCP messaging-node-2.domain.tld:59730->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48803 neutron 10u IPv4 13118538 0t0 TCP messaging-node-2.domain.tld:48412->messaging-node-138.domain.tld:5673 (ESTABLISHED)
neutron-m 48803 neutron 12u IPv4 1724330 0t0 TCP messaging-node-2.domain.tld:39476->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48805 neutron 10u IPv4 12904111 0t0 TCP messaging-node-2.domain.tld:40052->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48805 neutron 11u IPv4 1763638 0t0 TCP messaging-node-2.domain.tld:39466->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48805 neutron 12u IPv4 1790597 0t0 TCP messaging-node-2.domain.tld:39467->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48807 neutron 11u IPv4 949184 0t0 TCP messaging-node-2.domain.tld:59391->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48807 neutron 12u IPv4 1332763 0t0 TCP messaging-node-2.domain.tld:59392->messaging-node-140.domain.tld:5673 (ESTABLISHED)
neutron-m 48808 neutron 10u IPv4 12904058 0t0 TCP messaging-node-2.domain.tld:48404->messaging-node-138.domain.tld:5673 (ESTABLISHED)
neutron-m 48808 neutron 12u IPv4 2038553 0t0 TCP messaging-node-2.domain.tld:43495->messa...

Read more...

Revision history for this message
Ilya Shakhat (shakhat) wrote :
Download full text (12.4 KiB)

root@node-2:~# ps aux | grep metadata-agent
neutron 48784 0.0 0.0 190480 50016 ? Ss Apr25 1:25 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48795 0.0 0.0 190716 46780 ? S Apr25 0:41 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48796 0.0 0.0 188936 44176 ? S Apr25 0:00 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48797 0.0 0.0 191456 47372 ? S Apr25 0:43 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48798 0.0 0.0 190968 46832 ? S Apr25 0:42 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48799 0.0 0.0 190748 46640 ? S Apr25 0:40 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48800 0.0 0.0 190968 46856 ? S Apr25 0:43 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48801 0.0 0.0 191192 47280 ? S Apr25 0:41 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48802 0.0 0.0 190460 46440 ? S Apr25 0:41 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48803 0.0 0.0 191184 47284 ? S Apr25 0:43 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48804 0.0 0.0 188936 44132 ? S Apr25 0:00 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48805 0.0 0.0 190972 46872 ? S Apr25 0:43 /usr/bin/python2.7 /usr/bin/neutron-metadata-agent --config-file=/etc/neutron/metadata_agent.ini --log-file=/var/log/neutron/neutron-metadata-agent.log --config-file=/etc/neutron/neutron.conf
neutron 48806 0.0 0.0 188936 44132 ? ...

Changed in mos:
milestone: none → 9.0
assignee: nobody → MOS Neutron (mos-neutron)
Revision history for this message
Ilya Shakhat (shakhat) wrote :

Child processes open connections only after receiving request from instance. The connection stays open forever. Large number of instances causes all metadata agents to have a single connection to RabbitMQ.

2016-04-25 12:19:15.039 48784 DEBUG oslo_service.service [req-e457cbd0-8ae0-4735-a6a0-7675b04af2bc - - - - -] Started child 48840 _start_child /usr/lib/python2.7/dist-packages/oslo_service/service.py:475
2016-04-25 12:19:15.041 48840 DEBUG neutron.callbacks.manager [-] Notify callbacks for process, after_create _notify_loop /usr/lib/python2.7/dist-packages/neutron/callbacks/manager.py:138
2016-04-25 12:19:15.042 48840 INFO eventlet.wsgi.server [-] (48840) wsgi starting up on http:/var/lib/neutron/metadata_proxy
2016-04-25 17:03:46.379 48840 DEBUG eventlet.wsgi.server [-] (48840) accepted '' server /usr/lib/python2.7/dist-packages/eventlet/wsgi.py:867
2016-04-25 17:03:46.380 48840 DEBUG neutron.agent.metadata.agent [-] Request: GET /openstack/2013-10-17/vendor_data.json HTTP/1.0
2016-04-25 17:03:46.381 48840 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "('neutron.agent.metadata.agent.MetadataProxyHandler._get_router_networks', '42d52119-d1aa-43d0-b82f-8a425a04df97')" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:212
2016-04-25 17:03:46.381 48840 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "('neutron.agent.metadata.agent.MetadataProxyHandler._get_router_networks', '42d52119-d1aa-43d0-b82f-8a425a04df97')" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:225
2016-04-25 17:03:46.382 48840 DEBUG oslo.messaging._drivers.pool [-] Pool creating new connection create /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/pool.py:109
2016-04-25 17:03:46.387 48840 DEBUG oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.0.69:5673 __init__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py:539
2016-04-25 17:03:46.396 48840 DEBUG oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.0.69:5673 via [amqp] client __init__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py:566

Ilya Shakhat (shakhat)
tags: added: scale
Ilya Shakhat (shakhat)
description: updated
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

We need to limit this, obviously.

Changed in mos:
importance: Undecided → High
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

By the way, we need to limit it on controllers as well in DVR mode.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Need to fix in 9.0 release.

Revision history for this message
Sergey Kolekonov (skolekonov) wrote :
Revision history for this message
Sergey Kolekonov (skolekonov) wrote :
Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

both patches are merged

Dina Belova (dbelova)
tags: added: area-nova
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Setting as Won't Fix for 7.0 and 8.0, as we don't change the design of already shipped products. Also there is no clearly described user impact in the bug, nobody analyzed the outcome of having "too many connections".

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.