Comment 6 for bug 1680195

Revision history for this message
Michele Baldessari (michele) wrote :

So I poked at the environment Sagi gave me and this seems to be indeed a neutron-l3-ha thing. Brent Eagles and John Eckersberg did most of the work looking at the env.

You can ping the floating ip from inside the proper namespace on the node with the active l3 router:
[root@overcloud-controller-2 neutron]# ip netns exec qdhcp-760bbb48-b24b-4d6e-8ac8-db477de6019a ping 10.0.0.102
PING 10.0.0.102 (10.0.0.102) 56(84) bytes of data.
64 bytes from 10.0.0.102: icmp_seq=1 ttl=63 time=4.11 ms
64 bytes from 10.0.0.102: icmp_seq=2 ttl=63 time=0.986 ms

But you can't from outside.

Another thing worth noting is how much ceilometer and swift seem to hammer the controllers in general. They seem to monopolize the CPU quite a bit. Namely,ceilometer generates more notifications than it consumes:
the rate of incoming notifications is 4 and the outgoing notification rate is 2 (meaning we get 2 times more incoming messages than we are able to process)

The unconsumed notifications in rabbit are about ~190k:
sqlite> select name, messages from queues order by messages desc limit 10;
notifications.info|187502

And swift seems to generate a huge number of connections to swift-container:
A huge number of connections created towards port 6001 (swift-container):
[root@overcloud-controller-0 ceilometer]# ss -antp dport = 6001 | wc -l
5748

And also in terms of syscalls swift-container generates two orders of magnitude more syscalls than any other process on the system:

[root@overcloud-controller-0 ~]# stap -v syscalls_by_pid.stp
Collecting data... Type Ctrl-C to exit and display results
#SysCalls PID
1038113 105149
12662 105160

[root@overcloud-controller-0 ~]# ps -f 105149
UID PID PPID C STIME TTY STAT TIME CMD
swift 105149 1 27 May14 ? Rs 3720:13 /usr/bin/python2 /usr/bin/swift-container-replicator /etc/swift/container-server.conf

While I don't think ceilometer/swift hogging the cpu are the main culprit we will need to investigate this sometime.