So I poked at the environment Sagi gave me and this seems to be indeed a neutron-l3-ha thing. Brent Eagles and John Eckersberg did most of the work looking at the env.
You can ping the floating ip from inside the proper namespace on the node with the active l3 router:
[root@overcloud-controller-2 neutron]# ip netns exec qdhcp-760bbb48-b24b-4d6e-8ac8-db477de6019a ping 10.0.0.102
PING 10.0.0.102 (10.0.0.102) 56(84) bytes of data.
64 bytes from 10.0.0.102: icmp_seq=1 ttl=63 time=4.11 ms
64 bytes from 10.0.0.102: icmp_seq=2 ttl=63 time=0.986 ms
But you can't from outside.
Another thing worth noting is how much ceilometer and swift seem to hammer the controllers in general. They seem to monopolize the CPU quite a bit. Namely,ceilometer generates more notifications than it consumes:
the rate of incoming notifications is 4 and the outgoing notification rate is 2 (meaning we get 2 times more incoming messages than we are able to process)
The unconsumed notifications in rabbit are about ~190k:
sqlite> select name, messages from queues order by messages desc limit 10;
notifications.info|187502
And swift seems to generate a huge number of connections to swift-container:
A huge number of connections created towards port 6001 (swift-container):
[root@overcloud-controller-0 ceilometer]# ss -antp dport = 6001 | wc -l
5748
And also in terms of syscalls swift-container generates two orders of magnitude more syscalls than any other process on the system:
[root@overcloud-controller-0 ~]# stap -v syscalls_by_pid.stp
Collecting data... Type Ctrl-C to exit and display results
#SysCalls PID
1038113 105149
12662 105160
[root@overcloud-controller-0 ~]# ps -f 105149
UID PID PPID C STIME TTY STAT TIME CMD
swift 105149 1 27 May14 ? Rs 3720:13 /usr/bin/python2 /usr/bin/swift-container-replicator /etc/swift/container-server.conf
While I don't think ceilometer/swift hogging the cpu are the main culprit we will need to investigate this sometime.
So I poked at the environment Sagi gave me and this seems to be indeed a neutron-l3-ha thing. Brent Eagles and John Eckersberg did most of the work looking at the env.
You can ping the floating ip from inside the proper namespace on the node with the active l3 router: -controller- 2 neutron]# ip netns exec qdhcp-760bbb48- b24b-4d6e- 8ac8-db477de601 9a ping 10.0.0.102
[root@overcloud
PING 10.0.0.102 (10.0.0.102) 56(84) bytes of data.
64 bytes from 10.0.0.102: icmp_seq=1 ttl=63 time=4.11 ms
64 bytes from 10.0.0.102: icmp_seq=2 ttl=63 time=0.986 ms
But you can't from outside.
Another thing worth noting is how much ceilometer and swift seem to hammer the controllers in general. They seem to monopolize the CPU quite a bit. Namely,ceilometer generates more notifications than it consumes:
the rate of incoming notifications is 4 and the outgoing notification rate is 2 (meaning we get 2 times more incoming messages than we are able to process)
The unconsumed notifications in rabbit are about ~190k: info|187502
sqlite> select name, messages from queues order by messages desc limit 10;
notifications.
And swift seems to generate a huge number of connections to swift-container: -controller- 0 ceilometer]# ss -antp dport = 6001 | wc -l
A huge number of connections created towards port 6001 (swift-container):
[root@overcloud
5748
And also in terms of syscalls swift-container generates two orders of magnitude more syscalls than any other process on the system:
[root@overcloud -controller- 0 ~]# stap -v syscalls_by_pid.stp
Collecting data... Type Ctrl-C to exit and display results
#SysCalls PID
1038113 105149
12662 105160
[root@overcloud -controller- 0 ~]# ps -f 105149 swift-container -replicator /etc/swift/ container- server. conf
UID PID PPID C STIME TTY STAT TIME CMD
swift 105149 1 27 May14 ? Rs 3720:13 /usr/bin/python2 /usr/bin/
While I don't think ceilometer/swift hogging the cpu are the main culprit we will need to investigate this sometime.