2022-08-31 08:56:51 |
norman shen |
description |
We are observing that neutron-dhcp-agent's state is deviating from "real state", by saying real state, I mean
all hosted dnsmasq are running and configured. For example, agent A is hosting 1,000 networks, if I reboot agent A then all dnsmasq processes are gone, and dhcp agent will try to reboot every dnsmasq, this will introduce a long delay between agent start and agent handles new rabbitmq messages. But weirdly, openstack network agent list will show that the agent is up and running which IMO is inconsistent. |
We are observing that neutron-dhcp-agent's state is deviating from "real state", by saying real state, I mean all hosted dnsmasq are running and configured.
For example, agent A is hosting 1,000 networks, if I reboot agent A then all dnsmasq processes are gone, and dhcp agent will try to reboot every dnsmasq, this will introduce a long delay between agent start and agent handles new rabbitmq messages. But weirdly, openstack network agent list will show that the agent is up and running which IMO is inconsistent. I think under this situation, openstack network agent list should report the corresponding agent to be down. |
|
2022-09-01 00:26:19 |
norman shen |
description |
We are observing that neutron-dhcp-agent's state is deviating from "real state", by saying real state, I mean all hosted dnsmasq are running and configured.
For example, agent A is hosting 1,000 networks, if I reboot agent A then all dnsmasq processes are gone, and dhcp agent will try to reboot every dnsmasq, this will introduce a long delay between agent start and agent handles new rabbitmq messages. But weirdly, openstack network agent list will show that the agent is up and running which IMO is inconsistent. I think under this situation, openstack network agent list should report the corresponding agent to be down. |
We have a situation where there are 4 servers which all of them could be seen as network and compute nodes. And the hosts are running in the same rack, to make things worse the power supply is not very stable which means occasionally all physical servers could be cut off of power supply at the same time. After reboot, we found that virtual machine (especially for centos series) could lost IP because when virtual machine reboots, it may not waiting for DHCP agents to be ready.
We are observing that neutron-dhcp-agent's state is deviating from "real state", by saying real state, I mean all hosted dnsmasq are running and configured.
For example, agent A is hosting 1,000 networks, if I reboot agent A then all dnsmasq processes are gone, and dhcp agent will try to reboot every dnsmasq, this will introduce a long delay between agent start and agent handles new rabbitmq messages. But weirdly, openstack network agent list will show that the agent is up and running which IMO is inconsistent. I think under this situation, openstack network agent list should report the corresponding agent to be down. |
|