dhcp-agent performance issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Kevin Benton |
Bug Description
Single controller[48 GB RAM, 16vCPU, 120GB Disk]
1 Network Node
100 ESX hypervisors distributed in 10 nova-compute nodes
100 KVM hypervisors
Test:
1. Create /16 network
2. Template to spawn 500 instances on network created in step 1
Observation:
1. Tried spinning 500 instances on KVM and 500 instances on ESX hypervisor using CirrOS qemu & vmdk image respectively
2. Stack creation went through fine, but randomly few VM's missed dhcp IP on first boot
3. After forcing sudo udhcpc from guest OS, VM's was able to receive IP
4. Tried tcpdump on network dhcp port on network node, I could able to see only dhcp request coming from guest VM but there was no dhcp offer or ack from dhcp agent.
Assume this issue is due to scale test on this setup.
Please let me know if you need any particular logs or details to debug this issue, I can work and re-create issue and collect logs.
Logs:
Network node:
stack@network:
commit 171351d5fb55f96
Merge: 0a5a3cf 8f15d72
Author: Jenkins <email address hidden>
Date: Fri Feb 19 07:25:52 2016 +0000
Merge "Adopt Grafana to plot Neutron Failure Rates"
stack@network:
stack@network:
DISTRIB_ID=Ubuntu
DISTRIB_
DISTRIB_
DISTRIB_
stack@network:
stack@network:
ii isc-dhcp-client 4.2.4-7ubuntu12.3 amd64 ISC DHCP client
ii isc-dhcp-common 4.2.4-7ubuntu12.3 amd64 common files used by all the isc-dhcp* packages
stack@network:
nobody 32359 0.0 0.0 28212 2324 ? S Feb19 0:00 dnsmasq --no-hosts --no-resolv --strict-order --except-
=== sshd host keys ===
-----BEGIN SSH HOST KEY KEYS-----
ssh-rsa AAAAB3NzaC1yc2E
ssh-dss AAAAB3NzaC1kc3M
-----END SSH HOST KEY KEYS-----
=== network info ===
if-info: lo,up,127.
if-info: eth0,up,
=== datasource: configdrive local ===
instance-id: 11a83228-
name: kvm-218-admin-3
availability-zone: nova
local-hostname: kvm-218-
launch-index: 0
=== cirros: current=0.3.4 uptime=191.83 ===
route: fscanf
=== pinging gateway failed, debugging connection ===
############ debug start ##############
### /etc/init.d/sshd start
Starting dropbear sshd: OK
route: fscanf
### ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:EC:77:DC
inet6 addr: fe80::f816:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:84 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:3840 (3.7 KiB) TX bytes:1150 (1.1 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:1020 (1020.0 B) TX bytes:1020 (1020.0 B)
### route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
route: fscanf
### cat /etc/resolv.conf
cat: can't open '/etc/resolv.conf': No such file or directory
### gateway not found
/sbin/cirros-
### pinging nameservers
### uname -a
Linux kvm-218-admin-3 3.2.0-80-virtual #116-Ubuntu SMP Mon Mar 23 17:28:52 UTC 2015 x86_64 GNU/Linux
### lsmod
Module Size Used by Not tainted
nls_iso8859_1 12713 0
nls_cp437 16991 0
vfat 17585 0
fat 61512 1 vfat
isofs 40259 0
ip_tables 27473 0
x_tables 29891 1 ip_tables
pcnet32 42119 0
8139cp 27360 0
ne2k_pci 13691 0
8390 18856 1 ne2k_pci
e1000 108589 0
acpiphp 24231 0
### dmesg | tail
[ 2.581084] EXT3-fs (vda): mounted filesystem with ordered data mode
[ 2.746639] ISO 9660 Extensions: Microsoft Joliet Level 3
[ 2.790563] ISO 9660 Extensions: RRIP_1991A
[ 2.955492] kjournald starting. Commit interval 5 seconds
[ 2.973249] EXT3-fs (vda): using internal journal
[ 2.973330] EXT3-fs (vda): mounted filesystem with ordered data mode
[ 5.386057] EXT3-fs (vda): using internal journal
[ 7.067755] ISO 9660 Extensions: Microsoft Joliet Level 3
[ 7.151524] ISO 9660 Extensions: RRIP_1991A
[ 19.008702] eth0: no IPv6 routers present
### tail -n 25 /var/log/messages
Feb 3 11:31:14 cirros kern.info kernel: [ 2.267494] acpiphp: Slot [25] registered
Feb 3 11:31:14 cirros kern.info kernel: [ 2.267701] acpiphp: Slot [26] registered
Feb 3 11:31:14 cirros kern.info kernel: [ 2.267899] acpiphp: Slot [27] registered
Feb 3 11:31:14 cirros kern.info kernel: [ 2.268298] acpiphp: Slot [28] registered
Feb 3 11:31:14 cirros kern.info kernel: [ 2.268528] acpiphp: Slot [29] registered
Feb 3 11:31:14 cirros kern.info kernel: [ 2.268721] acpiphp: Slot [30] registered
Feb 3 11:31:14 cirros kern.info kernel: [ 2.268907] acpiphp: Slot [31] registered
Feb 3 11:31:14 cirros kern.info kernel: [ 2.302428] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
Feb 3 11:31:14 cirros kern.info kernel: [ 2.302460] e1000: Copyright (c) 1999-2006 Intel Corporation.
Feb 3 11:31:14 cirros kern.info kernel: [ 2.325543] ne2k-pci.c:v1.03 9/22/2003 D. Becker/P. Gortmaker
Feb 3 11:31:14 cirros kern.info kernel: [ 2.348426] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
Feb 3 11:31:14 cirros kern.info kernel: [ 2.375341] pcnet32: pcnet32.c:v1.35 21.Apr.2008 <email address hidden>
Feb 3 11:31:14 cirros kern.info kernel: [ 2.400792] ip_tables: (C) 2000-2006 Netfilter Core Team
Feb 3 11:31:14 cirros kern.info kernel: [ 2.579114] kjournald starting. Commit interval 5 seconds
Feb 3 11:31:14 cirros kern.info kernel: [ 2.581084] EXT3-fs (vda): mounted filesystem with ordered data mode
Feb 3 11:31:14 cirros kern.debug kernel: [ 2.746639] ISO 9660 Extensions: Microsoft Joliet Level 3
Feb 3 11:31:14 cirros kern.debug kernel: [ 2.790563] ISO 9660 Extensions: RRIP_1991A
Feb 3 11:31:14 cirros kern.info kernel: [ 2.955492] kjournald starting. Commit interval 5 seconds
Feb 3 11:31:14 cirros kern.info kernel: [ 2.973249] EXT3-fs (vda): using internal journal
Feb 3 11:31:14 cirros kern.info kernel: [ 2.973330] EXT3-fs (vda): mounted filesystem with ordered data mode
Feb 3 11:31:14 cirros kern.info kernel: [ 5.386057] EXT3-fs (vda): using internal journal
Feb 3 11:31:15 cirros kern.debug kernel: [ 7.067755] ISO 9660 Extensions: Microsoft Joliet Level 3
Feb 3 11:31:15 cirros kern.debug kernel: [ 7.151524] ISO 9660 Extensions: RRIP_1991A
Feb 3 11:31:27 cirros kern.debug kernel: [ 19.008702] eth0: no IPv6 routers present
Feb 3 11:34:19 cirros authpriv.info dropbear[261]: Running in background
############ debug end ##############
____ ____ ____
/ __/ __ ____ ____ / __ \/ __/
/ /__ / // __// __// /_/ /\ \
\___//_//_/ /_/ \____/___/
http://
login as 'cirros' user. default password: 'cubswin:)'. use 'sudo' for root.
kvm-218-admin-3 login:
tags: | added: neutron-proactive-backport-potential |
tags: | removed: neutron-proactive-backport-potential |
can you check system logs for dnsmasq messages? logs should show pretty much all requests/responses.
You'll be able to see if it is performance issue or something else. Also note process ids on those messages.
Right now dnsmasq gets restarted with each VM being spawned after its config files are renewed.
That might be the reason.