LXC container network slow because of kernel debug mesg

Bug #1785365 reported by Satish Patel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Unassigned

Bug Description

This is one of bizarre issue i have ever seen, Stay with me in this issue, I had pike and i upgraded to Queens and i found my horizon was freaking slow! it was painful slow, I talked to Kevin (cloudnull) and James about this and we had pretty good discussion how to troubleshoot this issue, I have tired every single thing to find out why my horizon GUI is slow but i was failed to pin point issue.. i didn't find any single error in any logs everything was looking good, After frustration i destroy all container and started to re-deploy and i found lots of error about repo mirror timeout and you know what i found...!!

Doing some curl test to find out speed

Physical Infra host: (it took less than 1 second)

[root@ostack-infra-03 ~]# curl http://mirror.cc.columbia.edu/pub/linux/centos/7.5.1804/updates/x86_64/repodata/0d7e660988dcc434ec5dec72067655f9b0ef44e6164d3fb85bda2bd1b09534db-primary.sqlite.bz2 -o /tmp/foo
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 4409k 100 4409k 0 0 16.2M 0 --:--:-- --:--:-- --:--:-- 16.1M

One of container on host: (it took 1 minute 23 second)

[root@ostack-infra-03 ~]# lxc-attach -n ostack-infra-03_neutron_server_container-daea5dc3
[root@ostack-infra-03-neutron-server-container-daea5dc3 ~]# curl http://mirror.cc.columbia.edu/pub/linux/centos/7.5.1804/updates/x86_64/repodata/0d7e660988dcc434ec5dec72067655f9b0ef44e6164d3fb85bda2bd1b09534db-primary.sqlite.bz2 -o /tmp/foo
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 4409k 100 4409k 0 0 53882 0 0:01:23 0:01:23 --:--:-- 92945

Now adjust kernel logging level

echo "3 4 1 3" > /proc/sys/kernel/printk

Same test on same container: (it took less than 1second)

[root@ostack-infra-03 ~]# lxc-attach -n ostack-infra-03_neutron_server_container-daea5dc3
[root@ostack-infra-03-neutron-server-container-daea5dc3 ~]# curl http://mirror.cc.columbia.edu/pub/linux/centos/7.5.1804/updates/x86_64/repodata/0d7e660988dcc434ec5dec72067655f9b0ef44e6164d3fb85bda2bd1b09534db-primary.sqlite.bz2 -o /tmp/foo
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 4409k 100 4409k 0 0 11.9M 0 --:--:-- --:--:-- --:--:-- 11.9M

I am using CentOS 7.5 and look like this is what causing my Horizon slowness, Can someone explain me what was going on here and if this is right way then please make this setting default..

I was few inch far to switch my deployment from OSA to triplo because of frustration :(

Revision history for this message
Kevin Carter (kevin-carter) wrote :

Can you set the kernel log level in the grub, update grub and then reboot to see if the same problem is happening? I'd have to really dig into this but I don't we define that parameter or set the kernel log level anywhere in OSA. See https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt (specifically "loglevel").

Revision history for this message
Satish Patel (satish-txt) wrote :

Kevin,

I will try your test but in CentOS 7.5 latest kernel is 3.10.x (its old believe me) and i am seeing following logs in dmesg

http://paste.openstack.org/show/727336/

Look like OSAD configuring rsyslog incorrectly, it seem its using ubuntu style syslog not REDHAT style

[root@ostack-infra-01 ~]# cat /etc/rsyslog.d/50-default.conf
auth,authpriv.* /var/log/auth.log
*.*;local7,auth,authpriv,cron,daemon,mail,news.none -/var/log/syslog
cron.* /var/log/cron.log
daemon.* -/var/log/daemon.log
kern.* -/var/log/kern.log
lpr.* -/var/log/lpr.log
mail.* -/var/log/mail.log
user.* -/var/log/user.log

mail.err /var/log/mail.err

news.crit /var/log/news/news.crit
news.err /var/log/news/news.err
news.notice -/var/log/news/news.notice

*.emerg :omusrmsg:*

Revision history for this message
Satish Patel (satish-txt) wrote :

Also i found something here which is possible related to my issue

https://bugzilla.kernel.org/show_bug.cgi?id=82471

https://github.com/coreos/bugs/issues/158

Revision history for this message
Satish Patel (satish-txt) wrote :

Later i found iptables checksum-fill for port 80 rule was causing all issue.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

Please have a look at https://review.openstack.org/#/c/589463/ for a gate fix, and an explanation on how to fix things for your deployment.

Changed in openstack-ansible:
status: New → Fix Committed
Mohammed Naser (mnaser)
Changed in openstack-ansible:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.