heat-cfntools on Centos7 are about 5 times slower than Centos6

Bug #1386666 reported by David Hill
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Expired
Medium
Unassigned

Bug Description

Hi guys,

     heat-cfntools on Centos7 are about 5 times slower than on Centos6. I've looked at an instance
being spawned and it seems that the slow part is everyting related to rpm manipulation.
For instance, listing available packages seems to take forever within the tools and if I run the
exact same command, it's instantaneous. I`m still investigating the issue but if anybody else
encoutered this issue, perhaps there's already a fix for it somewhere.

Dave

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I`m still trying to understand... is it possible that systemd throttle cloud-init?

Revision history for this message
David Hill (david-hill-ubisoft) wrote :
tags: added: icehouse
Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Log:
WARNING [2014-10-28 13:00:54,994] Unable to retrieve remote metadata : No credentials!
WARNING [2014-10-28 13:00:54,994] Unable to open local metadata : /var/cache/heat-cfntools/last_metadata
WARNING [2014-10-28 13:00:54,994] Unable to open local metadata : /var/cache/heat-cfntools/last_metadata
INFO [2014-10-28 13:00:54,994] Opened local metadata /var/lib/heat-cfntools/cfn-init-data
WARNING [2014-10-28 13:00:54,995] Unable to open local metadata : /var/cache/heat-cfntools/last_metadata
WARNING [2014-10-28 13:00:54,995] Unable to open local metadata : /var/cache/heat-cfntools/last_metadata
DEBUG [2014-10-28 13:00:54,995] Running command: rpm -q httpd
DEBUG [2014-10-28 13:01:20,211] Return code of 1 after executing: '['su', 'root', '-c', u'rpm -q httpd']'
stdout: 'package httpd is not installed
'
stderr: ''
DEBUG [2014-10-28 13:01:20,211] Running command: yum -y --showduplicates list available httpd
DEBUG [2014-10-28 13:01:53,062] Running command: rpm -q wordpress
DEBUG [2014-10-28 13:02:18,096] Return code of 1 after executing: '['su', 'root', '-c', u'rpm -q wordpress']'
stdout: 'package wordpress is not installed
'
stderr: ''
DEBUG [2014-10-28 13:02:18,096] Running command: yum -y --showduplicates list available wordpress
INFO [2014-10-28 13:02:43,383] Installing packages: yum -y install httpd wordpress
DEBUG [2014-10-28 13:02:43,384] Running command: yum -y install httpd wordpress
INFO [2014-10-28 13:03:38,743] Enabling service httpd
DEBUG [2014-10-28 13:03:38,743] Running command: /bin/systemctl enable httpd.service
DEBUG [2014-10-28 13:04:03,843] Running command: /bin/systemctl status httpd.service
DEBUG [2014-10-28 13:04:28,895] Return code of 3 after executing: '['su', 'root', '-c', u'/bin/systemctl status httpd.service']'
stdout: 'httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled)
   Active: inactive (dead)

'
stderr: ''
INFO [2014-10-28 13:04:28,895] Starting service httpd
DEBUG [2014-10-28 13:04:28,895] Running command: /bin/systemctl start httpd.service

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

It appears to be a "slow su" issue...

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Do you see the same delays when you run these commands manually on a server which hasn't had the cfn-init run on it?

Changed in heat-cfntools:
status: New → Incomplete
Revision history for this message
David Hill (david-hill-ubisoft) wrote :

No I don't see the same behavior:
https://bugzilla.redhat.com/show_bug.cgi?id=1158052

[root@localhost ~]# time su -c "/bin/systemctl status httpd.service"
httpd.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

real 0m0.023s
user 0m0.003s
sys 0m0.011s

Also, the hostname doesn't change....

Changed in heat-cfntools:
status: Incomplete → New
Changed in heat-cfntools:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Could you please try the following? In your server resource set the following properties:

      user_data_format: RAW
      user_data: |
        #!/bin/sh
        # script running various yum and systemctl commands with and without su and printing the timings

If this demonstrates the slowness then this rules out heat-cfntools and we can assign this bug to its appropriate component - maybe cloud-init.

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I don't understand your example... it will do nothing except if I add stuff?

Like "time yum check-update"
         "time rpm -qa"

I know that my VM not using heat-cfntools spawns as fast as Centos 6.5 ... so that's why I'm suspecting heat-cfntools and if you look at the logs, you clearly see that every "su" command are slow ... and always takes around 26~ seconds to execute.

RPM = slow
DEBUG [2014-10-28 13:01:53,062] Running command: rpm -q wordpress
DEBUG [2014-10-28 13:02:18,096] Return code of 1 after executing: '['su', 'root', '-c', u'rpm -q wordpress']'

yum = slow
DEBUG [2014-10-28 13:02:18,096] Running command: yum -y --showduplicates list available wordpress
INFO [2014-10-28 13:02:43,383] Installing packages: yum -y install httpd

systemctl = slow
DEBUG [2014-10-28 13:04:03,843] Running command: /bin/systemctl status httpd.service
 DEBUG [2014-10-28 13:04:28,895] Return code of 3 after executing: '['su', 'root', '-c', u'/bin/systemctl status httpd.service']'
 stdout: 'httpd.service - The Apache HTTP Server
    Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled)
    Active: inactive (dead)

and

 INFO [2014-10-28 13:03:38,743] Enabling service httpd
 DEBUG [2014-10-28 13:03:38,743] Running command: /bin/systemctl enable httpd.service
 DEBUG [2014-10-28 13:04:03,843] Running command: /bin/systemctl status httpd.service

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I'm wondering, since the hostname doesn't change with heat-cfntools, if there's no entry in /etc/hosts for localhost, could su be slower?

Revision history for this message
Steve Baker (steve-stevebaker) wrote : Re: [Bug 1386666] Re: heat-cfntools on Centos7 are about 5 times slower than Centos6

On 30/10/14 11:46, David Hill wrote:
> I don't understand your example... it will do nothing except if I add
> stuff?
>
> Like "time yum check-update"
> "time rpm -qa"
>
> I know that my VM not using heat-cfntools spawns as fast as Centos 6.5
> ... so that's why I'm suspecting heat-cfntools and if you look at the
> logs, you clearly see that every "su" command are slow ... and always
> takes around 26~ seconds to execute.
>
> RPM = slow
> DEBUG [2014-10-28 13:01:53,062] Running command: rpm -q wordpress
> DEBUG [2014-10-28 13:02:18,096] Return code of 1 after executing: '['su', 'root', '-c', u'rpm -q wordpress']'
>
> yum = slow
> DEBUG [2014-10-28 13:02:18,096] Running command: yum -y --showduplicates list available wordpress
> INFO [2014-10-28 13:02:43,383] Installing packages: yum -y install httpd
>
> systemctl = slow
> DEBUG [2014-10-28 13:04:03,843] Running command: /bin/systemctl status httpd.service
> DEBUG [2014-10-28 13:04:28,895] Return code of 3 after executing: '['su', 'root', '-c', u'/bin/systemctl status httpd.service']'
> stdout: 'httpd.service - The Apache HTTP Server
> Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled)
> Active: inactive (dead)
>
> and
>
> INFO [2014-10-28 13:03:38,743] Enabling service httpd
> DEBUG [2014-10-28 13:03:38,743] Running command: /bin/systemctl enable httpd.service
> DEBUG [2014-10-28 13:04:03,843] Running command: /bin/systemctl status httpd.service
>
Yes, given the above I'm suggesting you try:

       user_data_format: RAW
       user_data: |
         #!/bin/sh
         time su root -c rpm -q wordpress
         time su root -c yum -y install httpd
         time su root -c /bin/systemctl status httpd.service
         # etc

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I'm doing the test right now and see this:
[root@localhost yum.repos.d]# ps aufxg | grep yum
root 9387 0.0 0.0 112640 964 pts/0 S+ 15:20 0:00 \_ grep --color=auto yum
root 9278 0.0 0.1 180676 2272 ? S 15:18 0:00 \_ su root -c yum -y --showduplicates list available mariadb-galera-server
root 9343 0.1 1.1 422656 22588 ? S 15:18 0:00 \_ /usr/bin/python /usr/bin/yum -y --showduplicates list available mariadb-galera-server

It takes forever ... It's been like this for 2 minutes now. I'm doing the time thing and I'll get back to you.

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Oct 30 17:05:32 localhost cloud-init: time su root -c 'rpm -q wordpress' > /tmp/allo
Oct 30 17:05:32 localhost cloud-init: real 0m25.043s
Oct 30 17:05:32 localhost cloud-init: user 0m0.007s
Oct 30 17:05:32 localhost cloud-init: sys 0m0.016s
Oct 30 17:05:32 localhost cloud-init: time su root -c 'yum -y install httpd' >> /tmp/allo
Oct 30 17:05:32 localhost cloud-init: real 0m31.398s
Oct 30 17:05:32 localhost cloud-init: user 0m1.750s
Oct 30 17:05:32 localhost cloud-init: sys 0m0.287s
Oct 30 17:05:32 localhost cloud-init: time su root -c '/bin/systemctl status httpd.service' >> /tmp/allo
Oct 30 17:05:32 localhost cloud-init: real 0m25.052s
Oct 30 17:05:32 localhost cloud-init: user 0m0.004s
Oct 30 17:05:32 localhost cloud-init: sys 0m0.010s

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Could it be due to SELinux?

AVC Report
========================================================
# date time comm subj syscall class permission obj event
========================================================
1. 10/30/2014 16:58:56 systemd-hostnam system_u:system_r:systemd_hostnamed_t:s0 2 file read system_u:object_r:file_t:s0 denied 338
2. 10/30/2014 16:59:24 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 360
3. 10/30/2014 16:59:49 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 367
4. 10/30/2014 17:00:25 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 374
5. 10/30/2014 17:00:35 sshd system_u:system_r:sshd_t:s0-s0:c0.c1023 2 file read system_u:object_r:cloud_log_t:s0 denied 387
6. 10/30/2014 17:00:35 sshd system_u:system_r:sshd_t:s0-s0:c0.c1023 2 file read system_u:object_r:cloud_log_t:s0 denied 393
7. 10/30/2014 17:00:35 sshd system_u:system_r:sshd_t:s0-s0:c0.c1023 2 file read write system_u:object_r:cloud_log_t:s0 denied 394
8. 10/30/2014 17:00:50 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 406
9. 10/30/2014 17:01:16 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 420
10. 10/30/2014 17:02:18 useradd system_u:system_r:useradd_t:s0 2 file read write system_u:object_r:cloud_log_t:s0 denied 425
11. 10/30/2014 17:02:37 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 432
12. 10/30/2014 17:03:02 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 439
13. 10/30/2014 17:03:27 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 446
14. 10/30/2014 17:04:11 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 458
15. 10/30/2014 17:04:36 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 465
16. 10/30/2014 17:05:04 useradd system_u:system_r:useradd_t:s0 2 file read write system_u:object_r:cloud_log_t:s0 denied 468
17. 10/30/2014 17:05:07 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 475

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Forget the SELinux theory... It doesn't match the logs...

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

You can now split out your user_data script into its own file and you will have a demonstration test case which only requires nova.

So instructions for demonstrating the slowness would be:

  nova boot --user-data path/to/script.sh ...
  nova console-log <server id>

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Same results.

affects: heat-cfntools → cloud-init
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

An strace of rpm -q would be interesting

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Well, since it does the same with rpm, systemctl and yum... Perhaps it would be better to strace su ?
Or perhaps remove "su" from the command line arguments?

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

sure, try with and without su. The script should be running as root already anyway.

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

There... The culprit is "su" when called from cloud-init because when I python, import os, os.system('su -c "echo test"'), it's very fast...

Here is the output of cloud-init.log:

Oct 31 18:00:52 localhost cloud-init: time su root -c 'rpm -q wordpress' > /tmp/allo
Oct 31 18:00:52 localhost cloud-init: real 0m25.038s
Oct 31 18:00:52 localhost cloud-init: user 0m0.005s
Oct 31 18:00:52 localhost cloud-init: sys 0m0.019s
Oct 31 18:00:52 localhost cloud-init: time su root -c 'yum -y install httpd' >> /tmp/allo
Oct 31 18:00:52 localhost cloud-init: real 0m31.585s
Oct 31 18:00:52 localhost cloud-init: user 0m1.723s
Oct 31 18:00:52 localhost cloud-init: sys 0m0.308s
Oct 31 18:00:52 localhost cloud-init: time su root -c '/bin/systemctl status httpd.service' >> /tmp/allo
Oct 31 18:00:52 localhost cloud-init: real 0m25.052s
Oct 31 18:00:52 localhost cloud-init: user 0m0.005s
Oct 31 18:00:52 localhost cloud-init: sys 0m0.009s
Oct 31 18:00:52 localhost cloud-init: time su root -c 'rpm -e wordpress' > /tmp/allo
Oct 31 18:00:52 localhost cloud-init: error: package wordpress is not installed
Oct 31 18:00:52 localhost cloud-init: real 0m25.031s
Oct 31 18:00:52 localhost cloud-init: user 0m0.007s
Oct 31 18:00:52 localhost cloud-init: sys 0m0.009s
Oct 31 18:00:52 localhost cloud-init: time su root -c 'yum -y remove httpd' >> /tmp/allo
Oct 31 18:00:52 localhost cloud-init: real 0m28.463s
Oct 31 18:00:52 localhost cloud-init: user 0m0.466s
Oct 31 18:00:52 localhost cloud-init: sys 0m0.081s
Oct 31 18:00:52 localhost cloud-init: time rpm -q wordpress >> /tmp/allo
Oct 31 18:00:52 localhost cloud-init: real 0m0.011s
Oct 31 18:00:52 localhost cloud-init: user 0m0.003s
Oct 31 18:00:52 localhost cloud-init: sys 0m0.004s
Oct 31 18:00:52 localhost cloud-init: time yum -y install httpd >> /tmp/allo
Oct 31 18:00:52 localhost cloud-init: real 0m6.347s
Oct 31 18:00:52 localhost cloud-init: user 0m1.378s
Oct 31 18:00:52 localhost cloud-init: sys 0m0.220s
Oct 31 18:00:52 localhost cloud-init: time /bin/systemctl status httpd.service >> /tmp/allo
Oct 31 18:00:52 localhost cloud-init: real 0m0.005s
Oct 31 18:00:52 localhost cloud-init: user 0m0.000s
Oct 31 18:00:52 localhost cloud-init: sys 0m0.004s

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Oct 31 19:52:43 localhost cloud-init: time strace -ffffffffff su root -c 'echo test' >> /tmp/allo 2>&1
Oct 31 19:52:43 localhost cloud-init: real 0m25.071s
Oct 31 19:52:43 localhost cloud-init: user 0m0.011s
Oct 31 19:52:43 localhost cloud-init: sys 0m0.025s

The issue is exactly there:
20:11:46 sendto(6, "BEGIN\r\n", 7, MSG_NOSIGNAL, NULL, 0) = 7
20:11:46 poll([{fd=6, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=6, revents=POLLOUT}])
20:11:46 sendmsg(6, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\0\1\0\0\0\0\1\0\0\0n\0\0\0\1\1o\0\25\0\0\0/org/fre"..., 128}, {"", 0}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 128
20:11:46 clock_gettime(CLOCK_MONOTONIC, {726, 288363024}) = 0
20:11:46 poll([{fd=6, events=POLLIN}], 1, 25000) = 1 ([{fd=6, revents=POLLIN}])
20:11:46 recvmsg(6, {msg_name(0)=NULL, msg_iov(1)=[{"l\2\1\1\n\0\0\0\1\0\0\0=\0\0\0\6\1s\0\5\0\0\0:1.32\0\0\0"..., 2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 260
20:11:46 recvmsg(6, 0x7fff6b6f0370, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
20:11:46 sendmsg(6, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\0\1a\0\0\0\2\0\0\0\222\0\0\0\1\1o\0\27\0\0\0/org/fre"..., 168}, {"\0\0\0\0\266%\0\0\2\0\0\0su\0\0\v\0\0\0unspecified\0"..., 97}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 265
20:11:46 clock_gettime(CLOCK_MONOTONIC, {726, 304256758}) = 0
20:11:46 poll([{fd=6, events=POLLIN}], 1, 25000) = 0 (Timeout)
20:12:11 clock_gettime(CLOCK_MONOTONIC, {751, 330416732}) = 0
20:12:11 socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 7
20:12:11 connect(7, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 0
20:12:11 sendto(7, "<83>Oct 31 20:12:11 su: pam_syst"..., 285, MSG_NOSIGNAL, NULL, 0) = 285
20:12:11 close(6) = 0
20:12:11 getuid() = 0

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I think the issue is selinux :
16. 10/31/2014 20:11:46 ? system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 0 (null) (null) (null) unset 656

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

require {
        type systemd_logind_t;
        type cloud_log_t;
        type sshd_t;
        type useradd_t;
        type systemd_hostnamed_t;
        class file { read write };
}

#============= sshd_t ==============
allow sshd_t cloud_log_t:file { read write };

#============= systemd_hostnamed_t ==============
files_read_isid_type_files(systemd_hostnamed_t)

#============= systemd_logind_t ==============
systemd_dbus_chat_logind(systemd_logind_t)

#============= useradd_t ==============
allow useradd_t cloud_log_t:file { read write };

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Well, I have a puzzling situation here... I use the same image in Grizzly and it works flawlessly.

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Could this be related to the fact that we're calling su when already root?

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Why are we calling "su" anyways? Since we're root ... shouldn't we call "sudo" at the very least?

Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.