Reduce memory usage (better defaults and CI config)

Bug #1566755 reported by Steven Hardy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

We keep running into RAM usage limitations in CI, but there's some services using a ton of memory that we don't even need, such as ceilometer in the undercloud:

[root@instack ~]# ps -eo size,pid,user,command --sort -size | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }' | head -n10
         0.00 Mb COMMAND
      1523.11 Mb /usr/bin/python2 /usr/bin/ceilometer-agent-notification --logfile /var/log/ceilometer/agent-notification.log
      1435.20 Mb /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/var/lib/mysql/mysql.sock --port=3306
      1225.60 Mb /usr/lib64/erlang/erts-5.10.4/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../ebin -noshell -noinput -s rabbit boot -sname rabbit@instack -boot start_sasl -config /etc/rabbitmq/rabbitmq -kernel inet_default_connect_options [{nodelay,true}] -rabbit tcp_listeners [{"192.0.2.1",5672}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/<email address hidden>"} -rabbit sasl_error_logger {file,"/<email address hidden>"} -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@instack-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@instack" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672
      1137.90 Mb /usr/bin/python2 /usr/bin/swift-object-replicator /etc/swift/object-server.conf
      1025.62 Mb /opt/sensu/embedded/bin/ruby /opt/sensu/bin/sensu-client -b -c /etc/sensu/config.json -d /etc/sensu/conf.d -e /etc/sensu/extensions -p /var/run/sensu/sensu-client.pid -l /var/log/sensu/sensu-client.log -L info
       620.76 Mb /usr/bin/python2 /usr/bin/ceilometer-collector --logfile /var/log/ceilometer/collector.log
       375.66 Mb keystone-admin -DFOREGROUND
       375.66 Mb keystone-main -DFOREGROUND
       297.25 Mb /usr/bin/python -Es /usr/sbin/tuned -l -P

There's probably scope for investigating the high memory usage of rabbit and mysql also, and we should consider if we need every service (e.g such as ceilometer) actually running for every overcloud job.

Interestingly (on this just started up idle undercloud), heat isn't the main offender - further testing to be peformed after some deployments ;)

Revision history for this message
Steven Hardy (shardy) wrote :

Note the above script is using ps which turns out to provide misleading results, top also shows at least mysql and rabbit as eating a lot of memory, but further analysis required

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in tripleo:
milestone: none → pike-2
importance: Undecided → High
status: New → Triaged
tags: added: deployment-time
Changed in tripleo:
milestone: pike-2 → pike-3
Changed in tripleo:
milestone: pike-3 → pike-rc1
Revision history for this message
Ben Nemec (bnemec) wrote :

I'm re-targeting this to queens. I don't think we want to be making big changes to the ci configuration at this point in the pike cycle, and for the instances where we have to (job timeouts mostly) there are separate bugs open to track them individually.

Changed in tripleo:
milestone: pike-rc1 → queens-1
Revision history for this message
Alex Schultz (alex-schultz) wrote :

I'm going to close this as a bug because we provide a low-memory environment file that is used in CI. https://github.com/openstack/tripleo-heat-templates/blob/master/environments/low-memory-usage.yaml

I agree that this should be a thing but it needs to be more actionable for bug reports. It might be better to convert this to a blueprint so that we can expand on a better way to have this be configurable or detect low memory environments automatically. If there are specific items we can address then it would be a good idea to raise a new targeted bug.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.