storing multiple snapshots causing significant memory consumption

Bug #1412961 reported by Patrick Crews on 2015-01-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Undecided
Unassigned

Bug Description

Running randomized image creation tests via nova and heat shows significant memory consumption tied to the number of snapshots.

The post-workload value is with ~30 snapshots in the system.

Not sure if this is due to my devstack install / configuration, but this seemed worth writing up for additional scrutiny.

KiB Mem: 37066812 total, 6883732 used, 30183080 free, 244800 buffers
KiB Mem: 37066812 total, 35717836 used, 1348976 free, 29860 buffers <- post workload

Tests:
# Create a vm / heat stack / set of vm's
# repeatedly snapshot them and monitor memory usage as the number of stored images increases.

# My tests:
clone https://github.com/pcrews/rannsaka
cd rannsaka/rannsaka
# expecting at least one stack to snapshot / does not create it in the test
python rannsaka.py --host=http://192.168.0.5 --requests 500 -w 2 --test-file=locust_files/heat_basic_stress.py

Have tried many test variations with varying numbers of servers / stacks, but memory consumption appears tied to the total number of images stored in the system:

[2015-02-10 15:09:34,861] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 1 || 30216604 K free memory
[2015-02-10 15:09:49,724] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 1 || 26717180 K free memory
[2015-02-10 15:09:59,242] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 1 || 27261024 K free memory
[2015-02-10 15:12:35,072] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 5 || 23324884 K free memory
[2015-02-10 15:12:38,312] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 5 || 22932740 K free memory
[2015-02-10 15:12:41,728] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 5 || 22703904 K free memory
[2015-02-10 15:13:00,152] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 5 || 20729056 K free memory
[2015-02-10 15:15:38,137] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 10 || 16715384 K free memory
[2015-02-10 15:15:44,306] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 10 || 16581624 K free memory
[2015-02-10 15:16:05,172] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 10 || 15414400 K free memory
[2015-02-10 15:16:11,367] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 10 || 14560096 K free memory
[2015-02-10 15:19:17,026] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 15 || 10001072 K free memory
[2015-02-10 15:19:25,819] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 15 || 8794308 K free memory
[2015-02-10 15:20:59,007] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 17 || 7489788 K free memory
[2015-02-10 15:21:17,554] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 17 || 5997816 K free memory

NOTE: comments 2-5 were when I thought this was a heat-bug and are not particularly relevant.

Patrick Crews (patrick-crews) wrote :
Download full text (4.2 KiB)

the engine logs have a large number of these messages:
/usr/lib/python2.7/dist-packages/amqp/channel.py:608: DeprecationWarning: auto_delete exchanges has been deprecated
  'auto_delete exchanges has been deprecated'))

2015-01-20 09:43:21.418 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.5:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.
2015-01-20 09:43:21.419 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 104] Connection reset by peer
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 440, in _ensured
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit return fun(*args, **kwargs)
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 512, in __call__
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit return fun(*args, channel=channels[0], **kwargs), channels[0]
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 695, in _consume
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit return self.connection.drain_events(timeout=1)
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 279, in drain_events
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs)
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 90, in drain_events
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs)
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 303, in drain_events
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit return amqp_method(channel, args)
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 506, in _close
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit self._x_close_ok()
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 534, in _x_close_ok
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit self._send_method((10, 51))
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 62, in _send_method
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, method_sig, args, content,
2015-01-20 09:43:21.419 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 227, in writ...

Read more...

Patrick Crews (patrick-crews) wrote :
Download full text (3.9 KiB)

Further testing has shown that this can occur with just a single heat user performing snapshot / restore operations.
Also tested with *just* create_snapshot operations and memory usage still appears significant, though less than when restore operations are also used in testing.

top - 17:09:20 up 2:26, 2 users, load average: 0.13, 0.23, 0.22
Tasks: 371 total, 3 running, 368 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.1 us, 0.3 sy, 0.0 ni, 98.0 id, 0.5 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 37066820 total, 20059448 used, 17007372 free, 192664 buffers
KiB Swap: 37738492 total, 68 used, 37738424 free. 13992420 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24980 libvirt+ 20 0 3917968 989.2m 9424 S 0.0 2.7 0:19.24 qemu-system-x86
 2179 rabbitmq 20 0 2381912 172444 2516 S 1.3 0.5 2:22.75 beam.smp
10769 stack 20 0 339112 143740 3896 S 0.0 0.4 0:12.13 nova-api
10766 stack 20 0 339260 143668 3872 S 0.0 0.4 0:12.47 nova-api
 1475 root 20 0 422924 142980 1184 S 0.0 0.4 0:01.05 zfs-fuse
10767 stack 20 0 338632 142916 3896 S 0.0 0.4 0:11.58 nova-api
10768 stack 20 0 337696 142312 3880 S 0.0 0.4 0:10.94 nova-api
 8310 mysql 20 0 4165360 117316 7964 S 0.0 0.3 1:03.81 mysqld
10918 stack 20 0 276152 88792 3756 S 0.0 0.2 0:22.74 nova-conductor
10915 stack 20 0 275072 87812 3752 S 0.0 0.2 0:23.10 nova-conductor
10749 stack 20 0 190252 87764 5536 S 1.3 0.2 1:10.21 nova-api
10917 stack 20 0 274476 87384 3748 S 0.0 0.2 0:23.05 nova-conductor...

Read more...

Patrick Crews (patrick-crews) wrote :

NOTE: Have also tested snapshot + create/delete server operations against nova directly and have not been able to reproduce this - appears tied to heat snapshot ops.

Thomas Herve (therve) wrote :

Heat is just using nova APIs. It's unclear what's the problem here.

Changed in heat:
status: New → Incomplete
Patrick Crews (patrick-crews) wrote :

As noted in a previous comment, tests directly against nova's snapshot api could not reproduce this behavior. It is only when using heat's snapshot api that this kind of resource consumption is observed.

Any recommendations for additional information to help pinpoint the problem would be most welcome.

Patrick Crews (patrick-crews) wrote :

This appears to be tied to the number of snapshots.
Have only tested with a single stack, but memory consumption increases as the number of stack snapshots increases.
Will run further tests to determine if this is a snapshots-per-stack issue or a total-number-of-snapshots thing as well as further delving into what exactly is happening.

As previously noted, any recommendations or requests for additional information are most welcome:

[2015-02-10 15:09:34,861] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 1 || 30216604 K free memory
[2015-02-10 15:09:49,724] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 1 || 26717180 K free memory
[2015-02-10 15:09:59,242] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 1 || 27261024 K free memory
[2015-02-10 15:12:35,072] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 5 || 23324884 K free memory
[2015-02-10 15:12:38,312] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 5 || 22932740 K free memory
[2015-02-10 15:12:41,728] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 5 || 22703904 K free memory
[2015-02-10 15:13:00,152] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 5 || 20729056 K free memory
[2015-02-10 15:15:38,137] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 10 || 16715384 K free memory
[2015-02-10 15:15:44,306] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 10 || 16581624 K free memory
[2015-02-10 15:16:05,172] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 10 || 15414400 K free memory
[2015-02-10 15:16:11,367] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 10 || 14560096 K free memory
[2015-02-10 15:19:17,026] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 15 || 10001072 K free memory
[2015-02-10 15:19:25,819] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 15 || 8794308 K free memory
[2015-02-10 15:20:59,007] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 17 || 7489788 K free memory
[2015-02-10 15:21:17,554] testbox/INFO/stdout: 0x7f9c6f819f90: SNAP_COUNT: 17 || 5997816 K free memory

This appears to be total snapshots (at least for a single tenant_id / user).
Tested with 2 stacks, viewed approx. same memory consumption pattern as total (across both stacks) number of stacks matched up with the previous test:

[2015-02-10 17:17:20,593] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-1 || SNAP_COUNT: 6 || 14828760 K free memory
[2015-02-10 17:17:51,097] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-0 || SNAP_COUNT: 6 || 13532360 K free memory
[2015-02-10 17:18:00,956] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-0 || SNAP_COUNT: 6 || 13135784 K free memory
[2015-02-10 17:18:10,719] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-1 || SNAP_COUNT: 7 || 12037352 K free memory
[2015-02-10 17:18:24,877] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-0 || SNAP_COUNT: 6 || 11025924 K free memory
[2015-02-10 17:18:51,377] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-0 || SNAP_COUNT: 7 || 10931172 K free memory
[2015-02-10 17:19:06,170] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-1 || SNAP_COUNT: 8 || 11024068 K free memory
[2015-02-10 17:19:17,746] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-0 || SNAP_COUNT: 7 || 8998664 K free memory
[2015-02-10 17:19:18,608] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-0 || SNAP_COUNT: 7 || 8324752 K free memory
[2015-02-10 17:19:33,749] testbox/INFO/stdout: 0x7f66847ebe50: STACK: test-stack-0 || SNAP_COUNT: 7 || 7679832 K free memory

summary: - heat snapshot and restore operations causing significant memory
- consumption
+ storing multiple heat snapshots causing significant memory consumption
Changed in heat:
status: Incomplete → New
Patrick Crews (patrick-crews) wrote :

Have updated bug description to better reflect what is happening.
Have reset the bug to New now that more information has been obtained about what is happening.

affects: heat → glance
summary: - storing multiple heat snapshots causing significant memory consumption
+ storing multiple snapshots causing significant memory consumption
description: updated
Erno Kuvaja (jokke) wrote :

Hi Patrick,

Any indication that this would be glance issue? By the details provided and specially your comment #5 does not suggest so.

Any details welcome how did you pinpointed the issue to Glance.

Changed in glance:
status: New → Incomplete
Patrick Crews (patrick-crews) wrote :

Comment #5 does not hold true - tests of just storing nova image snapshots produced the same memory consumption - ~30+GB at 15+ snapshots.

Changed in glance:
status: Incomplete → New
Changed in glance:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers