storing multiple snapshots causing significant memory consumption
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Glance |
Undecided
|
Unassigned |
Bug Description
Running randomized image creation tests via nova and heat shows significant memory consumption tied to the number of snapshots.
The post-workload value is with ~30 snapshots in the system.
Not sure if this is due to my devstack install / configuration, but this seemed worth writing up for additional scrutiny.
KiB Mem: 37066812 total, 6883732 used, 30183080 free, 244800 buffers
KiB Mem: 37066812 total, 35717836 used, 1348976 free, 29860 buffers <- post workload
Tests:
# Create a vm / heat stack / set of vm's
# repeatedly snapshot them and monitor memory usage as the number of stored images increases.
# My tests:
clone https:/
cd rannsaka/rannsaka
# expecting at least one stack to snapshot / does not create it in the test
python rannsaka.py --host=http://
Have tried many test variations with varying numbers of servers / stacks, but memory consumption appears tied to the total number of images stored in the system:
[2015-02-10 15:09:34,861] testbox/
[2015-02-10 15:09:49,724] testbox/
[2015-02-10 15:09:59,242] testbox/
[2015-02-10 15:12:35,072] testbox/
[2015-02-10 15:12:38,312] testbox/
[2015-02-10 15:12:41,728] testbox/
[2015-02-10 15:13:00,152] testbox/
[2015-02-10 15:15:38,137] testbox/
[2015-02-10 15:15:44,306] testbox/
[2015-02-10 15:16:05,172] testbox/
[2015-02-10 15:16:11,367] testbox/
[2015-02-10 15:19:17,026] testbox/
[2015-02-10 15:19:25,819] testbox/
[2015-02-10 15:20:59,007] testbox/
[2015-02-10 15:21:17,554] testbox/
NOTE: comments 2-5 were when I thought this was a heat-bug and are not particularly relevant.
Patrick Crews (patrick-crews) wrote : | #1 |
Patrick Crews (patrick-crews) wrote : | #2 |
Further testing has shown that this can occur with just a single heat user performing snapshot / restore operations.
Also tested with *just* create_snapshot operations and memory usage still appears significant, though less than when restore operations are also used in testing.
top - 17:09:20 up 2:26, 2 users, load average: 0.13, 0.23, 0.22
Tasks: 371 total, 3 running, 368 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.1 us, 0.3 sy, 0.0 ni, 98.0 id, 0.5 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 37066820 total, 20059448 used, 17007372 free, 192664 buffers
KiB Swap: 37738492 total, 68 used, 37738424 free. 13992420 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24980 libvirt+ 20 0 3917968 989.2m 9424 S 0.0 2.7 0:19.24 qemu-system-x86
2179 rabbitmq 20 0 2381912 172444 2516 S 1.3 0.5 2:22.75 beam.smp
10769 stack 20 0 339112 143740 3896 S 0.0 0.4 0:12.13 nova-api
10766 stack 20 0 339260 143668 3872 S 0.0 0.4 0:12.47 nova-api
1475 root 20 0 422924 142980 1184 S 0.0 0.4 0:01.05 zfs-fuse
10767 stack 20 0 338632 142916 3896 S 0.0 0.4 0:11.58 nova-api
10768 stack 20 0 337696 142312 3880 S 0.0 0.4 0:10.94 nova-api
8310 mysql 20 0 4165360 117316 7964 S 0.0 0.3 1:03.81 mysqld
10918 stack 20 0 276152 88792 3756 S 0.0 0.2 0:22.74 nova-conductor
10915 stack 20 0 275072 87812 3752 S 0.0 0.2 0:23.10 nova-conductor
10749 stack 20 0 190252 87764 5536 S 1.3 0.2 1:10.21 nova-api
10917 stack 20 0 274476 87384 3748 S 0.0 0.2 0:23.05 nova-conductor...
Patrick Crews (patrick-crews) wrote : | #3 |
NOTE: Have also tested snapshot + create/delete server operations against nova directly and have not been able to reproduce this - appears tied to heat snapshot ops.
Thomas Herve (therve) wrote : | #4 |
Heat is just using nova APIs. It's unclear what's the problem here.
Changed in heat: | |
status: | New → Incomplete |
Patrick Crews (patrick-crews) wrote : | #5 |
As noted in a previous comment, tests directly against nova's snapshot api could not reproduce this behavior. It is only when using heat's snapshot api that this kind of resource consumption is observed.
Any recommendations for additional information to help pinpoint the problem would be most welcome.
Patrick Crews (patrick-crews) wrote : | #6 |
This appears to be tied to the number of snapshots.
Have only tested with a single stack, but memory consumption increases as the number of stack snapshots increases.
Will run further tests to determine if this is a snapshots-per-stack issue or a total-number-
As previously noted, any recommendations or requests for additional information are most welcome:
[2015-02-10 15:09:34,861] testbox/
[2015-02-10 15:09:49,724] testbox/
[2015-02-10 15:09:59,242] testbox/
[2015-02-10 15:12:35,072] testbox/
[2015-02-10 15:12:38,312] testbox/
[2015-02-10 15:12:41,728] testbox/
[2015-02-10 15:13:00,152] testbox/
[2015-02-10 15:15:38,137] testbox/
[2015-02-10 15:15:44,306] testbox/
[2015-02-10 15:16:05,172] testbox/
[2015-02-10 15:16:11,367] testbox/
[2015-02-10 15:19:17,026] testbox/
[2015-02-10 15:19:25,819] testbox/
[2015-02-10 15:20:59,007] testbox/
[2015-02-10 15:21:17,554] testbox/
Patrick Crews (patrick-crews) wrote : Re: storing multiple heat snapshots causing significant memory consumption | #7 |
This appears to be total snapshots (at least for a single tenant_id / user).
Tested with 2 stacks, viewed approx. same memory consumption pattern as total (across both stacks) number of stacks matched up with the previous test:
[2015-02-10 17:17:20,593] testbox/
[2015-02-10 17:17:51,097] testbox/
[2015-02-10 17:18:00,956] testbox/
[2015-02-10 17:18:10,719] testbox/
[2015-02-10 17:18:24,877] testbox/
[2015-02-10 17:18:51,377] testbox/
[2015-02-10 17:19:06,170] testbox/
[2015-02-10 17:19:17,746] testbox/
[2015-02-10 17:19:18,608] testbox/
[2015-02-10 17:19:33,749] testbox/
summary: |
- heat snapshot and restore operations causing significant memory - consumption + storing multiple heat snapshots causing significant memory consumption |
Changed in heat: | |
status: | Incomplete → New |
Patrick Crews (patrick-crews) wrote : | #8 |
Have updated bug description to better reflect what is happening.
Have reset the bug to New now that more information has been obtained about what is happening.
affects: | heat → glance |
summary: |
- storing multiple heat snapshots causing significant memory consumption + storing multiple snapshots causing significant memory consumption |
description: | updated |
Erno Kuvaja (jokke) wrote : | #9 |
Hi Patrick,
Any indication that this would be glance issue? By the details provided and specially your comment #5 does not suggest so.
Any details welcome how did you pinpointed the issue to Glance.
Changed in glance: | |
status: | New → Incomplete |
Patrick Crews (patrick-crews) wrote : | #10 |
Comment #5 does not hold true - tests of just storing nova image snapshots produced the same memory consumption - ~30+GB at 15+ snapshots.
Changed in glance: | |
status: | Incomplete → New |
Changed in glance: | |
status: | New → Incomplete |
the engine logs have a large number of these messages: python2. 7/dist- packages/ amqp/channel. py:608: DeprecationWarning: auto_delete exchanges has been deprecated
/usr/lib/
'auto_delete exchanges has been deprecated'))
2015-01-20 09:43:21.418 ERROR oslo.messaging. _drivers. impl_rabbit [-] AMQP server on 192.168.0.5:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. _drivers. impl_rabbit [-] Failed to consume message from queue: [Errno 104] Connection reset by peer _drivers. impl_rabbit Traceback (most recent call last): _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ kombu/connectio n.py", line 440, in _ensured _drivers. impl_rabbit return fun(*args, **kwargs) _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ kombu/connectio n.py", line 512, in __call__ _drivers. impl_rabbit return fun(*args, channel= channels[ 0], **kwargs), channels[0] _drivers. impl_rabbit File "/usr/local/ lib/python2. 7/dist- packages/ oslo/messaging/ _drivers/ impl_rabbit. py", line 695, in _consume _drivers. impl_rabbit return self.connection .drain_ events( timeout= 1) _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ kombu/connectio n.py", line 279, in drain_events _drivers. impl_rabbit return self.transport. drain_events( self.connection , **kwargs) _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ kombu/transport /pyamqp. py", line 90, in drain_events _drivers. impl_rabbit return connection. drain_events( **kwargs) _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ amqp/connection .py", line 303, in drain_events _drivers. impl_rabbit return amqp_method( channel, args) _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ amqp/connection .py", line 506, in _close _drivers. impl_rabbit self._x_close_ok() _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ amqp/connection .py", line 534, in _x_close_ok _drivers. impl_rabbit self._send_ method( (10, 51)) _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ amqp/abstract_ channel. py", line 62, in _send_method _drivers. impl_rabbit self.channel_id, method_sig, args, content, _drivers. impl_rabbit File "/usr/lib/ python2. 7/dist- packages/ amqp/method_ framing. py", line 227, in writ...
2015-01-20 09:43:21.419 ERROR oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.
2015-01-20 09:43:21.419 TRACE oslo.messaging.