Ceilometer

significant performance degradation when ceilometer middleware for swift proxy uses

Bug #1337761 reported by Hisashi Osanai on 2014-07-04

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	Ceilometer	Invalid	Low	Unassigned

Bug Description

[Description]
I measured swift's performance when ceilometer middleware for swift proxy used.

The result is following:
PUT : confirmed 20-50% degradation
GET : confirmed 20-80% degradation
DELETE : seems to be no problem with the data.

I think that it is very difficult to accept these degradation.
What do you think?

[Version details]
IceHouse(swift) : 1.13.1-1.el6
swift-bench : 1.1-dev

[Crystal clear details to reproduce the bug]
swift-bench

[Test environment details]
- 4 x clients CentOS 6.5 (2.93GHz/15GB) on Xen
- 1 x Proxy Node CentOS 6.5 (2.93GHz/24GB) on Xen
- 3 x Storage Nodes CentOS 6.5 (2.93GHz/12GB) on Xen

[Actual results]
Without ceilometer middleware for swift proxy
- PUT (MB/s)
ObjectSize Concurrency
1 2 4 8 16 32
4KB 0.03 0.03 0.02 0.02 0.02 0.03
16KB 0.14 0.13 0.10 0.08 0.10 0.10
64KB 0.43 0.46 0.28 0.32 0.38 0.40
256KB 1.93 1.48 0.75 1.00 1.24 1.32

- GET (MB/s)
ObjectSize Concurrency
1 2 4 8 16 32
4KB 0.19 0.20 0.16 0.07 0.07 0.09
16KB 0.87 0.89 0.27 0.38 0.31 0.37
64KB 1.76 2.04 0.77 1.09 0.95 0.95
256KB 5.65 2.66 2.27 2.39 2.29 2.98

- DELETE (/s)
ObjectSize Concurrency
1 2 4 8 16 32
4KB 16.00 8.60 5.73 7.35 8.03 9.33
16KB 12.80 7.55 6.25 9.33 7.93 9.80
64KB 16.50 7.60 7.30 6.60 7.45 9.38
256KB 9.30 9.15 8.23 5.80 7.38 9.33

With ceilometer middleware for swift proxy
- PUT (MB/s)
ObjectSize Concurrency
1 2 4 8 16 32
4KB 0.03 0.02 0.01 0.01 0.01 0.02
16KB 0.10 0.07 0.05 0.05 0.06 0.06
64KB 0.35 0.33 0.20 0.21 0.23 0.25
256KB 1.53 1.23 0.69 0.76 0.81 0.95

- GET (MB/s)
ObjectSize Concurrency
1 2 4 8 16 32
4KB 0.06 0.04 0.02 0.03 0.03 0.03
16KB 0.23 0.18 0.11 0.11 0.12 0.14
64KB 0.91 0.68 0.39 0.44 0.43 0.48
256KB 2.65 1.95 1.27 1.33 1.59 1.70

- DELETE (/s)
ObjectSize Concurrency
1 2 4 8 16 32
4KB 14.80 9.55 5.25 6.28 6.93 8.58
16KB 16.30 10.20 5.83 6.68 7.95 9.95
64KB 12.20 9.90 5.83 6.83 7.70 8.90
256KB 8.20 7.95 5.58 6.13 7.73 7.68

[Expected results]
I have no idea but less than 10% degradation would be fine for me.

I think this problem is related to the following logic.
The "publish_sample" is included in the measurement range so it is
better to remove this method (specially communication to the bus) in
the range.

- swift_middleware.py
96 def iter_response(iterable):
...
115 self.publish_sample(env,
116 input_proxy.bytes_received,
117 bytes_sent)
...
126 else:
127 return iter_response(iterable)

See original description

Revision history for this message

Chuck Thier (cthier) wrote on 2014-07-04:

Changing bug to ceilometer since it is a bug in the ceilometer middleware

affects:

swift → ceilometer

Revision history for this message

Hisashi Osanai (osanai-hisashi) wrote on 2014-07-07:

I measured swift's performance with ceilometer middleware
for proxy used but I commented out following 3 steps in
the code.

- commented out the following lines in swift_middleware.py
115 self.publish_sample(env,
116 input_proxy.bytes_received,
117 bytes_sent)

The purpose of this measurement is having a goal for improvement
of this problem.

The result of this measurement is:
the worst value will be 20% degradation for both PUT and GET.

Summary:
- PUT (% when not using ceilometer middleware for swift proxy is 100%)
ObjectSize Concurrency
1 　2 4 8 16 32
4KB 　　　80 99 99 113 114 98
16KB 82 95 86 117 94 100
64KB 110 106 135 92 92 93
256KB 71 114 　97 90 91 91

- GET (% when not using ceilometer middleware for swift proxy is 100%)
ObjectSize Concurrency
1 2 4 8 16 　　　32
4KB 　　　108 84 39* 88 168 116
16KB 107 89 70 79 101 112
64KB 66* 43* 91 87 124 113
256KB 81 　　　80 82 134 118 98

As for the (*), they seems to have some problems. (logic? environment?)

Result:
- PUT (MB/s)
ObjectSize Concurrency
1 2 　　　　　4 8 16 32
4KB 　　　0.03 0.03 0.02 0.03 0.03 0.03
16KB 0.12 0.12 0.08 0.09 0.09 0.10
64KB 0.47 0.48 0.38 0.29 0.35 0.37
256KB 1.38 1.68 0.73 0.90 1.14 1.20

- GET (MB/s)
ObjectSize Concurrency
1 2 4 8 16 32
4KB 0.21 0.16 0.06 0.06 0.11 0.11
16KB 0.93 0.79 0.19 0.30 0.32 0.42
64KB 1.16 0.87 0.71 0.95 1.17 1.08
256KB 4.58 2.14 1.87 3.22 2.69 2.91

- DELETE (/s)
ObjectSize Concurrency
1 2 4 8 16 32
4KB 14.10 11.50 7.80 7.08 7.60 9.13
16KB 11.30 8.85 5.60 8.18 9.50 9.50
64KB 17.00 6.95 6.15 10.03 9.53 10.43
256KB 14.10 7.75 7.13 7.45 9.33 9.00

Hisashi Osanai (osanai-hisashi) on 2014-07-07

description:

updated

Keisuke Yamamoto (keisuk-yamamoto) on 2014-07-08

Changed in ceilometer:
assignee:	nobody → Keisuke Yamamoto (keisuk-yamamoto)
status:	New → In Progress

gordon chung (chungg) on 2014-07-22

Changed in ceilometer:
importance:	Undecided → High

Revision history for this message

Chris Dent (cdent) wrote on 2014-07-23:

To try and learn something I had a look at the middleware code and had a few thoughts (some of which are addressed in other comments above):

* It would be good to have benchmark numbers of 5 different set ups:
    * middleware not present at all
    * middleware present in normal form
    * middleware present but not publishing any samples (but still counting)
    * middleware present, publishing artificial samples (but not counting request and response sizes)
    * middleware present, but doing a no-op (just calling the app and returning the iterator)
* My (perhaps obvious) guess is that the slowness comes from counting the size of the request and response and this is a duplication of behavior that swift is already doing within itself.

One invasive way to optimize would be to move the counting behavior into swift (where presumably it already has abstractions for the size of its inputs and outputs) and have it set response headers that the middleware could read to then publish as samples.

That crosses a boundary that presumably would not be ideal to cross, but this is for the sake of optimization.

A couple of other thoughts:

* Some requests will include a content-length header. If that's present it could be used instead of the InputProxy.
* It appears that swift/obj/sever.py sets the content-length header on responses to GET requests. If that is present it should be used.

So, basically what could happen is that headers are used if present, and if not, fall back to the more expensive way of doing things?

Revision history for this message

Chris Dent (cdent) wrote on 2014-07-24:

I've done some testing with just paying attention to the headers provided by HTTP and it works pretty well, with one limitation: If the proxied server (the internals of swift itself) has not set a content-length then no data is recorded. This is not too problematic as it is set on the important requests: getting an object from the server.

I'll be linking a potential patchset to this for discussion soon.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-24: Related fix proposed to ceilometer (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/109368

Revision history for this message

Chris Dent (cdent) wrote on 2014-07-24:

Although the code linked in the provided review is probably more correct, at small object sizes (4KB), it casual testing with swift-bench shows it doesn't have a significant impact on performance. The cost of publishing the sample is a major contributor.

At larger sizes (4MB), the relative cost of publishing is much less significant.

In the tests above is the meter database in mysql or mongodb or something else?

Revision history for this message

Chris Dent (cdent) wrote on 2014-07-25:

I've done some further exploration to try and narrow down where time is lost. The general areas are:

* Counting the size of the request and response.
* Processing samples through the publishing pipeline to a sink.
* Publishing at the chosen sink.

In other testing it's become pretty clear that the first is not a huge factor, so today I did some testing to see which of the second and third have the most impact. To do this I created:

* a "drop" publisher which does nothing with any inputs, it just passes
* a replica of the default pipeline.yaml that uses the drop publisher by default
* a tiny pipeline.yaml with one meter and one sink (rpc)
* a tiny pipeline.yaml with one meter and one sink (drop)

and then compared those settings against a swift-bench run with this conf:

[bench]
auth = http://localhost:8080/auth/v1.0
user = test:tester
key = testing
concurrency = 1
object_size = 4096
num_objects = 1000
num_gets = 1000
delete = yes
auth_version = 1.0
use_proxy = yes
num_containers = 10

I used a concurrency of 1 and a non-large object size to avoid too many other variables. I have more complete results but the results for 1000 GETS is the quickest way to show results:

limited pipeline, rpc: 65.2/s
regular pipeline, rpc: 68.5/s
limited pipeline, rpc, pre-prepared client: 70.7/s
limited pipeline, rpc, no-cast: 137.1/s
regular pipeline, drop: 151.9/s
limited pipeline, drop: 153.7/s

"no-cast" goes into oslo.messaging make the cast method on the context do nothing
"pre-pepared" tries to see if there is anything to gain by having an exist context before doing casts

As you can see here, the size of the pipeline doesn't impact the number of requests per second by that much, but using rpc does.

There are _not_ the results I was hoping for. They suggest that the quickest way to improve things would be to have the publishing happen on a greenthread that is async from the main thread running the wsgi middleware.

I've done some further exploration to try and narrow down where time is lost. The general areas are:

* Counting the size of the request and response.
* Processing samples through the publishing pipeline to a sink.
* Publishing at the chosen sink.

In other testing it's become pretty clear that the first is not a huge factor, so today I did some testing to see which of the second and third have the most impact. To do this I created:

and then compared those settings against a swift-bench run with this conf:

I used a concurrency of 1 and a non-large object size to avoid too many other variables. I have more complete results but the results for 1000 GETS is the quickest way to show results:

limited pipeline, rpc:                      65.2/s
regular pipeline, rpc:                      68.5/s
limited pipeline, rpc, pre-prepared client: 70.7/s
limited pipeline, rpc, no-cast:            137.1/s
regular pipeline, drop:                    151.9/s
limited pipeline, drop:                    153.7/s

"no-cast" goes into oslo.messaging make the cast method on the context do nothing
"pre-pepared" tries to see if there is anything to gain by having an exist context before doing casts

As you can see here, the size of the pipeline doesn't impact the number of requests per second by that much, but using rpc does.

Revision history for this message

Chris Dent (cdent) wrote on 2014-07-28:

It may be worthwhile tracking this change https://review.openstack.org/#/c/80225/

As the notifications are async and no reply from the other end is required. This will allow significant speedup.

Revision history for this message

Hisashi Osanai (osanai-hisashi) wrote on 2014-07-28:

In #2, I commented out the 3 steps and I got the good result of measurement. So I'm sure that a main cause for this degradation is sync publishing. And I heard that Mr. Yamamoto has been making fix for this bug with async publishing.

Revision history for this message

Hisashi Osanai (osanai-hisashi) wrote on 2014-07-28:

#10

answer for #6:
> In the tests above is the meter database in mysql or mongodb or something else?

I use mongodb for ceilometer.

Revision history for this message

Chris Dent (cdent) wrote on 2014-07-28:

#11

"And I heard that Mr. Yamamoto has been making fix for this bug with async publishing."

That's why I linked: https://review.openstack.org/#/c/80225/

Publishing with notifications instead of cast will allow async.

Revision history for this message

Hisashi Osanai (osanai-hisashi) wrote on 2014-07-28:

#12

Thanks, I understand. A fix of https://review.openstack.org/#/c/80225/ is common one and Mr. Yamamoto's one is only for swift-middleware. For Icehouse, it is better to have less effect modules like swift-middleware but for future 80225 is preferable arch if they have same performance. Anyway I'm looking forward to having both modules for extra measurement.

Eoghan Glynn (eglynn) on 2014-07-29

Changed in ceilometer:
milestone:	none → juno-3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-29: Fix proposed to ceilometer (master)

#13

Fix proposed to branch: master
Review: https://review.openstack.org/110257

Revision history for this message

Chris Dent (cdent) wrote on 2014-07-31:

#14

There's discussion related to this bug on the mailing list: http://lists.openstack.org/pipermail/openstack-dev/2014-July/041522.html which eventually leads to one potential solution: rewrite using oslo middleware.

Eoghan Glynn (eglynn) on 2014-09-02

Changed in ceilometer:
milestone:	juno-3 → none

Eoghan Glynn (eglynn) on 2014-09-02

Changed in ceilometer:
milestone:	none → juno-rc1

Eoghan Glynn (eglynn) on 2014-09-15

Changed in ceilometer:
milestone:	juno-rc1 → none

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-17: Change abandoned on ceilometer (master)

#15

Change abandoned by Chris Dent (<email address hidden>) on branch: master
Review: https://review.openstack.org/109368
Reason: The swift middleware is the site of much need for change, and this change in particular is pretty minor compared to those other needs.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-07:

#16

Change abandoned by Keisuke Yamamoto (<email address hidden>) on branch: master
Review: https://review.openstack.org/110257
Reason: This problem will be fixed by the BP(https://review.openstack.org/#/c/142129/). I withdraw this patch as I thought the BP more best.
Thanks to all for your time.

Keisuke Yamamoto (keisuk-yamamoto) on 2015-01-07

Changed in ceilometer:
assignee:	Keisuke Yamamoto (keisuk-yamamoto) → nobody

Revision history for this message

gordon chung (chungg) wrote on 2015-03-05:

#17

we've released ceilometermiddleware, this package uses oslo.messaging to send a single notification to a queue rather than 2-3 messages via rpc...

i'm marking this will a lower severity for now just in case... will close if nothing else is raised.

Changed in ceilometer:
importance:	High → Low
status:	In Progress → Incomplete

gordon chung (chungg) on 2015-09-02

Changed in ceilometer:
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.