I didn't see anything in here that was particularly surprising -- these were all things we knew we were doing, but having ballpark numbers is nice.
Some initial thoughts:
* We may see some tail latency improvements from using tcmalloc given the cases where we ended up in malloc seemed quite expensive (up to 65us).
* ControlObject::set can cost 16us locking mutexes alone. Queued connections allocate memory too (which can also block). We really need to get COs out of the engine.
* SideChain writing costs ~8-16us just signaling the sidechain thread.
* Why is SoundDeviceNetwork open when no shoutcast stream is enabled? Do we need to be calling gettimeofday from the engine thread?
I didn't see anything in here that was particularly surprising -- these were all things we knew we were doing, but having ballpark numbers is nice.
Some initial thoughts:
* We may see some tail latency improvements from using tcmalloc given the cases where we ended up in malloc seemed quite expensive (up to 65us).
* ControlObject::set can cost 16us locking mutexes alone. Queued connections allocate memory too (which can also block). We really need to get COs out of the engine.
* SideChain writing costs ~8-16us just signaling the sidechain thread.
* Why is SoundDeviceNetwork open when no shoutcast stream is enabled? Do we need to be calling gettimeofday from the engine thread?