innodb_memcache OOM when reading >8k values

Bug #1655403 reported by Andrew N
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Invalid
Undecided
Unassigned
5.6
Invalid
Undecided
Unassigned
5.7
Triaged
High
Unassigned

Bug Description

Running percona 5.7 HEAD, using VARCHAR(65000) in place of VARCHAR(1024)
for demo_test.c2 in the distribution innodb_memcached_config.sql,
percona-server uses up all system memory and triggers the Linux OOM-killer
when a single memcache client retrieves many values greater than ~8k. The
memory for the values isn’t getting freed after the GET command returns,
but it’s not a true memory leak because all of the additional memory gets
released when the memcache TCP connection closes.

The issue seems to be that values over a certain threshold trigger the
following call stack, allocating memory in a heap that belongs to the
read_tpl in innodb_engine.c, and that memory isn’t freed up until the
read_tpl is destroyed when the connection closes.

    mem_heap_alloc
    btr_rec_copy_externally_stored_field
    ib_read_tuple
    ib_cursor_read_row
    innodb_api_search
    innodb_get
    process_get_command

The below patch is working for us, and I’m happy to update it as needed to
get it merged. I’m hoping that would go a lot faster if you could point me
to detailed documentation or help me get in touch with someone who
understands the relation between innodb cursors, tuples, and heaps, and
where and how it would be safe to release the block allocated when
btr_rec_copy_externally_stored_field calls mem_heap_alloc.

diff --git a/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c b/plugi
index 60fc372..db88525 100644
--- a/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c
+++ b/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c
@@ -1637,6 +1637,10 @@ innodb_release(
  item_release(def_eng, (hash_item *) item);
  conn_data->use_default_mem = false;
 }
+ if (conn_data->read_tpl) {
+ ib_cb_tuple_delete(conn_data->read_tpl);
+ conn_data->read_tpl = NULL;
+ }

 return;
 }

Here is the script we're using to reproduce the error. Run it and watch
mysql memory usage grow.

    #!/usr/bin/env ruby

    require 'socket'

    def bin_to_hex(s)
      s.each_byte.map { |b| b.to_s(16) }.join
    end

    class MemcacheClient
      def initialize
        @s = TCPSocket.new 'localhost', 11211
      end

      def get(key)
        @s.write("get #{key}\r\n")
        info = @s.gets
        if info.start_with?('VALUE')
          count = info.split[-1].to_i
          value = @s.read(count)
          value += @s.gets # \r\n
          value += @s.gets # END
        end
        info + (value || "")
      end

      def set(key, value)
        value = value.to_s
        @s.write("set #{key} 0 0 #{value.bytesize}\r\n#{value}\r\n")
        @s.gets
      end
    end

    def test_write_and_read(size=64000)
      memcache_client = MemcacheClient.new
      random = Random.new

      1.upto(1_000_000) do |n|
        k = bin_to_hex(random.bytes(20))
        v = 'a' * size

        puts n
        $stdout.flush

        memcache_client.set(k, v)
        ret = memcache_client.get(k)
        if !ret.include?(v)
          raise "Error: mismatch: #{ret.inspect} != #{v.inspect}"
        end
      end
    end
    test_write_and_read

Tags: upstream
tags: added: upstream
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1046

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.