Transaction log blending two distinct UPDATES in a single transaction incorrectly

Bug #655352 reported by Patrick Crews
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Drizzle
Fix Released
Undecided
Joe Daly

Bug Description

The transaction log appears to be merging / blending two distinct UPDATES (against the same table) within a single transaction with bad results.

From the test case:
These two UPDATES -

  UPDATE `c` SET `col_int_not_null` = 1 WHERE `col_int` BETWEEN 7 AND 108 ORDER BY `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk` LIMIT 7 ;

  UPDATE `c` SET `col_int_not_null_key` = 10 WHERE `col_char_10_not_null_key` >= 'p' ORDER BY `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk` LIMIT 5 ;

End up in the transaction log like this (all changes merged into the first update / new values for the second UPDATE are attributed to col_int_not_null instead of col_int_not_null_key)
statement {
  type: UPDATE
  START_TIMESTAMP
  END_TIMESTAMP
  update_header {
    table_metadata {
      schema_name: "test"
      table_name: "c"
    }
    key_field_metadata {
      type: INTEGER
      name: "pk"
    }
    set_field_metadata {
      type: INTEGER
      name: "col_int_not_null"
    }
  }
  update_data {
    segment_id: 1
    end_segment: true
    record {
      key_value: "11"
      after_value: "1"
      is_null: false
    }
    record {
      key_value: "7"
      after_value: "1"
      is_null: false
    }
    record {
      key_value: "13"
      after_value: "1"
      is_null: false
    }

############# This should be for col_int_not_null_key : ( ###################################
    record {
      key_value: "3"
      after_value: "10"
      is_null: false
    }
    record {
      key_value: "7"
      after_value: "10"
      is_null: false
    }
    record {
      key_value: "8"
      after_value: "10"
      is_null: false
    }
    record {
      key_value: "9"
      after_value: "10"
      is_null: false
    }
    record {
      key_value: "12"
      after_value: "10"
      is_null: false
    }
  }
}

Related branches

Changed in drizzle:
status: New → Confirmed
Changed in drizzle:
assignee: nobody → David Shrewsbury (dshrews)
Revision history for this message
Joe Daly (skinny.moey) wrote :

this looks like the check in getUpdateStatement() here:

    else
    {
      const message::UpdateHeader &update_header= statement->update_header();
      string old_table_name= update_header.table_metadata().table_name();

      string current_table_name;
      (void) in_table->getShare()->getTableName(current_table_name);
      if (current_table_name.compare(old_table_name))
      {
        finalizeStatementMessage(*statement, in_session);
        statement= in_session->getStatementMessage();
      }
      else
      {
        /* carry forward the existing segment id */
        const message::UpdateData &current_data= statement->update_data();
        *next_segment_id= current_data.segment_id();
      }

needs some improvement to check what fields are updated and call finalizeStatementMessage() if they differ between updates.

Joe Daly (skinny.moey)
Changed in drizzle:
assignee: David Shrewsbury (dshrews) → Joe Daly (skinny.moey)
Revision history for this message
Joe Daly (skinny.moey) wrote :

Im wondering if the best fix for this would be to remove the optimization of reusing the update header and having multiple updates in the statement. Alternatively the other solution would be to check the fields on previous update and compare them to the new update if they are equal combine if not create a new statement.

Revision history for this message
Patrick Crews (patrick-crews) wrote : Re: [Bug 655352] Re: Transaction log blending two distinct UPDATES in a single transaction incorrectly
Download full text (3.6 KiB)

I'd lean in favor of not reusing the update header as it seems simplest /
cleanest.

On Thu, Oct 7, 2010 at 1:37 PM, Joe Daly <email address hidden> wrote:

> Im wondering if the best fix for this would be to remove the
> optimization of reusing the update header and having multiple updates in
> the statement. Alternatively the other solution would be to check the
> fields on previous update and compare them to the new update if they are
> equal combine if not create a new statement.
>
> --
> Transaction log blending two distinct UPDATES in a single transaction
> incorrectly
> https://bugs.launchpad.net/bugs/655352
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in A Lightweight SQL Database for Cloud Infrastructure and Web
> Applications: Confirmed
>
> Bug description:
> The transaction log appears to be merging / blending two distinct UPDATES
> (against the same table) within a single transaction with bad results.
>
> >From the test case:
> These two UPDATES -
>
> UPDATE `c` SET `col_int_not_null` = 1 WHERE `col_int` BETWEEN 7 AND 108
> ORDER BY
> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
> LIMIT 7 ;
>
> UPDATE `c` SET `col_int_not_null_key` = 10 WHERE
> `col_char_10_not_null_key` >= 'p' ORDER BY
> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
> LIMIT 5 ;
>
> End up in the transaction log like this (all changes merged into the first
> update / new values for the second UPDATE are attributed to col_int_not_null
> instead of col_int_not_null_key)
> statement {
> type: UPDATE
> START_TIMESTAMP
> END_TIMESTAMP
> update_header {
> table_metadata {
> schema_name: "test"
> table_name: "c"
> }
> key_field_metadata {
> type: INTEGER
> name: "pk"
> }
> set_field_metadata {
> type: INTEGER
> name: "col_int_not_null"
> }
> }
> update_data {
> segment_id: 1
> end_segment: true
> record {
> key_value: "11"
> after_value: "1"
> is_null: false
> }
> record {
> key_value: "7"
> after_value: "1"
> is_null: false
> }
> record {
> key_value: "13"
> after_value: "1"
> is_null: false
> }
>
> ############# This should be for col_int_not_null_key : (
> ###################################
> record {
> key_value: "3"
> after_value: "10"
> is_null: false
> ...

Read more...

Revision history for this message
David Shrewsbury (dshrews) wrote :
Download full text (4.3 KiB)

Hmm, although that should work, we do lose the "compression" of
combining multiple Statements into one. I wonder how much of a
performance hit we would take if we were to compare columns?
A good short-circuit would be first compare number of columns
before comparing column names themselves.

-Dave

On Thu, Oct 7, 2010 at 1:59 PM, Patrick Crews <email address hidden> wrote:
> I'd lean in favor of not reusing the update header as it seems simplest /
> cleanest.
>
> On Thu, Oct 7, 2010 at 1:37 PM, Joe Daly <email address hidden> wrote:
>
>> Im wondering if the best fix for this would be to remove the
>> optimization of reusing the update header and having multiple updates in
>> the statement. Alternatively the other solution would be to check the
>> fields on previous update and compare them to the new update if they are
>> equal combine if not create a new statement.
>>
>> --
>> Transaction log blending two distinct UPDATES in a single transaction
>> incorrectly
>> https://bugs.launchpad.net/bugs/655352
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>> Status in A Lightweight SQL Database for Cloud Infrastructure and Web
>> Applications: Confirmed
>>
>> Bug description:
>> The transaction log appears to be merging / blending two distinct UPDATES
>> (against the same table) within a single transaction with bad results.
>>
>> >From the test case:
>> These two UPDATES -
>>
>>  UPDATE `c` SET `col_int_not_null` = 1 WHERE `col_int` BETWEEN 7 AND 108
>> ORDER BY
>> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
>> LIMIT 7 ;
>>
>>  UPDATE `c` SET `col_int_not_null_key` = 10 WHERE
>> `col_char_10_not_null_key` >= 'p' ORDER BY
>> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
>> LIMIT 5 ;
>>
>> End up in the transaction log like this (all changes merged into the first
>> update / new values for the second UPDATE are attributed to col_int_not_null
>> instead of col_int_not_null_key)
>> statement {
>>  type: UPDATE
>>  START_TIMESTAMP
>>  END_TIMESTAMP
>>  update_header {
>>    table_metadata {
>>      schema_name: "test"
>>      table_name: "c"
>>    }
>>    key_field_metadata {
>>      type: INTEGER
>>      name: "pk"
>>    }
>>    set_field_metadata {
>>      type: INTEGER
>>      name: "col_int_not_null"
>>    }
>>  }
>>  update_data {
>>    segment_id: 1
>>    end_segment: true
>>    reco...

Read more...

Revision history for this message
Joe Daly (skinny.moey) wrote :

Patrick if you have a chance to run my linked branch through your randgen tests that saw this failure that would be great

Changed in drizzle:
status: Confirmed → Fix Committed
Joe Daly (skinny.moey)
Changed in drizzle:
status: Fix Committed → Fix Released
milestone: none → 2010-10-11
status: Fix Released → Fix Committed
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.