Drizzle

Transaction log blending two distinct UPDATES in a single transaction incorrectly

Bug #655352 reported by Patrick Crews on 2010-10-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Drizzle	Fix Released	Undecided	Joe Daly	Drizzle 2010-10-11

Bug Description

The transaction log appears to be merging / blending two distinct UPDATES (against the same table) within a single transaction with bad results.

From the test case:
These two UPDATES -

UPDATE `c` SET `col_int_not_null` = 1 WHERE `col_int` BETWEEN 7 AND 108 ORDER BY `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk` LIMIT 7 ;

UPDATE `c` SET `col_int_not_null_key` = 10 WHERE `col_char_10_not_null_key` >= 'p' ORDER BY `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk` LIMIT 5 ;

End up in the transaction log like this (all changes merged into the first update / new values for the second UPDATE are attributed to col_int_not_null instead of col_int_not_null_key)
statement {
  type: UPDATE
  START_TIMESTAMP
  END_TIMESTAMP
  update_header {
    table_metadata {
      schema_name: "test"
      table_name: "c"
    }
    key_field_metadata {
      type: INTEGER
      name: "pk"
    }
    set_field_metadata {
      type: INTEGER
      name: "col_int_not_null"
    }
  }
  update_data {
    segment_id: 1
    end_segment: true
    record {
      key_value: "11"
      after_value: "1"
      is_null: false
    }
    record {
      key_value: "7"
      after_value: "1"
      is_null: false
    }
    record {
      key_value: "13"
      after_value: "1"
      is_null: false
    }

############# This should be for col_int_not_null_key : ( ###################################
    record {
      key_value: "3"
      after_value: "10"
      is_null: false
    }
    record {
      key_value: "7"
      after_value: "10"
      is_null: false
    }
    record {
      key_value: "8"
      after_value: "10"
      is_null: false
    }
    record {
      key_value: "9"
      after_value: "10"
      is_null: false
    }
    record {
      key_value: "12"
      after_value: "10"
      is_null: false
    }
  }
}

Related branches

lp:~patrick-crews/drizzle/bug655352-tests

lp:~skinny.moey/drizzle/transaction_log_655352

Merged into lp:~drizzle-trunk/drizzle/development at revision 1838

Drizzle Merge Team: Pending requested 2010-10-11

Patrick Crews (patrick-crews) on 2010-10-05

Changed in drizzle:
status:	New → Confirmed

Patrick Crews (patrick-crews) on 2010-10-05

Changed in drizzle:
assignee:	nobody → David Shrewsbury (dshrews)

Revision history for this message

Joe Daly (skinny.moey) wrote on 2010-10-06:

this looks like the check in getUpdateStatement() here:

    else
    {
      const message::UpdateHeader &update_header= statement->update_header();
      string old_table_name= update_header.table_metadata().table_name();

      string current_table_name;
      (void) in_table->getShare()->getTableName(current_table_name);
      if (current_table_name.compare(old_table_name))
      {
        finalizeStatementMessage(*statement, in_session);
        statement= in_session->getStatementMessage();
      }
      else
      {
        /* carry forward the existing segment id */
        const message::UpdateData &current_data= statement->update_data();
        *next_segment_id= current_data.segment_id();
      }

needs some improvement to check what fields are updated and call finalizeStatementMessage() if they differ between updates.

Joe Daly (skinny.moey) on 2010-10-06

Changed in drizzle:
assignee:	David Shrewsbury (dshrews) → Joe Daly (skinny.moey)

Revision history for this message

Joe Daly (skinny.moey) wrote on 2010-10-07:

Im wondering if the best fix for this would be to remove the optimization of reusing the update header and having multiple updates in the statement. Alternatively the other solution would be to check the fields on previous update and compare them to the new update if they are equal combine if not create a new statement.

Revision history for this message

Patrick Crews (patrick-crews) wrote on 2010-10-07: Re: [Bug 655352] Re: Transaction log blending two distinct UPDATES in a single transaction incorrectly

Download full text (3.6 KiB)

I'd lean in favor of not reusing the update header as it seems simplest /
cleanest.

On Thu, Oct 7, 2010 at 1:37 PM, Joe Daly <email address hidden> wrote:

> Im wondering if the best fix for this would be to remove the
> optimization of reusing the update header and having multiple updates in
> the statement. Alternatively the other solution would be to check the
> fields on previous update and compare them to the new update if they are
> equal combine if not create a new statement.
>
> --
> Transaction log blending two distinct UPDATES in a single transaction
> incorrectly
> https://bugs.launchpad.net/bugs/655352
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in A Lightweight SQL Database for Cloud Infrastructure and Web
> Applications: Confirmed
>
> Bug description:
> The transaction log appears to be merging / blending two distinct UPDATES
> (against the same table) within a single transaction with bad results.
>
> >From the test case:
> These two UPDATES -
>
> UPDATE `c` SET `col_int_not_null` = 1 WHERE `col_int` BETWEEN 7 AND 108
> ORDER BY
> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
> LIMIT 7 ;
>
> UPDATE `c` SET `col_int_not_null_key` = 10 WHERE
> `col_char_10_not_null_key` >= 'p' ORDER BY
> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
> LIMIT 5 ;
>
> End up in the transaction log like this (all changes merged into the first
> update / new values for the second UPDATE are attributed to col_int_not_null
> instead of col_int_not_null_key)
> statement {
> type: UPDATE
> START_TIMESTAMP
> END_TIMESTAMP
> update_header {
> table_metadata {
> schema_name: "test"
> table_name: "c"
> }
> key_field_metadata {
> type: INTEGER
> name: "pk"
> }
> set_field_metadata {
> type: INTEGER
> name: "col_int_not_null"
> }
> }
> update_data {
> segment_id: 1
> end_segment: true
> record {
> key_value: "11"
> after_value: "1"
> is_null: false
> }
> record {
> key_value: "7"
> after_value: "1"
> is_null: false
> }
> record {
> key_value: "13"
> after_value: "1"
> is_null: false
> }
>
> ############# This should be for col_int_not_null_key : (
> ###################################
> record {
> key_value: "3"
> after_value: "10"
> is_null: false
> ...

I'd lean in favor of not reusing the update header as it seems simplest /
cleanest.

On Thu, Oct 7, 2010 at 1:37 PM, Joe Daly <skinny.moey@gmail.com> wrote:

> Im wondering if the best fix for this would be to remove the
> optimization of reusing the update header and having multiple updates in
> the statement. Alternatively the other solution would be to check the
> fields on previous update and compare them to the new update if they are
> equal combine if not create a new statement.
>
> --
> Transaction log blending two distinct UPDATES in a single transaction
> incorrectly
> https://bugs.launchpad.net/bugs/655352
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in A Lightweight SQL Database for Cloud Infrastructure and Web
> Applications: Confirmed
>
> Bug description:
> The transaction log appears to be merging / blending two distinct UPDATES
> (against the same table) within a single transaction with bad results.
>
> >From the test case:
> These two UPDATES -
>
>  UPDATE `c` SET `col_int_not_null` = 1 WHERE `col_int` BETWEEN 7 AND 108
> ORDER BY
> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
> LIMIT 7 ;
>
>  UPDATE `c` SET `col_int_not_null_key` = 10 WHERE
> `col_char_10_not_null_key` >= 'p' ORDER BY
> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
> LIMIT 5 ;
>
> End up in the transaction log like this (all changes merged into the first
> update / new values for the second UPDATE are attributed to col_int_not_null
> instead of col_int_not_null_key)
> statement {
>  type: UPDATE
>  START_TIMESTAMP
>  END_TIMESTAMP
>  update_header {
>    table_metadata {
>      schema_name: "test"
>      table_name: "c"
>    }
>    key_field_metadata {
>      type: INTEGER
>      name: "pk"
>    }
>    set_field_metadata {
>      type: INTEGER
>      name: "col_int_not_null"
>    }
>  }
>  update_data {
>    segment_id: 1
>    end_segment: true
>    record {
>      key_value: "11"
>      after_value: "1"
>      is_null: false
>    }
>    record {
>      key_value: "7"
>      after_value: "1"
>      is_null: false
>    }
>    record {
>      key_value: "13"
>      after_value: "1"
>      is_null: false
>    }
>
> ############# This should be for col_int_not_null_key : (
> ###################################
>    record {
>      key_value: "3"
>      after_value: "10"
>      is_null: false
>    }
>    record {
>      key_value: "7"
>      after_value: "10"
>      is_null: false
>    }
>    record {
>      key_value: "8"
>      after_value: "10"
>      is_null: false
>    }
>    record {
>      key_value: "9"
>      after_value: "10"
>      is_null: false
>    }
>    record {
>      key_value: "12"
>      after_value: "10"
>      is_null: false
>    }
>  }
> }
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/drizzle/+bug/655352/+subscribe
>

Revision history for this message

David Shrewsbury (dshrews) wrote on 2010-10-07:

Download full text (4.3 KiB)

Hmm, although that should work, we do lose the "compression" of
combining multiple Statements into one. I wonder how much of a
performance hit we would take if we were to compare columns?
A good short-circuit would be first compare number of columns
before comparing column names themselves.

-Dave

On Thu, Oct 7, 2010 at 1:59 PM, Patrick Crews <email address hidden> wrote:
> I'd lean in favor of not reusing the update header as it seems simplest /
> cleanest.
>
> On Thu, Oct 7, 2010 at 1:37 PM, Joe Daly <email address hidden> wrote:
>
>> Im wondering if the best fix for this would be to remove the
>> optimization of reusing the update header and having multiple updates in
>> the statement. Alternatively the other solution would be to check the
>> fields on previous update and compare them to the new update if they are
>> equal combine if not create a new statement.
>>
>> --
>> Transaction log blending two distinct UPDATES in a single transaction
>> incorrectly
>> https://bugs.launchpad.net/bugs/655352
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>> Status in A Lightweight SQL Database for Cloud Infrastructure and Web
>> Applications: Confirmed
>>
>> Bug description:
>> The transaction log appears to be merging / blending two distinct UPDATES
>> (against the same table) within a single transaction with bad results.
>>
>> >From the test case:
>> These two UPDATES -
>>
>> UPDATE `c` SET `col_int_not_null` = 1 WHERE `col_int` BETWEEN 7 AND 108
>> ORDER BY
>> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
>> LIMIT 7 ;
>>
>> UPDATE `c` SET `col_int_not_null_key` = 10 WHERE
>> `col_char_10_not_null_key` >= 'p' ORDER BY
>> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
>> LIMIT 5 ;
>>
>> End up in the transaction log like this (all changes merged into the first
>> update / new values for the second UPDATE are attributed to col_int_not_null
>> instead of col_int_not_null_key)
>> statement {
>> type: UPDATE
>> START_TIMESTAMP
>> END_TIMESTAMP
>> update_header {
>> table_metadata {
>> schema_name: "test"
>> table_name: "c"
>> }
>> key_field_metadata {
>> type: INTEGER
>> name: "pk"
>> }
>> set_field_metadata {
>> type: INTEGER
>> name: "col_int_not_null"
>> }
>> }
>> update_data {
>> segment_id: 1
>> end_segment: true
>> reco...

-Dave

On Thu, Oct 7, 2010 at 1:59 PM, Patrick Crews <655352@bugs.launchpad.net> wrote:
> I'd lean in favor of not reusing the update header as it seems simplest /
> cleanest.
>
> On Thu, Oct 7, 2010 at 1:37 PM, Joe Daly <skinny.moey@gmail.com> wrote:
>
>> Im wondering if the best fix for this would be to remove the
>> optimization of reusing the update header and having multiple updates in
>> the statement. Alternatively the other solution would be to check the
>> fields on previous update and compare them to the new update if they are
>> equal combine if not create a new statement.
>>
>> --
>> Transaction log blending two distinct UPDATES in a single transaction
>> incorrectly
>> https://bugs.launchpad.net/bugs/655352
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>> Status in A Lightweight SQL Database for Cloud Infrastructure and Web
>> Applications: Confirmed
>>
>> Bug description:
>> The transaction log appears to be merging / blending two distinct UPDATES
>> (against the same table) within a single transaction with bad results.
>>
>> >From the test case:
>> These two UPDATES -
>>
>>  UPDATE `c` SET `col_int_not_null` = 1 WHERE `col_int` BETWEEN 7 AND 108
>> ORDER BY
>> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
>> LIMIT 7 ;
>>
>>  UPDATE `c` SET `col_int_not_null_key` = 10 WHERE
>> `col_char_10_not_null_key` >= 'p' ORDER BY
>> `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_1024`,`col_char_1024_key`,`col_char_1024_not_null`,`col_char_1024_not_null_key`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk`
>> LIMIT 5 ;
>>
>> End up in the transaction log like this (all changes merged into the first
>> update / new values for the second UPDATE are attributed to col_int_not_null
>> instead of col_int_not_null_key)
>> statement {
>>  type: UPDATE
>>  START_TIMESTAMP
>>  END_TIMESTAMP
>>  update_header {
>>    table_metadata {
>>      schema_name: "test"
>>      table_name: "c"
>>    }
>>    key_field_metadata {
>>      type: INTEGER
>>      name: "pk"
>>    }
>>    set_field_metadata {
>>      type: INTEGER
>>      name: "col_int_not_null"
>>    }
>>  }
>>  update_data {
>>    segment_id: 1
>>    end_segment: true
>>    record {
>>      key_value: "11"
>>      after_value: "1"
>>      is_null: false
>>    }
>>    record {
>>      key_value: "7"
>>      after_value: "1"
>>      is_null: false
>>    }
>>    record {
>>      key_value: "13"
>>      after_value: "1"
>>      is_null: false
>>    }
>>
>> ############# This should be for col_int_not_null_key : (
>> ###################################
>>    record {
>>      key_value: "3"
>>      after_value: "10"
>>      is_null: false
>>    }
>>    record {
>>      key_value: "7"
>>      after_value: "10"
>>      is_null: false
>>    }
>>    record {
>>      key_value: "8"
>>      after_value: "10"
>>      is_null: false
>>    }
>>    record {
>>      key_value: "9"
>>      after_value: "10"
>>      is_null: false
>>    }
>>    record {
>>      key_value: "12"
>>      after_value: "10"
>>      is_null: false
>>    }
>>  }
>> }
>>
>> To unsubscribe from this bug, go to:
>> https://bugs.launchpad.net/drizzle/+bug/655352/+subscribe
>>
>
> --
> Transaction log blending two distinct UPDATES in a single transaction incorrectly
> https://bugs.launchpad.net/bugs/655352
> You received this bug notification because you are a member of Drizzle-
> developers, which is subscribed to Drizzle.
>

Revision history for this message

Joe Daly (skinny.moey) wrote on 2010-10-08:

Patrick if you have a chance to run my linked branch through your randgen tests that saw this failure that would be great

Changed in drizzle:
status:	Confirmed → Fix Committed

Joe Daly (skinny.moey) on 2010-10-11

Changed in drizzle:
status:	Fix Committed → Fix Released
milestone:	none → 2010-10-11
status:	Fix Released → Fix Committed
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Testing of the drizzle transaction log

Remote bug watches

Bug watches keep track of this bug in other bug trackers.