Row mismatch between index/table cause init auth to fail
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
Fix Committed
|
High
|
Justin Du |
Bug Description
Initialize authorization is failing because of a problem with index maintenance on the OBJECTS metadata table.
After running full regressions, the number of rows in the OBJECTS table is less than the number of rows in the OBJECTS table's index. The number of rows should match.
How to recreate:
Run full regressions on the work station (either debug or release) – everything works
Re-run regressions and all catman1 tests failed because initialize authorization fails:
initialize authorization;
*** ERROR[1001] An internal error occurred in module ../sqlcomp/
The initialize authorization code first performs an insert .. select -> this returns 238 rows (it reads from the OBJECTS table)
Then the code performs a count(*) on the newly inserted table -> this returns 246 rows (it reads from the OBJECTS table index)
Note that initialize authorization creates five tables, and then performs the insert … select
When initialize authorization fails it removes the 5 tables.
If I log onto sqlci and perform similar commands:
A select that mimics what the insert .. select performs and
A select with count(*)
select count(*)
from "_MD_".objects o
where o.object_type in ('VI','
;
(EXPR)
-------
--- 1 row(s) selected.
select
object_uid,
object_owner
from "_MD_".objects o
where o.object_type in ('VI','
;
OBJECT_UID OBJECT_OWNER
-------
614181175319
822472716130
. . .
1210903123425
1210903123425
1210903123425
--- 233 row(s) selected.
Note that the difference in row counts in these queries and the row counts from initialize authorization is because the 5 privilege manager table that no longer exist (246 – 5 = 241 & 238 – 5 = 233)
If I don’t include object_owner in the select list, then 241 rows are returned as expected.
Comparing the returned UID’s from both requests, the following rows were not returned in the second select:
OBJECT_TYPE SCHEMA_NAME OBJECT_NAME
----------- -------
BT SCH SKC
BT SCH DM2C
BT SCH T062A
BT SCH T062B
BT SCH SKC
BT SCH DM2C
BT SCH T062A
BT SCH T062B
--- 8 row(s) selected.
The returned rows have UID’s in the same range:
83939089253157639
839390892531
839390892531
839390892531
861903851304
861903851304
861903851304
861903851304
Now I did a select count(*) on the table versus the index (read all the rows, not just a subset):
Set parserflags 1;
select count(*) from table (index_table objects_uniq_idx);
..
(EXPR)
-------
--- 1 row(s) selected.
>>select count(*) from objects;
(EXPR)
-------
There is a difference – there are more rows in the index than in the table. This seems to be the problem encountered by initialize authorization.
Hopefully, the table names above can lead to which statements are actually failing to do index maintenance and the problem can be more easily recreated.
------> Mike
All the tables below come from compGeneral/
ps - What’s the significance of the UIDs being in the same range?
------> Selva
This might be yet another issue that we might need to understand better. Couple of days ago, I ran executor/TEST106 in my workspace because jenkens had some failures with this test. For some reason, this left the metadata for the table TEST106 inconsistent though the test passed. I couldn’t drop T106B (I think) because it was not able to delete all the rows from SB_HISTOGRAM_
I initialized Trafodion again and ran full regressions again. This time similar problem happens in T42 table in compGeneral/
------> Sandhya
This is troubling though – I don’t recall this issue being reported where our dev regressions left the metadata in inconsistent state so easily . We recently made some changes that changed the plan for the delete statement that gets executed during a drop table – it was to avoid an error 73 issue when concurrent ddls were performed. I wonder if there is some issue with that CQD and the plan causing index maintenance on the metadata tables to not work correctly. Or perhaps some other change causing this regression. I am just guessing here – no concrete clues. As Mike says this may be difficult to pin point.
tags: | added: sql-exe |
Changed in trafodion: | |
importance: | Undecided → High |
milestone: | none → r1.1 |
Changed in trafodion: | |
assignee: | nobody → Justin Du (justin-du-2) |
Changed in trafodion: | |
status: | New → In Progress |
Changed in trafodion: | |
status: | In Progress → Fix Committed |
This problem may be the result of CQD HIDE_INDEXES 'ALL' in the test (compGeneral/ TEST062) . In fact the problem can be reproduced with the following scripts:
1 Run this on a new sqlci session:
userId varchar(20) character set UCS2 not casespecific not null
store by (name);
drop table skc;
create table skc(name varchar(20) character set UCS2 not casespecific not null,
)
drop table dm2c;
salary float (40) not null)
store by (name);
create table dm2c( name char(30) character set UCS2 not null,
control query default HIDE_INDEXES 'ALL';
drop table skc;
drop table dm2c;
2. Run this script on the same sqlci session or new one:
set parserflags 1;
set schema "_MD_";
cqd query_cache '0';
control query shape join(scan(path 'OBJECTS'), scan(path 'OBJECTS_ UNIQ_IDX' ));
select object_uid, object_type from OBJECTS where object_uid not in
(select "OBJECT_UID@" from table(index_table OBJECTS_UNIQ_IDX));
control query shape join(scan(path 'OBJECTS_ UNIQ_IDX' ), scan(path 'OBJECTS'));
select "OBJECT_UID@", object_type, object_name from table(index_table OBJECTS_UNIQ_IDX)
where "OBJECT_UID@" not in (select object_uid from OBJECTS);
The second select statement would return one or 2 rows.
The user CQD should not affect the metadata queries, so none of the select statements should return any row.