Compiler returns internal error from ../optimizer/opt.cpp line 6907

Bug #1438372 reported by Weishiun Tsai
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Fix Released
High
Justin Du

Bug Description

We are seeing an alarming rate of random compiler internal errors from ../optimizer/opt.cpp, line 6907:

*** ERROR[2235] Compiler Internal Error: Cannot produce a plan in optimizer pass one, originated from file ../optimizer/opt.cpp at line 6907.

The error is often returned from a simple DDL operation, such as a create table statement, or a create index statement. Sometimes it is also returned from the update statistics statement. Unfortunately, there is no certain way to reproduce this problem. The operations that saw this error were very basic operations and reran the same tests often did not reproduce the error again. However, in the last full SQL regression run on the v0327 build installed on a 6-node cluster, we saw a total of 223 occurrences of this error, which is quite alarming.

-bash-4.1$ grep "Compiler Internal Error" */*.log | grep 2235 | grep 6907 | wc -l
223

This bug report is filed to see if some analysis can be done from the source code to figure out why compiler is returning this internal error at such a high rate.

Here are some examples of the SQL operations returning this error from the v0327 run:

-----------------------------------

arkcase/arkt0149.log

SQL>create table b4table2
(a int not null, b int not null, c varchar(100) not null,
primary key (a));
--- SQL operation complete.

SQL>create index b4tab1x
on b4table1 (b);
*** ERROR[2235] Compiler Internal Error: Cannot produce a plan in optimizer pass one, originated from file ../optimizer/opt.cpp at line 6907. [2015-03-28 20:53:48]
*** ERROR[8804] The provided input statement does not exist in the current context. [2015-03-28 20:53:48]
*** ERROR[4082] Object TRAFODION.ARKCASE_ARKT0149.B4TABLE1 does not exist or is inaccessible. [2015-03-28 20:53:48]

-----------------------------------

arkcase/arkt1198.log

SQL>Update Statistics For Table BTA1P001 on every column;
*** ERROR[2235] Compiler Internal Error: Cannot produce a plan in optimizer pass one, originated from file ../optimizer/opt.cpp at line 6907. [2015-03-28 23:43:30]
*** ERROR[8804] The provided input statement does not exist in the current context. [2015-03-28 23:43:30]
*** ERROR[4082] Object TRAFODION.ARKCASE_ARKT1198.BTA1P001 does not exist or is inaccessible. [2015-03-28 23:43:30]
*** ERROR[2235] Compiler Internal Error: Cannot produce a plan in optimizer pass one, originated from file ../optimizer/opt.cpp at line 6907. [2015-03-28 23:43:30]
*** ERROR[8804] The provided input statement does not exist in the current context. [2015-03-28 23:43:30]
*** ERROR[4082] Object TRAFODION.ARKCASE_ARKT1198.BTA1P001 does not exist or is inaccessible. [2015-03-28 23:43:30]
*** ERROR[4082] Object TRAFODION.ARKCASE_ARKT1198.BTA1P001 does not exist or is inaccessible. [2015-03-28 23:43:30]

-----------------------------------

ddl/tab002.log

SQL>create table Female_actors (
f_no int not null not droppable,
f_name varchar(30) not null,
f_realname varchar(50) default null,
f_birthday date constraint md1 check (f_birthday > date '1900-01-01'),
primary key (f_no)
)
;
*** ERROR[2235] Compiler Internal Error: Cannot produce a plan in optimizer pass one, originated from file ../optimizer/opt.cpp at line 6907. [2015-03-29 06:50:34]
*** ERROR[8804] The provided input statement does not exist in the current context. [2015-03-29 06:50:34]
*** ERROR[4082] Object TRAFODION.DDL_TAB002.FEMALE_ACTORS does not exist or is inaccessible. [2015-03-29 06:50:34]
*** ERROR[1029] Object TRAFODION.DDL_TAB002.FEMALE_ACTORS could not be created. [2015-03-29 06:50:34]

Tags: sql-cmp
Revision history for this message
Sandhya Sundaresan (sandhya-sundaresan) wrote :

We believe this is fixed with the fix made for : https://bugs.launchpad.net/trafodion/+bug/1426479

Change-Id: Idd0388b626ecd3dbf3c3d1e75c1d6dc30b0ce021

Apr 1, 2015 4:34 PM

Changed in trafodion:
assignee: nobody → Sandhya Sundaresan (sandhya-sundaresan)
status: New → Triaged
Changed in trafodion:
status: Triaged → In Progress
assignee: Sandhya Sundaresan (sandhya-sundaresan) → Justin Du (justin-du-2)
Revision history for this message
Justin Du (justin-du-2) wrote :

Hans reproduced the problem with the following statements and provided these analyses:

First the good news, we have a reproducible test case:

sqlci
  insert into "_MD_".defaults values('QUERY_CACHE','0','testing LP1438372');
  exit;

sqlci
  cqd optimization_level '2';
  create table t12(a int);
  -- error 2235

  -- don’t forget to delete the row in the defaults table!
  delete from "_MD_".defaults where attribute = 'QUERY_CACHE';

Here are the CQD settings I’m seeing in a couple of these cores:

SQL statement:
select O.catalog_name, O.schema_name, O.object_name, I.keytag, I.is_unique, I.is_explicit, I.key_colcount, I.nonkey_colcount, T.num_salt_partns
from TRAFODION."_MD_".INDEXES I, TRAFODION."_MD_".OBJECTS O , TRAFODION."_MD_".TABLES T
where I.base_table_uid = 82816710866407165 and I.index_uid = O.object_uid and O.valid_def = 'Y' and I.index_uid = T.table_uid
for read committed access order by 1,2,3

This query is issued from CmpSeabaseDDL::getSeabaseUserTableDesc().

OPTIMIZATION_LEVEL: 2
HASH_JOINS: OFF
MERGE_JOINS: OFF
NESTED_JOINS: ON

When I try this in a standalone session with these CQD settings I can reproduce the error 2235. As mentioned before, that’s a “feature”. I think the reason why this happens only occasionally is that if we compile the query with the default optimization level first, then we are getting a cached plan when we later reduce the level to 2, so we don’t see the bug. I can reproduce that as well. If, however, the optimization level is 2 the first time we compile this, then we get error 2235.

I think Wei-Shiun hit the nail on the head when he mentioned his OPTIMIZATION_LEVEL test suite below. What we saw in the core and Wei-Shiun’s observation lead to the reproducible test script above.

Is this something you could look into, Justin or Sandhya, how the optimization level 2 makes its way into the arkcmp?

Just FYI, here are the gdb commands I used:

set print elements 1000
f 4
p input_str
p cmpCurrentContext->schemaDB_->defaults_->currentDefaults_[OPTIMIZATION_LEVEL]
p cmpCurrentContext->schemaDB_->defaults_->currentDefaults_[HASH_JOINS]
p cmpCurrentContext->schemaDB_->defaults_->currentDefaults_[MERGE_JOINS]
p cmpCurrentContext->schemaDB_->defaults_->currentDefaults_[NESTED_JOINS]

Thanks,

Hans

Revision history for this message
Justin Du (justin-du-2) wrote :

Added temporary fix by adding back the hold and set CQD optimization level to 3.

A permanent fix would be to totally block user CQDs to be propagated to secondary compilers, i.e. embedded metadata compiler and external compiler.

Will lower the importance to "high" after delivering the temp fix.

Revision history for this message
Justin Du (justin-du-2) wrote :

The temporary fix were merged. Now lower the importance to "high" and continue the work for R1.2.

Changed in trafodion:
importance: Critical → High
milestone: r1.1 → r1.2
Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :

Running through the entire regression tests on the v0416 build installed on a 4-node cluster without seeing this problem again. This temporary workaround did address this particular use case. Will verify again once the actual fix is in.

-bash-4.1$ grep "Compiler Internal Error" */*.log | grep 2235 | grep 6907 | wc -l
0

Revision history for this message
Justin Du (justin-du-2) wrote :

Recent check-in C1625 addressed this particular issue which is that the user entered CQDs affect (metadata) internal queries.

Changed in trafodion:
status: In Progress → Fix Committed
Revision history for this message
Weishiun Tsai (wei-shiun-tsai) wrote :

Ran through the entire SQL regression test with the v0523 build installed both on a Cloudera cluster and a Hortonworks cluster. Did not see this particular error. This case will be closed.

-bash-4.1$ grep "Compiler Internal Error" */*.log | grep 2235 | grep 6907 | wc -l
0

Changed in trafodion:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.