UNEXPECTED_MESSAGE_RECIEVED

Bug #1160730 reported by Alvin Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stado
New
Undecided
Unassigned

Bug Description

create table a(x int, y int);
create table b(x int, y int);
insert into a values(1,1);
insert into a values(1,2);
insert into b values(1,3);
insert into b values(1,4);

with "xdb.use_load_for_step=false" config in stado.config.

Stado -> select * from a,b;
SQLException: ERROR: Node 1 has aborted execution, cause is: org.postgresql.stado.exception.XDBServerException : Insert Failed :

During debug, below exception is got:

org.postgresql.stado.exception.XDBServerException: UNEXPECTED_MESSAGE_RECIEVED (GOT, EXPECTED) :(MSG_ABORT , MSG_SEND_DATA_ACK)

The cause seems to be:

###ProducerSender.java:

public void sendToNodes(ResultSet aResultSet, StepDetail aStepDetail,
            Object jdbcLock, int sourceNodeId, int requestId)
    throws SQLException, XDBServerException {
...
            try {
                logger.info("Finishing Inserts");
                finishInserts(aStepDetail.getDestType());
                logger.info("Finished Inserts");
            } catch (Exception e) {
                logger.catching(e);
                XDBServerException ex = new XDBServerException(
                        ErrorMessageRepository.INSERT_FAILED, e,
                        ErrorMessageRepository.INSERT_FAILED_CODE);
                logger.throwing(ex);
                throw ex;
            }
...

in finishInserts(aStepDetail.getDestType());

###LocalConnector.java:

        protected void send(NodeMessage message) throws Exception {
            final String method = "send";
            logger.entering(method, new Object[] { message });
            try {

                BlockingQueue<NodeMessage> destination = msgQTable.get(message
                        .getTargetNodeID());
                destination.offer(message);

            } finally {
                logger.exiting(method);
            }
        }

msgQTable is : {1=[]}
message is: NodeMessage[Type:MSG_SEND_DATA, From:1, To: [2], Session: 1, Request: 0]

So destination is null, then java.lang.NullPointerException is got, and MSG_ABORT is generated.

Related branches

Revision history for this message
Alvin Peng (pengalvin) wrote :

with "xdb.use_load_for_step=true" (default config) config in stado.config.

Stado -> select count(*) from sf_commercerecord;
+----------+
| count(*) |
+----------+
| 3000000 |
+----------+
1 row(s).

Stado -> select count(*) from sf_wofcust201205;
+----------+
| count(*) |
+----------+
| 25009727 |
+----------+
1 row(s).

Stado -> update sf_commercerecord set status=integraltype where exists ( select * from sf_wofcust201205 where sf_commercerecord.mobileno=sf_wofcust201205.mobileno and sf_commercerecord.feecode=sf_wofcust201205.feecode );
SQLException: ERROR: UNEXPECTED_MESSAGE_RECIEVED (GOT, EXPECTED) :(1 , 41 )

For small tables, there seems to be no problem:

Stado -> select count(*) from sf_commercerecord;
+----------+
| count(*) |
+----------+
| 2 |
+----------+
1 row(s).

Stado -> select count(*) from sf_wofcust201205;
+----------+
| count(*) |
+----------+
| 82 |
+----------+
1 row(s).

Stado -> update sf_commercerecord set status=integraltype where exists ( select * from sf_wofcust201205 where sf_commercerecord.mobileno=sf_wofcust201205.mobileno and sf_commercerecord.feecode=sf_wofcust201205.feecode);
1 row(s) affected

Revision history for this message
Andrei Martsinchyk (andrei-martsinchyk) wrote :

Hi Alvin,

The problem is not reproduced in my environment, and look like misconfiguration.
Could you tell more about your setup?
How many nodes you have? Are you using XDBAgents? Are there any entries like xdb.connector.x.y=z in your configuration files?

Changed in stado:
status: New → Incomplete
Revision history for this message
Alvin Peng (pengalvin) wrote :
Download full text (3.7 KiB)

Hi Andrei,

In my testing environment. I have two logical nodes on the same phisical node:
xdb.nodecount=2
xdb.node.1.dbhost=127.0.0.1
xdb.node.2.dbhost=127.0.0.1

####1 xdb.use_load_for_step=false ; NOT using XDBAgents

Stado -> select * from a,b;
+---------------+
| x | y | x | y |
+---------------+
| 1 | 1 | 1 | 3 |
| 1 | 2 | 1 | 3 |
| 1 | 1 | 1 | 4 |
| 1 | 2 | 1 | 4 |
+---------------+
4 row(s).

Stado -> update sf_commercerecord set status=integraltype
Stado -> where exists (
Stado -> select * from sf_wofcust201205
Stado -> where
Stado -> sf_commercerecord.mobileno=sf_wofcust201205.mobileno
Stado -> and sf_commercerecord.feecode=sf_wofcust201205.feecode
Stado -> );
SQLException: ERROR: UNEXPECTED_MESSAGE_RECIEVED (GOT, EXPECTED) :(1 , 41 )

In PostgreSQL's log, there is error like: "ERROR: portal "C_49" does not exist"

####2 xdb.use_load_for_step=true; NOT using XDBAgents

Stado -> select * from a,b;
+---------------+
| x | y | x | y |
+---------------+
| 1 | 1 | 1 | 3 |
| 1 | 2 | 1 | 3 |
| 1 | 1 | 1 | 4 |
| 1 | 2 | 1 | 4 |
+---------------+
4 row(s).

Stado -> update sf_commercerecord set status=integraltype
Stado -> where exists (
Stado -> select * from sf_wofcust201205
Stado -> where
Stado -> sf_commercerecord.mobileno=sf_wofcust201205.mobileno
Stado -> and sf_commercerecord.feecode=sf_wofcust201205.feecode
Stado -> );
SQLException: ERROR: UNEXPECTED_MESSAGE_RECIEVED (GOT, EXPECTED) :(1 , 41 )

If Using XDBAgents, I have below configuration:

# config with different ports, otherwise agent init failed
xdb.node.1.port=6455
xdb.node.1.host=172.17.0.14
xdb.node.2.port=6456
xdb.node.2.host=172.17.0.14

# coordinator and agents are on the same server, but starts separately
xdb.coordinator.host=172.17.0.14
xdb.coordinator.port=6454

xdb.connector.0.1=1
xdb.connector.1.0=1
xdb.connector.0.2=1
xdb.connector.2.0=1

## starts the agents
./gs-server.sh -a
./gs-agent.sh -n 1 2

####3 xdb.use_load_for_step=true; Using XDBAgents:

Stado -> select * from a,b;
+---------------+
| x | y | x | y |
+---------------+
| 1 | 1 | 1 | 3 |
| 1 | 2 | 1 | 3 |
| 1 | 1 | 1 | 4 |
| 1 | 2 | 1 | 4 |
+---------------+
4 row(s).

Stado -> update sf_commercerecord set status=integraltype
Stado -> where exists (
Stado -> select * from sf_wofcust201205
Stado -> where
Stado -> sf_commercerecord.mobileno=sf_wofcust201205.mobileno
Stado -> and sf_commercerecord.feecode=sf_wofcust201205.feecode
Stado -> );
SQLException: ERROR: UNEXPECTED_MESSAGE_RECIEVED (GOT, EXPECTED) :(1 , 41 )

####4 xdb.use_load_for_step=false; Using XDBAgents:

Stado -> select * from a,b;
SQLException: ERROR: Node 1 has aborted execution, cause is: org.postgresql.stado.exception.XDBServerException : Insert Failed :

Stado -> select * from a,b;
SQLException: ERROR: Node 1 has aborted execution, cause is: org.postgresql.stado.exception.XDBServerException : Insert Failed :

Stado -> select * from a,b;
+---------------+
| x | y | x | y |
+---------------+
| 1 | 1 | 1 | 3 |
| 1 | 2 | 1 | 3 |
| 1 | 1 | 1 | 4 |
| 1 | 2 | 1 | 4 |
+---------------+
4 row(s).

Stado -> select * from a,b;
SQLException: ERROR: Node 1 has aborted execution, cause is: org.postgresql.stado.exception.XDBServerException : Inser...

Read more...

Revision history for this message
Andrei Martsinchyk (andrei-martsinchyk) wrote :

OK, I will try to reconstruct your configuration.

Changed in stado:
status: Incomplete → Opinion
status: Opinion → In Progress
Changed in stado:
status: In Progress → Fix Committed
Revision history for this message
Andrei Martsinchyk (andrei-martsinchyk) wrote :

The problem with cross join is fixed. I could not reproduce the problem with update, just hoping it is of the same nature.
If it is still happening, please reopen the bug and provide more details:
Distribution of tables involved, what are the distribution keys, how many rows are on each node;
Relevant postgres logs, if available.

Revision history for this message
Alvin Peng (pengalvin) wrote :

Stado -> update sf_commercerecord set status=integraltype where exists ( select * from sf_wofcust201205 where sf_commercerecord.mobileno=sf_wofcust201205.mobileno and sf_commercerecord.feecode=sf_wofcust201205.feecode );
SQLException: ERROR: UNEXPECTED_MESSAGE_RECIEVED (GOT, EXPECTED) :(1 , 41 )

If xdb.use_load_for_step=true, postgres logs:
ERROR: portal "C_6" does not exist
ERROR: portal "C_6" does not exist
ERROR: portal "C_6" does not exist
ERROR: portal "C_6" does not exist

If xdb.use_load_for_step=false, postgres logs:
ERROR: portal "C_14" does not exist
ERROR: portal "C_14" does not exist
ERROR: portal "C_14" does not exist
ERROR: portal "C_14" does not exist
ERROR: canceling statement due to user request
STATEMENT: INSERT INTO "TMPTsf_commercerecord_" ("status_new", "id_old") VALUES (2, '1342356489155')
ERROR: current transaction is aborted, commands ignored until end of transaction block
STATEMENT: DROP TABLE "TMPTsf_commercerecord_"

TABLEs:

create table sf_commercerecord
  (
    id int8 not null,
    mobileno varchar(11) not null,
    product integer not null ,
    employeeid varchar(20) not null ,
    netportid varchar(20) not null ,
    commercetype char(4) not null ,
    monthtime varchar(6) not null ,
    feecode integer not null ,
    integraltype smallint not null ,
    empintegral float not null ,
    netintegral float not null ,
    expreward float ,
    yxplanid integer not null ,
    lastid int8 not null ,
    status smallint not null,
    primary key (id)
  ) partitioning key mobileno on all;

create index idx_sfcommercerecord on sf_commercerecord (commercetype,monthtime,lastid,status);
create index idx_sfcommercerecord_employeeid on sf_commercerecord (employeeid, commercetype);
create index idx_sfcommercerecord_netportid on sf_commercerecord (netportid, commercetype);

create table sf_wofcust201205
  (
    mobileno varchar(11) not null ,
    feecode integer not null ,
    recamt float not null
  ) partitioning key mobileno on all;

Stado -> execute direct on all 'select count(*) from sf_commercerecord';
+----------------+
| count | Node |
+----------------+
| 1497023 | 1 |
| 1502977 | 2 |
+----------------+
2 row(s).

Stado -> execute direct on all 'select count(*) from sf_wofcust201205';
+-----------------+
| count | Node |
+-----------------+
| 9981280 | 1 |
| 10018720 | 2 |
+-----------------+
2 row(s).

Changed in stado:
status: Fix Committed → Incomplete
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.