Trafodion

Not all logs on all nodes are processed by event log reader UDF

Bug #1412630 reported by gaoruixian on 2015-01-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Trafodion	Fix Released	Critical	Hans Zeller	Trafodion r1.0

Bug Description

select * from udf(event_log_reader('f')) should return all records in logs on all nodes, however , seems logs on some nodes are not being processed.

Tried on centos-mapr1.hpl.hp.com:37800

SQL>select * from udf(event_log_reader('f')) where cpu=0 and log_file_name='master_exec_0_7476.log';

--- 0 row(s) selected.

SQL>select distinct cpu from udf(event_log_reader('f')) where log_file_name='master_exec_0_7476.log';

         CPU
         -----------
                      2
                      5
                      3
                      4

--- 4 row(s) selected.

The result didn’t get cpu 0 and cpu 1 , but we do have logs on those nodes.

Check node1(centos-mapr2) –

          [trafodion@centos-mapr2 logs]$ ll master*.log
          -rw-r--r-- 1 trafodion trafodion 258 Jan 18 20:35 master_exec_0_7476.log
          -rw-r--r-- 1 trafodion trafodion 258 Jan 18 20:30 master_exec_0_7487.log
          -rw-r--r-- 1 trafodion trafodion 136 Jan 18 19:42 master_exec_0_7851.log
          -rw-r--r-- 1 trafodion trafodion 5281 Jan 18 21:09 master_exec_1_15592.log
          -rw-r--r-- 1 trafodion trafodion 7216 Jan 18 20:35 master_exec_1_15605.log
          -rw-r--r-- 1 trafodion trafodion 259 Jan 18 21:25 master_exec_3_2078.log
          -rw-r--r-- 1 trafodion trafodion 258 Jan 18 20:27 master_exec_4_15691.log
          -rw-r--r-- 1 trafodion trafodion 130 Jan 18 20:03 master_exec_5_11066.log

Cat master_exec_0_7476.log –

2015-01-19 04:29:53,454, INFO, SQL.ESP, Node Number: 0, CPU: 1, PIN: 3309, Process Name: $Z0102PJ,,, An ESP process is launched.
2015-01-19 04:35:23,103, INFO, SQL.ESP, Node Number: 0, CPU: 1, PIN: 6333, Process Name: $Z01055Y,,, An ESP process is launched.
2015-01-19 05:31:34,150, INFO, SQL.ESP, Node Number: 0, CPU: 1, PIN: 30036, Process Name: $Z010PI6,,, An ESP process is launched.

Sandhya Sundaresan (sandhya-sundaresan) on 2015-01-20

Changed in trafodion:
assignee:	nobody → Hans Zeller (hans-zeller)
importance:	Undecided → High
milestone:	none → r1.0

Aruna Sadashiva (aruna-sadashiva) on 2015-01-20

Changed in trafodion:
importance:	High → Critical

Hans Zeller (hans-zeller) on 2015-01-20

Changed in trafodion:
status:	New → Confirmed

Revision history for this message

Hans Zeller (hans-zeller) wrote on 2015-01-20:

The issue is that we start the tdm_udrserv processes on random CPUs, we don't co-locate them with the ESPs. The query plan ensures we get one ESP per node, and that seems to work in this case, but then each of these ESPs starts its UDR server on a random CPU, so we return some data multiple times and other data not at all. Example:

[trafodion@centos-mapr3 ~]$ sqps | grep tdm_arkesp
[$Z020HEQ] 000,00031849 001 GEN ES--A-- $Z000QZZ $Z000PH2 tdm_arkesp
[$Z020HEQ] 001,00000807 001 GEN ES--A-- $Z0100N2 $Z000PH2 tdm_arkesp
[$Z020HEQ] 002,00015935 001 GEN ES--A-- $Z020D0A $Z000PH2 tdm_arkesp
[$Z020HEQ] 003,00001904 001 GEN ES--A-- $Z0301JE $Z000PH2 tdm_arkesp
[$Z020HEQ] 004,00024644 001 GEN ES--A-- $Z040K44 $Z000PH2 tdm_arkesp
[$Z020HEQ] 005,00002529 001 GEN ES--A-- $Z050229 $Z000PH2 tdm_arkesp
[trafodion@centos-mapr3 ~]$ sqps | grep tdm_udrserv
[$Z020HH1] 002,00015941 001 GEN ES--A-- $Z020D0G $Z000QZZ tdm_udrserv
[$Z020HH1] 002,00015943 001 GEN ES--A-- $Z020D0I $Z0100N2 tdm_udrserv
[$Z020HH1] 002,00015944 001 GEN ES--A-- $Z020D0J $Z050229 tdm_udrserv
[$Z020HH1] 003,00001911 001 GEN ES--A-- $Z0301JL $Z020D0A tdm_udrserv
[$Z020HH1] 004,00024650 001 GEN ES--A-- $Z040K4A $Z0301JE tdm_udrserv
[$Z020HH1] 005,00002536 001 GEN ES--A-- $Z05022G $Z040K44 tdm_udrserv
[trafodion@centos-mapr3 ~]$

The fix is to co-locate each tdm_udrserv with its local ESP. That should not cause any load balancing issues, since we already made sure that the ESPs are evenly balanced. To the contrary, if the UDR servers perform fairly heavy work, this ensures that they are as evenly balanced as the ESPs.

Hans Zeller (hans-zeller) on 2015-01-20

Changed in trafodion:
status:	Confirmed → In Progress

Revision history for this message

Hans Zeller (hans-zeller) wrote on 2015-01-21:

Fix committed in https://review.trafodion.org/#/c/991/

Changed in trafodion:
status:	In Progress → Fix Committed

Revision history for this message

gaoruixian (ruixian-gao) wrote on 2015-01-26:

The fix has been verified on traf_0122 build on centos-mapr1

Changed in trafodion:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.