They have implemented contrail ISSU in a lab environment
=============================================
-rw-r--r-- 1 root root 10984 Aug 29 05:18 issu_contrail_generate_conf_2017_08_29_05_18_25_896844.log
-rw-r--r-- 1 root root 32121 Aug 29 05:29 issu_contrail_migrate_config_2017_08_29_05_19_00_206072.log
-rw-r--r-- 1 root root 7337 Aug 29 09:53 install_pkg_node_2017_08_29_09_52_05_948152.log
-rw-r--r-- 1 root root 2984 Aug 29 09:54 create_install_repo_node_2017_08_29_09_54_05_434516.log
-rw-r--r-- 1 root root 361021 Aug 29 09:56 create_install_repo_node_2017_08_29_09_54_32_088583.log
-rw-r--r-- 1 root root 38913 Aug 29 09:59 migrate_compute_kernel_node_2017_08_29_09_57_40_635039.log
-rw-r--r-- 1 root root 5892 Aug 29 10:12 issu_contrail_switch_collector_in_compute_node_revert_2017_08_29_10_12_32_966090.log
-rw-r--r-- 1 root root 46070 Aug 29 10:21 issu_contrail_migrate_compute_node_2017_08_29_10_20_39_185370.log
-rw-r--r-- 1 root root 537 Aug 29 10:21 update_post_issu_vrouter_param_node_custom_2017_08_29_10_21_53_900064.log
-rw-r--r-- 1 root root 448 Aug 29 10:25 reboot_node_2017_08_29_10_22_17_384617.log
-rw-r--r-- 1 root root 38913 Aug 29 10:34 migrate_compute_kernel_node_2017_08_29_10_33_07_187075.log
-rw-r--r-- 1 root root 5154 Aug 29 10:42 issu_contrail_switch_collector_in_compute_node_revert_2017_08_29_10_42_16_737673.log
-rw-r--r-- 1 root root 82139 Aug 29 10:47 issu_contrail_migrate_compute_node_2017_08_29_10_46_26_017647.log
-rw-r--r-- 1 root root 537 Aug 29 10:47 update_post_issu_vrouter_param_node_custom_2017_08_29_10_47_21_828548.log
-rw-r--r-- 1 root root 456 Aug 29 10:51 reboot_node_2017_08_29_10_47_34_063729.log
=============================================
After ISSU implementation a lot of tor-agent crashed. During this time, vrouter-agent also crashed.
As per them, there was issue with ISSU synchronization process which is looped. At that time, they have restarted supervisor-config on all controller nodes repeatedly.
Then they applied the countermeasure patch and resolved the issue loop of supervisor-config
This was recovered around Aug 31 00:44. The supervisor-config process is up around Aug 31 00:51.
After that, at 00: 57 (UTC), tor-agent again crashed and continuously vrouter-agent repeatedly crash and eventually became timeout
rw------- 1 root root 192475136 Aug 30 08:27 core.contrail-tor-ag.3867.lab3adp-00004nn.1504081664
-rw------- 1 root root 389111808 Aug 30 08:27 core.contrail-tor-ag.3871.lab3adp-00004nn.1504081666
-rw------- 1 root root 240373760 Aug 30 08:27 core.contrail-tor-ag.3869.lab3adp-00004nn.1504081671
~~~ snip ~~~
-rw------- 1 root root 43044864 Aug 30 08:28 core.contrail-tor-ag.6328.lab3adp-00004nn.1504081680
-rw------- 1 root root 43003904 Aug 30 08:28 core.contrail-tor-ag.6472.lab3adp-00004nn.1504081680
-rw------- 1 root root 42926080 Aug 30 08:28 core.contrail-tor-ag.6502.lab3adp-00004nn.1504081680
-rw------- 1 root root 2315395072 Aug 30 08:28 core.contrail-vroute.3874.lab3adp-00004nn.1504081678
-rw------- 1 root root 112717824 Aug 31 00:56 core.contrail-tor-ag.6602.lab3adp-00004nn.1504140985
-rw------- 1 root root 50020352 Aug 31 00:56 core.contrail-tor-ag.6620.lab3adp-00004nn.1504140990
-rw------- 1 root root 53112832 Aug 31 00:56 core.contrail-tor-ag.6613.lab3adp-00004nn.1504140990
~~~ snip ~~~
-rw------- 1 root root 44822528 Aug 31 00:57 core.contrail-tor-ag.6587.lab3adp-00004nn.1504141049
-rw------- 1 root root 43237376 Aug 31 00:57 core.contrail-tor-ag.21168.lab3adp-00004nn.1504141049
-rw------- 1 root root 48709632 Aug 31 00:57 core.contrail-tor-ag.21271.lab3adp-00004nn.1504141050
-rw------- 1 root root 62373888 Aug 30 08:27 core.contrail-tor-ag.3918.lab3adp-00005nn.1504081672
-rw------- 1 root root 53563392 Aug 30 08:27 core.contrail-tor-ag.3919.lab3adp-00005nn.1504081672
-rw------- 1 root root 60043264 Aug 30 08:27 core.contrail-tor-ag.3929.lab3adp-00005nn.1504081672
-rw------- 1 root root 53788672 Aug 30 08:27 core.contrail-tor-ag.3928.lab3adp-00005nn.1504081672
~~~ snip ~~~
-rw------- 1 root root 43008000 Aug 30 08:28 core.contrail-tor-ag.10771.lab3adp-00005nn.1504081680
-rw------- 1 root root 43036672 Aug 30 08:28 core.contrail-tor-ag.10819.lab3adp-00005nn.1504081680
-rw------- 1 root root 43270144 Aug 30 08:28 core.contrail-tor-ag.10410.lab3adp-00005nn.1504081684
-rw------- 1 root root 99872768 Aug 31 00:56 core.contrail-tor-ag.11297.lab3adp-00005nn.1504140990
-rw------- 1 root root 48185344 Aug 31 00:57 core.contrail-tor-ag.10901.lab3adp-00005nn.1504141046
-rw------- 1 root root 48037888 Aug 31 00:57 core.contrail-tor-ag.10843.lab3adp-00005nn.1504141046
~~~ snip ~~~
-rw------- 1 root root 42999808 Aug 31 00:57 core.contrail-tor-ag.25303.lab3adp-00005nn.1504141049
-rw------- 1 root root 43163648 Aug 31 00:57 core.contrail-tor-ag.25029.lab3adp-00005nn.1504141049
-rw------- 1 root root 43028480 Aug 31 00:57 core.contrail-tor-ag.25299.lab3adp-00005nn.1504141051
=========================
The customer restarted TSN 's supervisor - vrouter yesterday. This issue temporarily recovered. (timeout and crash vrouter-agent). However, the regular crash of vrouter - agent has reoccurred.
Right now they have rollback to v 2.21 environment.
Hi Team,
Below is the sequence of the event happened.
They have implemented contrail ISSU in a lab environment
======= ======= ======= ======= ======= ======= === generate_ conf_2017_ 08_29_05_ 18_25_896844. log migrate_ config_ 2017_08_ 29_05_19_ 00_206072. log pkg_node_ 2017_08_ 29_09_52_ 05_948152. log install_ repo_node_ 2017_08_ 29_09_54_ 05_434516. log install_ repo_node_ 2017_08_ 29_09_54_ 32_088583. log compute_ kernel_ node_2017_ 08_29_09_ 57_40_635039. log switch_ collector_ in_compute_ node_revert_ 2017_08_ 29_10_12_ 32_966090. log migrate_ compute_ node_2017_ 08_29_10_ 20_39_185370. log post_issu_ vrouter_ param_node_ custom_ 2017_08_ 29_10_21_ 53_900064. log node_2017_ 08_29_10_ 22_17_384617. log compute_ kernel_ node_2017_ 08_29_10_ 33_07_187075. log switch_ collector_ in_compute_ node_revert_ 2017_08_ 29_10_42_ 16_737673. log migrate_ compute_ node_2017_ 08_29_10_ 46_26_017647. log post_issu_ vrouter_ param_node_ custom_ 2017_08_ 29_10_47_ 21_828548. log node_2017_ 08_29_10_ 47_34_063729. log ======= ======= ======= ======= ======= ===
-rw-r--r-- 1 root root 10984 Aug 29 05:18 issu_contrail_
-rw-r--r-- 1 root root 32121 Aug 29 05:29 issu_contrail_
-rw-r--r-- 1 root root 7337 Aug 29 09:53 install_
-rw-r--r-- 1 root root 2984 Aug 29 09:54 create_
-rw-r--r-- 1 root root 361021 Aug 29 09:56 create_
-rw-r--r-- 1 root root 38913 Aug 29 09:59 migrate_
-rw-r--r-- 1 root root 5892 Aug 29 10:12 issu_contrail_
-rw-r--r-- 1 root root 46070 Aug 29 10:21 issu_contrail_
-rw-r--r-- 1 root root 537 Aug 29 10:21 update_
-rw-r--r-- 1 root root 448 Aug 29 10:25 reboot_
-rw-r--r-- 1 root root 38913 Aug 29 10:34 migrate_
-rw-r--r-- 1 root root 5154 Aug 29 10:42 issu_contrail_
-rw-r--r-- 1 root root 82139 Aug 29 10:47 issu_contrail_
-rw-r--r-- 1 root root 537 Aug 29 10:47 update_
-rw-r--r-- 1 root root 456 Aug 29 10:51 reboot_
=======
After ISSU implementation a lot of tor-agent crashed. During this time, vrouter-agent also crashed.
As per them, there was issue with ISSU synchronization process which is looped. At that time, they have restarted supervisor-config on all controller nodes repeatedly.
Then they applied the countermeasure patch and resolved the issue loop of supervisor-config
This was recovered around Aug 31 00:44. The supervisor-config process is up around Aug 31 00:51.
After that, at 00: 57 (UTC), tor-agent again crashed and continuously vrouter-agent repeatedly crash and eventually became timeout
rw------- 1 root root 192475136 Aug 30 08:27 core.contrail- tor-ag. 3867.lab3adp- 00004nn. 1504081664 tor-ag. 3871.lab3adp- 00004nn. 1504081666 tor-ag. 3869.lab3adp- 00004nn. 1504081671 tor-ag. 6328.lab3adp- 00004nn. 1504081680 tor-ag. 6472.lab3adp- 00004nn. 1504081680 tor-ag. 6502.lab3adp- 00004nn. 1504081680 vroute. 3874.lab3adp- 00004nn. 1504081678 tor-ag. 6602.lab3adp- 00004nn. 1504140985 tor-ag. 6620.lab3adp- 00004nn. 1504140990 tor-ag. 6613.lab3adp- 00004nn. 1504140990 tor-ag. 6587.lab3adp- 00004nn. 1504141049 tor-ag. 21168.lab3adp- 00004nn. 1504141049 tor-ag. 21271.lab3adp- 00004nn. 1504141050
-rw------- 1 root root 389111808 Aug 30 08:27 core.contrail-
-rw------- 1 root root 240373760 Aug 30 08:27 core.contrail-
~~~ snip ~~~
-rw------- 1 root root 43044864 Aug 30 08:28 core.contrail-
-rw------- 1 root root 43003904 Aug 30 08:28 core.contrail-
-rw------- 1 root root 42926080 Aug 30 08:28 core.contrail-
-rw------- 1 root root 2315395072 Aug 30 08:28 core.contrail-
-rw------- 1 root root 112717824 Aug 31 00:56 core.contrail-
-rw------- 1 root root 50020352 Aug 31 00:56 core.contrail-
-rw------- 1 root root 53112832 Aug 31 00:56 core.contrail-
~~~ snip ~~~
-rw------- 1 root root 44822528 Aug 31 00:57 core.contrail-
-rw------- 1 root root 43237376 Aug 31 00:57 core.contrail-
-rw------- 1 root root 48709632 Aug 31 00:57 core.contrail-
-rw------- 1 root root 62373888 Aug 30 08:27 core.contrail- tor-ag. 3918.lab3adp- 00005nn. 1504081672 tor-ag. 3919.lab3adp- 00005nn. 1504081672 tor-ag. 3929.lab3adp- 00005nn. 1504081672 tor-ag. 3928.lab3adp- 00005nn. 1504081672 tor-ag. 10771.lab3adp- 00005nn. 1504081680 tor-ag. 10819.lab3adp- 00005nn. 1504081680 tor-ag. 10410.lab3adp- 00005nn. 1504081684 tor-ag. 11297.lab3adp- 00005nn. 1504140990 tor-ag. 10901.lab3adp- 00005nn. 1504141046 tor-ag. 10843.lab3adp- 00005nn. 1504141046 tor-ag. 25303.lab3adp- 00005nn. 1504141049 tor-ag. 25029.lab3adp- 00005nn. 1504141049 tor-ag. 25299.lab3adp- 00005nn. 1504141051 ======= ======= ====
-rw------- 1 root root 53563392 Aug 30 08:27 core.contrail-
-rw------- 1 root root 60043264 Aug 30 08:27 core.contrail-
-rw------- 1 root root 53788672 Aug 30 08:27 core.contrail-
~~~ snip ~~~
-rw------- 1 root root 43008000 Aug 30 08:28 core.contrail-
-rw------- 1 root root 43036672 Aug 30 08:28 core.contrail-
-rw------- 1 root root 43270144 Aug 30 08:28 core.contrail-
-rw------- 1 root root 99872768 Aug 31 00:56 core.contrail-
-rw------- 1 root root 48185344 Aug 31 00:57 core.contrail-
-rw------- 1 root root 48037888 Aug 31 00:57 core.contrail-
~~~ snip ~~~
-rw------- 1 root root 42999808 Aug 31 00:57 core.contrail-
-rw------- 1 root root 43163648 Aug 31 00:57 core.contrail-
-rw------- 1 root root 43028480 Aug 31 00:57 core.contrail-
=======
The customer restarted TSN 's supervisor - vrouter yesterday. This issue temporarily recovered. (timeout and crash vrouter-agent). However, the regular crash of vrouter - agent has reoccurred.
Right now they have rollback to v 2.21 environment.
-Regards,
Mehul Patel