Comment 0 for bug 1681909

== Comment: #0 - PAVITHRA R. PRAKASH <email address hidden> - 2017-03-07 05:00:29 ==
---Problem Description---

Ubuntu 17.04: dump is not captured in remote host when kdump over ssh is configured on firestone.

---Steps to Reproduce---

1. Configure kdump.
2. Check whether kdump is operational using ?# kdump-config show?.
3. Install ?kernel-debuginfo? and ?kernel-debuginfo-common? rpms.
4. Setup password less ssh connection, generate rsa key.
# ssh-keygen -t rsa
5. verify id_rsa and id_rsa.pub are created under /root/.ssh/
6. Edit /etc/default/kdump-tools and add below entries.
SSH="ubuntu@9.114.15.239"
SSH_KEY=/root/.ssh/id_rsa
7. Propagate RSA key.
# kdump-config propagate
8. Restart kdump service.
# kdump-config load
9. Trigger Crash using below commands.
# echo "1" > /proc/sys/kernel/sysrq
# echo "c" > /proc/sysrq-trigger
10. Verify dump is available in remote server in configured path.

Machine details
===========

$ ipmitool -I lanplus -H 9.47.70.3 -U ADMIN -P admin sol activate

$ ssh ubuntu@9.47.70.29

PW: shriya101

Attaching logs

== Comment: #1 - PAVITHRA R. PRAKASH <email address hidden> - 2017-03-07 05:01:42 ==

== Comment: #5 - PAVITHRA R. PRAKASH <email address hidden> - 2017-03-07 23:19:46 ==
Hi,

Attaching the logs.

Network info:

root@ltc-firep3:~# hwinfo --network
36: None 00.0: 10700 Loopback
  [Created at net.126]
  Unique ID: ZsBS.GQNx7L4uPNA
  SysFS ID: /class/net/lo
  Hardware Class: network interface
  Model: "Loopback network interface"
  Device File: lo
  Link detected: yes
  Config Status: cfg=new, avail=yes, need=no, active=unknown

37: None 00.0: 10701 Ethernet
  [Created at net.126]
  Unique ID: 2lHw.ndpeucax6V1
  Parent ID: mIXc.aXC4wIvegH8
  SysFS ID: /class/net/enP33p3s0f2
  SysFS Device Link: /devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:01.0/0021:03:00.2
  Hardware Class: network interface
  Model: "Ethernet network interface"
  Driver: "tg3"
  Driver Modules: "tg3"
  Device File: enP33p3s0f2
  HW Address: 98:be:94:03:18:4a
  Permanent HW Address: 98:be:94:03:18:4a
  Link detected: no
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #15 (Ethernet controller)

38: None 00.0: 10701 Ethernet
  [Created at net.126]
  Unique ID: 7Onn.ndpeucax6V1
  Parent ID: sx0U.aXC4wIvegH8
  SysFS ID: /class/net/enP33p3s0f0
  SysFS Device Link: /devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:01.0/0021:03:00.0
  Hardware Class: network interface
  Model: "Ethernet network interface"
  Driver: "tg3"
  Driver Modules: "tg3"
  Device File: enP33p3s0f0
  HW Address: 98:be:94:03:18:48
  Permanent HW Address: 98:be:94:03:18:48
  Link detected: yes
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #16 (Ethernet controller)

39: None 00.0: 10701 Ethernet
  [Created at net.126]
  Unique ID: VwX_.ndpeucax6V1
  Parent ID: DUng.aXC4wIvegH8
  SysFS ID: /class/net/enP33p3s0f3
  SysFS Device Link: /devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:01.0/0021:03:00.3
  Hardware Class: network interface
  Model: "Ethernet network interface"
  Driver: "tg3"
  Driver Modules: "tg3"
  Device File: enP33p3s0f3
  HW Address: 98:be:94:03:18:4b
  Permanent HW Address: 98:be:94:03:18:4b
  Link detected: no
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #25 (Ethernet controller)

40: None 00.0: 10701 Ethernet
  [Created at net.126]
  Unique ID: bZ1s.ndpeucax6V1
  Parent ID: J7HY.aXC4wIvegH8
  SysFS ID: /class/net/enP33p3s0f1
  SysFS Device Link: /devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:01.0/0021:03:00.1
  Hardware Class: network interface
  Model: "Ethernet network interface"
  Driver: "tg3"
  Driver Modules: "tg3"
  Device File: enP33p3s0f1
  HW Address: 98:be:94:03:18:49
  Permanent HW Address: 98:be:94:03:18:49
  Link detected: no
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #4 (Ethernet controller)
root@ltc-firep3:~#

Thanks,
Pavithra

== Comment: #6 - PAVITHRA R. PRAKASH <email address hidden> - 2017-03-07 23:20:47 ==

== Comment: #7 - PAVITHRA R. PRAKASH <email address hidden> - 2017-03-07 23:21:27 ==

== Comment: #8 - Urvashi Jawere <email address hidden> - 2017-03-08 02:48:15 ==
I am able to see some errors in syslog ;

auxiliary
Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: DNSSEC validation failed for question 114.15.239:/home/ubuntu/test IN SOA: failed-auxiliary
Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: DNSSEC validation failed for question 9.114.15.239:/home/ubuntu/test IN DS: failed-auxiliary
Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: DNSSEC validation failed for question 9.114.15.239:/home/ubuntu/test IN SOA: failed-auxiliary
Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: DNSSEC validation failed for question 9.114.15.239:/home/ubuntu/test IN A: failed-auxiliary
Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: Server 9.12.16.2 does not support DNSSEC, downgrading to non-DNSSEC mode.
Mar 7 04:57:44 ltc-firep3 kdump-config: /root/.ssh/id_rsa failed to be sent to ubuntu@9.114.15.239:/home/ubuntu/test
Mar 7 04:58:04 ltc-firep3 systemd[1]: Reloading.
Mar 7 04:59:15 ltc-firep3 systemd[1]: Reloading.
Mar 7 04:59:16 ltc-firep3 kdump-config: propagated ssh key /root/.ssh/id_rsa to server ubuntu@9.114.15.239
.
.
.

Mar 7 05:06:55 ltc-firep3 systemd[1]: Started Accounts Service.
Mar 7 05:06:56 ltc-firep3 kdump-tools[3498]: Starting kdump-tools: Modified cmdline:root=UUID=1e76cfd5-988c-46f4-bdc4-39fe1ed01152 ro quiet splash irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service ata_piix.prefer_ms_hyperv=0 elfcorehdr=155136K
Mar 7 05:06:57 ltc-firep3 kdump-tools[3498]: * loaded kdump kernel
Mar 7 05:06:57 ltc-firep3 kdump-tools: /sbin/kexec -p --command-line="root=UUID=1e76cfd5-988c-46f4-bdc4-39fe1ed01152 ro quiet splash irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
Mar 7 05:06:57 ltc-firep3 kdump-tools: loaded kdump kernel
Mar 7 05:06:57 ltc-firep3 systemd[1]: Started Kernel crash dump capture service.
Mar 7 05:06:57 ltc-firep3 apport[3584]: ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/linux-image-4.10.0-9-generic-201703060521.crash'
Mar 7 05:06:57 ltc-firep3 apport[3584]: ...done.

== Comment: #18 - Hari Krishna Bathini <email address hidden> - 2017-03-28 06:55:20 ==
Looks like tg3 module was not needed after all. Interesting thing though is
even after enP34p1s0f0 is up (ifup) and network.online target is reached,
network was not really active. It took about 30 seconds, after reaching
network.online target, for the network to be active, even on a normal boot.
Adding this wait time in kdump script, before saving dump, ensured that
vmcore is captured successful. Attaching the log for the same..

Not sure why enP34p1s0f0 is taking that long to configure/initialize. Even so,
this delay should be part of ifup/network-online.target if it is inevitable,
so that network is pingable after network-online.target

Thanks
Hari

== Comment: #19 - Hari Krishna Bathini <email address hidden> - 2017-03-28 07:01:52 ==
The workaround snippet adding delay in kdump script:

--- kdump-config.orig 2017-03-28 03:35:17.753542107 -0500
+++ kdump-config 2017-03-28 06:59:22.887576623 -0500
@@ -761,6 +761,7 @@
  KDUMP_DMESGFILE="$KDUMP_STAMPDIR/dmesg.$KDUMP_STAMP"
  ERROR=0

+ sleep 30
  ssh -i $KDUMP_SSH_KEY $KDUMP_REMOTE_HOST mkdir -p $KDUMP_STAMPDIR
  ERROR=$?
  # If remote connections fails, no need to continue

---

Thanks
Hari

== Comment: #20 - PAVITHRA R. PRAKASH <email address hidden> - 2017-03-30 01:33:56 ==
(In reply to comment #19)
> The workaround snippet adding delay in kdump script:
>
>
> --- kdump-config.orig 2017-03-28 03:35:17.753542107 -0500
> +++ kdump-config 2017-03-28 06:59:22.887576623 -0500
> @@ -761,6 +761,7 @@
> KDUMP_DMESGFILE="$KDUMP_STAMPDIR/dmesg.$KDUMP_STAMP"
> ERROR=0
>
> + sleep 30
> ssh -i $KDUMP_SSH_KEY $KDUMP_REMOTE_HOST mkdir -p $KDUMP_STAMPDIR
> ERROR=$?
> # If remote connections fails, no need to continue
>
> ---
>
> Thanks
> Hari

With above workaround dump captured successfully in remote host.

Thanks,
Pavithra

== Comment: #22 - Hari Krishna Bathini <email address hidden> - 2017-04-10 22:14:27 ==
(In reply to comment #18)
> Created attachment 117088 [details]
> Console log of successful dump capture after adding a time delay of 'sleep
> 30'
>
> Looks like tg3 module was not needed after all. Interesting thing though is
> even after enP34p1s0f0 is up (ifup) and network.online target is reached,
> network was not really active. It took about 30 seconds, after reaching
> network.online target, for the network to be active, even on a normal boot.
> Adding this wait time in kdump script, before saving dump, ensured that
> vmcore is captured successful. Attaching the log for the same..
>
> Not sure why enP34p1s0f0 is taking that long to configure/initialize. Even
> so,
> this delay should be part of ifup/network-online.target if it is inevitable,
> so that network is pingable after network-online.target

Hi Canonical,

Since this falls outside the realm of kdump, should we add a NET_WAIT_TIME field
in /etc/default/kdump-tools file that defaults to 0 but can be changed when the
user sees timing troubles?

Thanks
Hari