The system is moving to emergency mode while booting up.

Bug #1852057 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Invalid
High
Canonical Server
boot (Ubuntu)
Invalid
Undecided
Ubuntu on IBM Power Systems Bug Triage
cloud-init (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

---Problem Description---
I installed Ubuntu 16.04 on a P8 HMC, installed cloud-init. Captured and deployed the image, There are two issues I see:

1. the deployed image fails to boot up and the system is moving to Emergency mode.
2. Ip get assigned and it is pingable but can't ssh to it.

sshd service is down. When I try to start the service, the system restarts.

Contact Information = <email address hidden>

---uname output---
Linux final-ubuntu 4.4.0-166-generic #195-Ubuntu SMP Tue Oct 1 09:35:07 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = P8

---boot type---
CDROM / ISO image

---Install repository type---
CDROM

---Point of failure---
Failed during boot of installer
== Comment: #1 - Aman K. Sinha <email address hidden> - 2019-11-06 20:48:55 ==

== Comment: #5 - Aman K. Sinha <email address hidden> - 2019-11-08 02:47:06 ==
I checked for openssh-server on both the Base VM and deployed VM. They both have openssh-server

root@rough-try:~# dpkg -l | grep openssh
ii openssh-client 1:7.2p2-4ubuntu2.8 ppc64el secure shell (SSH) client, for secure access to remote machines
ii openssh-server 1:7.2p2-4ubuntu2.8 ppc64el secure shell (SSH) server, for secure access from remote machines
ii openssh-sftp-server 1:7.2p2-4ubuntu2.8 ppc64el secure shell (SSH) sftp server module, for SFTP access from remote machines

I am using latest version of Ubuntu 16.04

root@rough-try:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Not using Ubuntu 18 because our custom cloud-init module supports Ubuntu 16.04 on novalink

== Comment: #8 - Aman K. Sinha <email address hidden> - 2019-11-11 00:38:08 ==
I am able to login after maually making the ssh services up. But after every reboot or the session logout, the ssh service goes down and for login I have to go to the HMC and make it up manually

== Comment: #9 - SEETEENA THOUFEEK <email address hidden> - 2019-11-11 03:03:58 ==
Provided this steps following documentation and ask to start the system.. ssh not enabled after the reboot and system going to emergency mode.

sudo systemctl enable ssh

https://askubuntu.com/questions/892447/start-ssh-automatically-on-boot

We will need to involve distro team further.

-

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-182308 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → boot (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
Revision history for this message
Frank Heimes (fheimes) wrote :

Hello and thanks for opening this bug and sharing the logs via the sosreport.
The cloud-init logs as well as the syslog show several error, like:
"
2019-11-04 09:27:36,346 - url_helper.py[DEBUG]: [0/1] open 'http://xxx.xxx.xxx.xxx/2009-04-04/meta-data/instance-id' with {'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/19.2-36-g059d049c-0ubuntu2~16.04.1'}, 'method': 'GET', 'url': 'http://169.254.169.254/2009-04-04/meta-data/instance-id', 'timeout': 50.0} configuration
2019-11-04 09:28:26,389 - url_helper.py[WARNING]: Calling 'http://xxx.xxx.xxx.xxx/2009-04-04/meta-data/instance-id' failed [50/120s]: request error [HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x3fffb30ebe10>, 'Connection to xxx.xxx.xxx.xxx timed out. (connect timeout=50.0)'))]
"
and
"
2019-11-04 09:27:36,346 - url_helper.py[DEBUG]: [0/1] open 'http://xxx.xxx.xxx.xxx/2009-04-04/meta-data/instance-id' with {'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/19.2-36-g059d049c-0ubuntu2~16.04.1'}, 'method': 'GET', 'url': 'http://xxx.xxx.xxx.xxx/2009-04-04/meta-data/instance-id', 'timeout': 50.0} configuration
2019-11-04 09:28:26,389 - url_helper.py[WARNING]: Calling 'http://xxx.xxx.xxx.xxx/2009-04-04/meta-data/instance-id' failed [50/120s]: request error [HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x3fffb30ebe10>, 'Connection to xxx.xxx.xxx.xxx timed out. (connect timeout=50.0)'))]"

This let's me assume that cloud-init does not seem to be able to find the meta data service, or there is a network/connection problem (due to the timeout above).

Please make sure and double check if the connectivity is given.

Btw. is there a particular reason why you are using cloud-init in this case? Since a regular ISO install of 16.04 on P8 via virtual cdrom shall not 'require' the use of cloud.init.

Changed in ubuntu-power-systems:
status: New → Triaged
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Server Team (canonical-server)
Revision history for this message
Ryan Harper (raharper) wrote :

cloud-init.log also shows that the instance is not responding to DHCP requests:

2019-11-04 09:22:24,758 - util.py[DEBUG]: Running command ['ip', 'link', 'set', 'dev', 'ibmveth20', 'up'] with allowed return codes [0] (shell=False, capture=True)
2019-11-04 09:22:24,760 - util.py[DEBUG]: Running command ['/var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhclient', '-1', '-v', '-lf', '/var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhcp.leases', '-pf', '/var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhclient.pid', 'ibmveth20', '-sf', '/bin/true'] with allowed return codes [0] (shell=False, capture=True)
2019-11-04 09:27:25,611 - util.py[DEBUG]: All files appeared after 0 seconds: ['/var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhclient.pid', '/var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhcp.leases']
2019-11-04 09:27:25,611 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhclient.pid (quiet=False)
2019-11-04 09:27:25,611 - util.py[DEBUG]: Read 6 bytes from /var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhclient.pid
2019-11-04 09:27:25,611 - util.py[DEBUG]: Reading from /proc/10050/stat (quiet=True)
2019-11-04 09:27:25,611 - util.py[DEBUG]: Read 309 bytes from /proc/10050/stat
2019-11-04 09:27:25,611 - dhcp.py[DEBUG]: killing dhclient with pid=10050
2019-11-04 09:27:25,612 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhcp.leases (quiet=False)
2019-11-04 09:27:25,612 - util.py[DEBUG]: Read 0 bytes from /var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhcp.leases
2019-11-04 09:27:25,612 - util.py[DEBUG]:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 83, in obtain_lease
    leases = maybe_perform_dhcp_discovery(self.iface)
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 139, in maybe_perform_dhcp_discovery
    return dhcp_discovery(dhclient_path, nic, tdir)
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 231, in dhcp_discovery
    return parse_dhcp_lease_file(lease_file)
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 156, in parse_dhcp_lease_file
    'Cannot parse empty dhcp lease file {0}'.format(lease_file))
cloudinit.net.dhcp.InvalidDHCPLeaseFileError: Cannot parse empty dhcp lease file /var/tmp/cloud-init/cloud-init-dhcp-shaa3l4j/dhcp.leases

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOpenStack.py", line 130, in _get_data
    with EphemeralDHCPv4(self.fallback_interface):
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 57, in __enter__
    return self.obtain_lease()
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 85, in obtain_lease
    raise NoDHCPLeaseError()
cloudinit.net.dhcp.NoDHCPLeaseError

Revision history for this message
Ryan Harper (raharper) wrote :

This does not appear to be an issue with cloud-init. I'm marking invalid. If we find more information indicating cloud-init is not doing something correctly, please re-open the cloud-init task (Set status to New).

Changed in cloud-init (Ubuntu):
status: New → Invalid
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-11-11 10:49 EDT-------
"2019-11-04 09:27:36,346 - url_helper.py[DEBUG]: [0/1] open 'http://xxx.xxx.xxx.xxx/2009-04-04/meta-data/instance-id' with {'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/19.2-36-g059d049c-0ubuntu2~16.04.1'}, 'method': 'GET', 'url': 'http://169.254.169.254/2009-04-04/meta-data/instance-id', 'timeout': 50.0} configuration
2019-11-04 09:28:26,389 - url_helper.py[WARNING]: Calling 'http://xxx.xxx.xxx.xxx/2009-04-04/meta-data/instance-id' failed [50/120s]: request error [HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x3fffb30ebe10>, 'Connection to xxx.xxx.xxx.xxx timed out. (connect timeout=50.0)'))]"

This log are quite old, after that I configured cloud-init .

Since I am testing latest cloud-init with some custom module. I installed cloud-init after installing Ubuntu 16.04.

Revision history for this message
Frank Heimes (fheimes) wrote :

Can you give us a day/time index that indicate when the logs get relevant (so after everything was properly configured by you)? That helps excluding everything else ...

Revision history for this message
Patricia Domingues (patriciasd) wrote :

I'm marking as incomplete while we are waiting a response

Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-11-18 04:27 EDT-------
I don't remember the exact time but it is somewhere between 5th or 6th Nov,2019. You can exclude the earlier logs.

Revision history for this message
Frank Heimes (fheimes) wrote :
Download full text (4.4 KiB)

So now I ignored all log entries that are Nov 4th or before - well, there are actually only a few logs that have log entries with timestamp Nov 5th or later.

The interesting ones (with Nov 5 ff entries) are syslog and cloud-init - auth has Nov 5 entries, too but is not really relevant. cloud-init-output also has Nov 5 entries but is fine.

First of all I couldn't find a cloud-init issue with timestamp Nov 5th or later.
The following block tells me that cloud-init was able to find a config drive and was able to read it (even if it didn't had a user_data entry - which is fine):

2019-11-06 06:04:27,346 - util.py[DEBUG]: Reading from /run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/network_data.json (quiet=False)
2019-11-06 06:04:27,347 - util.py[DEBUG]: Read 463 bytes from /run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/network_data.json
2019-11-06 06:04:27,347 - util.py[DEBUG]: Reading from /run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/vendor_data.json (quiet=False)
2019-11-06 06:04:27,347 - util.py[DEBUG]: Read 2 bytes from /run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/vendor_data.json
2019-11-06 06:04:27,347 - util.py[DEBUG]: Reading from /run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/meta_data.json (quiet=False)
2019-11-06 06:04:27,348 - util.py[DEBUG]: Read 1054 bytes from /run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/meta_data.json
2019-11-06 06:04:27,348 - util.py[DEBUG]: Reading from /run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/user_data (quiet=False)
2019-11-06 06:04:27,348 - openstack.py[DEBUG]: Failed reading optional path /run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/user_data due to: [Errno 2] No such file or directory: '/run/cloud-init/tmp/tmpswvqg6v_/openstack/2018-08-27/user_data'
2019-11-06 06:04:27,348 - util.py[DEBUG]: Reading from /run/cloud-init/tmp/tmpswvqg6v_/openstack/content/0000 (quiet=False)
2019-11-06 06:04:27,349 - util.py[DEBUG]: Read 606 bytes from /run/cloud-init/tmp/tmpswvqg6v_/openstack/content/0000
2019-11-06 06:04:27,349 - util.py[DEBUG]: Reading from /run/cloud-init/tmp/tmpswvqg6v_/ec2/latest/meta-data.json (quiet=False)
2019-11-06 06:04:27,350 - util.py[DEBUG]: Read 581 bytes from /run/cloud-init/tmp/tmpswvqg6v_/ec2/latest/meta-data.json
2019-11-06 06:04:27,350 - util.py[DEBUG]: Running command ['umount', '/run/cloud-init/tmp/tmpswvqg6v_'] with allowed return codes [0] (shell=False, capture=True)
2019-11-06 06:04:27,364 - util.py[DEBUG]: Reading from /var/lib/cloud/data/instance-id (quiet=False)
2019-11-06 06:04:27,371 - util.py[DEBUG]: Read 20 bytes from /var/lib/cloud/data/instance-id
2019-11-06 06:04:27,373 - atomic_helper.py[DEBUG]: Atomically writing to file /run/cloud-init/instance-data.json (via temporary file /run/cloud-init/tmpuulu3ae2) - w: [644] 3272 bytes/chars
2019-11-06 06:04:27,373 - atomic_helper.py[DEBUG]: Atomically writing to file /run/cloud-init/instance-data-sensitive.json (via temporary file /run/cloud-init/tmpbft3zc1b) - w: [600] 3272 bytes/chars
2019-11-06 06:04:27,373 - handlers.py[DEBUG]: finish: init-local/search-ConfigDrive: SUCCESS: found local data from DataSourceConfigDrive

Then I had a look at syslog and came across some o...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-11-25 09:25 EDT-------
I can get you the very latest version of RSCT for a test version. But I don't think we need to latest to test this - could you also try provisioning the system without RSCT?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-11-27 07:46 EDT-------
RSCT packages are required for one of my custom module that is being run with cloud-init while booting up. I was using RSCT package 3.2.4. I can try provisioning a new system without it and try cloud-init part is working fine or not.

Revision history for this message
Frank Heimes (fheimes) wrote :

Due to the outcome of the todays call I reduce the priority of this ticket.

Changed in ubuntu-power-systems:
importance: Critical → Medium
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-12-02 09:17 EDT-------
Reducing the severity while working the recreate.

tags: added: severity-high
removed: severity-critical
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
importance: Medium → High
Changed in ubuntu-power-systems:
importance: High → Medium
importance: Medium → High
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-02-10 10:11 EDT-------
Hi Aman - a straight xenial install has worked for me. I think we were trying to find out what is different with your setup. I think you were trying a different ISO? Did it work ok? Can we close this one now?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-02-10 10:52 EDT-------
Hi Michael, I am able to boot it as of now. I think we can close this bug as of now.

Revision history for this message
Frank Heimes (fheimes) wrote :

Okay - I close it on Launchpad now.
If this problem - for whatever reason - comes up again, we can simply open a new LP ticket.

Changed in boot (Ubuntu):
status: New → Invalid
Changed in ubuntu-power-systems:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.