Sometimes instances are not getting an IP

Bug #1852206 reported by David
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MicroStack
Confirmed
High
Unassigned

Bug Description

+ System information:

Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic

+ microstack snap version: --beta (171)

+ How to reproduce it:
Well, as said in the bug title, this happens occasionally, but I have captured the logs when this has happened. This is the deployed server that's failing:

ubuntu@test-2:~$ microstack.openstack server list
+--------------------------------------+-----------------------------------------+--------+------------------------------------+------------+-----------------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------------------------------------+--------+------------------------------------+------------+-----------------------+
| 94d9b703-7556-426c-babb-fedd446da073 | hackfest_simplecharm_ns-1-mgmtVM-1 | ACTIVE | test=192.168.222.48, 10.20.20.136 | ubuntu1604 | mgmtVM-flv |
+--------------------------------------+-----------------------------------------+--------+------------------------------------+------------+-----------------------+

+ Logs:
  - systemctl status snap.microstack.*: https://pastebin.canonical.com/p/FZ7kh3hPgv/
  - journal.log: https://pastebin.canonical.com/p/cCMWg7FDH5/
  - microstack.openstack console log show: https://pastebin.canonical.com/p/rSvHbDGXz2/

Revision history for this message
Pen Gale (pengale) wrote :

I have seen this once on AWS, as well, and also seen it more frequently in the gate.

I'm not sure what's breaking -- the issue has proven to be difficult to reproduce in kvm instances running on our development hardware.

Thank you for the bug report!

Changed in microstack:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
David Coronel (davecore) wrote :

I think I'm hitting this bug. Here's a pastebin of my cirros instance log: https://paste.ubuntu.com/p/RVTTHk7tSK/

Interesting parts:

Starting network...
udhcpc (v1.23.2) started
Sending discover...
Sending discover...
Sending discover...
[ 175.592033] random: nonblocking pool is initialized
Usage: /sbin/cirros-dhcpc <up|down>
No lease, failing
WARN: /etc/rc3.d/S40-network failed
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 181.10. request failed
failed 2/20: up 183.11. request failed
failed 3/20: up 185.12. request failed
failed 4/20: up 187.14. request failed
failed 5/20: up 189.15. request failed
failed 6/20: up 191.17. request failed
failed 7/20: up 193.19. request failed
failed 8/20: up 195.20. request failed
failed 9/20: up 197.22. request failed
failed 10/20: up 199.23. request failed
failed 11/20: up 201.24. request failed
failed 12/20: up 203.25. request failed
failed 13/20: up 205.27. request failed
failed 14/20: up 207.29. request failed
failed 15/20: up 209.30. request failed
failed 16/20: up 211.32. request failed
failed 17/20: up 213.34. request failed
failed 18/20: up 215.35. request failed
failed 19/20: up 217.36. request failed
failed 20/20: up 219.36. request failed
failed to read iid from metadata. tried 20
failed to get instance-id of datasource

=== network info ===
if-info: lo,up,127.0.0.1,8,,
if-info: eth0,up,,8,fe80::f816:3eff:fe4e:a6f6/64,
ip-route6:fe80::/64 dev eth0 metric 256
ip-route6:unreachable default dev lo metric -1 error -101
ip-route6:ff00::/8 dev eth0 metric 256
ip-route6:unreachable default dev lo metric -1 error -101

############ debug start ##############
### /etc/init.d/sshd start
Top of dropbear init script
Starting dropbear sshd: failed to get instance-id of datasource
FAIL
### ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:4E:A6:F6
          inet6 addr: fe80::f816:3eff:fe4e:a6f6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:68 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4886 (4.7 KiB) TX bytes:1332 (1.3 KiB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:1020 (1020.0 B) TX bytes:1020 (1020.0 B)

Revision history for this message
David Coronel (davecore) wrote :

I found the snap.microstack.neutron-metadata-agent.service was active (running) but had a python traceback in its status: https://paste.ubuntu.com/p/jZpkYR2cFr/

I restarted the service and it looked happier: https://paste.ubuntu.com/p/mM9cs9Zgsv/

But when I try to launch an instance, I can't get a lease: https://paste.ubuntu.com/p/HGDcyzBkCh/

Interesting parts:

Starting network...
udhcpc (v1.23.2) started
Sending discover...
Sending discover...
Sending discover...
Usage: /sbin/cirros-dhcpc <up|down>
No lease, failing
WARN: /etc/rc3.d/S40-network failed
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 181.16. request failed
failed 2/20: up 183.17. request failed
failed 3/20: up 185.19. request failed
failed 4/20: up 187.20. request failed
failed 5/20: up 189.22. request failed
failed 6/20: up 191.24. request failed
failed 7/20: up 193.25. request failed
failed 8/20: up 195.27. request failed
failed 9/20: up 197.29. request failed
failed 10/20: up 199.30. request failed
failed 11/20: up 201.32. request failed
failed 12/20: up 203.33. request failed
failed 13/20: up 205.35. request failed
failed 14/20: up 207.36. request failed
failed 15/20: up 209.38. request failed
failed 16/20: up 211.40. request failed
failed 17/20: up 213.41. request failed
failed 18/20: up 215.43. request failed
failed 19/20: up 217.45. request failed
failed 20/20: up 219.46. request failed
failed to read iid from metadata. tried 20
failed to get instance-id of datasource

=== network info ===
if-info: lo,up,127.0.0.1,8,,
if-info: eth0,up,,8,fe80::f816:3eff:fe04:c70f/64,
ip-route6:fe80::/64 dev eth0 metric 256
ip-route6:unreachable default dev lo metric -1 error -101
ip-route6:ff00::/8 dev eth0 metric 256
ip-route6:unreachable default dev lo metric -1 error -101

### ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:04:C7:0F
          inet6 addr: fe80::f816:3eff:fe04:c70f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:35 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3500 (3.4 KiB) TX bytes:1332 (1.3 KiB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:1020 (1020.0 B) TX bytes:1020 (1020.0 B)

Revision history for this message
David Coronel (davecore) wrote :

A restart of microstack fixed the issue for me (sudo snap restart microstack).

Revision history for this message
Tytus Kurek (tkurek) wrote :

I am experiencing a similar issue. Neither re-launch nor MicroStack restart nor even a complete MicroStack re-installation work. It does NOT work on AWS.

Revision history for this message
Pen Gale (pengale) wrote :

I'm now running into this issue on the MAAS node that I do most of my development work on. The logs for the cirros image are the same. We attempt to start the network, and fail. I'm still not sure what the root cause is, but the issue seems to have gotten worse (maybe due to an Ubuntu package update), and I'm doing another round of in depth troubleshooting.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.