Canonical Juju

[AWS] t3 instance types fail deployment when storage is attached

Bug #1798001 reported by james beedy on 2018-10-16

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	Critical	Ian Booth	Canonical Juju 2.5-beta1
	2.3	Fix Released	Critical	Ian Booth	Canonical Juju 2.3.10
	2.4	Fix Released	Critical	Ian Booth	Canonical Juju 2.4.5

Bug Description

t3 instance types fail to deploy when storage is attached see [0]

[0] https://paste.ubuntu.com/p/tgDQsmtmHb/

Revision history for this message

Ian Booth (wallyworld) wrote on 2018-10-16:

#1

Can we get an errors reported via the AWS console? What does juju debug-log indicate? Maybe the instance type has a limitation of how many volumes can be attached?

Revision history for this message

james beedy (jamesbeedy) wrote on 2018-10-16:

#2

@wallyworld `juju debug-log` does not contain log messages because the agent does not start it seems.

AWS states that the volume attachment limit for this instance type is 28, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html

I'm looking for errors in aws, still digging ....

Revision history for this message

james beedy (jamesbeedy) wrote on 2018-10-16:

#3

to replicate, using juju 2.4.4:

`juju deploy postgresql --storage pgdata=ebs,10G`

Revision history for this message

Ian Booth (wallyworld) wrote on 2018-10-17:

#4

Digging into the storage processing code, tt looks like the problem is that the EBS volumes are being exposed as NVMe block devices. This is relatively new behaviour that was previously confined to c5 and m5 instance types but now appears to have been more widely implemented. The issue is that the block device names become unpredictable which messes up how Juju determines whether a volume has become attached to a machine. This in turn blocks the initialisation of the unit agent since it waits for storage to become attached before installing the charm.

I need to look further to find a tasteful solution that works in all cases.

Changed in juju:
milestone:	none → 2.4.5
importance:	Undecided → Critical
status:	New → Triaged

Revision history for this message

Ian Booth (wallyworld) wrote on 2018-10-17:

#5

The only real way to fix this regardless of how EC2 behaviour might change underneath us is punt on assuming a NVMe block device link is valid for a newly attached machine volume even though it may not be. This is because we have to real way of querying how an attached volume will be exposed on a machine instance. This has no practical effect other than potentially printing an incorrect device link when printing storage volume information in YAML. The device name (eg xvdf) is accurate. To update the Juju model after the volume is recorded as attached is messy because we don't want to encode AWS specific behaviour at that layer.

I deployed a 2.4.4 controller and deployed postgresql and observed the storage issue. I then upgraded the controller with the abve fix and observed storage come good and postgresql became active.

Changed in juju:
status:	Triaged → In Progress
assignee:	nobody → Ian Booth (wallyworld)

Revision history for this message

Ian Booth (wallyworld) wrote on 2018-10-18:

#6

https://github.com/juju/juju/pull/9331

Changed in juju:
milestone:	2.4.5 → 2.5-beta1

Ian Booth (wallyworld) on 2018-10-18

Changed in juju:
status:	In Progress → Fix Committed

Anastasia (anastasia-macmood) on 2019-03-22

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.