Bug #1479546 “Storage provisioner timeouts spawning extra volume...” : Bugs : juju-core

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-07-29:

#1

Storage is a bit "experimental" in 1.24; 1.25 is much more solid. Can you try with master? I'll see if I can repro with 1.24 later today.

Revision history for this message

Adam Israel (aisrael) wrote on 2015-07-30:

#2

Yep, I'll build master and give it another try and report back.

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-07-30:

#3

BTW: just took a look at the charm changes, and wanted to point out that mounting in "storage-attached" is not enough. If the agent is restarted, "storage-attached" is not re-run. I would suggest using "type: filesystem" for the storage, unless you need to be able to control the filesystem type and other bits.

Andrew Wilkins (axwalk) on 2015-07-30

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High
assignee:	nobody → Andrew Wilkins (axwalk)
milestone:	none → 1.25.1

Revision history for this message

Adam Israel (aisrael) wrote on 2015-07-30:

#4

Thanks for the tip! I figured that the storage-attached hook would need to do something in order to be persistent -- adding the mount to fstab, perhaps.

I went initially with block because I do want to control the filesystem type (ext3/4 vs. xfs, perhaps), as well as various mount options, like noatime. It all adds a little bit of complexity, but goes far in helping do benchmarking and performance-tuning. Are these something "type: filesystem" can do?

Still working on getting master built; I'll have more on that when I start my day.

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-07-30:

#5

So there is a serious issue here: if the provisioning of one volume fails, other volumes cannot be provisioned. I will look into making this more robust.

The other issue is fixed in 1.25: destroying a volume will first ensure the volume is detached.

Changed in juju-core:
status:	Triaged → In Progress

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-07-30:

#6

> I went initially with block because I do want to control the filesystem type (ext3/4 vs. xfs, perhaps), as well as various mount options, like noatime. It all adds a little bit of complexity, but goes far in helping do benchmarking and performance-tuning. Are these something "type: filesystem" can do?

Not at the moment. All you get with "type: filesystem" is ext4. Juju will make sure it's formatted and mounted before installing, and will ensure the mount is maintained whenever the agent restarts. Control over filesystems and mount options was originally in the plan, but was removed due to complexity around required vs. preferred options. If you find yourself repeating this work a lot, it may be worth us reconsidering that position.

Adding an fstab entry would be fine. Just be careful about persistent naming; block device names can change across machine reboots. I suggest using the filesystem UUID.

Revision history for this message

John George (jog) wrote on 2015-07-31:

#7

Juju CI test failure that possibly looks related:
http://reports.vapour.ws/releases/2932/job/aws-deploy-precise-amd64/attempt/2942

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-07-31:

#8

John, that's a separate issue; that error does not involve the storage provisioner. We probably need to wait a bit longer for the volume to be associated.

Revision history for this message

Adam Israel (aisrael) wrote on 2015-07-31:

#9

Hi Andrew,

Confirming that 1.25 trunk fixes the spawning of extra volumes, but still times out while attempting to create one.

Ian Booth (wallyworld) on 2015-08-09

Changed in juju-core:
milestone:	1.25.1 → 1.25.0

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-08-10:

#10

Adam, if you could test again with master that would be great. It should be fixed now.

Changed in juju-core:
status:	In Progress → Fix Committed

Revision history for this message

Adam Israel (aisrael) wrote on 2015-08-10:

#11

Hey Andrew,

I pulled master this morning and re-ran. I'm still seeing timeouts occur, unfortunately. `juju debug-log` started spewing:

machine-0: 2015-08-10 15:26:18 ERROR juju.worker.storageprovisioner storageprovisioner.go:173 processing pending volumes: creating volumes: creating volumes from source "ebs": attaching vol-746fec95 to i-b7e05865: timed out waiting for volume vol-746fec95 to become available

The AWS console shows that I have three standard 8Gb. I checked it again a couple minutes later and a new volume has appeared; my 10G SSD. I let that spin for several minutes, waiting for amazon to bring the volume online, and when I checked again it was being deleted and re-initialized.

juju switch amazon
juju bootstrap
juju deploy -n3 --repository=/charms local:trusty/cassandra --storage data=ebs-ssd,10G --constraints "instance-typi2.8xlarge"

I ran this, deploying three units, and a second time with just the one. I see the SSD being initialized, and then deleted. Here's the relevant debug-log from a one-unit deployment:

machine-0: 2015-08-10 16:02:50 ERROR juju.state.unit unit.go:738 unit cassandra/0 cannot get assigned machine: unit "cassandra/0" is not assigned to a machine
machine-0: 2015-08-10 16:03:08 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: cannot attach to non-running instance i-7a9a22a8
machine-0: 2015-08-10 16:03:44 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: attaching vol-800a8961 to i-7a9a22a8: timed out waiting for volume vol-800a8961 to become available
machine-0: 2015-08-10 16:04:50 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: attaching vol-5c098abd to i-7a9a22a8: timed out waiting for volume vol-5c098abd to become available
machine-0: 2015-08-10 16:06:56 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: attaching vol-6e088b8f to i-7a9a22a8: timed out waiting for volume vol-6e088b8f to become available
machine-0: 2015-08-10 16:11:01 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: attaching vol-1f0c8ffe to i-7a9a22a8: timed out waiting for volume vol-1f0c8ffe to become available

Hey Andrew,

I pulled master this morning and re-ran. I'm still seeing timeouts occur, unfortunately. `juju debug-log` started spewing:

machine-0: 2015-08-10 15:26:18 ERROR juju.worker.storageprovisioner storageprovisioner.go:173 processing pending volumes: creating volumes: creating volumes from source "ebs": attaching vol-746fec95 to i-b7e05865: timed out waiting for volume vol-746fec95 to become available

The AWS console shows that I have three standard 8Gb. I checked it again a couple minutes later and a new volume has appeared; my 10G SSD.  I let that spin for several minutes, waiting for amazon to bring the volume online, and when I checked again it was being deleted and re-initialized.

juju switch amazon
juju bootstrap
juju deploy -n3 --repository=/charms local:trusty/cassandra --storage data=ebs-ssd,10G --constraints "instance-typi2.8xlarge"

I ran this, deploying three units, and a second time with just the one. I see the SSD being initialized, and then deleted. Here's the relevant debug-log from a one-unit deployment:

machine-0: 2015-08-10 16:02:50 ERROR juju.state.unit unit.go:738 unit cassandra/0 cannot get assigned machine: unit "cassandra/0" is not assigned to a machine
machine-0: 2015-08-10 16:03:08 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: cannot attach to non-running instance i-7a9a22a8
machine-0: 2015-08-10 16:03:44 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: attaching vol-800a8961 to i-7a9a22a8: timed out waiting for volume vol-800a8961 to become available
machine-0: 2015-08-10 16:04:50 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: attaching vol-5c098abd to i-7a9a22a8: timed out waiting for volume vol-5c098abd to become available
machine-0: 2015-08-10 16:06:56 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: attaching vol-6e088b8f to i-7a9a22a8: timed out waiting for volume vol-6e088b8f to become available
machine-0: 2015-08-10 16:11:01 ERROR juju.worker.storageprovisioner volumes.go:407 failed to create volume 0: attaching vol-1f0c8ffe to i-7a9a22a8: timed out waiting for volume vol-1f0c8ffe to become available

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-08-11:

#12

Indeed, confirmed there's still something not quite right. Investigating.

Changed in juju-core:
status:	Fix Committed → In Progress

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-08-11:

#13

Another fix here: https://github.com/juju/juju/pull/2961

Revision history for this message

Adam Israel (aisrael) wrote on 2015-08-11:

#14

Hey Andrew,

Success! The volume mounted on the first try, and I've been able to use it as a block device, formatting and mounting it.

I do get this message in debug-log, immediately after the volume switched to "in-use", after each time the config-changed hook finished, and periodically while idle:

machine-1[3941]: 2015-08-11 06:09:23 ERROR juju.worker.storageprovisioner volumes.go:481 attaching volume: querying instance details: Credential must have exactly 5 slash-delimited elements, e.g. keyid/date/region/service/term, got 'not' (AuthFailure)

Everything seems to be working, other than that message. Thanks for all your work tracking this down!

Revision history for this message

Adam Israel (aisrael) wrote on 2015-08-11:

#15

Side note:

$ juju destroy-environment -y amazon
ERROR failed to destroy environment "amazon"

If the environment is unusable, then you may run

juju destroy-environment --force

to forcefully destroy the environment. Upon doing so, review
your environment provider console for any resources that need
to be cleaned up. Using force will also by-pass destroy-envrionment block.

ERROR environment destruction failed: destroying environment: failed to destroy environment: Environment cannot be destroyed until all persistent volumes have been destroyed.
Run "juju storage list" to display persistent storage volumes.

--

There's a typo: destroy-envrionment should be destroy-environment

Also, It's not clear from `juju storage help` or any subsequent storage commands *how* to destroy a volume. I found a workaround mentioned on the wip storage doc page (run destroy-environment with --force). Is it safe to assume that functionality is targeted for 1.25.0, and that the -detach hook, when fired, should unmount that device?

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-08-11:

#16

Great, thanks for confirming.

The error message you noted is due to lp:1483492, which I just noticed today as well. It's not actively harmful, it's just log spam.

The destroy-environment issue should be removed soon. At the moment, you must destroy the machines that the volumes are attached to; this will cause the volumes to be destroyed. Since we destroy all volumes when the environment is destroyed anyway, I'll be removing that check in destroy-environment.

Andrew Wilkins (axwalk) on 2015-08-12

Changed in juju-core:
status:	In Progress → Fix Committed

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2015-08-12:

#17

FYI, on master destroy-environment will no longer be prevented by the presence of volumes.

Curtis Hovey (sinzui) on 2015-08-27

Changed in juju-core:
status:	Fix Committed → Fix Released

juju-core

Storage provisioner timeouts spawning extra volumes on AWS

Bug Description

Other bug subscribers

Remote bug watches