Failed deployment: FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/block/bcache0/bcache0p1/slaves'

Bug #1811117 reported by Ashley Lai
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
Fix Released
Critical
Unassigned
curtin (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Bundle deployment failed with juju status shows the message:
  Failed deployment: 'cloudinit' running modules for final

Jan 9 06:35:17 beartic cloud-init[2993]: Traceback (most recent call last):
Jan 9 06:35:17 beartic cloud-init[2993]: File "/curtin/curtin/commands/main.py", line 201, in main
Jan 9 06:35:17 beartic cloud-init[2993]: ret = args.func(args)
Jan 9 06:35:17 beartic cloud-init[2993]: File "/curtin/curtin/commands/block_meta.py", line 58, in block_meta
Jan 9 06:35:17 beartic cloud-init[2993]: meta_custom(args)
Jan 9 06:35:17 beartic cloud-init[2993]: File "/curtin/curtin/commands/block_meta.py", line 1471, in meta_custom
Jan 9 06:35:17 beartic cloud-init[2993]: clear_holders.clear_holders(disk_paths)
Jan 9 06:35:17 beartic cloud-init[2993]: File "/curtin/curtin/block/clear_holders.py", line 587, in clear_holders
Jan 9 06:35:17 beartic cloud-init[2993]: shutdown_function(dev_info['device'])
Jan 9 06:35:17 beartic cloud-init[2993]: File "/curtin/curtin/block/clear_holders.py", line 134, in shutdown_bcache
Jan 9 06:35:17 beartic cloud-init[2993]: os.listdir(os.path.join(device, 'slaves'))]
Jan 9 06:35:17 beartic cloud-init[2993]: FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/block/bcache0/bcache0p1/slaves'
Jan 9 06:35:17 beartic cloud-init[2993]: [Errno 2] No such file or directory: '/sys/class/block/bcache0/bcache0p1/slaves'
Jan 9 06:35:17 beartic cloud-init[2993]: builtin command failed
Jan 9 06:35:17 beartic cloud-init[2993]: finish: cmd-install/stage-partitioning/builtin: FAIL: running 'curtin block-meta custom'
Jan 9 06:35:17 beartic cloud-init[2993]: builtin took 4.751 seconds
Jan 9 06:35:18 beartic cloud-init[2993]: stage_partitioning took 4.754 seconds
Jan 9 06:35:18 beartic cloud-init[2993]: finish: cmd-install/stage-partitioning: FAIL: configuring storage
Jan 9 06:35:18 beartic cloud-init[2993]: curtin: Installation failed with exception: Unexpected error while running command.
Jan 9 06:35:18 beartic cloud-init[2993]: Command: ['curtin', 'block-meta', 'custom']
Jan 9 06:35:18 beartic cloud-init[2993]: Exit code: 3
Jan 9 06:35:18 beartic cloud-init[2993]: Reason: -

Related branches

Revision history for this message
Ryan Harper (raharper) wrote :

FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/block/bcache0/bcache0p1/slaves'
Jan 9 06:35:17 beartic cloud-init[2993]: [Errno 2] No such file or directory: '/sys/class/block/bcache0/bcache0p1/slaves'

Can you provide the curtin configuration and the install log (verbose)?

https://discourse.maas.io/t/getting-curtin-debug-logs/169

I didn't think bcache devices could be partitioned directly; so this looks really strange.
None the less, it appears doing a wipe; so curtin needs to handle this.

Changed in curtin:
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Another repro, this is the rsyslog which I think includes the curtin logs, with verbose on:
http://paste.ubuntu.com/p/NvzCGHJZ9x/

This is the curtin-config
http://paste.ubuntu.com/p/KY89jHJG8S/

summary: - Failed deployment: 'cloudinit' running modules for final
+ Failed deployment: FileNotFoundError: [Errno 2] No such file or
+ directory: '/sys/class/block/bcache0/bcache0p1/slaves'
Changed in curtin:
status: Incomplete → New
Revision history for this message
Ryan Harper (raharper) wrote :

Hrm, something must be creating a partition on bcachce0 though; after the install. What software runs after an initial install?

Note, curtin will fix this; I'm just curious as partitioning a bcached device is an odd thing to do.

Changed in curtin:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I think that bcache device is getting passed to ceph as an OSD device. ceph partitions OSD devices.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Subscribed to field critical as this is causing a very high number of solutions QA test failures and we don't have a workaround for it.

Ryan Harper (raharper)
Changed in curtin:
status: Confirmed → In Progress
Ryan Harper (raharper)
Changed in curtin:
importance: High → Critical
Chad Smith (chad.smith)
Changed in curtin:
assignee: nobody → Ryan Harper (raharper)
Revision history for this message
Ryan Harper (raharper) wrote :

I've got the branch to address this up in a PPA:

ppa:raharper/bugfixes

curtin - 18.2-0ubuntu5~clear-holders-bcache-partitions-lp1811117-ppa5

Available for xenial and bionic.

Please test.

Revision history for this message
Ryan Harper (raharper) wrote :

Interesting, partitioning of bcache devices only works on newer kernels, Xenial GA (4.4.x) does not support this.

That will make this a bit more tricky.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I don't think that matters to us, we use hwe 4.15.x to get the bcache consistent naming fixes there.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We're testing with your ppa now and hitting an error when trying to deploy VMs:

Jan 17 12:08:49 juju-1 cloud-init[1318]: An error occured handling 'vda': RuntimeError - Cannot create disk tag udev rule for /dev/vda [id=vda], missing 'serial' or 'wwn' value

Full log: http://paste.ubuntu.com/p/QdrjHCw8v6/

This seems like an unrelated issue but it's blocking us getting to the point where we reproduce this.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1811117] Re: Failed deployment: FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/block/bcache0/bcache0p1/slaves'

On Thu, Jan 17, 2019 at 6:45 AM Jason Hobbs <email address hidden>
wrote:

> We're testing with your ppa now and hitting an error when trying to
> deploy VMs:
>
> Jan 17 12:08:49 juju-1 cloud-init[1318]: An error occured
> handling 'vda': RuntimeError - Cannot create disk tag udev rule for
> /dev/vda [id=vda], missing 'serial' or 'wwn' value
>
>
Doesn't maas create serials for kvm disks?

I'll apply a workaround and update the package in the ppa.

>
> Full log: http://paste.ubuntu.com/p/QdrjHCw8v6/
>
> This seems like an unrelated issue but it's blocking us getting to the
> point where we reproduce this.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1811117
>
> Title:
> Failed deployment: FileNotFoundError: [Errno 2] No such file or
> directory: '/sys/class/block/bcache0/bcache0p1/slaves'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1811117/+subscriptions
>

Revision history for this message
Ryan Harper (raharper) wrote :

I've updated the ppa with a workaround for that issue.

curtin - 18.2-0ubuntu9~clear-holders-bcache-partitions-lp1811117-ppa9

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

The workaround and the fix look to be working for us.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 81bf02ed to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=81bf02ed

Changed in curtin:
status: In Progress → Fix Committed
Ante Karamatić (ivoks)
Changed in curtin (Ubuntu):
milestone: none → xenial-updates
milestone: xenial-updates → none
Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

it's affecting one of our customer deployments, running on xenial. I'm curious if it's planned to release the fix on xenial, and when.

tags: added: 4010 field-hige
tags: added: field-high
removed: field-hige
tags: removed: field-high
Revision history for this message
Ryan Harper (raharper) wrote :

On Mon, Jan 28, 2019 at 9:31 AM Gábor Mészáros <email address hidden>
wrote:

> it's affecting one of our customer deployments, running on xenial. I'm
> curious if it's planned to release the fix on xenial, and when.
>

There's a related bug that's also being looked at to confirm if this
resolves issues for them.

https://bugs.launchpad.net/curtin/+bug/1796292

I've updated the PPA with one additional fix for dealing with bcache
devices which have
partitions.

I'd like confirmation from QA (and/or field) that it addresses the issue.

With that in-place we can start the SRU (and yes back to Xenial).

>
> ** Tags added: 4010 field-hige
>
> ** Tags removed: field-hige
> ** Tags added: field-high
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1811117
>
> Title:
> Failed deployment: FileNotFoundError: [Errno 2] No such file or
> directory: '/sys/class/block/bcache0/bcache0p1/slaves'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1811117/+subscriptions
>

Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

The version published in the ppa looks good so far.

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the confirmation.

The commit 81bf02ed was not sufficient to fix the issue. We'll be landing an additional change to ensure that partitions on bcache devices are recognized as partitions (not as actual bcache devices) as shutting them down is handled differently.

Revision history for this message
Ryan Harper (raharper) wrote :

The committed fix did not completely resolve the issue.

Changed in curtin:
status: Fix Committed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 18.2-10-g7afd77fa-0ubuntu1

---------------
curtin (18.2-10-g7afd77fa-0ubuntu1) disco; urgency=medium

  * New upstream snapshot.
    - Support for multi-layers images fsimage-layered:// URI
      [Jean-Baptiste Lallement]
    - dname: relax dname req for disk serial/wwn presence for compatibility
      (LP: #1735839)
    - flake8: fix some E117 over-indented issues [Paride Legovini]
    - bcache: ensure partitions on bcache devices are detected as partition
    - vmtest: bump skip_by_date out a year for trusty bcache bug
    - Fix typo in doc/topics/integration-testing.rst. [Paride Legovini]
    - flake8: Fix two issues found with new version of flake8
    - clear-holders: handle FileNotFound when probing for bcache device slaves
      (LP: #1811117)
    - vmtests: network mtu fix-by bump to post 19.04 release
    - vmtest: Fix bug preventing explicit disabling of system_upgrade.

 -- Ryan Harper <email address hidden> Wed, 27 Feb 2019 16:43:21 -0600

Changed in curtin (Ubuntu):
status: New → Fix Released
Revision history for this message
Ryan Harper (raharper) wrote :

Just to close the loop on comment #17, the SRU includes this commit:

https://git.launchpad.net/curtin/commit/?id=b473fd507e6e6ae76c30df3199fdd3adc697d825

which was the additional change needed to completely resolve the issue.

Joshua Powers (powersj)
Changed in curtin:
status: In Progress → Fix Released
assignee: Ryan Harper (raharper) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.