[2.3.x] Preseed fails to render when bcache backed partition is reformatted.

Bug #1799161 reported by Vladimir Grevtsev
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Newell Jensen
2.3
Fix Committed
Critical
Newell Jensen
2.4
Fix Committed
Critical
Newell Jensen

Bug Description

=== Environment ===
OS: Xenial 16.04.4, Linux BLRKECROSINF32 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
MAAS version: 2.3.5
MAAS packages installed: https://pastebin.canonical.com/p/PtwDfKQDP4/

=== Problem summary ===

While deploying some random Juju bundle (in this case it was "magpie"), Juju is not able to get a machine from MAAS: https://pastebin.canonical.com/p/sWSfrFyGxk/
In the same time, MAAS regiond log contains following records:

2018-10-19 13:34:28 maasserver: [error] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/maasserver/api/machines.py", line 561, in deploy
    get_curtin_merged_config(machine)
  File "/usr/lib/python3/dist-packages/maasserver/preseed.py", line 399, in get_curtin_merged_config
    yaml_config = get_curtin_yaml_config(node)
  File "/usr/lib/python3/dist-packages/maasserver/preseed.py", line 387, in get_curtin_yaml_config
    storage_config = compose_curtin_storage_config(node)
  File "/usr/lib/python3/dist-packages/maasserver/preseed_storage.py", line 640, in compose_curtin_storage_config
    return [generator.generate()]
  File "/usr/lib/python3/dist-packages/maasserver/preseed_storage.py", line 68, in generate
    self._generate_bcache_operations()
  File "/usr/lib/python3/dist-packages/maasserver/preseed_storage.py", line 505, in _generate_bcache_operations
    self._generate_bcache_operation(filesystem_group)
  File "/usr/lib/python3/dist-packages/maasserver/preseed_storage.py", line 514, in _generate_bcache_operation
    "backing_device": filesystem_group.get_bcache_backing_filesystem(
AttributeError: 'NoneType' object has no attribute 'get_parent'

Full log: https://pastebin.canonical.com/p/RTrPCDkQ59/

Manual deployment via GUI leads to same error messages in logs - "Logs > Installation output" shows only "System is booting", while similar log records appears in regiond log.

CLI:

ubuntu@BLRKECROSINF31:~$ maas maasadmin machine deploy xpywff --debug
400 BAD REQUEST

       Content-Type: text/plain; charset=utf-8
               Date: Mon, 22 Oct 2018 13:20:40 GMT
             Server: TwistedWeb/16.0.0
             Status: 400
  Transfer-Encoding: chunked
               Vary: Cookie
    X-Frame-Options: SAMEORIGIN

Failed to render preseed: 'NoneType' object has no attribute 'get_parent'
ubuntu@BLRKECROSINF31:~$

Output of "maas admin machine read ....": https://pastebin.canonical.com/p/qB8s5m5HSz/

Related branches

tags: added: field-high
tags: added: field-medium
removed: field-high
Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

+ ~field-medium team subscription

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Changed to field-high again since to this issue may become a blocker on customer env.

description: updated
tags: added: field-high
removed: field-medium
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Valdimir,

Can you please provide how your storage has been configured? Please provide a screenshot and the output of: maas <user> machine get-curtin-config <systemd-id> .

Changed in maas:
status: New → Incomplete
Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Hi Andres,

ubuntu@BLRKECROSINF31:~$ maas maasadmin machine get-curtin-config xpywff
Machine BLRKECRSDCLD301 is not in a deployment state.

Storage configuration - please, see attached document.

In the meanwhile, I have modified a /usr/lib/python3/dist-packages/maasserver/preseed_storage.py [1] and tried to call "machine deploy" once again. Output is: [2]

[1] https://pastebin.canonical.com/p/X3wdx8PSts/
[2] https://pastebin.canonical.com/p/zrty7Tk7Kn/

So I believe the problem is in the bcache0/bcache1, because all another passed that code successfully.

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Tried to put machine to "Deploying" state via GUI and tried "get-curtin-config" again, it failed with "list index out of range": https://pastebin.canonical.com/p/4PqJtthWnZ/

Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: [Bug 1799161] Re: [2.3.x] "Failed to render preseed: 'NoneType' object has no attribute 'get_parent'" on Juju acquisition event

Vladimir,

Are you guys configuring anything in /etc/Maas/preseeds/curtin_userdata ?

On Mon, Oct 22, 2018 at 11:30 AM Vladimir Grevtsev <email address hidden>
wrote:

> Tried to put machine to "Deploying" state via GUI and tried "get-curtin-
> config" again, it failed with "list index out of range":
> https://pastebin.canonical.com/p/4PqJtthWnZ/
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1799161
>
> Title:
> [2.3.x] "Failed to render preseed: 'NoneType' object has no attribute
> 'get_parent'" on Juju acquisition event
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1799161/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=maas; status=Incomplete; importance=Undecided;
> assignee=None;
> Launchpad-Bug-Tags: cpe-onsite field-high
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: andreserl vlgrevtsev
> Launchpad-Bug-Reporter: Vladimir Grevtsev (vlgrevtsev)
> Launchpad-Bug-Modifier: Vladimir Grevtsev (vlgrevtsev)
> Launchpad-Message-Rationale: Subscriber (MAAS)
> Launchpad-Message-For: andreserl
>
--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote : Re: [2.3.x] "Failed to render preseed: 'NoneType' object has no attribute 'get_parent'" on Juju acquisition event

No, we don't.

But looks like I found a wrong place in configuration of storage on server.

Before:

- partition from spin drive created
- added format options to partition
- set mountpoint for partition
- create bcache device, set "backing device" to partition, "caching device" to appropriate part of NVMe drive

After:

- partition from spin drive created
- bcache device created, set "backing device" to partition, "caching device" to NVMe
- added format options for bcache device
- set mountpoint for bcache device

The difference is - in first run (where a bug was originally reported) we have created a ext4 partitions on top of spinning drives, while bcache devices was left intact. The problem was, I guess, in something like - bcache device was created without any filesystem and thus was causing a problems while rendering curtin config.

I will try to share step-by-step MAAS CLI configuration commands to reproduce this case - I believe a little more diagnosis outputs (or some kind of checks) should be added in scope of this.

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Reproducer: https://pastebin.canonical.com/p/YVdYQr3tDg/ (after this command set execution node is not able to deploy; error like in bug description is shown; also "maas %user% bcaches read %node_id% throws the same exception)

Valid configuration (succeded only after deleting bcaches manually via GUI): https://pastebin.canonical.com/p/w8F475Mn5F/

Changed in maas:
milestone: none → 2.5.0rc1
Changed in maas:
status: Incomplete → Confirmed
importance: Undecided → High
Changed in maas:
importance: High → Critical
Changed in maas:
assignee: nobody → Newell Jensen (newell-jensen)
Changed in maas:
status: Confirmed → In Progress
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Newell Jensen (newell-jensen) wrote :

Vladimir,

Can you please test that latest fix that has landed in master and report back if this clears up the issue for you?

Thanks,

Newell

Changed in maas:
status: Fix Committed → Fix Released
summary: - [2.3.x] "Failed to render preseed: 'NoneType' object has no attribute
- 'get_parent'" on Juju acquisition event
+ [2.3.x] Preseed fails to render when bcache backed partition is
+ reformatted.
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.