[SRU] maas-cluster-controller doesn't have images for provisioning

Bug #1068843 reported by Diogo Matsubara
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Julian Edwards
1.2
Fix Released
Critical
Julian Edwards
maas (Ubuntu)
Fix Released
High
Unassigned
Quantal
Fix Released
Undecided
Unassigned
Raring
Fix Released
High
Unassigned
python-tx-tftp (Ubuntu)
Fix Released
Undecided
Unassigned
Quantal
Fix Released
High
Unassigned
Raring
Fix Released
Undecided
Unassigned

Bug Description

Context is the QA lab

In one VM I have the region-controller (RC) where:

10.98.0.90 is the api/webui
192.168.21.5 is the RC's provisioning server

On another VM I have a cluster-controller (CC) where

192.168.20.5 is the CC's provisioning server
10.98.0.91 is the interface connected to the RC network

The 21.x and 20.x networks don't have access to each other.

When I install the CC, I set up the API to http://10.98.0.90/MAAS

When the CC is registered to the RC and and dhcpd.conf file is written, I get this:

subnet 192.168.20.0 netmask 255.255.255.0 {
       next-server 192.168.21.5;
       filename "pxelinux.0";
       option subnet-mask 255.255.255.0;
       option broadcast-address 192.168.20.255;
       option domain-name-servers 192.168.21.5;
       option routers 192.168.20.1;
       range dynamic-bootp 192.168.20.10 192.168.20.20;
}

When the nodes in the 20.x network are booted, they can reach the DHCP server but can't reach the provisioning server.
If I change the next-server value to 192.168.20.5, the nodes can't find the images.

Looks like when the CC is registered, the images in the RC should be sent to the CC, so it can serve them to the nodes in its network.

= python-tx-tftp =

[Impact]
Enables the backend to know from what address the request came in on. This is an important fix related to MAAS, as it requires from what IP address the request came from in order to provide tftp correctly.

[Test Case]
1. Install MAAS region in 1 machine.(Make sure iscsi is not running)
2. Install MAAS cluster in a second machine. (ISCSI will be running managed by MAAS)
3. Boot a client machine.
If the enlistment failed it is because the machine tried to access the region controller instead of the cluster controller. If the machine enlists successful we can be confident that the bug is resolved.

[Regression Potential]
Minimal. This simply passes a new argument and does not affect the code. This has been tested wwith MAAS and the MAAS team is committed to fix any issues for this package from this point forward.

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

We knew about this but there was no bug until now, so thanks :)

See also bug 1067961 which is the same problem but with iSCSI

Changed in maas:
status: New → Triaged
importance: Undecided → High
tags: added: scaling
Changed in maas:
importance: High → Critical
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Gavin and I just talked about how to fix this. We looked at various ways of pushing files out from the region controller to the clusters and they all have varying orders of complexity. Given the timescales involved we need a simple solution.

So, we'll make maas-import-pxe-files able to run on the cluster controller such that it pulls kernel/initrd/ephemeral files from a squid proxy running on the region controller. The maas-import-pxe-files script sets up iscsi targets when it runs.

This also means the boot image reporting job has to run on all the clusters, and the database needs to identify which cluster is missing its boot files.

Points to note:
 * check that squid's max object size is big enough to cache the large ISOs
 * Ensure that during PXE boot we point the iSCSI config in kernel params to the cluster's address

Changed in maas:
milestone: none → 12.10-stabilization
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Further discussion:

maas-import-pxe-files can be always run as a celerybeat job which is targeted at all workers. Each worker would be sent an optional squid proxy address in the job parameters. If the proxy is not set, then download from t'internet. The squid proxy address can be a region controller setting.

This means that the import could also be initiated remotely from the UI/API etc.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Work items to do all this:

* add celery job that wraps maas-import-pxe-files
* add settings page setting to set proxy url
* add settings page setting to say whether to run maas-import-pxe-files
* amend celery job to take proxy url
* celery job to abort if setting is False
* celerybeat schedule for m-i-p-f

* BootImage schema to reference NodeGroup
* report_boot_images to say which nodegroup it is on
* report_boot_images to run on all celeryds
* change beat schedule to run on all workers

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Julian Edwards (julian-edwards)
Revision history for this message
Julian Edwards (julian-edwards) wrote :

See also bug 1070318

James Page (james-page)
Changed in maas (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in maas:
milestone: 12.10-stabilization → none
status: In Progress → Fix Released
Changed in maas:
status: Fix Released → In Progress
Changed in maas:
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-tx-tftp - 0.1~bzr31-0ubuntu7

---------------
python-tx-tftp (0.1~bzr31-0ubuntu7) raring; urgency=low

  [ Julian Edwards ]
  * Add d/p/02-ip-context.patch (LP: #1068843)
 -- Andres Rodriguez <email address hidden> Wed, 31 Oct 2012 10:10:04 +0100

Changed in python-tx-tftp (Ubuntu Raring):
status: New → Fix Released
Changed in python-tx-tftp (Ubuntu Quantal):
importance: Undecided → High
description: updated
summary: - maas-cluster-controller doesn't have images for provisioning
+ [SRU] maas-cluster-controller doesn't have images for provisioning
Revision history for this message
Dave Walker (davewalker) wrote : Please test proposed package

Hello Diogo, or anyone else affected,

Accepted into quantal-proposed. The package will build now and be available in a few hours in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in maas (Ubuntu Quantal):
status: New → Fix Committed
Changed in python-tx-tftp (Ubuntu Quantal):
status: New → Fix Committed
tags: added: verification-needed
tags: added: verification-done
removed: verification-needed
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-tx-tftp - 0.1~bzr31-0ubuntu6.1

---------------
python-tx-tftp (0.1~bzr31-0ubuntu6.1) quantal-proposed; urgency=low

  [ Julian Edwards ]
  * Add d/p/02-ip-context.patch: Allow the backend to know from what
    address the request came from. (LP: #1068843)
 -- Andres Rodriguez <email address hidden> Fri, 23 Nov 2012 14:10:55 -0500

Changed in python-tx-tftp (Ubuntu Quantal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package maas - 1.2+bzr1349+dfsg-0ubuntu1

---------------
maas (1.2+bzr1349+dfsg-0ubuntu1) raring; urgency=low

  * New upstream bugfix release. Fixes:
    - The DNS configuration is not created if maas-dns is installed after
      the DNS config has been set up (LP: #1085865).
    - IPMI detection ends up with power_address of 0.0.0.0 (LP: #1064224)
    - Main page slow to load with many nodes (LP: #1066775)
    - maas-cluster-controller doesn't have images for
      provisioning (LP: #1068843)
    - Filestorage is unique to each appserver instance (LP: #1069734)
    - import_pxe_files does not include quantal (LP: #1069850)
    - maas-cli nodes new incomplete documentation (LP: #1070522)
    - DNS forward zone ends up with nonsensical entries (LP: #1070765)
    - The hostname of a node can still be changed once the node is in
      use. (LP: #1070774)
    - The zone name (attached to a cluster controller) can still be changed
      when it contains in-use nodes and DNS is managed. (LP: #1070775)
    - Duplicated prefix in the url used by the CLI (LP: #1075597)
    - Not importing Quantal boot images (LP: #1077180)
    - Nodes are deployed with wrong domain name. (LP: #1078744)
    - src/maasserver/api.py calls request.data.getlist with a 'default'
      parameter. That parameter is not supported by Django 1.3. (LP: #1080673)
    - API calls that return a node leak private data (LP: #1034318)
    - MAAS hostnames should be 5 easily disambiguated characters (LP: #1058998)
    - URI in API description wrong when accessing machine via alternative
      interface. (LP: #1059645)
    - Oops when renaming nodegroup w/o interface (LP: #1077075)
    - Error in log when using 'Start node' button: MAASAPINotFound: No user
      data available for this node. (LP: #1069603)

  [ Raphaël Badin ]
  * debian/maas-dns.postinst: Call write_dns_config (LP: #1085865).
  * debian/maas-dns.postinst: fix permissions and group ownership of
    file /etc/bind/maas/named.conf.rndc.maas. (LP: #1066935)

  [ Julian Edwards ]
  * debian/maas-region-controller.install: Remove installation of maas-gc; it
    is no longer required as upstream no longer stores files in the filesystem.
    (LP: #1069734)
  * debian/maas-cluster-controller.postinst: Ensure that /etc/maas/pserv.yaml
    is updated when reconfiguring. (LP: #1081212)

  [ Andres Rodriguez ]
  * debian/control:
    - maas-cluster-controller Conflicts with tftpd-hpa (LP: #1076028)
    - maas-dns: Conflicts with dnsmasq
    - Drop Dependency on rabbitmq-server for maas-cluster-controller.
      (LP: #1072744)
    - Add conflicts/replaces for maas-region-controller to
      maas-cluster-controller.
  * debian/maas-cluster-controller.config: If URL has been detected, add
    /MAAS if it doesn't contain it. This helps upgrades from versions where
    DEFAULT_MAAS_URL didn't use /MAAS.
  * Install maas-import-pxe-files and related files with
    maas-cluster-controller, as well as configure tgtd, as
    maas-region-controller no longer stores images. Thanks to Jeroen
    Vermuelen.

  [ Gavin Panella ]
  * debian/extras/99-maas: squashfs image download is no longer needed.
  * debian/maas-clu...

Read more...

Changed in maas (Ubuntu Raring):
status: Triaged → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 1.2+bzr1373+dfsg-0ubuntu1

---------------
maas (1.2+bzr1373+dfsg-0ubuntu1) quantal-proposed; urgency=low

  * MAAS Stable Release Update (LP: #1109283):
    This SRU brings a new upstream release of MAAS that removes
    the usage of a cobbler code copy, 'maas-provision' as well as
    several bug fixes. Exception has been granted by the Technical
    Board to proceed. More information can be found in:
    https://lists.ubuntu.com/archives/ubuntu-devel-announce/2013-February/001012.html

  [ Andres Rodriguez ]
  * debian/control:
    - Change Conflicts/Replaces for Breaks/Replaces.
    - Conflicts on tftpd-hpa and dnsmasq.
    - Do not pre-depends, but Depends on ${misc:Depends} for 'maas'.

  [ Steve Langasek ]
  * postinst scripts are never called with 'reconfigure' as the script
    argument. Remove references to this (mythical) invocation.
  * always call 'set -e' from maintainer scripts instead of passing 'sh -e'
    as the interpreter, so that scripts will behave correctly when run via
    'sh -x'.
  * invoke-rc.d is never allowed to not exist - simplify scripts (and make
    them better policy-compliant) by invoking unconditionally. (The only
    possible exception is in the postrm, where it's *theoretically* possible
    for invoke-rc.d to be missing if the user has completely stripped
    down their system; that's a fairly unreasonable corner case, but we
    might as well be correct if it ever happens.)
  * db_get+db_set is a no-op; don't call db_set to push back a value we just
    got from db_get.
  * Omit superfluous calls to 'exit 0' at the end of each script.
  * Remove maas-cluster-controller prerm script, which called debconf for no
    reason.
  * Don't invoke debconf in the postrm script either, debhelper already does
    this for us.
  * Other miscellaneous maintainer script fixes
  * debian/maas-common.postinst: call adduser and addgroup unconditionally;
    the tools are already designed to DTRT, we don't need to check for the
    user/group existence before calling them nor should we worry about
    calling them only once on first install.
  * debian/maas-common.postrm: delete the maas group, not just the user,
    as the comment in the code implies we should do.
 -- Andres Rodriguez <email address hidden> Thu, 07 Mar 2013 14:22:35 -0500

Changed in maas (Ubuntu Quantal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.