Bug #1352923 “MAAS 1.8 requires arbitrary high-numbered port con...” : Bugs : MAAS

Revision history for this message

Nick Moffitt (nick-moffitt) wrote on 2014-08-05:

#1

ProcEnviron.txt Edit (128 bytes, text/plain; charset="utf-8")

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-08-05:

#2

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in maas (Ubuntu):
status:	New → Confirmed

James Troup (elmo) on 2014-08-05

affects:

maas (Ubuntu) → maas

Julian Edwards (julian-edwards) on 2014-08-06

Changed in maas:
status:	Confirmed → Triaged
importance:	Undecided → High
milestone:	none → 1.6.1

Revision history for this message

Gavin Panella (allenap) wrote on 2014-08-06:

#3

A few thoughts:

- We may be able to use SO_REUSEPORT to have all event-loops in the
  region listening on the same port, but then it's unclear how we would
  ensure that each cluster has a connection to each and every
  event-loop, which is the model we rely upon.

- We /could/ change the logic so that the region initiates connections
  with clusters. The clusters could listen on a fixed port, and because
  there's only one of them (...so far), it would make firewalls simpler
  to configure. But, if we do ever run multiple cluster processes then
  we're back to the same problem.

- We could restrict the ports used to a narrow range.

Before we spend more time on this, can you explain why opening up a
large port range is problematic? The ports need only to be open between
the cluster controllers and the region controllers, not to the whole
cluster network.

My hand-wavy thoughts on using a narrow range, which is probably the
easiest thing to implement:

- A narrow port range isn't a significant security improvement over
  using a wide range as far as I can tell, unless the range is beneath
  1024, when we can assume that only a root or root-authorised user
  could attempt to impersonate an event-loop in the region.

- The clusters will only connect to ports on the region controllers that
have been advertised (the clusters request the list of ports via the
web API). This also narrows the chance of impersonation.

- The RPC connection between region and cluster is not yet over TLS, but
  will be for 1.7. Each end will authenticate the other via certificate.
  (The code is in place, we just need to figure out the upgrade workflow
  to introduce certificates.)

- On the region machines, other processes could be listening on high
  ports, and these would be exposed to the cluster controllers. These
  could be selectively blocked, or reconfigured to only listen on a
  local-only interface (e.g. 127.0.0.1). Granted, this isn't as nice as
  poking small holes in the firewall, but it's doable.

A few thoughts:

- We may be able to use SO_REUSEPORT to have all event-loops in the
  region listening on the same port, but then it's unclear how we would
  ensure that each cluster has a connection to each and every
  event-loop, which is the model we rely upon.

- We /could/ change the logic so that the region initiates connections
  with clusters. The clusters could listen on a fixed port, and because
  there's only one of them (...so far), it would make firewalls simpler
  to configure. But, if we do ever run multiple cluster processes then
  we're back to the same problem.

- We could restrict the ports used to a narrow range.

Before we spend more time on this, can you explain why opening up a
large port range is problematic? The ports need only to be open between
the cluster controllers and the region controllers, not to the whole
cluster network.

My hand-wavy thoughts on using a narrow range, which is probably the
easiest thing to implement:

- A narrow port range isn't a significant security improvement over
  using a wide range as far as I can tell, unless the range is beneath
  1024, when we can assume that only a root or root-authorised user
  could attempt to impersonate an event-loop in the region.

- The clusters will only connect to ports on the region controllers that
  have been advertised (the clusters request the list of ports via the
  web API). This also narrows the chance of impersonation.

- The RPC connection between region and cluster is not yet over TLS, but
  will be for 1.7. Each end will authenticate the other via certificate.
  (The code is in place, we just need to figure out the upgrade workflow
  to introduce certificates.)

- On the region machines, other processes could be listening on high
  ports, and these would be exposed to the cluster controllers. These
  could be selectively blocked, or reconfigured to only listen on a
  local-only interface (e.g. 127.0.0.1). Granted, this isn't as nice as
  poking small holes in the firewall, but it's doable.

Revision history for this message

Nick Moffitt (nick-moffitt) wrote on 2014-08-06: Re: [Bug 1352923] Re: MAAS 1.5 requires arbitrary high-numbered port connections between cluster and region controllers

#4

Download full text (3.7 KiB)

Gavin Panella:
> Before we spend more time on this, can you explain why opening up a
> large port range is problematic? The ports need only to be open between
> the cluster controllers and the region controllers, not to the whole
> cluster network.

I don't have a unified or coherent way of explaining this, but I'll try
to outline what I see as some of the reasoning involved here and hope
that others more articulate on the subject can chime in.

1. Principle of Least Access

    The reason we have firewalls is to allow desired traffic through
    while keeping out undesired traffic. The costs for disallowing
    desired traffic are felt immediately, and (depending on
    administration of the firewalls) easily remedied. The costs for
    allowing undesired traffic through are subtle and hidden, and
    magnify with time.

    As a result, the most sensible policy is to DENY by default and then
    allow through specific host/port combinations. Occasionally it is
    necessary to use entire network ranges instead of hosts, but that is
    a considered thing and has no alternatives (a central monitoring
    system, for example, needs to be able to probe all machines on a
    given network).

    Fortunately we never have to open large ranges of ports. Protocols
    such as the old NFS portmap used to require arbitrary UDP
    connections, and that's one of the many reasons nobody uses NFS any
    more. Protocols and connections are identified by port numbers,
    even if they're narrow ranges or de facto standards (such as 6667
    for IRC and 6697 for IRC over SSL).

It is reasonable that the burden of proof be on the request to open
access, as that is the operation that comes with risk.

2. Bound Ports

    We can be confident that a process has laid claim to a port
    (privileged or no) and that unprivileged processes won't be able to
    set up connections without first disabling the official one and any
    monitoring we perform.

    You mentioned privileged ports (sub-1024) which add a handy extra
    layer of constraints on all this, but the fact that the ports are
    unprivileged doesn't mean that there's no benefit to limiting them.
    If the apache2 process holds port 8080 on example.canonical.com,
    someone who exploits ntpd and gains access as the ntp user won't be
    able to kick apache off of that port without further effort.

3. Pervasiveness of the Approach

    Network firewalls are something we've had to defend less and less as
    Canonical has gained experience on customer sites. Even if there
    were no rational basis for the setup, its pervasiveness means that
    it's something MAAS needs to support.

    The role of the cluster controller is such that it is MAAS's
    representative in a variety of walled-off networks, and it cannot
    assume that it has unrestricted access to the region controller.
    The firewalls between MAAS controllers may not be under the control
    of the people running the MAAS infrastructure, and changes to
    filtering rules on such a site may be vastly more difficult than it
    is for us internally.

What happens if an ISP filters bittorrent ports, and a ...

Gavin Panella:
> Before we spend more time on this, can you explain why opening up a
> large port range is problematic? The ports need only to be open between
> the cluster controllers and the region controllers, not to the whole
> cluster network.

I don't have a unified or coherent way of explaining this, but I'll try
to outline what I see as some of the reasoning involved here and hope
that others more articulate on the subject can chime in.

1. Principle of Least Access

The reason we have firewalls is to allow desired traffic through
    while keeping out undesired traffic.  The costs for disallowing
    desired traffic are felt immediately, and (depending on
    administration of the firewalls) easily remedied.  The costs for
    allowing undesired traffic through are subtle and hidden, and
    magnify with time.

As a result, the most sensible policy is to DENY by default and then
    allow through specific host/port combinations.  Occasionally it is
    necessary to use entire network ranges instead of hosts, but that is
    a considered thing and has no alternatives (a central monitoring
    system, for example, needs to be able to probe all machines on a
    given network).

Fortunately we never have to open large ranges of ports.  Protocols
    such as the old NFS portmap used to require arbitrary UDP
    connections, and that's one of the many reasons nobody uses NFS any
    more.  Protocols and connections are identified by port numbers,
    even if they're narrow ranges or de facto standards (such as 6667
    for IRC and 6697 for IRC over SSL).

It is reasonable that the burden of proof be on the request to open
    access, as that is the operation that comes with risk.

2. Bound Ports
    
    We can be confident that a process has laid claim to a port
    (privileged or no) and that unprivileged processes won't be able to
    set up connections without first disabling the official one and any
    monitoring we perform.

You mentioned privileged ports (sub-1024) which add a handy extra
    layer of constraints on all this, but the fact that the ports are
    unprivileged doesn't mean that there's no benefit to limiting them.
    If the apache2 process holds port 8080 on example.canonical.com,
    someone who exploits ntpd and gains access as the ntp user won't be
    able to kick apache off of that port without further effort.

3. Pervasiveness of the Approach

Network firewalls are something we've had to defend less and less as
    Canonical has gained experience on customer sites.  Even if there
    were no rational basis for the setup, its pervasiveness means that
    it's something MAAS needs to support.

The role of the cluster controller is such that it is MAAS's
    representative in a variety of walled-off networks, and it cannot
    assume that it has unrestricted access to the region controller.
    The firewalls between MAAS controllers may not be under the control
    of the people running the MAAS infrastructure, and changes to
    filtering rules on such a site may be vastly more difficult than it
    is for us internally.

What happens if an ISP filters bittorrent ports, and a cluster
    controller must cross that barrier to reach a region controller that
    landed on port 6881?  What if it lands on port 8080 and an
    intercepting HTTP proxy redirects it?  What if it lands on a port
    that trips a company's IDS, and nobody was able to whitelist it
    there because it wasn't listed in the documentation as a required
    port?  
    
    There are lots of nasty surprises that can crop up on customer
    networks, and it seems worthwhile to avoid them by telling the user
    "traffic goes from $HOSTS to $OTHER_HOSTS on $PORTS."

-- 
Nick Moffitt

Revision history for this message

Gavin Panella (allenap) wrote on 2014-08-06: Re: MAAS 1.5 requires arbitrary high-numbered port connections between cluster and region controllers

#5

Thanks Nick, that's good stuff.

Revision history for this message

Christian Reis (kiko) wrote on 2014-09-03:

#6

We should put serious thought into making a single port whitelist the only thing which is required to communicate between cluster and region. I'm not sure whether making that 80 or 443 is feasible or significantly better than requiring a random unprivileged port.

Christian Reis (kiko) on 2014-10-02

Changed in maas:
assignee:	nobody → Gavin Panella (allenap)

Julian Edwards (julian-edwards) on 2014-10-08

Changed in maas:
milestone:	1.7.0 → next

Gavin Panella (allenap) on 2015-04-30

Changed in maas:
assignee:	Gavin Panella (allenap) → nobody

Andres Rodriguez (andreserl) on 2015-05-15

Changed in maas:
milestone:	next → 1.8.0
importance:	High → Critical

Andres Rodriguez (andreserl) on 2015-05-15

summary:

- MAAS 1.5 requires arbitrary high-numbered port connections between
+ MAAS 1.8 requires arbitrary high-numbered port connections between
cluster and region controllers

ubuntudotcom1 (ubuntudotcom1) on 2015-05-15

Changed in maas:
assignee:	nobody → ubuntudotcom1 (ubuntudotcom1)

Revision history for this message

ubuntudotcom1 (ubuntudotcom1) wrote on 2015-05-15:

#7

Hmm maybe there is a way we can leverage multicast to help out here.

Gavin Panella (allenap) on 2015-05-21

Changed in maas:
assignee:	ubuntudotcom1 (ubuntudotcom1) → Gavin Panella (allenap)
status:	Triaged → In Progress

Andres Rodriguez (andreserl) on 2015-05-22

Changed in maas:
status:	In Progress → Fix Committed

Andres Rodriguez (andreserl) on 2015-06-22

Changed in maas:
status:	Fix Committed → Fix Released

MAAS

MAAS 1.8 requires arbitrary high-numbered port connections between cluster and region controllers

Bug Description

Related branches

Other bug subscribers

Bug attachments

Remote bug watches