MRE updates of rabbitmq-server for Jammy,Focal

Bug #2060248 reported by Mitchell Dzurick
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
rabbitmq-server (Ubuntu)
New
Undecided
Mitchell Dzurick
Focal
In Progress
Undecided
Unassigned
Jammy
In Progress
Undecided
Unassigned

Bug Description

This bug tracks an update for the rabbitmq-server package in Ubuntu.

This bug tracks an update to the following versions:

 * Focal (20.04): rabbitmq-server 3.8.3
 * Jammy (22.04): rabbitmq-server 3.9.27

(NOTE) - Jammy is only updating to 3.9.27 because 3.9.28 requires Erlang 24.3. If Erlang updates in the future, then we can upgrade further.
(NOTE) - Focal is only updating to 3.8.3 from 3.8.2 because 3.8.4 requires etcd v3.4.

This is the first MRE of rabbitmq-server.

Upstream has a very rapid release cadence with micro releases that contain many bug fixes that would be good to bring into our LTS releases.

One major hurdle with this is the lack of proper dep8 tests, which a limited suite of dep8 tests were created for this MRE, which is planned to get integrated into newer releases once approved.

rabbitmq-server is a complicated package that the new dep8 tests will not be able to cover everything, therefore our openstack charms CI/CD ran the new verison to provide more confidence in the package, and to at least verify that our workflow works. The results of these runs can be found at https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/915836.

In addition to this, only Jammy has github workflows to build+test the package, where the results can be found at https://github.com/mitchdz/rabbitmq-server-3-9-27-tests/actions/runs/8955069098/job/24595393599.

Reviewing the changes, there is only one change that I want to bring to attention. That is version 3.9.23 (https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.9.23 ) introduces the following change:
Nodes now default to 65536 concurrent client connections instead of using the effective kernel open file handle limit

------------------------------------------------------------------------------

Jammy Changes:
  - Notices:
    + Nodes now default to 65536 concurrent client connections instead of
      using the effective kernel open file handle limit. Users who want to
      override this default, that is, have nodes that should support more
      concurrent connections and open files, now have to perform an additional
      configuration step:

      1. Pick a new limit value they would like to use, for instance, 100K
      2. Set the maximum open file handle limit (for example, via `systemd`
         or similar tooling) for the OS user used by RabbitMQ to 100K
      3. Set the ERL_MAX_PORTS environment variable to 100K

      This change was introduced because of a change in several Linux
      distributions: they now use a default open file handle limit so high,
      they cause a significant (say, 1.5 GiB) memory preallocated the Erlang
      runtime.
  - Updates:
    + Free disk space monitor robustness improvements.
    + `raft.adaptive_failure_detector.poll_interval` exposes aten()'s
      poll_interval setting to RabbitMQ users. Increasing it can reduce the
      probability of false positives in clusters where inter-node
      communication links are used at close to maximum capacity. The default
      is `5000` (5 seconds).
    + When both `disk_free_limit.relative` and `disk_free_limit.absolute`,
      or both `vm_memory_high_watermark.relative` and
      `vm_memory_high_watermark.absolute` are set, the absolute settings will
      now take precedence.
    + New key supported by `rabbitmqctl list_queues`:
      `effective_policy_definition` that returns merged definitions of regular
      and operator policies effective for the queue.
    + New HTTP API endpoint, `GET /api/config/effective`, returns effective
      node configuration. This is an HTTP API counterpart of
      `rabbitmq-diagnostics environment`.
    + Force GC after definition import to reduce peak memory load by mostly
      idle nodes that import a lot of definitions.
    + A way to configure an authentication timeout, much like in some other
      protocols RabbitMQ supports.
    + Windows installer Service startup is now optional. More environment
      variables are respected by the installer.
    + In environments where DNS resolution is not yet available at the time
      RabbitMQ nodes boot and try to perform peer discovery, such as CoreDNS
      with default caching interval of 30s on Kubernetes, nodes now will
      retry hostname resolution (including of their own host) several times
      with a wait interval.
    + Prometheus plugin now exposes one more metric process_start_time_seconds
      the moment of node process startup in seconds.
    + Reduce log noise when `sysctl` cannot be accessed by node memory
      monitor.
    + Shovels now handle consumer delivery timeouts gracefully and restart.
    + Optimization: internal message GUID is no longer generated for quorum
      queues and streams, as they are specific to classic queues.
    + Two more AMQP 1.0 connection lifecycle events are now logged.
    + TLS configuration for inter-node stream replication connections now can
      use function references and definitions.
    + Stream protocol connection logging is now less verbose.
    + Max stream segment size is now limited to 3 GiB to avoid a potential
      stream position overflow.
    + Logging messages that use microseconds now use "us" for the SI symbol to
      be compatible with more tools.
    + Consul peer discovery now supports client-side TLS options, much like
      its Kubernetes and etcd peers.
    + A minor quorum queue optimization.
    + 40 to 50% throughput improvement for some workloads where AMQP 0-9-1
      clients consumed from a [stream](https://rabbitmq.com/stream.html).
    + Configuration of fallback secrets for Shovel and Federation credential
      obfuscation. This feature allows for secret rotation during rolling
      cluster node restarts.
    + Reduced memory footprint of individual consumer acknowledgements of
      quorum queue consumers.
    + `rabbitmq-diagnostics status` now reports crypto library (OpenSSL,
      LibreSSL, etc) used by the runtime, as well as its version details.
    + With a lot of busy quorum queues, nodes hosting a moderate number of of
      leader replicas could experience growing memory footprint of one of the
      Raft implementation processes.
    + Re-introduced key file log rotation settings. Some log rotation settings
      were left behind during the migration to the standard runtime logger
      starting with 3.9.0. Now some key settings have been re-introduced.
    + Cleaned up some compiler options that are no longer relevant.
    + Quorum queues: better forward compatibility with RabbitMQ 3.10.
    + Significantly faster queue re-import from definitions on subsequent node
      restarts. Initial definition import still takes the same amount of time
      as before.
    + Significantly faster exchange re-import from definitions on subsequent
      node restarts. Initial definition import still takes the same amount of
      time as before.
    + RabbitMQ nodes will now filter out certain log messages related to
      connections, channels, and queue leader replicas receiving internal
      protocol messages sent to this node before a restart. These messages
      usually raise more questions and cause confusion than help.
    + More Erlang 24.3's `eldap` library compatibility improvements.
    + Restart of a node that hosted one or more stream leaders resulted in
      their consumers not "re-attaching" to the newly elected leader.
    + Large fanouts experienced a performance regression when streams were not
      enabled using a feature flag.
    + Stream management plugin did not support mixed version clusters.
    + Stream deletion did not result in a `basic.cancel` being sent to AMQP
      0-9-1 consumers.
    + Stream clients did not receive a correct stream unavailability error in
      some cases.
    + It is again possible to clear user tags and update the password in a
      single operation.
    + Forward compatibility with Erlang 25.
    + File handle cache efficiency improvements.
    + Uknown stream properties (e.g. those requested by a node that runs a
      newer version)
      are now handled gracefully.
    + Temporary hostname resolution issues-attempts that fail with `nxdomain`
      are now handled more gracefully and with a delay of several seconds.
    + Build time compatibility with Elixir 1.13.
    + `auth_oauth2.additional_scopes_key` in `rabbitmq.conf` was not converted
       correctly during configuration translation and thus had no effect.
    + Adapt to a breaking Erlang 24.3 LDAP client change.
    + Shovels now can be declared with `delete-after` parameter set to `0`.
      Such shovels will immediately stop instead of erroring and failing to
      start after a node restart.
    + Support for Consul 1.1 response code changes
      when an operation is attempted on a non-existent health check.
  - Bug Fixes:
    + Classic queues with Single Active Consumer enabled could run into an
      exception.
    + When a global parameter was cleared,
      nodes emitted an internal event of the wrong type.
    + Fixed a type analyzer definition.
    + LDAP server password could end up in the logs in certain types of
      exceptions.
    + `rabbitmq-diagnostics status` now handles server responses where free
      disk space is not yet computed. This is the case with nodes early in the
      boot process.
    + Management UI links now include "noopener" and "noreferrer" attributes
      to protect them against reverse tabnabbing. Note that since management
      UI only includes a small number of external links to trusted resources,
      reverse tabnabbing is unlikely to affect most users. However, it can
      show up in security scanner results and become an issue in environments
      where a modified version of RabbitMQ is offered as a service.
    + Plugin could stop in environments where no static Shovels were defined
      and a specific sequence of events happens at the same time.
    + When installation directory was overridden, the plugins directory did
      not respect the updated base installation path.
    + Intra-cluster communication link metric collector could run into an
      exception when peer connection has just been re-established, e.g. after
      a peer node restart.
    + When a node was put into maintenance mode, it closed all MQTT client
      connections cluster-wide instead of just local client connections.
    + Reduced log noise from exceptions connections could run into when a
      client was closings it connection end concurrently with other activity.
    + `rabbitmq-env-conf.bat§ on Windows could fail to load when its path
      contained spaces.
    + Stream declaration could run into an exception when stream parameters
      failed validation.
    + Some counters on the Overview page have been moved to global counters
      introduced in RabbitMQ 3.9.
    + Avoid an exception when MQTT client closes TCP connection before server
      could fully process a `CONNECT` frame sent earlier by the same client.
    + Channels on connections to mixed clusters that had 3.8 nodes in them
      could run into an exception.
    + Inter-node cluster link statistics did not have any data when TLS was
      enabled for them.
    + Quorum queues now correctly propagate errors when a `basic.get` (polling
      consumption) operation hits a timeout.
    + Stream consumer that used AMQP 0-9-1 instead of a stream protocol
      client, and disconnected, leaked a file handle.
    + Max frame size and client heartbeat parameters for [RabbitMQ stream]()
      clients were not correctly set when taken from `rabbitmq.conf`.
    + Removed a duplicate exchange decorator set operation.
    + Node restarts could result in a hashing ring inconsistency.
    + Avoid seeding default user in old clusters that still use the deprecated
      `management.load_definitions` option.
    + Streams could run into an exception or fetch stale stream position data
      in some scenarios.
    + `rabbitmqctl set_log_level` did not have any effect on logging via
      `amq.rabbitmq.log`.
    + `rabbitmq-diagnostics status` is now more resilient and won't fail if
      free disk space monitoring repeatedly fails (gets disabled) on the node.
    + CLI tools failed to run on Erlang 25 because of an old version of Elixir
      (compiled on Erlang 21) was used in the release pipeline. Erlang 25 no
      longer loads modules
      compiled on Erlang 21 or older.
    + Default log level used a four-character severity abbreviation instead of
      more common longer format, for example, `warn` instead of `warning`.
    + `rabbitmqctl set_log_level` documentation clarification.
    + Nodes now make sure that maintenance mode status table exists after node
      boot as long as the feature flag is enabled.
    + "In flight" messages directed to an exchange that has just been deleted
      will be silently dropped or returned back to the publisher instead of
      causing an exception.
    + rabbitmq-upgrade await_online_synchronized_mirror is now a no-op in
      single node clusters
    + One metric that was exposed via CLI tools and management plugin's HTTP
      API was not exposed via Prometheus scraping API.
    + Stream delivery rate could drop if concurrent stream consumers consumed
      in a way that made them reach the end of the stream often.
    + If a cluster that had streams enabled was upgraded with a jump of
      multiple patch releases, stream state could fail an upgrade.
    + Significantly faster queue re-import from definitions on subsequent node
      restarts. Initial definition import still takes the same amount of time
      as before.
    + When a policy contained keys unsupported by a particular queue
      type, and later updated or superseded by a higher priority policy,
      effective optional argument list could become inconsistent (policy
      would not have the expected effect).
    + Priority queues could run into an exception in some cases.
    + Maintenance mode could run into a timeout during queue leadership
      transfer.
    + Prometheus collector could run into an exception early on node's
      schema database sync.
    + Connection data transfer rate units were incorrectly displayed when
      rate was less than 1 kiB per second.
    + `rabbitmqadmin` now correctly loads TLS-related keys from its
      configuration file.
    + Corrected a help message for node memory usage tool tip.
* Added new dep8 tests:
  - d/t/hello-world
  - d/t/publish-subscribe
  - d/t/rpc
  - d/t/work-queue
* Remove patches fixed upstream:
  - d/p/lp1999816-fix-rabbitmqctl-status-disk-free-timeout.patch

------------------------------------------------------------------------------

Focal Changes:
* New upstream verison 3.8.3 (LP: #2060248).
 - Updates:
   + Some Proxy protocol errors are now logged at debug level.
     This reduces log noise in environments where TCP load balancers and
     proxies perform health checks by opening a TCP connection but never
     sending any data.
   + Quorum queue deletion operation no longer supports the "if unused" and
     "if empty" options. They are typically used for transient queues don't
     make much sense for quorum ones.
   + Do not treat applications that do not depend on rabbit as plugins.
     This is especially important for applications that should not be stopped
     before rabbit is stopped.
   + RabbitMQ nodes will now gracefully shutdown when receiving a `SIGTERM`
     signal. Previously the runtime would invoke a default handler that
     terminates the VM giving RabbitMQ no chance to execute its shutdown
     steps.
   + Every cluster now features a persistent internal cluster ID that can be
     used by core features or plugins. Unlike the human-readable cluster name,
     the value cannot be overridden by the user.
   + Speedup execution of boot steps by a factor of 2N, where N is the number
     of attributes per step.
   + New health checks that can be used to determine if it's a good moment to
     shut down a node for an upgrade.

     ``` sh
     # Exits with a non-zero code if target node hosts leader replica of at
     # least one queue that has out-of-sync mirror.
     rabbitmq-diagnostics check_if_node_is_mirror_sync_critical

     # Exits with a non-zero code if one or more quorum queues will lose
     # online quorum should target node be shut down
     rabbitmq-diagnostics check_if_node_is_quorum_critical
     ```
   + Some proxy protocol errors are now logged at debug level. This reduces
     log noise in environments where TCP load balancers and proxies perform
     health checks by opening a TCP connection but never sending any data.
   + Management and Management Agent Plugins:
     * An undocumented "automagic login" feature on the login form was
       removed.
     * A new `POST /login` endpoint can be used by custom management UI login
       forms to authenticate the user and set the cookie.
     * A new `POST /rebalance/queues` endpoint that is the HTTP API equivalent
       of `rabbitmq-queues rebalance`
     * Warning about a missing `handle.exe` in `PATH` on Windows is now only
       logged every 10 minutes.
     * `rabbitmqadmin declare queue` now supports a new `queue_type` parameter
       to simplify declaration of quorum queues.
     * HTTP API request log entries now includes acting user.
     * Content Security Policy headers are now also set for static assets such
       as JavaScript files.
   + Prometheus Plugin:
     * Add option to aggregate metrics for channels, queues & connections.
       Metrics are now aggregated by default (safe by default).
   + Kubernetes Peer Discovery Plugin:
     * The plugin will now notify Kubernetes API of node startup and peer
       stop/unavailability events. This new behaviour can be disabled via
       `prometheus.return_per_object_metrics = true` config.
   + Federation Plugin:
     * "Command" operations such as binding propagation now use a separate
       channel for all links, preventing latency spikes for asynchronous
       operations (such as message publishing) (a head-of-line blocking
       problem).
   + Auth Backend OAuth 2 Plugin:
     * Additional scopes can be fetched from a predefined JWT token field.
       Those scopes will be combined with the standard scopes field.
   + Trust Store Plugin:
     * HTTPS certificate provider will not longer terminate if upstream
       service response contains invalid JSON.
   + MQTT Plugin:
     * Avoid blocking when registering or unregistering a client ID.
   + AMQP 1.0 Client Plugin:
     * Handle heartbeat in `close_sent/2`.
 - Bug Fixes:
   + Reduced scheduled GC activity in connection socket writer to one run per
     1 GiB of data transferred, with an option to change the value or disable
     scheduled run entirely.
   + Eliminated an inefficiency in recovery of quorum queues with a backlog of
     messages.
   + In a case where a node hosting a quorum queue replica went offline and
     was removed from the cluster, and later came back, quorum queues could
     enter a loop of Raft leader elections.
   + Quorum queues with a dead lettering could fail to recover.
   + The node now can recover even if virtual host recovery terms file was
     corrupted.
   + Autoheal could fail to finish if one of its state transitions initiated
     by a remote node timed out.
   + Syslog client is now started even when Syslog logging is configured only
     for some log sinks.
   + Policies that quorum queues ignored were still listed as applied to them.
   + If a quorum queue leader rebalancing operation timed out, CLI tools
     failed with an exception instead of a sensible internal API response.
   + Handle timeout error on the rebalance function.
   + Handle and raise protocol error for absent queues assumed to be alive.
   + `rabbitmq-diagnostics status` failed to display the results when executed
     against a node that had high VM watermark set as an absolute value
     (using `vm_memory_high_watermark.absolute`).
   + Management and Management Agent Plugins:
     * Consumer section on individual page was unintentionally hidden.
   + Management and Management Agent Plugins:
     * Fix queue-type select by adding unsafe-inline CSP policy.
   + Etcd Peer Discovery Plugin:
     * Only run healthcheck when backend is configured.
   + Federation Plugin:
     * Use vhost to delete federated exchange.
* Added new dep8 tests:
  - d/t/smoke-test
  - d/t/hello-world
  - d/t/publish-subscribe
  - d/t/rpc
  - d/t/work-queue

Related branches

description: updated
description: updated
Changed in rabbitmq-server (Ubuntu):
assignee: nobody → Mitchell Dzurick (mitchdz)
Changed in rabbitmq-server (Ubuntu Focal):
status: New → In Progress
Changed in rabbitmq-server (Ubuntu Jammy):
status: New → In Progress
no longer affects: rabbitmq-server (Ubuntu Mantic)
no longer affects: rabbitmq-server (Ubuntu Noble)
Revision history for this message
Paride Legovini (paride) wrote :

Focal MRE uploaded:

Uploading rabbitmq-server_3.8.3-0ubuntu0.1.dsc
Uploading rabbitmq-server_3.8.3.orig.tar.xz
Uploading rabbitmq-server_3.8.3-0ubuntu0.1.debian.tar.xz
Uploading rabbitmq-server_3.8.3-0ubuntu0.1_source.buildinfo
Uploading rabbitmq-server_3.8.3-0ubuntu0.1_source.changes

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks Paride!

Just a heads up - let's please block both jammy/focal in the -proposed pocket until our openstack team can run the CI/CD with these versions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.