diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/changelog xen-4.11.3+24-g14b62ab3e5/debian/changelog
--- xen-4.11.3+24-g14b62ab3e5/debian/changelog	2020-03-09 15:17:56.000000000 +0000
+++ xen-4.11.3+24-g14b62ab3e5/debian/changelog	2022-07-13 14:07:29.000000000 +0100
@@ -1,3 +1,100 @@
+xen (4.11.3+24-g14b62ab3e5-1ubuntu2.3) focal-security; urgency=medium
+
+  * SECURITY UPDATE: CVE-2020-0543, CVE-2020-11739, CVE-2020-11740,
+    CVE-2020-11741, CVE-2020-11742, CVE-2020-11743, CVE-2020-15563,
+    CVE-2020-15565, CVE-2020-15566, CVE-2020-25595, CVE-2020-25596,
+    CVE-2020-25597, CVE-2020-25599, CVE-2020-25600, CVE-2020-25601,
+    CVE-2020-25602, CVE-2020-25603, CVE-2020-25604, CVE-2020-27670,
+    CVE-2020-27671, CVE-2020-27672, CVE-2020-27674, CVE-2020-28368,
+    CVE-2020-29040, CVE-2020-29479, CVE-2020-29480, CVE-2020-29481,
+    CVE-2020-29482, CVE-2020-29483, CVE-2020-29484, CVE-2020-29485,
+    CVE-2020-29486, CVE-2020-29566, CVE-2020-29570, CVE-2020-29571,
+    CVE-2021-0089, CVE-2021-26313, CVE-2021-26933, CVE-2021-27379,
+    CVE-2021-28689, CVE-2021-28690, CVE-2021-28692, CVE-2021-28694,
+    CVE-2021-28695, CVE-2021-28696, CVE-2021-28697, CVE-2021-28698,
+    CVE-2021-28699, CVE-2021-28701, CVE-2021-28704, CVE-2021-28705,
+    CVE-2021-28706, CVE-2021-28707, CVE-2021-28708, CVE-2021-28709,
+    CVE-2022-23034, CVE-2022-23035, CVE-2022-26356, CVE-2022-26357,
+    CVE-2022-26358, CVE-2022-26359, CVE-2022-26360, CVE-2022-26361,
+    CVE-2022-26362, CVE-2022-26363 and CVE-2022-26364 (LP: #1970507).
+    - Also fixes CVE-2018-3639 on Arm systems.
+    - debian/patches/*.patch: New patches from upstream security advisories.
+      Some were backported to this version.
+    - debian/patches/evtchn-fifo-use-stable-fields-when-recording-last-queue-information.patch,
+      debian/patches/xen-evtchn-rework-per-event-channel-lock.patch:
+      New patches from stable-4.11 branch in upstream Git needed to apply
+      xen-events-access-last_priority-and-last_vcpu_id-together.patch.
+    - debian/patches/xen-events-access-last_priority-and-last_vcpu_id-together.patch:
+      New patch from stable-4.11 branch in upstream Git needed to apply
+      fix_event_channel_race.patch.
+    - debian/patches/x86-pv-Options-to-disable-and-or-compile-out-32bit-PV-support.patch:
+      New backported patch from master branch in upstream Git needed to apply
+      0002-SUPPORT.md-Un-shimmed-32-bit-PV-guests-are-no-longer.patch.
+    - debian/patches/xen-split-parameter-related-definitions-in-own-header-file.patch:
+      New backported patch from master branch in upstream Git needed to apply
+      x86-pv-Options-to-disable-and-or-compile-out-32bit-PV-support.patch.
+    - debian/patches/fix_event_channel_race.patch: New patch from stable-4.11
+      branch in upstream Git needed to apply xsa358-4.14.patch.
+    - debian/patches/AMD-IOMMU-fix-off-by-one-in-amd_iommu_get_paging_mode-callers.patch:
+      New patch from stable-4.11 branch in upstream Git needed to apply
+      xsa378-4.11-6.patch.
+    - debian/patches/xen-arm-Simplify-alternative-patching-of-non-writable-region.patch:
+      New patch from stable-4.12 branch in upstream Git needed to apply
+      xen-arm-alternatives-Add-dynamic-patching-feature.patch.
+    - debian/patches/xen-arm64-entry-Use-named-label-in-guest_sync.patch,
+      debian/patches/xen-arm-alternatives-Add-dynamic-patching-feature.patch:
+      New backported patches from stable-4.12 branch in upstream Git needed to
+      apply xen-arm64-Implement-a-fast-path-for-handling-SMCCC_ARCH_WORKAROUND_2.patch.
+    - debian/patches/xen-arm64-Add-generic-assembly-macros.patch: New backported
+      patch from stable-4.12 branch in upstream Git needed to apply
+      xsa398-4.12-4-xen-arm-Add-Spectre-BHB-handling.patch.
+    - debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-probing.patch,
+      debian/patches/xen-arm-Add-command-line-option-to-control-SSBD-mitigation.patch,
+      debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-support-for-guests.patch,
+      debian/patches/xen-arm64-Implement-a-fast-path-for-handling-SMCCC_ARCH_WORKAROUND_2.patch:
+      New backported patches from stable-4.12 branch in upstream Git that fix
+      CVE-2018-3639 for Arm systems, and some of these are needed to apply
+      xsa398-4.12-5-xen-arm-Allow-to-discover-and-use-SMCCC_ARCH_WORKARO.patch.
+    - debian/patches/VT-d-dont-pass-bridge-devices-to-domain_context_mapping_one.patch:
+      New backported patch from stable-4.12 branch in upstream Git needed to
+      apply xsa400-4.12-04.patch.
+    - debian/patches/amd-iommu-get-rid-of-pointless-IOMMU_PAGING_MODE_LEVEL_X-definitions.patch:
+      New backported patch from stable-4.12 branch in upstream Git needed to
+      apply xsa400-4.12-10.patch.
+    - debian/patches/x86-feature-Generalise-synth-and-introduce-a-bug-word.patch,
+      debian/patches/x86-AMD-Fix-handling-of-x87-exception-pointers-on-Fam17h-hardware.patch:
+      New backported patches from stable-4.13 branch in upstream Git needed to
+      apply xsa402-4.13-4.patch.
+    - debian/patches/x86-cpu-intel-Clear-cache-self-snoop-capability-in-CPUs-with-known-errata.patch:
+      New backported patch from stable-4.13 branch in upstream Git needed to
+      apply xsa402-4.13-5.patch.
+  * debian/not-installed: Do not install systemd-specific files.
+  * debian/source/lintian-overrides: Override debhelper-but-no-misc-depends
+    warnings for transitional packages.
+  * debian/control: Add ${misc:Depends} to dependencies of xen-doc.
+  * debian/control: Remove build dependency on autotools-dev.
+
+ -- Luís Infante da Câmara <luis.infante.da.camara@tecnico.ulisboa.pt>  Wed, 13 Jul 2022 14:07:29 +0100
+
+xen (4.11.3+24-g14b62ab3e5-1ubuntu2.2) focal; urgency=medium
+
+  * Fix FTBFS on armhf/arm64 due to missing <asm/unaligned.h>:
+    - d/p/lp1956166-0006-fix-ftbfs-arm-lzo-unaligned.h.patch
+
+ -- Mauricio Faria de Oliveira <mfo@canonical.com>  Thu, 07 Jul 2022 13:53:37 -0300
+
+xen (4.11.3+24-g14b62ab3e5-1ubuntu2.1) focal; urgency=medium
+
+  * Add support for zstd compressed kernels for Dom0/DomU on x86 (LP: #1956166)
+    - d/p/lp1956166-0001-introduce-unaligned.h.patch
+    - d/p/lp1956166-0002-lib-introduce-xxhash.patch
+    - d/p/lp1956166-0003-x86-Dom0-support-zstd-compressed-kernels.patch
+    - d/p/lp1956166-0004-libxenguest-add-get_unaligned_le32.patch
+    - d/p/lp1956166-0005-libxenguest-support-zstd-compressed-kernels.patch
+    - d/control: add libzstd-dev as build-dep
+
+ -- Mauricio Faria de Oliveira <mfo@canonical.com>  Mon, 04 Jul 2022 16:02:20 -0300
+
 xen (4.11.3+24-g14b62ab3e5-1ubuntu2) focal; urgency=medium
 
   * Update: Building hypervisor with cf-protection enabled
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/control xen-4.11.3+24-g14b62ab3e5/debian/control
--- xen-4.11.3+24-g14b62ab3e5/debian/control	2020-02-28 13:14:00.000000000 +0000
+++ xen-4.11.3+24-g14b62ab3e5/debian/control	2022-07-13 14:06:12.000000000 +0100
@@ -4,11 +4,10 @@
 XSBC-Original-Maintainer: Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>
 Uploaders: Guido Trotter <ultrotter@debian.org>, Bastian Blank <waldi@debian.org>, Ian Jackson <ian.jackson@eu.citrix.com>
 Section: admin
-Standards-Version: 3.9.4
+Standards-Version: 4.5.0
 Build-Depends:
    debhelper (>= 10),
    dh-exec,
-   autotools-dev,
    dpkg-dev (>= 1.16.0~),
    rdfind,
    lsb-release,
@@ -34,6 +33,7 @@
    ocaml-native-compilers | ocaml-nox,
    ocaml-findlib,
    lmodern,
+   libzstd-dev,
 XS-Python-Version: current
 Homepage: https://xenproject.org/
 Vcs-Browser: https://salsa.debian.org/xen-team/debian-xen
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/libxen-4.11.bug-control xen-4.11.3+24-g14b62ab3e5/debian/libxen-4.11.bug-control
--- xen-4.11.3+24-g14b62ab3e5/debian/libxen-4.11.bug-control	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/libxen-4.11.bug-control	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,2 @@
+# autogenerated, do not edit
+Submit-As: src:xen
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/libxenmisc4.11.install xen-4.11.3+24-g14b62ab3e5/debian/libxenmisc4.11.install
--- xen-4.11.3+24-g14b62ab3e5/debian/libxenmisc4.11.install	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/libxenmisc4.11.install	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,9 @@
+# autogenerated, do not edit
+usr/lib/*/libxenctrl.so.*
+usr/lib/*/libxenguest.so.*
+usr/lib/*/libxenlight.so.*
+usr/lib/*/libxenstat.so.*
+usr/lib/*/libxenvchan.so.*
+usr/lib/*/libxlutil.so.*
+usr/lib/xen-4.11/lib/*/libfsimage*
+usr/lib/xen-4.11/lib/*/fs
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/libxenmisc4.11.lintian-overrides xen-4.11.3+24-g14b62ab3e5/debian/libxenmisc4.11.lintian-overrides
--- xen-4.11.3+24-g14b62ab3e5/debian/libxenmisc4.11.lintian-overrides	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/libxenmisc4.11.lintian-overrides	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,8 @@
+# autogenerated, do not edit
+no-symbols-control-file usr/lib/*/lib*.so.4.11.0
+# ^ the ABI changes every Xen release and every Debian release anyway
+#   and we do not upload to Debian packages based on Xen upstream
+#   versions which are at least an rc with a stable ABI.
+
+package-name-doesnt-match-sonames
+# ^ yes, this is a portmanteau package.  They all change at once.
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/not-installed xen-4.11.3+24-g14b62ab3e5/debian/not-installed
--- xen-4.11.3+24-g14b62ab3e5/debian/not-installed	2020-02-28 13:14:00.000000000 +0000
+++ xen-4.11.3+24-g14b62ab3e5/debian/not-installed	2022-06-17 08:58:55.000000000 +0100
@@ -34,3 +34,7 @@
 # If someone wants this, suggestions from ocaml experts on what
 # to ship where would be welcome.
 usr/local/lib/ocaml
+
+# systemd-specific files are not installed in this version
+usr/lib/modules-load.d
+usr/lib/systemd
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-SUPPORT.md-Document-speculative-attacks-status-of-no.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-SUPPORT.md-Document-speculative-attacks-status-of-no.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-SUPPORT.md-Document-speculative-attacks-status-of-no.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-SUPPORT.md-Document-speculative-attacks-status-of-no.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,55 @@
+From 4e37e21f6e71752fb69c27ab9f1417a5d19ebedb Mon Sep 17 00:00:00 2001
+From: Ian Jackson <ian.jackson@eu.citrix.com>
+Date: Tue, 9 Mar 2021 15:00:47 +0000
+Subject: [PATCH 1/2] SUPPORT.md: Document speculative attacks status of
+ non-shim 32-bit PV
+
+This documents, but does not fix, XSA-370.
+
+Reported-by: Jann Horn <jannh@google.com>
+Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
+Signed-off-by: George Dunlap <george.dunlap@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+---
+
+NB that the security team does not consider the security support
+status of un-shimmed 32-bit PV guests in this patch to be particularly
+useful. However, we do not consider ourselves to have the authority to decide
+to completely de-support 32-bit PV guests without community consultation.
+
+The support status in this patch should therefore be considered
+transitional.  A permanent support status is proposed in a subsequent
+patch in this series.
+
+v2:
+- Fix double 'be'
+- Don't mention user -> kernel attacks, which have nothing to do with Xen
+---
+ SUPPORT.md | 11 ++++++++++-
+ 1 file changed, 10 insertions(+), 1 deletion(-)
+
+diff --git a/SUPPORT.md b/SUPPORT.md
+index 7db4568f1a..6dcd93e22f 100644
+--- a/SUPPORT.md
++++ b/SUPPORT.md
+@@ -84,7 +84,16 @@ Traditional Xen PV guest
+ 
+ No hardware requirements
+ 
+-    Status: Supported
++    Status, x86_64: Supported
++    Status, x86_32, shim: Supported
++    Status, x86_32, without shim: Supported, with caveats
++
++Due to architectural limitations,
++32-bit PV guests must be assumed to be able to read arbitrary host memory
++using speculative execution attacks.
++Advisories will continue to be issued
++for new vulnerabilities related to un-shimmed 32-bit PV guests
++enabling denial-of-service attacks or privilege escalation attacks.
+ 
+ ### x86/HVM
+ 
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-tools-ocaml-xenstored-ignore-transaction-id-for-un-w.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-tools-ocaml-xenstored-ignore-transaction-id-for-un-w.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-tools-ocaml-xenstored-ignore-transaction-id-for-un-w.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-tools-ocaml-xenstored-ignore-transaction-id-for-un-w.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,43 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: ignore transaction id for [un]watch
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Instead of ignoring the transaction id for XS_WATCH and XS_UNWATCH
+commands as it is documented in docs/misc/xenstore.txt, it is tested
+for validity today.
+
+Really ignore the transaction id for XS_WATCH and XS_UNWATCH.
+
+This is part of XSA-115.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index 74c69f869c..0a0e43d1f0 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -492,12 +492,19 @@ let retain_op_in_history ty =
+ 	| Xenbus.Xb.Op.Reset_watches
+ 	| Xenbus.Xb.Op.Invalid           -> false
+ 
++let maybe_ignore_transaction = function
++	| Xenbus.Xb.Op.Watch | Xenbus.Xb.Op.Unwatch -> fun tid ->
++		if tid <> Transaction.none then
++			debug "Ignoring transaction ID %d for watch/unwatch" tid;
++		Transaction.none
++	| _ -> fun x -> x
++
+ (**
+  * Nothrow guarantee.
+  *)
+ let process_packet ~store ~cons ~doms ~con ~req =
+ 	let ty = req.Packet.ty in
+-	let tid = req.Packet.tid in
++	let tid = maybe_ignore_transaction ty req.Packet.tid in
+ 	let rid = req.Packet.rid in
+ 	try
+ 		let fct = function_of_type ty in
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-tools-xenstore-allow-removing-child-of-a-node-exceed.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-tools-xenstore-allow-removing-child-of-a-node-exceed.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-tools-xenstore-allow-removing-child-of-a-node-exceed.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-tools-xenstore-allow-removing-child-of-a-node-exceed.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,157 @@
+From e92f3dfeaae21a335e666c9247954424e34e5c56 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:37 +0200
+Subject: [PATCH 01/10] tools/xenstore: allow removing child of a node
+ exceeding quota
+
+An unprivileged user of Xenstore is not allowed to write nodes with a
+size exceeding a global quota, while privileged users like dom0 are
+allowed to write such nodes. The size of a node is the needed space
+to store all node specific data, this includes the names of all
+children of the node.
+
+When deleting a node its parent has to be modified by removing the
+name of the to be deleted child from it.
+
+This results in the strange situation that an unprivileged owner of a
+node might not succeed in deleting that node in case its parent is
+exceeding the quota of that unprivileged user (it might have been
+written by dom0), as the user is not allowed to write the updated
+parent node.
+
+Fix that by not checking the quota when writing a node for the
+purpose of removing a child's name only.
+
+The same applies to transaction handling: a node being read during a
+transaction is written to the transaction specific area and it should
+not be tested for exceeding the quota, as it might not be owned by
+the reader and presumably the original write would have failed if the
+node is owned by the reader.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ tools/xenstore/xenstored_core.c        | 20 +++++++++++---------
+ tools/xenstore/xenstored_core.h        |  3 ++-
+ tools/xenstore/xenstored_transaction.c |  2 +-
+ 3 files changed, 14 insertions(+), 11 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 97ceabf9642d..b43e1018babd 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -417,7 +417,8 @@ static struct node *read_node(struct connection *conn, const void *ctx,
+ 	return node;
+ }
+ 
+-int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node)
++int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
++		   bool no_quota_check)
+ {
+ 	TDB_DATA data;
+ 	void *p;
+@@ -427,7 +428,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node)
+ 		+ node->num_perms*sizeof(node->perms[0])
+ 		+ node->datalen + node->childlen;
+ 
+-	if (domain_is_unprivileged(conn) &&
++	if (!no_quota_check && domain_is_unprivileged(conn) &&
+ 	    data.dsize >= quota_max_entry_size) {
+ 		errno = ENOSPC;
+ 		return errno;
+@@ -455,14 +456,15 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node)
+ 	return 0;
+ }
+ 
+-static int write_node(struct connection *conn, struct node *node)
++static int write_node(struct connection *conn, struct node *node,
++		      bool no_quota_check)
+ {
+ 	TDB_DATA key;
+ 
+ 	if (access_node(conn, node, NODE_ACCESS_WRITE, &key))
+ 		return errno;
+ 
+-	return write_node_raw(conn, &key, node);
++	return write_node_raw(conn, &key, node, no_quota_check);
+ }
+ 
+ static enum xs_perm_type perm_for_conn(struct connection *conn,
+@@ -999,7 +1001,7 @@ static struct node *create_node(struct connection *conn, const void *ctx,
+ 	/* We write out the nodes down, setting destructor in case
+ 	 * something goes wrong. */
+ 	for (i = node; i; i = i->parent) {
+-		if (write_node(conn, i)) {
++		if (write_node(conn, i, false)) {
+ 			domain_entry_dec(conn, i);
+ 			return NULL;
+ 		}
+@@ -1039,7 +1041,7 @@ static int do_write(struct connection *conn, struct buffered_data *in)
+ 	} else {
+ 		node->data = in->buffer + offset;
+ 		node->datalen = datalen;
+-		if (write_node(conn, node))
++		if (write_node(conn, node, false))
+ 			return errno;
+ 	}
+ 
+@@ -1115,7 +1117,7 @@ static int remove_child_entry(struct connection *conn, struct node *node,
+ 	size_t childlen = strlen(node->children + offset);
+ 	memdel(node->children, offset, childlen + 1, node->childlen);
+ 	node->childlen -= childlen + 1;
+-	return write_node(conn, node);
++	return write_node(conn, node, true);
+ }
+ 
+ 
+@@ -1254,7 +1256,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ 	node->num_perms = num;
+ 	domain_entry_inc(conn, node);
+ 
+-	if (write_node(conn, node))
++	if (write_node(conn, node, false))
+ 		return errno;
+ 
+ 	fire_watches(conn, in, name, false);
+@@ -1514,7 +1516,7 @@ static void manual_node(const char *name, const char *child)
+ 	if (child)
+ 		node->childlen = strlen(child) + 1;
+ 
+-	if (write_node(NULL, node))
++	if (write_node(NULL, node, false))
+ 		barf_perror("Could not create initial node %s", name);
+ 	talloc_free(node);
+ }
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 56a279cfbb47..3cb1c235a101 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -149,7 +149,8 @@ void send_ack(struct connection *conn, enum xsd_sockmsg_type type);
+ char *canonicalize(struct connection *conn, const void *ctx, const char *node);
+ 
+ /* Write a node to the tdb data base. */
+-int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node);
++int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
++		   bool no_quota_check);
+ 
+ /* Get this node, checking we have permissions. */
+ struct node *get_node(struct connection *conn,
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index 2824f7b359b8..e87897573469 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -276,7 +276,7 @@ int access_node(struct connection *conn, struct node *node,
+ 			i->check_gen = true;
+ 			if (node->generation != NO_GENERATION) {
+ 				set_tdb_key(trans_name, &local_key);
+-				ret = write_node_raw(conn, &local_key, node);
++				ret = write_node_raw(conn, &local_key, node, true);
+ 				if (ret)
+ 					goto err;
+ 				i->ta_node = true;
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-x86-mm-Refactor-map_pages_to_xen-to-have-only-a-sing.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-x86-mm-Refactor-map_pages_to_xen-to-have-only-a-sing.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-x86-mm-Refactor-map_pages_to_xen-to-have-only-a-sing.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0001-x86-mm-Refactor-map_pages_to_xen-to-have-only-a-sing.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,94 @@
+From edbe70427e17743351f1b739ea1536acd757ae6c Mon Sep 17 00:00:00 2001
+From: Wei Liu <wei.liu2@citrix.com>
+Date: Sat, 11 Jan 2020 21:57:41 +0000
+Subject: [PATCH 1/3] x86/mm: Refactor map_pages_to_xen to have only a single
+ exit path
+
+We will soon need to perform clean-ups before returning.
+
+No functional change.
+
+This is part of XSA-345.
+
+Reported-by: Hongyan Xia <hongyxia@amazon.com>
+Signed-off-by: Wei Liu <wei.liu2@citrix.com>
+Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
+Signed-off-by: George Dunlap <george.dunlap@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/arch/x86/mm.c | 17 +++++++++++------
+ 1 file changed, 11 insertions(+), 6 deletions(-)
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 626768a950..79a3fac3cc 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -5194,6 +5194,7 @@ int map_pages_to_xen(
+     l2_pgentry_t *pl2e, ol2e;
+     l1_pgentry_t *pl1e, ol1e;
+     unsigned int  i;
++    int rc = -ENOMEM;
+ 
+ #define flush_flags(oldf) do {                 \
+     unsigned int o_ = (oldf);                  \
+@@ -5214,7 +5215,8 @@ int map_pages_to_xen(
+         l3_pgentry_t ol3e, *pl3e = virt_to_xen_l3e(virt);
+ 
+         if ( !pl3e )
+-            return -ENOMEM;
++            goto out;
++
+         ol3e = *pl3e;
+ 
+         if ( cpu_has_page1gb &&
+@@ -5302,7 +5304,7 @@ int map_pages_to_xen(
+ 
+             pl2e = alloc_xen_pagetable();
+             if ( pl2e == NULL )
+-                return -ENOMEM;
++                goto out;
+ 
+             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
+                 l2e_write(pl2e + i,
+@@ -5331,7 +5333,7 @@ int map_pages_to_xen(
+ 
+         pl2e = virt_to_xen_l2e(virt);
+         if ( !pl2e )
+-            return -ENOMEM;
++            goto out;
+ 
+         if ( ((((virt >> PAGE_SHIFT) | mfn_x(mfn)) &
+                ((1u << PAGETABLE_ORDER) - 1)) == 0) &&
+@@ -5374,7 +5376,7 @@ int map_pages_to_xen(
+             {
+                 pl1e = virt_to_xen_l1e(virt);
+                 if ( pl1e == NULL )
+-                    return -ENOMEM;
++                    goto out;
+             }
+             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
+             {
+@@ -5401,7 +5403,7 @@ int map_pages_to_xen(
+ 
+                 pl1e = alloc_xen_pagetable();
+                 if ( pl1e == NULL )
+-                    return -ENOMEM;
++                    goto out;
+ 
+                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+                     l1e_write(&pl1e[i],
+@@ -5545,7 +5547,10 @@ int map_pages_to_xen(
+ 
+ #undef flush_flags
+ 
+-    return 0;
++    rc = 0;
++
++ out:
++    return rc;
+ }
+ 
+ int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-SUPPORT.md-Un-shimmed-32-bit-PV-guests-are-no-longer.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-SUPPORT.md-Un-shimmed-32-bit-PV-guests-are-no-longer.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-SUPPORT.md-Un-shimmed-32-bit-PV-guests-are-no-longer.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-SUPPORT.md-Un-shimmed-32-bit-PV-guests-are-no-longer.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,72 @@
+From: George Dunlap <george.dunlap@citrix.com>
+Subject: SUPPORT.md: Un-shimmed 32-bit PV guests are no longer supported
+
+The support status of 32-bit guests doesn't seem particularly useful.
+
+With it changed to fully unsupported outside of PV-shim, adjust the PV32
+Kconfig default accordingly.
+
+Reported-by: Jann Horn <jannh@google.com>
+Signed-off-by: George Dunlap <george.dunlap@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+---
+
+NB this patch should be considered a proposal to the community.  It
+will not become effective until three weeks after the XSA-370 embargo
+lifts, and only if there are no objections raised before that point.
+
+TBD: Should we also default opt_pv32 to false when not running in shim
+     mode?
+
+The (forward) dependency on PV_SHIM isn't very useful especially when
+configuring from scratch - we may want to re-order items down the road,
+such that the prompt for PV_SHIM occurs ahead of that for PV32. Yet then
+this conflicts with PV_SHIM also depending on GUEST.
+
+v3:
+- Add Kconfig adjustment.
+
+v2:
+- Port over changes in patch 1
+
+--- a/SUPPORT.md
++++ b/SUPPORT.md
+@@ -86,14 +86,7 @@ No hardware requirements
+ 
+     Status, x86_64: Supported
+     Status, x86_32, shim: Supported
+-    Status, x86_32, without shim: Supported, with caveats
+-
+-Due to architectural limitations,
+-32-bit PV guests must be assumed to be able to read arbitrary host memory
+-using speculative execution attacks.
+-Advisories will continue to be issued
+-for new vulnerabilities related to un-shimmed 32-bit PV guests
+-enabling denial-of-service attacks or privilege escalation attacks.
++    Status, x86_32, without shim: Supported, not security supported
+ 
+ ### x86/HVM
+ 
+--- a/xen/arch/x86/Kconfig
++++ b/xen/arch/x86/Kconfig
+@@ -56,7 +56,7 @@ config PV
+ config PV32
+ 	bool "Support for 32bit PV guests"
+ 	depends on PV
+-	default y
++	default PV_SHIM
+ 	---help---
+ 	  The 32bit PV ABI uses Ring1, an area of the x86 architecture which
+ 	  was deprecated and mostly removed in the AMD64 spec.  As a result,
+@@ -67,7 +67,10 @@ config PV32
+ 	  reduction, or performance reasons.  Backwards compatibility can be
+ 	  provided via the PV Shim mechanism.
+ 
+-	  If unsure, say Y.
++	  Note that outside of PV Shim, 32-bit PV guests are not security
++	  supported anymore.
++
++	  If unsure, use the default setting.
+ 
+ config PV_LINEAR_PT
+        bool "Support for PV linear pagetables"
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-tools-ocaml-xenstored-check-privilege-for-XS_IS_DOMA.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-tools-ocaml-xenstored-check-privilege-for-XS_IS_DOMA.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-tools-ocaml-xenstored-check-privilege-for-XS_IS_DOMA.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-tools-ocaml-xenstored-check-privilege-for-XS_IS_DOMA.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,30 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: check privilege for XS_IS_DOMAIN_INTRODUCED
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The Xenstore command XS_IS_DOMAIN_INTRODUCED should be possible for privileged
+domains only (the only user in the tree is the xenpaging daemon).
+
+This is part of XSA-115.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index 0a0e43d1f0..f374abe998 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -166,7 +166,9 @@ let do_setperms con t domains cons data =
+ let do_error con t domains cons data =
+ 	raise Define.Unknown_operation
+ 
+-let do_isintroduced con t domains cons data =
++let do_isintroduced con _t domains _cons data =
++	if not (Connection.is_dom0 con)
++	then raise Define.Permission_denied;
+ 	let domid =
+ 		match (split None '\000' data) with
+ 		| domid :: _ -> int_of_string domid
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-tools-xenstore-ignore-transaction-id-for-un-watch.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-tools-xenstore-ignore-transaction-id-for-un-watch.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-tools-xenstore-ignore-transaction-id-for-un-watch.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-tools-xenstore-ignore-transaction-id-for-un-watch.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,86 @@
+From e8076f73de65c4816f69d6ebf75839c706145fcd Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:38 +0200
+Subject: [PATCH 02/10] tools/xenstore: ignore transaction id for [un]watch
+
+Instead of ignoring the transaction id for XS_WATCH and XS_UNWATCH
+commands as it is documented in docs/misc/xenstore.txt, it is tested
+for validity today.
+
+Really ignore the transaction id for XS_WATCH and XS_UNWATCH.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ tools/xenstore/xenstored_core.c | 26 ++++++++++++++++----------
+ 1 file changed, 16 insertions(+), 10 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index b43e1018babd..bb2f9fd4e76e 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1268,13 +1268,17 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ static struct {
+ 	const char *str;
+ 	int (*func)(struct connection *conn, struct buffered_data *in);
++	unsigned int flags;
++#define XS_FLAG_NOTID		(1U << 0)	/* Ignore transaction id. */
+ } const wire_funcs[XS_TYPE_COUNT] = {
+ 	[XS_CONTROL]           = { "CONTROL",           do_control },
+ 	[XS_DIRECTORY]         = { "DIRECTORY",         send_directory },
+ 	[XS_READ]              = { "READ",              do_read },
+ 	[XS_GET_PERMS]         = { "GET_PERMS",         do_get_perms },
+-	[XS_WATCH]             = { "WATCH",             do_watch },
+-	[XS_UNWATCH]           = { "UNWATCH",           do_unwatch },
++	[XS_WATCH]             =
++	    { "WATCH",         do_watch,        XS_FLAG_NOTID },
++	[XS_UNWATCH]           =
++	    { "UNWATCH",       do_unwatch,      XS_FLAG_NOTID },
+ 	[XS_TRANSACTION_START] = { "TRANSACTION_START", do_transaction_start },
+ 	[XS_TRANSACTION_END]   = { "TRANSACTION_END",   do_transaction_end },
+ 	[XS_INTRODUCE]         = { "INTRODUCE",         do_introduce },
+@@ -1296,7 +1300,7 @@ static struct {
+ 
+ static const char *sockmsg_string(enum xsd_sockmsg_type type)
+ {
+-	if ((unsigned)type < XS_TYPE_COUNT && wire_funcs[type].str)
++	if ((unsigned int)type < ARRAY_SIZE(wire_funcs) && wire_funcs[type].str)
+ 		return wire_funcs[type].str;
+ 
+ 	return "**UNKNOWN**";
+@@ -1311,7 +1315,14 @@ static void process_message(struct connection *conn, struct buffered_data *in)
+ 	enum xsd_sockmsg_type type = in->hdr.msg.type;
+ 	int ret;
+ 
+-	trans = transaction_lookup(conn, in->hdr.msg.tx_id);
++	if ((unsigned int)type >= XS_TYPE_COUNT || !wire_funcs[type].func) {
++		eprintf("Client unknown operation %i", type);
++		send_error(conn, ENOSYS);
++		return;
++	}
++
++	trans = (wire_funcs[type].flags & XS_FLAG_NOTID)
++		? NULL : transaction_lookup(conn, in->hdr.msg.tx_id);
+ 	if (IS_ERR(trans)) {
+ 		send_error(conn, -PTR_ERR(trans));
+ 		return;
+@@ -1320,12 +1331,7 @@ static void process_message(struct connection *conn, struct buffered_data *in)
+ 	assert(conn->transaction == NULL);
+ 	conn->transaction = trans;
+ 
+-	if ((unsigned)type < XS_TYPE_COUNT && wire_funcs[type].func)
+-		ret = wire_funcs[type].func(conn, in);
+-	else {
+-		eprintf("Client unknown operation %i", type);
+-		ret = ENOSYS;
+-	}
++	ret = wire_funcs[type].func(conn, in);
+ 	if (ret)
+ 		send_error(conn, ret);
+ 
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-x86-mm-Refactor-modify_xen_mappings-to-have-one-exit.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-x86-mm-Refactor-modify_xen_mappings-to-have-one-exit.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-x86-mm-Refactor-modify_xen_mappings-to-have-one-exit.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0002-x86-mm-Refactor-modify_xen_mappings-to-have-one-exit.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,68 @@
+From 7101786be91dce650b6e79f1374c580c731bb348 Mon Sep 17 00:00:00 2001
+From: Wei Liu <wei.liu2@citrix.com>
+Date: Sat, 11 Jan 2020 21:57:42 +0000
+Subject: [PATCH 2/3] x86/mm: Refactor modify_xen_mappings to have one exit
+ path
+
+We will soon need to perform clean-ups before returning.
+
+No functional change.
+
+This is part of XSA-345.
+
+Reported-by: Hongyan Xia <hongyxia@amazon.com>
+Signed-off-by: Wei Liu <wei.liu2@citrix.com>
+Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
+Signed-off-by: George Dunlap <george.dunlap@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/arch/x86/mm.c | 12 +++++++++---
+ 1 file changed, 9 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 79a3fac3cc..8ed3ecacbe 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -5577,6 +5577,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+     l1_pgentry_t *pl1e;
+     unsigned int  i;
+     unsigned long v = s;
++    int rc = -ENOMEM;
+ 
+     /* Set of valid PTE bits which may be altered. */
+ #define FLAGS_MASK (_PAGE_NX|_PAGE_RW|_PAGE_PRESENT)
+@@ -5618,7 +5619,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+             /* PAGE1GB: shatter the superpage and fall through. */
+             pl2e = alloc_xen_pagetable();
+             if ( !pl2e )
+-                return -ENOMEM;
++                goto out;
++
+             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
+                 l2e_write(pl2e + i,
+                           l2e_from_pfn(l3e_get_pfn(*pl3e) +
+@@ -5673,7 +5675,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+                 /* PSE: shatter the superpage and try again. */
+                 pl1e = alloc_xen_pagetable();
+                 if ( !pl1e )
+-                    return -ENOMEM;
++                    goto out;
++
+                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+                     l1e_write(&pl1e[i],
+                               l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
+@@ -5802,7 +5805,10 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+     flush_area(NULL, FLUSH_TLB_GLOBAL);
+ 
+ #undef FLAGS_MASK
+-    return 0;
++    rc = 0;
++
++ out:
++    return rc;
+ }
+ 
+ #undef flush_area
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-tools-ocaml-xenstored-unify-watch-firing.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-tools-ocaml-xenstored-unify-watch-firing.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-tools-ocaml-xenstored-unify-watch-firing.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-tools-ocaml-xenstored-unify-watch-firing.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,29 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: unify watch firing
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This will make it easier insert additional checks in a follow-up patch.
+All watches are now fired from a single function.
+
+This is part of XSA-115.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
+index be9c62f27f..d7432c6597 100644
+--- a/tools/ocaml/xenstored/connection.ml
++++ b/tools/ocaml/xenstored/connection.ml
+@@ -210,8 +210,7 @@ let fire_watch watch path =
+ 		end else
+ 			path
+ 	in
+-	let data = Utils.join_by_null [ new_path; watch.token; "" ] in
+-	send_reply watch.con Transaction.none 0 Xenbus.Xb.Op.Watchevent data
++	fire_single_watch { watch with path = new_path }
+ 
+ (* Search for a valid unused transaction id. *)
+ let rec valid_transaction_id con proposed_id =
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-tools-xenstore-fix-node-accounting-after-failed-node.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-tools-xenstore-fix-node-accounting-after-failed-node.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-tools-xenstore-fix-node-accounting-after-failed-node.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-tools-xenstore-fix-node-accounting-after-failed-node.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,104 @@
+From b8c6dbb67ebb449126023446a7d209eedf966537 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:39 +0200
+Subject: [PATCH 03/10] tools/xenstore: fix node accounting after failed node
+ creation
+
+When a node creation fails the number of nodes of the domain should be
+the same as before the failed node creation. In case of failure when
+trying to create a node requiring to create one or more intermediate
+nodes as well (e.g. when /a/b/c/d is to be created, but /a/b isn't
+existing yet) it might happen that the number of nodes of the creating
+domain is not reset to the value it had before.
+
+So move the quota accounting out of construct_node() and into the node
+write loop in create_node() in order to be able to undo the accounting
+in case of an error in the intermediate node destructor.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Acked-by: Julien Grall <jgrall@amazon.com>
+---
+ tools/xenstore/xenstored_core.c | 37 ++++++++++++++++++++++-----------
+ 1 file changed, 25 insertions(+), 12 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index bb2f9fd4e76e..db9b9ca7957d 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -925,11 +925,6 @@ static struct node *construct_node(struct connection *conn, const void *ctx,
+ 	if (!parent)
+ 		return NULL;
+ 
+-	if (domain_entry(conn) >= quota_nb_entry_per_domain) {
+-		errno = ENOSPC;
+-		return NULL;
+-	}
+-
+ 	/* Add child to parent. */
+ 	base = basename(name);
+ 	baselen = strlen(base) + 1;
+@@ -962,7 +957,6 @@ static struct node *construct_node(struct connection *conn, const void *ctx,
+ 	node->children = node->data = NULL;
+ 	node->childlen = node->datalen = 0;
+ 	node->parent = parent;
+-	domain_entry_inc(conn, node);
+ 	return node;
+ 
+ nomem:
+@@ -982,6 +976,9 @@ static int destroy_node(void *_node)
+ 	key.dsize = strlen(node->name);
+ 
+ 	tdb_delete(tdb_ctx, key);
++
++	domain_entry_dec(talloc_parent(node), node);
++
+ 	return 0;
+ }
+ 
+@@ -998,18 +995,34 @@ static struct node *create_node(struct connection *conn, const void *ctx,
+ 	node->data = data;
+ 	node->datalen = datalen;
+ 
+-	/* We write out the nodes down, setting destructor in case
+-	 * something goes wrong. */
++	/*
++	 * We write out the nodes bottom up.
++	 * All new created nodes will have i->parent set, while the final
++	 * node will be already existing and won't have i->parent set.
++	 * New nodes are subject to quota handling.
++	 * Initially set a destructor for all new nodes removing them from
++	 * TDB again and undoing quota accounting for the case of an error
++	 * during the write loop.
++	 */
+ 	for (i = node; i; i = i->parent) {
+-		if (write_node(conn, i, false)) {
+-			domain_entry_dec(conn, i);
++		/* i->parent is set for each new node, so check quota. */
++		if (i->parent &&
++		    domain_entry(conn) >= quota_nb_entry_per_domain) {
++			errno = ENOSPC;
+ 			return NULL;
+ 		}
+-		talloc_set_destructor(i, destroy_node);
++		if (write_node(conn, i, false))
++			return NULL;
++
++		/* Account for new node, set destructor for error case. */
++		if (i->parent) {
++			domain_entry_inc(conn, i);
++			talloc_set_destructor(i, destroy_node);
++		}
+ 	}
+ 
+ 	/* OK, now remove destructors so they stay around */
+-	for (i = node; i; i = i->parent)
++	for (i = node; i->parent; i = i->parent)
+ 		talloc_set_destructor(i, NULL);
+ 	return node;
+ }
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-x86-mm-Prevent-some-races-in-hypervisor-mapping-upda.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-x86-mm-Prevent-some-races-in-hypervisor-mapping-upda.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-x86-mm-Prevent-some-races-in-hypervisor-mapping-upda.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0003-x86-mm-Prevent-some-races-in-hypervisor-mapping-upda.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,249 @@
+From e7bbc4a0b5af76a82f0dcf4afcbf1509b020eb73 Mon Sep 17 00:00:00 2001
+From: Hongyan Xia <hongyxia@amazon.com>
+Date: Sat, 11 Jan 2020 21:57:43 +0000
+Subject: [PATCH 3/3] x86/mm: Prevent some races in hypervisor mapping updates
+
+map_pages_to_xen will attempt to coalesce mappings into 2MiB and 1GiB
+superpages if possible, to maximize TLB efficiency.  This means both
+replacing superpage entries with smaller entries, and replacing
+smaller entries with superpages.
+
+Unfortunately, while some potential races are handled correctly,
+others are not.  These include:
+
+1. When one processor modifies a sub-superpage mapping while another
+processor replaces the entire range with a superpage.
+
+Take the following example:
+
+Suppose L3[N] points to L2.  And suppose we have two processors, A and
+B.
+
+* A walks the pagetables, get a pointer to L2.
+* B replaces L3[N] with a 1GiB mapping.
+* B Frees L2
+* A writes L2[M] #
+
+This is race exacerbated by the fact that virt_to_xen_l[21]e doesn't
+handle higher-level superpages properly: If you call virt_xen_to_l2e
+on a virtual address within an L3 superpage, you'll either hit a BUG()
+(most likely), or get a pointer into the middle of a data page; same
+with virt_xen_to_l1 on a virtual address within either an L3 or L2
+superpage.
+
+So take the following example:
+
+* A reads pl3e and discovers it to point to an L2.
+* B replaces L3[N] with a 1GiB mapping
+* A calls virt_to_xen_l2e() and hits the BUG_ON() #
+
+2. When two processors simultaneously try to replace a sub-superpage
+mapping with a superpage mapping.
+
+Take the following example:
+
+Suppose L3[N] points to L2.  And suppose we have two processors, A and B,
+both trying to replace L3[N] with a superpage.
+
+* A walks the pagetables, get a pointer to pl3e, and takes a copy ol3e pointing to L2.
+* B walks the pagetables, gets a pointre to pl3e, and takes a copy ol3e pointing to L2.
+* A writes the new value into L3[N]
+* B writes the new value into L3[N]
+* A recursively frees all the L1's under L2, then frees L2
+* B recursively double-frees all the L1's under L2, then double-frees L2 #
+
+Fix this by grabbing a lock for the entirety of the mapping update
+operation.
+
+Rather than grabbing map_pgdir_lock for the entire operation, however,
+repurpose the PGT_locked bit from L3's page->type_info as a lock.
+This means that rather than locking the entire address space, we
+"only" lock a single 512GiB chunk of hypervisor address space at a
+time.
+
+There was a proposal for a lock-and-reverify approach, where we walk
+the pagetables to the point where we decide what to do; then grab the
+map_pgdir_lock, re-verify the information we collected without the
+lock, and finally make the change (starting over again if anything had
+changed).  Without being able to guarantee that the L2 table wasn't
+freed, however, that means every read would need to be considered
+potentially unsafe.  Thinking carefully about that is probably
+something that wants to be done on public, not under time pressure.
+
+This is part of XSA-345.
+
+Reported-by: Hongyan Xia <hongyxia@amazon.com>
+Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
+Signed-off-by: George Dunlap <george.dunlap@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/arch/x86/mm.c | 92 +++++++++++++++++++++++++++++++++++++++++++++--
+ 1 file changed, 89 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 8ed3ecacbe..4ff24de73d 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -2153,6 +2153,50 @@ void page_unlock(struct page_info *page)
+     current_locked_page_set(NULL);
+ }
+ 
++/*
++ * L3 table locks:
++ *
++ * Used for serialization in map_pages_to_xen() and modify_xen_mappings().
++ *
++ * For Xen PT pages, the page->u.inuse.type_info is unused and it is safe to
++ * reuse the PGT_locked flag. This lock is taken only when we move down to L3
++ * tables and below, since L4 (and above, for 5-level paging) is still globally
++ * protected by map_pgdir_lock.
++ *
++ * PV MMU update hypercalls call map_pages_to_xen while holding a page's page_lock().
++ * This has two implications:
++ * - We cannot reuse reuse current_locked_page_* for debugging
++ * - To avoid the chance of deadlock, even for different pages, we
++ *   must never grab page_lock() after grabbing l3t_lock().  This
++ *   includes any page_lock()-based locks, such as
++ *   mem_sharing_page_lock().
++ *
++ * Also note that we grab the map_pgdir_lock while holding the
++ * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
++ * reverse order.
++ */
++static void l3t_lock(struct page_info *page)
++{
++    unsigned long x, nx;
++
++    do {
++        while ( (x = page->u.inuse.type_info) & PGT_locked )
++            cpu_relax();
++        nx = x | PGT_locked;
++    } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
++}
++
++static void l3t_unlock(struct page_info *page)
++{
++    unsigned long x, nx, y = page->u.inuse.type_info;
++
++    do {
++        x = y;
++        BUG_ON(!(x & PGT_locked));
++        nx = x & ~PGT_locked;
++    } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
++}
++
+ /*
+  * PTE flags that a guest may change without re-validating the PTE.
+  * All other bits affect translation, caching, or Xen's safety.
+@@ -5184,6 +5228,23 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
+                          flush_area_local((const void *)v, f) : \
+                          flush_area_all((const void *)v, f))
+ 
++#define L3T_INIT(page) (page) = ZERO_BLOCK_PTR
++
++#define L3T_LOCK(page)        \
++    do {                      \
++        if ( locking )        \
++            l3t_lock(page);   \
++    } while ( false )
++
++#define L3T_UNLOCK(page)                           \
++    do {                                           \
++        if ( locking && (page) != ZERO_BLOCK_PTR ) \
++        {                                          \
++            l3t_unlock(page);                      \
++            (page) = ZERO_BLOCK_PTR;               \
++        }                                          \
++    } while ( false )
++
+ int map_pages_to_xen(
+     unsigned long virt,
+     mfn_t mfn,
+@@ -5195,6 +5256,7 @@ int map_pages_to_xen(
+     l1_pgentry_t *pl1e, ol1e;
+     unsigned int  i;
+     int rc = -ENOMEM;
++    struct page_info *current_l3page;
+ 
+ #define flush_flags(oldf) do {                 \
+     unsigned int o_ = (oldf);                  \
+@@ -5210,13 +5272,20 @@ int map_pages_to_xen(
+     }                                          \
+ } while (0)
+ 
++    L3T_INIT(current_l3page);
++
+     while ( nr_mfns != 0 )
+     {
+-        l3_pgentry_t ol3e, *pl3e = virt_to_xen_l3e(virt);
++        l3_pgentry_t *pl3e, ol3e;
+ 
++        L3T_UNLOCK(current_l3page);
++
++        pl3e = virt_to_xen_l3e(virt);
+         if ( !pl3e )
+             goto out;
+ 
++        current_l3page = virt_to_page(pl3e);
++        L3T_LOCK(current_l3page);
+         ol3e = *pl3e;
+ 
+         if ( cpu_has_page1gb &&
+@@ -5550,6 +5619,7 @@ int map_pages_to_xen(
+     rc = 0;
+ 
+  out:
++    L3T_UNLOCK(current_l3page);
+     return rc;
+ }
+ 
+@@ -5578,6 +5648,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+     unsigned int  i;
+     unsigned long v = s;
+     int rc = -ENOMEM;
++    struct page_info *current_l3page;
+ 
+     /* Set of valid PTE bits which may be altered. */
+ #define FLAGS_MASK (_PAGE_NX|_PAGE_RW|_PAGE_PRESENT)
+@@ -5586,11 +5657,22 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+     ASSERT(IS_ALIGNED(s, PAGE_SIZE));
+     ASSERT(IS_ALIGNED(e, PAGE_SIZE));
+ 
++    L3T_INIT(current_l3page);
++
+     while ( v < e )
+     {
+-        l3_pgentry_t *pl3e = virt_to_xen_l3e(v);
++        l3_pgentry_t *pl3e;
++
++        L3T_UNLOCK(current_l3page);
+ 
+-        if ( !pl3e || !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
++        pl3e = virt_to_xen_l3e(v);
++        if ( !pl3e )
++            goto out;
++
++        current_l3page = virt_to_page(pl3e);
++        L3T_LOCK(current_l3page);
++
++        if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
+         {
+             /* Confirm the caller isn't trying to create new mappings. */
+             ASSERT(!(nf & _PAGE_PRESENT));
+@@ -5808,9 +5890,13 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+     rc = 0;
+ 
+  out:
++    L3T_UNLOCK(current_l3page);
+     return rc;
+ }
+ 
++#undef L3T_LOCK
++#undef L3T_UNLOCK
++
+ #undef flush_area
+ 
+ int destroy_xen_mappings(unsigned long s, unsigned long e)
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0004-tools-ocaml-xenstored-introduce-permissions-for-spec.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0004-tools-ocaml-xenstored-introduce-permissions-for-spec.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0004-tools-ocaml-xenstored-introduce-permissions-for-spec.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0004-tools-ocaml-xenstored-introduce-permissions-for-spec.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,117 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: introduce permissions for special watches
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The special watches "@introduceDomain" and "@releaseDomain" should be
+allowed for privileged callers only, as they allow to gain information
+about presence of other guests on the host. So send watch events for
+those watches via privileged connections only.
+
+Start to address this by treating the special watches as regular nodes
+in the tree, which gives them normal semantics for permissions.  A later
+change will restrict the handling, so that they can't be listed, etc.
+
+This is part of XSA-115.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index f374abe998..c3c8ea2f4b 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -414,7 +414,7 @@ let do_introduce con t domains cons data =
+ 		else try
+ 			let ndom = Domains.create domains domid mfn port in
+ 			Connections.add_domain cons ndom;
+-			Connections.fire_spec_watches cons "@introduceDomain";
++			Connections.fire_spec_watches cons Store.Path.introduce_domain;
+ 			ndom
+ 		with _ -> raise Invalid_Cmd_Args
+ 	in
+@@ -433,7 +433,7 @@ let do_release con t domains cons data =
+ 	Domains.del domains domid;
+ 	Connections.del_domain cons domid;
+ 	if fire_spec_watches 
+-	then Connections.fire_spec_watches cons "@releaseDomain"
++	then Connections.fire_spec_watches cons Store.Path.release_domain
+ 	else raise Invalid_Cmd_Args
+ 
+ let do_resume con t domains cons data =
+diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
+index 6375a1c889..98d368d52f 100644
+--- a/tools/ocaml/xenstored/store.ml
++++ b/tools/ocaml/xenstored/store.ml
+@@ -214,6 +214,11 @@ let rec lookup node path fct =
+ 
+ let apply rnode path fct =
+ 	lookup rnode path fct
++
++let introduce_domain = "@introduceDomain"
++let release_domain = "@releaseDomain"
++let specials = List.map of_string [ introduce_domain; release_domain ]
++
+ end
+ 
+ (* The Store.t type *)
+diff --git a/tools/ocaml/xenstored/utils.ml b/tools/ocaml/xenstored/utils.ml
+index b252db799b..e8c9fe4e94 100644
+--- a/tools/ocaml/xenstored/utils.ml
++++ b/tools/ocaml/xenstored/utils.ml
+@@ -88,19 +88,17 @@ let read_file_single_integer filename =
+ 	Unix.close fd;
+ 	int_of_string (Bytes.sub_string buf 0 sz)
+ 
+-let path_complete path connection_path =
+-	if String.get path 0 <> '/' then
+-		connection_path ^ path
+-	else
+-		path
+-
++(* @path may be guest data and needs its length validating.  @connection_path
++ * is generated locally in xenstored and always of the form "/local/domain/$N/" *)
+ let path_validate path connection_path =
+-	if String.length path = 0 || String.length path > 1024 then
+-		raise Define.Invalid_path
+-	else
+-		let cpath = path_complete path connection_path in
+-		if String.get cpath 0 <> '/' then
+-			raise Define.Invalid_path
+-		else
+-			cpath
++	let len = String.length path in
++
++	if len = 0 || len > 1024 then raise Define.Invalid_path;
++
++	let abs_path =
++		match String.get path 0 with
++		| '/' | '@' -> path
++		| _   -> connection_path ^ path
++	in
+ 
++	abs_path
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index 49fc18bf19..32c3b1c0f1 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -287,6 +287,8 @@ let _ =
+ 	let quit = ref false in
+ 
+ 	Logging.init_xenstored_log();
++	List.iter (fun path ->
++		Store.write store Perms.Connection.full_rights path "") Store.Path.specials;
+ 
+ 	let filename = Paths.xen_run_stored ^ "/db" in
+ 	if cf.restart && Sys.file_exists filename then (
+@@ -339,7 +341,7 @@ let _ =
+ 					let (notify, deaddom) = Domains.cleanup domains in
+ 					List.iter (Connections.del_domain cons) deaddom;
+ 					if deaddom <> [] || notify then
+-						Connections.fire_spec_watches cons "@releaseDomain"
++						Connections.fire_spec_watches cons Store.Path.release_domain
+ 				)
+ 				else
+ 					let c = Connections.find_domain_by_port cons port in
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0004-tools-xenstore-simplify-and-rename-check_event_node.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0004-tools-xenstore-simplify-and-rename-check_event_node.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0004-tools-xenstore-simplify-and-rename-check_event_node.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0004-tools-xenstore-simplify-and-rename-check_event_node.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,55 @@
+From 318aa75bd0c05423e717ad0b64adb204282025db Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:40 +0200
+Subject: [PATCH 04/10] tools/xenstore: simplify and rename check_event_node()
+
+There is no path which allows to call check_event_node() without a
+event name. So don't let the result depend on the name being NULL and
+add an assert() covering that case.
+
+Rename the function to check_special_event() to better match the
+semantics.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ tools/xenstore/xenstored_watch.c | 12 +++++-------
+ 1 file changed, 5 insertions(+), 7 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index 7dedca60dfd6..f2f1bed47cc6 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -47,13 +47,11 @@ struct watch
+ 	char *node;
+ };
+ 
+-static bool check_event_node(const char *node)
++static bool check_special_event(const char *name)
+ {
+-	if (!node || !strstarts(node, "@")) {
+-		errno = EINVAL;
+-		return false;
+-	}
+-	return true;
++	assert(name);
++
++	return strstarts(name, "@");
+ }
+ 
+ /* Is child a subnode of parent, or equal? */
+@@ -87,7 +85,7 @@ static void add_event(struct connection *conn,
+ 	unsigned int len;
+ 	char *data;
+ 
+-	if (!check_event_node(name)) {
++	if (!check_special_event(name)) {
+ 		/* Can this conn load node, or see that it doesn't exist? */
+ 		struct node *node = get_node(conn, ctx, name, XS_PERM_READ);
+ 		/*
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0005-tools-ocaml-xenstored-avoid-watch-events-for-nodes-w.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0005-tools-ocaml-xenstored-avoid-watch-events-for-nodes-w.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0005-tools-ocaml-xenstored-avoid-watch-events-for-nodes-w.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0005-tools-ocaml-xenstored-avoid-watch-events-for-nodes-w.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,389 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: avoid watch events for nodes without access
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Today watch events are sent regardless of the access rights of the
+node the event is sent for. This enables any guest to e.g. setup a
+watch for "/" in order to have a detailed record of all Xenstore
+modifications.
+
+Modify that by sending only watch events for nodes that the watcher
+has a chance to see otherwise (either via direct reads or by querying
+the children of a node). This includes cases where the visibility of
+a node for a watcher is changing (permissions being removed).
+
+Permissions for nodes are looked up either in the old (pre
+transaction/command) or current trees (post transaction).  If
+permissions are changed multiple times in a transaction only the final
+version is checked, because considering a transaction atomic the
+individual permission changes would not be noticable to an outside
+observer.
+
+Two trees are only needed for set_perms: here we can either notice the
+node disappearing (if we loose permission), appearing
+(if we gain permission), or changing (if we preserve permission).
+
+RM needs to only look at the old tree: in the new tree the node would be
+gone, or could have different permissions if it was recreated (the
+recreation would get its own watch fired).
+
+Inside a tree we lookup the watch path's parent, and then the watch path
+child itself.  This gets us 4 sets of permissions in worst case, and if
+either of these allows a watch, then we permit it to fire.  The
+permission lookups are done without logging the failures, otherwise we'd
+get confusing errors about permission denied for some paths, but a watch
+still firing. The actual result is logged in xenstored-access log:
+
+  'w event ...' as usual if watch was fired
+  'w notfired...' if the watch was not fired, together with path and
+  permission set to help in troubleshooting
+
+Adding a watch bypasses permission checks and always fires the watch
+once immediately. This is consistent with the specification, and no
+information is gained (the watch is fired both if the path exists or
+doesn't, and both if you have or don't have access, i.e. it reflects the
+path a domain gave it back to that domain).
+
+There are some semantic changes here:
+
+  * Write+rm in a single transaction of the same path is unobservable
+    now via watches: both before and after a transaction the path
+    doesn't exist, thus both tree lookups come up with the empty
+    permission set, and noone, not even Dom0 can see this. This is
+    consistent with transaction atomicity though.
+  * Similar to above if we temporarily grant and then revoke permission
+    on a path any watches fired inbetween are ignored as well
+  * There is a new log event (w notfired) which shows the permission set
+    of the path, and the path.
+  * Watches on paths that a domain doesn't have access to are now not
+    seen, which is the purpose of the security fix.
+
+This is part of XSA-115.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
+index d7432c6597..1389d971c2 100644
+--- a/tools/ocaml/xenstored/connection.ml
++++ b/tools/ocaml/xenstored/connection.ml
+@@ -196,11 +196,36 @@ let list_watches con =
+ 		con.watches [] in
+ 	List.concat ll
+ 
+-let fire_single_watch watch =
++let dbg fmt = Logging.debug "connection" fmt
++let info fmt = Logging.info "connection" fmt
++
++let lookup_watch_perm path = function
++| None -> []
++| Some root ->
++	try Store.Path.apply root path @@ fun parent name ->
++		Store.Node.get_perms parent ::
++		try [Store.Node.get_perms (Store.Node.find parent name)]
++		with Not_found -> []
++	with Define.Invalid_path | Not_found -> []
++
++let lookup_watch_perms oldroot root path =
++	lookup_watch_perm path oldroot @ lookup_watch_perm path (Some root)
++
++let fire_single_watch_unchecked watch =
+ 	let data = Utils.join_by_null [watch.path; watch.token; ""] in
+ 	send_reply watch.con Transaction.none 0 Xenbus.Xb.Op.Watchevent data
+ 
+-let fire_watch watch path =
++let fire_single_watch (oldroot, root) watch =
++	let abspath = get_watch_path watch.con watch.path |> Store.Path.of_string in
++	let perms = lookup_watch_perms oldroot root abspath in
++	if List.exists (Perms.has watch.con.perm READ) perms then
++		fire_single_watch_unchecked watch
++	else
++		let perms = perms |> List.map (Perms.Node.to_string ~sep:" ") |> String.concat ", " in
++		let con = get_domstr watch.con in
++		Logging.watch_not_fired ~con perms (Store.Path.to_string abspath)
++
++let fire_watch roots watch path =
+ 	let new_path =
+ 		if watch.is_relative && path.[0] = '/'
+ 		then begin
+@@ -210,7 +235,7 @@ let fire_watch watch path =
+ 		end else
+ 			path
+ 	in
+-	fire_single_watch { watch with path = new_path }
++	fire_single_watch roots { watch with path = new_path }
+ 
+ (* Search for a valid unused transaction id. *)
+ let rec valid_transaction_id con proposed_id =
+diff --git a/tools/ocaml/xenstored/connections.ml b/tools/ocaml/xenstored/connections.ml
+index ae7692819d..020b875dcd 100644
+--- a/tools/ocaml/xenstored/connections.ml
++++ b/tools/ocaml/xenstored/connections.ml
+@@ -135,25 +135,26 @@ let del_watch cons con path token =
+  	watch
+ 
+ (* path is absolute *)
+-let fire_watches cons path recurse =
++let fire_watches ?oldroot root cons path recurse =
+ 	let key = key_of_path path in
+ 	let path = Store.Path.to_string path in
++	let roots = oldroot, root in
+ 	let fire_watch _ = function
+ 		| None         -> ()
+-		| Some watches -> List.iter (fun w -> Connection.fire_watch w path) watches
++		| Some watches -> List.iter (fun w -> Connection.fire_watch roots w path) watches
+ 	in
+ 	let fire_rec x = function
+ 		| None         -> ()
+ 		| Some watches -> 
+-			  List.iter (fun w -> Connection.fire_single_watch w) watches
++			List.iter (Connection.fire_single_watch roots) watches
+ 	in
+ 	Trie.iter_path fire_watch cons.watches key;
+ 	if recurse then
+ 		Trie.iter fire_rec (Trie.sub cons.watches key)
+ 
+-let fire_spec_watches cons specpath =
++let fire_spec_watches root cons specpath =
+ 	iter cons (fun con ->
+-		List.iter (fun w -> Connection.fire_single_watch w) (Connection.get_watches con specpath))
++		List.iter (Connection.fire_single_watch (None, root)) (Connection.get_watches con specpath))
+ 
+ let set_target cons domain target_domain =
+ 	let con = find_domain cons domain in
+diff --git a/tools/ocaml/xenstored/logging.ml b/tools/ocaml/xenstored/logging.ml
+index ea6033195d..99c7bc5e13 100644
+--- a/tools/ocaml/xenstored/logging.ml
++++ b/tools/ocaml/xenstored/logging.ml
+@@ -161,6 +161,8 @@ let xenstored_log_nb_lines = ref 13215
+ let xenstored_log_nb_chars = ref (-1)
+ let xenstored_logger = ref (None: logger option)
+ 
++let debug_enabled () = !xenstored_log_level = Debug
++
+ let set_xenstored_log_destination s =
+ 	xenstored_log_destination := log_destination_of_string s
+ 
+@@ -204,6 +206,7 @@ type access_type =
+ 	| Commit
+ 	| Newconn
+ 	| Endconn
++	| Watch_not_fired
+ 	| XbOp of Xenbus.Xb.Op.operation
+ 
+ let string_of_tid ~con tid =
+@@ -217,6 +220,7 @@ let string_of_access_type = function
+ 	| Commit                  -> "commit   "
+ 	| Newconn                 -> "newconn  "
+ 	| Endconn                 -> "endconn  "
++	| Watch_not_fired         -> "w notfired"
+ 
+ 	| XbOp op -> match op with
+ 	| Xenbus.Xb.Op.Debug             -> "debug    "
+@@ -331,3 +335,7 @@ let xb_answer ~tid ~con ~ty data =
+ 		| _ -> false, Debug
+ 	in
+ 	if print then access_logging ~tid ~con ~data (XbOp ty) ~level
++
++let watch_not_fired ~con perms path =
++	let data = Printf.sprintf "EPERM perms=[%s] path=%s" perms path in
++	access_logging ~tid:0 ~con ~data Watch_not_fired ~level:Info
+diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
+index 3ea193ea14..23b80aba3d 100644
+--- a/tools/ocaml/xenstored/perms.ml
++++ b/tools/ocaml/xenstored/perms.ml
+@@ -79,9 +79,9 @@ let of_string s =
+ let string_of_perm perm =
+ 	Printf.sprintf "%c%u" (char_of_permty (snd perm)) (fst perm)
+ 
+-let to_string permvec =
++let to_string ?(sep="\000") permvec =
+ 	let l = ((permvec.owner, permvec.other) :: permvec.acl) in
+-	String.concat "\000" (List.map string_of_perm l)
++	String.concat sep (List.map string_of_perm l)
+ 
+ end
+ 
+@@ -132,8 +132,8 @@ let check_owner (connection:Connection.t) (node:Node.t) =
+ 	then Connection.is_owner connection (Node.get_owner node)
+ 	else true
+ 
+-(* check if the current connection has the requested perm on the current node *)
+-let check (connection:Connection.t) request (node:Node.t) =
++(* check if the current connection lacks the requested perm on the current node *)
++let lacks (connection:Connection.t) request (node:Node.t) =
+ 	let check_acl domainid =
+ 		let perm =
+ 			if List.mem_assoc domainid (Node.get_acl node)
+@@ -154,11 +154,19 @@ let check (connection:Connection.t) request (node:Node.t) =
+ 			info "Permission denied: Domain %d has write only access" domainid;
+ 			false
+ 	in
+-	if !activate
++	!activate
+ 	&& not (Connection.is_dom0 connection)
+ 	&& not (check_owner connection node)
+ 	&& not (List.exists check_acl (Connection.get_owners connection))
++
++(* check if the current connection has the requested perm on the current node.
++*  Raises an exception if it doesn't. *)
++let check connection request node =
++	if lacks connection request node
+ 	then raise Define.Permission_denied
+ 
++(* check if the current connection has the requested perm on the current node *)
++let has connection request node = not (lacks connection request node)
++
+ let equiv perm1 perm2 =
+ 	(Node.to_string perm1) = (Node.to_string perm2)
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index c3c8ea2f4b..3cd0097db9 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -56,15 +56,17 @@ let split_one_path data con =
+ 	| path :: "" :: [] -> Store.Path.create path (Connection.get_path con)
+ 	| _                -> raise Invalid_Cmd_Args
+ 
+-let process_watch ops cons =
++let process_watch t cons =
++	let oldroot = t.Transaction.oldroot in
++	let newroot = Store.get_root t.store in
++	let ops = Transaction.get_paths t |> List.rev in
+ 	let do_op_watch op cons =
+-		let recurse = match (fst op) with
+-		| Xenbus.Xb.Op.Write    -> false
+-		| Xenbus.Xb.Op.Mkdir    -> false
+-		| Xenbus.Xb.Op.Rm       -> true
+-		| Xenbus.Xb.Op.Setperms -> false
++		let recurse, oldroot, root = match (fst op) with
++		| Xenbus.Xb.Op.Write|Xenbus.Xb.Op.Mkdir -> false, None, newroot
++		| Xenbus.Xb.Op.Rm       -> true, None, oldroot
++		| Xenbus.Xb.Op.Setperms -> false, Some oldroot, newroot
+ 		| _              -> raise (Failure "huh ?") in
+-		Connections.fire_watches cons (snd op) recurse in
++		Connections.fire_watches ?oldroot root cons (snd op) recurse in
+ 	List.iter (fun op -> do_op_watch op cons) ops
+ 
+ let create_implicit_path t perm path =
+@@ -205,7 +207,7 @@ let reply_ack fct con t doms cons data =
+ 	fct con t doms cons data;
+ 	Packet.Ack (fun () ->
+ 		if Transaction.get_id t = Transaction.none then
+-			process_watch (Transaction.get_paths t) cons
++			process_watch t cons
+ 	)
+ 
+ let reply_data fct con t doms cons data =
+@@ -353,14 +355,17 @@ let transaction_replay c t doms cons =
+ 			Connection.end_transaction c tid None
+ 		)
+ 
+-let do_watch con t domains cons data =
++let do_watch con t _domains cons data =
+ 	let (node, token) = 
+ 		match (split None '\000' data) with
+ 		| [node; token; ""]   -> node, token
+ 		| _                   -> raise Invalid_Cmd_Args
+ 		in
+ 	let watch = Connections.add_watch cons con node token in
+-	Packet.Ack (fun () -> Connection.fire_single_watch watch)
++	Packet.Ack (fun () ->
++		(* xenstore.txt says this watch is fired immediately,
++		   implying even if path doesn't exist or is unreadable *)
++		Connection.fire_single_watch_unchecked watch)
+ 
+ let do_unwatch con t domains cons data =
+ 	let (node, token) =
+@@ -391,7 +396,7 @@ let do_transaction_end con t domains cons data =
+ 	if not success then
+ 		raise Transaction_again;
+ 	if commit then begin
+-		process_watch (List.rev (Transaction.get_paths t)) cons;
++		process_watch t cons;
+ 		match t.Transaction.ty with
+ 		| Transaction.No ->
+ 			() (* no need to record anything *)
+@@ -414,7 +419,7 @@ let do_introduce con t domains cons data =
+ 		else try
+ 			let ndom = Domains.create domains domid mfn port in
+ 			Connections.add_domain cons ndom;
+-			Connections.fire_spec_watches cons Store.Path.introduce_domain;
++			Connections.fire_spec_watches (Transaction.get_root t) cons Store.Path.introduce_domain;
+ 			ndom
+ 		with _ -> raise Invalid_Cmd_Args
+ 	in
+@@ -433,7 +438,7 @@ let do_release con t domains cons data =
+ 	Domains.del domains domid;
+ 	Connections.del_domain cons domid;
+ 	if fire_spec_watches 
+-	then Connections.fire_spec_watches cons Store.Path.release_domain
++	then Connections.fire_spec_watches (Transaction.get_root t) cons Store.Path.release_domain
+ 	else raise Invalid_Cmd_Args
+ 
+ let do_resume con t domains cons data =
+@@ -501,6 +506,8 @@ let maybe_ignore_transaction = function
+ 		Transaction.none
+ 	| _ -> fun x -> x
+ 
++
++let () = Printexc.record_backtrace true
+ (**
+  * Nothrow guarantee.
+  *)
+@@ -542,7 +549,8 @@ let process_packet ~store ~cons ~doms ~con ~req =
+ 		(* Put the response on the wire *)
+ 		send_response ty con t rid response
+ 	with exn ->
+-		error "process packet: %s" (Printexc.to_string exn);
++		let bt = Printexc.get_backtrace () in
++		error "process packet: %s. %s" (Printexc.to_string exn) bt;
+ 		Connection.send_error con tid rid "EIO"
+ 
+ let do_input store cons doms con =
+diff --git a/tools/ocaml/xenstored/transaction.ml b/tools/ocaml/xenstored/transaction.ml
+index 23e7ccff1b..9e9e28db9b 100644
+--- a/tools/ocaml/xenstored/transaction.ml
++++ b/tools/ocaml/xenstored/transaction.ml
+@@ -82,6 +82,7 @@ type t = {
+ 	start_count: int64;
+ 	store: Store.t; (* This is the store that we change in write operations. *)
+ 	quota: Quota.t;
++	oldroot: Store.Node.t;
+ 	mutable paths: (Xenbus.Xb.Op.operation * Store.Path.t) list;
+ 	mutable operations: (Packet.request * Packet.response) list;
+ 	mutable read_lowpath: Store.Path.t option;
+@@ -123,6 +124,7 @@ let make ?(internal=false) id store =
+ 		start_count = !counter;
+ 		store = if id = none then store else Store.copy store;
+ 		quota = Quota.copy store.Store.quota;
++		oldroot = Store.get_root store;
+ 		paths = [];
+ 		operations = [];
+ 		read_lowpath = None;
+@@ -137,6 +139,8 @@ let make ?(internal=false) id store =
+ let get_store t = t.store
+ let get_paths t = t.paths
+ 
++let get_root t = Store.get_root t.store
++
+ let is_read_only t = t.paths = []
+ let add_wop t ty path = t.paths <- (ty, path) :: t.paths
+ let add_operation ~perm t request response =
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index 32c3b1c0f1..e9f471846f 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -341,7 +341,9 @@ let _ =
+ 					let (notify, deaddom) = Domains.cleanup domains in
+ 					List.iter (Connections.del_domain cons) deaddom;
+ 					if deaddom <> [] || notify then
+-						Connections.fire_spec_watches cons Store.Path.release_domain
++						Connections.fire_spec_watches
++							(Store.get_root store)
++							cons Store.Path.release_domain
+ 				)
+ 				else
+ 					let c = Connections.find_domain_by_port cons port in
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0005-tools-xenstore-check-privilege-for-XS_IS_DOMAIN_INTR.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0005-tools-xenstore-check-privilege-for-XS_IS_DOMAIN_INTR.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0005-tools-xenstore-check-privilege-for-XS_IS_DOMAIN_INTR.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0005-tools-xenstore-check-privilege-for-XS_IS_DOMAIN_INTR.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,115 @@
+From c625fae44aedc246776b52eb1173cf847a3d4d80 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:41 +0200
+Subject: [PATCH 05/10] tools/xenstore: check privilege for
+ XS_IS_DOMAIN_INTRODUCED
+
+The Xenstore command XS_IS_DOMAIN_INTRODUCED should be possible for
+privileged domains only (the only user in the tree is the xenpaging
+daemon).
+
+Instead of having the privilege test for each command introduce a
+per-command flag for that purpose.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ tools/xenstore/xenstored_core.c   | 24 ++++++++++++++++++------
+ tools/xenstore/xenstored_domain.c |  7 ++-----
+ 2 files changed, 20 insertions(+), 11 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index db9b9ca7957d..6afd58431111 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1283,8 +1283,10 @@ static struct {
+ 	int (*func)(struct connection *conn, struct buffered_data *in);
+ 	unsigned int flags;
+ #define XS_FLAG_NOTID		(1U << 0)	/* Ignore transaction id. */
++#define XS_FLAG_PRIV		(1U << 1)	/* Privileged domain only. */
+ } const wire_funcs[XS_TYPE_COUNT] = {
+-	[XS_CONTROL]           = { "CONTROL",           do_control },
++	[XS_CONTROL]           =
++	    { "CONTROL",       do_control,      XS_FLAG_PRIV },
+ 	[XS_DIRECTORY]         = { "DIRECTORY",         send_directory },
+ 	[XS_READ]              = { "READ",              do_read },
+ 	[XS_GET_PERMS]         = { "GET_PERMS",         do_get_perms },
+@@ -1294,8 +1296,10 @@ static struct {
+ 	    { "UNWATCH",       do_unwatch,      XS_FLAG_NOTID },
+ 	[XS_TRANSACTION_START] = { "TRANSACTION_START", do_transaction_start },
+ 	[XS_TRANSACTION_END]   = { "TRANSACTION_END",   do_transaction_end },
+-	[XS_INTRODUCE]         = { "INTRODUCE",         do_introduce },
+-	[XS_RELEASE]           = { "RELEASE",           do_release },
++	[XS_INTRODUCE]         =
++	    { "INTRODUCE",     do_introduce,    XS_FLAG_PRIV },
++	[XS_RELEASE]           =
++	    { "RELEASE",       do_release,      XS_FLAG_PRIV },
+ 	[XS_GET_DOMAIN_PATH]   = { "GET_DOMAIN_PATH",   do_get_domain_path },
+ 	[XS_WRITE]             = { "WRITE",             do_write },
+ 	[XS_MKDIR]             = { "MKDIR",             do_mkdir },
+@@ -1304,9 +1308,11 @@ static struct {
+ 	[XS_WATCH_EVENT]       = { "WATCH_EVENT",       NULL },
+ 	[XS_ERROR]             = { "ERROR",             NULL },
+ 	[XS_IS_DOMAIN_INTRODUCED] =
+-			{ "IS_DOMAIN_INTRODUCED", do_is_domain_introduced },
+-	[XS_RESUME]            = { "RESUME",            do_resume },
+-	[XS_SET_TARGET]        = { "SET_TARGET",        do_set_target },
++	    { "IS_DOMAIN_INTRODUCED", do_is_domain_introduced, XS_FLAG_PRIV },
++	[XS_RESUME]            =
++	    { "RESUME",        do_resume,       XS_FLAG_PRIV },
++	[XS_SET_TARGET]        =
++	    { "SET_TARGET",    do_set_target,   XS_FLAG_PRIV },
+ 	[XS_RESET_WATCHES]     = { "RESET_WATCHES",     do_reset_watches },
+ 	[XS_DIRECTORY_PART]    = { "DIRECTORY_PART",    send_directory_part },
+ };
+@@ -1334,6 +1340,12 @@ static void process_message(struct connection *conn, struct buffered_data *in)
+ 		return;
+ 	}
+ 
++	if ((wire_funcs[type].flags & XS_FLAG_PRIV) &&
++	    domain_is_unprivileged(conn)) {
++		send_error(conn, EACCES);
++		return;
++	}
++
+ 	trans = (wire_funcs[type].flags & XS_FLAG_NOTID)
+ 		? NULL : transaction_lookup(conn, in->hdr.msg.tx_id);
+ 	if (IS_ERR(trans)) {
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index 1eae703ef680..0e2926e2a3d0 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -377,7 +377,7 @@ int do_introduce(struct connection *conn, struct buffered_data *in)
+ 	if (get_strings(in, vec, ARRAY_SIZE(vec)) < ARRAY_SIZE(vec))
+ 		return EINVAL;
+ 
+-	if (domain_is_unprivileged(conn) || !conn->can_write)
++	if (!conn->can_write)
+ 		return EACCES;
+ 
+ 	domid = atoi(vec[0]);
+@@ -445,7 +445,7 @@ int do_set_target(struct connection *conn, struct buffered_data *in)
+ 	if (get_strings(in, vec, ARRAY_SIZE(vec)) < ARRAY_SIZE(vec))
+ 		return EINVAL;
+ 
+-	if (domain_is_unprivileged(conn) || !conn->can_write)
++	if (!conn->can_write)
+ 		return EACCES;
+ 
+ 	domid = atoi(vec[0]);
+@@ -480,9 +480,6 @@ static struct domain *onearg_domain(struct connection *conn,
+ 	if (!domid)
+ 		return ERR_PTR(-EINVAL);
+ 
+-	if (domain_is_unprivileged(conn))
+-		return ERR_PTR(-EACCES);
+-
+ 	return find_connected_domain(domid);
+ }
+ 
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0006-tools-ocaml-xenstored-add-xenstored.conf-flag-to-tur.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0006-tools-ocaml-xenstored-add-xenstored.conf-flag-to-tur.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0006-tools-ocaml-xenstored-add-xenstored.conf-flag-to-tur.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0006-tools-ocaml-xenstored-add-xenstored.conf-flag-to-tur.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,84 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: add xenstored.conf flag to turn off watch
+ permission checks
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There are flags to turn off quotas and the permission system, so add one
+that turns off the newly introduced watch permission checks as well.
+
+This is part of XSA-115.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
+index 1389d971c2..698f721345 100644
+--- a/tools/ocaml/xenstored/connection.ml
++++ b/tools/ocaml/xenstored/connection.ml
+@@ -218,7 +218,7 @@ let fire_single_watch_unchecked watch =
+ let fire_single_watch (oldroot, root) watch =
+ 	let abspath = get_watch_path watch.con watch.path |> Store.Path.of_string in
+ 	let perms = lookup_watch_perms oldroot root abspath in
+-	if List.exists (Perms.has watch.con.perm READ) perms then
++	if Perms.can_fire_watch watch.con.perm perms then
+ 		fire_single_watch_unchecked watch
+ 	else
+ 		let perms = perms |> List.map (Perms.Node.to_string ~sep:" ") |> String.concat ", " in
+diff --git a/tools/ocaml/xenstored/oxenstored.conf.in b/tools/ocaml/xenstored/oxenstored.conf.in
+index 6579b84448..d5d4f00de8 100644
+--- a/tools/ocaml/xenstored/oxenstored.conf.in
++++ b/tools/ocaml/xenstored/oxenstored.conf.in
+@@ -44,6 +44,16 @@ conflict-rate-limit-is-aggregate = true
+ # Activate node permission system
+ perms-activate = true
+ 
++# Activate the watch permission system
++# When this is enabled unprivileged guests can only get watch events
++# for xenstore entries that they would've been able to read.
++#
++# When this is disabled unprivileged guests may get watch events
++# for xenstore entries that they cannot read. The watch event contains
++# only the entry name, not the value.
++# This restores behaviour prior to XSA-115.
++perms-watch-activate = true
++
+ # Activate quota
+ quota-activate = true
+ quota-maxentity = 1000
+diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
+index 23b80aba3d..ee7fee6bda 100644
+--- a/tools/ocaml/xenstored/perms.ml
++++ b/tools/ocaml/xenstored/perms.ml
+@@ -20,6 +20,7 @@ let info fmt = Logging.info "perms" fmt
+ open Stdext
+ 
+ let activate = ref true
++let watch_activate = ref true
+ 
+ type permty = READ | WRITE | RDWR | NONE
+ 
+@@ -168,5 +169,9 @@ let check connection request node =
+ (* check if the current connection has the requested perm on the current node *)
+ let has connection request node = not (lacks connection request node)
+ 
++let can_fire_watch connection perms =
++	not !watch_activate
++	|| List.exists (has connection READ) perms
++
+ let equiv perm1 perm2 =
+ 	(Node.to_string perm1) = (Node.to_string perm2)
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index e9f471846f..30fc874327 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -95,6 +95,7 @@ let parse_config filename =
+ 		("conflict-max-history-seconds", Config.Set_float Define.conflict_max_history_seconds);
+ 		("conflict-rate-limit-is-aggregate", Config.Set_bool Define.conflict_rate_limit_is_aggregate);
+ 		("perms-activate", Config.Set_bool Perms.activate);
++		("perms-watch-activate", Config.Set_bool Perms.watch_activate);
+ 		("quota-activate", Config.Set_bool Quota.activate);
+ 		("quota-maxwatch", Config.Set_int Define.maxwatch);
+ 		("quota-transaction", Config.Set_int Define.maxtransaction);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0006-tools-xenstore-rework-node-removal.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0006-tools-xenstore-rework-node-removal.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0006-tools-xenstore-rework-node-removal.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0006-tools-xenstore-rework-node-removal.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,217 @@
+From 461c880600175c06e23a63e62d9f1ccab755d708 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:42 +0200
+Subject: [PATCH 06/10] tools/xenstore: rework node removal
+
+Today a Xenstore node is being removed by deleting it from the parent
+first and then deleting itself and all its children. This results in
+stale entries remaining in the data base in case e.g. a memory
+allocation is failing during processing. This would result in the
+rather strange behavior to be able to read a node (as its still in the
+data base) while not being visible in the tree view of Xenstore.
+
+Fix that by deleting the nodes from the leaf side instead of starting
+at the root.
+
+As fire_watches() is now called from _rm() the ctx parameter needs a
+const attribute.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ tools/xenstore/xenstored_core.c  | 99 ++++++++++++++++----------------
+ tools/xenstore/xenstored_watch.c |  4 +-
+ tools/xenstore/xenstored_watch.h |  2 +-
+ 3 files changed, 54 insertions(+), 51 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 6afd58431111..1cb729a2cd5f 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1087,74 +1087,76 @@ static int do_mkdir(struct connection *conn, struct buffered_data *in)
+ 	return 0;
+ }
+ 
+-static void delete_node(struct connection *conn, struct node *node)
+-{
+-	unsigned int i;
+-	char *name;
+-
+-	/* Delete self, then delete children.  If we crash, then the worst
+-	   that can happen is the children will continue to take up space, but
+-	   will otherwise be unreachable. */
+-	delete_node_single(conn, node);
+-
+-	/* Delete children, too. */
+-	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
+-		struct node *child;
+-
+-		name = talloc_asprintf(node, "%s/%s", node->name,
+-				       node->children + i);
+-		child = name ? read_node(conn, node, name) : NULL;
+-		if (child) {
+-			delete_node(conn, child);
+-		}
+-		else {
+-			trace("delete_node: Error deleting child '%s/%s'!\n",
+-			      node->name, node->children + i);
+-			/* Skip it, we've already deleted the parent. */
+-		}
+-		talloc_free(name);
+-	}
+-}
+-
+-
+ /* Delete memory using memmove. */
+ static void memdel(void *mem, unsigned off, unsigned len, unsigned total)
+ {
+ 	memmove(mem + off, mem + off + len, total - off - len);
+ }
+ 
+-
+-static int remove_child_entry(struct connection *conn, struct node *node,
+-			      size_t offset)
++static void remove_child_entry(struct connection *conn, struct node *node,
++			       size_t offset)
+ {
+ 	size_t childlen = strlen(node->children + offset);
++
+ 	memdel(node->children, offset, childlen + 1, node->childlen);
+ 	node->childlen -= childlen + 1;
+-	return write_node(conn, node, true);
++	if (write_node(conn, node, true))
++		corrupt(conn, "Can't update parent node '%s'", node->name);
+ }
+ 
+-
+-static int delete_child(struct connection *conn,
+-			struct node *node, const char *childname)
++static void delete_child(struct connection *conn,
++			 struct node *node, const char *childname)
+ {
+ 	unsigned int i;
+ 
+ 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
+ 		if (streq(node->children+i, childname)) {
+-			return remove_child_entry(conn, node, i);
++			remove_child_entry(conn, node, i);
++			return;
+ 		}
+ 	}
+ 	corrupt(conn, "Can't find child '%s' in %s", childname, node->name);
+-	return ENOENT;
+ }
+ 
++static int delete_node(struct connection *conn, struct node *parent,
++		       struct node *node)
++{
++	char *name;
++
++	/* Delete children. */
++	while (node->childlen) {
++		struct node *child;
++
++		name = talloc_asprintf(node, "%s/%s", node->name,
++				       node->children);
++		child = name ? read_node(conn, node, name) : NULL;
++		if (child) {
++			if (delete_node(conn, node, child))
++				return errno;
++		} else {
++			trace("delete_node: Error deleting child '%s/%s'!\n",
++			      node->name, node->children);
++			/* Quit deleting. */
++			errno = ENOMEM;
++			return errno;
++		}
++		talloc_free(name);
++	}
++
++	delete_node_single(conn, node);
++	delete_child(conn, parent, basename(node->name));
++	talloc_free(node);
++
++	return 0;
++}
+ 
+ static int _rm(struct connection *conn, const void *ctx, struct node *node,
+ 	       const char *name)
+ {
+-	/* Delete from parent first, then if we crash, the worst that can
+-	   happen is the child will continue to take up space, but will
+-	   otherwise be unreachable. */
++	/*
++	 * Deleting node by node, so the result is always consistent even in
++	 * case of a failure.
++	 */
+ 	struct node *parent;
+ 	char *parentname = get_parent(ctx, name);
+ 
+@@ -1165,11 +1167,13 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
+ 	if (!parent)
+ 		return (errno == ENOMEM) ? ENOMEM : EINVAL;
+ 
+-	if (delete_child(conn, parent, basename(name)))
+-		return EINVAL;
+-
+-	delete_node(conn, node);
+-	return 0;
++	/*
++	 * Fire the watches now, when we can still see the node permissions.
++	 * This fine as we are single threaded and the next possible read will
++	 * be handled only after the node has been really removed.
++	 */
++	fire_watches(conn, ctx, name, true);
++	return delete_node(conn, parent, node);
+ }
+ 
+ 
+@@ -1207,7 +1211,6 @@ static int do_rm(struct connection *conn, struct buffered_data *in)
+ 	if (ret)
+ 		return ret;
+ 
+-	fire_watches(conn, in, name, true);
+ 	send_ack(conn, XS_RM);
+ 
+ 	return 0;
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index f2f1bed47cc6..f0bbfe7a6dc6 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -77,7 +77,7 @@ static bool is_child(const char *child, const char *parent)
+  * Temporary memory allocations are done with ctx.
+  */
+ static void add_event(struct connection *conn,
+-		      void *ctx,
++		      const void *ctx,
+ 		      struct watch *watch,
+ 		      const char *name)
+ {
+@@ -121,7 +121,7 @@ static void add_event(struct connection *conn,
+  * Check whether any watch events are to be sent.
+  * Temporary memory allocations are done with ctx.
+  */
+-void fire_watches(struct connection *conn, void *ctx, const char *name,
++void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ 		  bool recurse)
+ {
+ 	struct connection *i;
+diff --git a/tools/xenstore/xenstored_watch.h b/tools/xenstore/xenstored_watch.h
+index c72ea6a68542..54d4ea7e0d41 100644
+--- a/tools/xenstore/xenstored_watch.h
++++ b/tools/xenstore/xenstored_watch.h
+@@ -25,7 +25,7 @@ int do_watch(struct connection *conn, struct buffered_data *in);
+ int do_unwatch(struct connection *conn, struct buffered_data *in);
+ 
+ /* Fire all watches: recurse means all the children are affected (ie. rm). */
+-void fire_watches(struct connection *conn, void *tmp, const char *name,
++void fire_watches(struct connection *conn, const void *tmp, const char *name,
+ 		  bool recurse);
+ 
+ void conn_delete_all_watches(struct connection *conn);
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0007-tools-xenstore-fire-watches-only-when-removing-a-spe.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0007-tools-xenstore-fire-watches-only-when-removing-a-spe.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0007-tools-xenstore-fire-watches-only-when-removing-a-spe.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0007-tools-xenstore-fire-watches-only-when-removing-a-spe.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,118 @@
+From 6ca2e14b43aecc79effc1a0cd528a4aceef44d42 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:43 +0200
+Subject: [PATCH 07/10] tools/xenstore: fire watches only when removing a
+ specific node
+
+Instead of firing all watches for removing a subtree in one go, do so
+only when the related node is being removed.
+
+The watches for the top-most node being removed include all watches
+including that node, while watches for nodes below that are only fired
+if they are matching exactly. This avoids firing any watch more than
+once when removing a subtree.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ tools/xenstore/xenstored_core.c  | 11 ++++++-----
+ tools/xenstore/xenstored_watch.c | 13 ++++++++-----
+ tools/xenstore/xenstored_watch.h |  4 ++--
+ 3 files changed, 16 insertions(+), 12 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 1cb729a2cd5f..d7c025616ead 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1118,8 +1118,8 @@ static void delete_child(struct connection *conn,
+ 	corrupt(conn, "Can't find child '%s' in %s", childname, node->name);
+ }
+ 
+-static int delete_node(struct connection *conn, struct node *parent,
+-		       struct node *node)
++static int delete_node(struct connection *conn, const void *ctx,
++		       struct node *parent, struct node *node)
+ {
+ 	char *name;
+ 
+@@ -1131,7 +1131,7 @@ static int delete_node(struct connection *conn, struct node *parent,
+ 				       node->children);
+ 		child = name ? read_node(conn, node, name) : NULL;
+ 		if (child) {
+-			if (delete_node(conn, node, child))
++			if (delete_node(conn, ctx, node, child))
+ 				return errno;
+ 		} else {
+ 			trace("delete_node: Error deleting child '%s/%s'!\n",
+@@ -1143,6 +1143,7 @@ static int delete_node(struct connection *conn, struct node *parent,
+ 		talloc_free(name);
+ 	}
+ 
++	fire_watches(conn, ctx, node->name, true);
+ 	delete_node_single(conn, node);
+ 	delete_child(conn, parent, basename(node->name));
+ 	talloc_free(node);
+@@ -1172,8 +1173,8 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
+ 	 * This fine as we are single threaded and the next possible read will
+ 	 * be handled only after the node has been really removed.
+ 	 */
+-	fire_watches(conn, ctx, name, true);
+-	return delete_node(conn, parent, node);
++	fire_watches(conn, ctx, name, false);
++	return delete_node(conn, ctx, parent, node);
+ }
+ 
+ 
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index f0bbfe7a6dc6..3836675459fa 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -122,7 +122,7 @@ static void add_event(struct connection *conn,
+  * Temporary memory allocations are done with ctx.
+  */
+ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+-		  bool recurse)
++		  bool exact)
+ {
+ 	struct connection *i;
+ 	struct watch *watch;
+@@ -134,10 +134,13 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ 	/* Create an event for each watch. */
+ 	list_for_each_entry(i, &connections, list) {
+ 		list_for_each_entry(watch, &i->watches, list) {
+-			if (is_child(name, watch->node))
+-				add_event(i, ctx, watch, name);
+-			else if (recurse && is_child(watch->node, name))
+-				add_event(i, ctx, watch, watch->node);
++			if (exact) {
++				if (streq(name, watch->node))
++					add_event(i, ctx, watch, name);
++			} else {
++				if (is_child(name, watch->node))
++					add_event(i, ctx, watch, name);
++			}
+ 		}
+ 	}
+ }
+diff --git a/tools/xenstore/xenstored_watch.h b/tools/xenstore/xenstored_watch.h
+index 54d4ea7e0d41..1b3c80d3dda1 100644
+--- a/tools/xenstore/xenstored_watch.h
++++ b/tools/xenstore/xenstored_watch.h
+@@ -24,9 +24,9 @@
+ int do_watch(struct connection *conn, struct buffered_data *in);
+ int do_unwatch(struct connection *conn, struct buffered_data *in);
+ 
+-/* Fire all watches: recurse means all the children are affected (ie. rm). */
++/* Fire all watches: !exact means all the children are affected (ie. rm). */
+ void fire_watches(struct connection *conn, const void *tmp, const char *name,
+-		  bool recurse);
++		  bool exact);
+ 
+ void conn_delete_all_watches(struct connection *conn);
+ 
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0008-tools-xenstore-introduce-node_perms-structure.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0008-tools-xenstore-introduce-node_perms-structure.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0008-tools-xenstore-introduce-node_perms-structure.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0008-tools-xenstore-introduce-node_perms-structure.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,289 @@
+From 2d4f410899bf59e112c107f371c3d164f8a592f8 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:44 +0200
+Subject: [PATCH 08/10] tools/xenstore: introduce node_perms structure
+
+There are several places in xenstored using a permission array and the
+size of that array. Introduce a new struct node_perms containing both.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ tools/xenstore/xenstored_core.c   | 79 +++++++++++++++----------------
+ tools/xenstore/xenstored_core.h   |  8 +++-
+ tools/xenstore/xenstored_domain.c | 12 ++---
+ 3 files changed, 50 insertions(+), 49 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index d7c025616ead..fe9943113b9f 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -401,14 +401,14 @@ static struct node *read_node(struct connection *conn, const void *ctx,
+ 	/* Datalen, childlen, number of permissions */
+ 	hdr = (void *)data.dptr;
+ 	node->generation = hdr->generation;
+-	node->num_perms = hdr->num_perms;
++	node->perms.num = hdr->num_perms;
+ 	node->datalen = hdr->datalen;
+ 	node->childlen = hdr->childlen;
+ 
+ 	/* Permissions are struct xs_permissions. */
+-	node->perms = hdr->perms;
++	node->perms.p = hdr->perms;
+ 	/* Data is binary blob (usually ascii, no nul). */
+-	node->data = node->perms + node->num_perms;
++	node->data = node->perms.p + node->perms.num;
+ 	/* Children is strings, nul separated. */
+ 	node->children = node->data + node->datalen;
+ 
+@@ -425,7 +425,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ 	struct xs_tdb_record_hdr *hdr;
+ 
+ 	data.dsize = sizeof(*hdr)
+-		+ node->num_perms*sizeof(node->perms[0])
++		+ node->perms.num * sizeof(node->perms.p[0])
+ 		+ node->datalen + node->childlen;
+ 
+ 	if (!no_quota_check && domain_is_unprivileged(conn) &&
+@@ -437,12 +437,13 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ 	data.dptr = talloc_size(node, data.dsize);
+ 	hdr = (void *)data.dptr;
+ 	hdr->generation = node->generation;
+-	hdr->num_perms = node->num_perms;
++	hdr->num_perms = node->perms.num;
+ 	hdr->datalen = node->datalen;
+ 	hdr->childlen = node->childlen;
+ 
+-	memcpy(hdr->perms, node->perms, node->num_perms*sizeof(node->perms[0]));
+-	p = hdr->perms + node->num_perms;
++	memcpy(hdr->perms, node->perms.p,
++	       node->perms.num * sizeof(*node->perms.p));
++	p = hdr->perms + node->perms.num;
+ 	memcpy(p, node->data, node->datalen);
+ 	p += node->datalen;
+ 	memcpy(p, node->children, node->childlen);
+@@ -468,8 +469,7 @@ static int write_node(struct connection *conn, struct node *node,
+ }
+ 
+ static enum xs_perm_type perm_for_conn(struct connection *conn,
+-				       struct xs_permissions *perms,
+-				       unsigned int num)
++				       const struct node_perms *perms)
+ {
+ 	unsigned int i;
+ 	enum xs_perm_type mask = XS_PERM_READ|XS_PERM_WRITE|XS_PERM_OWNER;
+@@ -478,16 +478,16 @@ static enum xs_perm_type perm_for_conn(struct connection *conn,
+ 		mask &= ~XS_PERM_WRITE;
+ 
+ 	/* Owners and tools get it all... */
+-	if (!domain_is_unprivileged(conn) || perms[0].id == conn->id
+-                || (conn->target && perms[0].id == conn->target->id))
++	if (!domain_is_unprivileged(conn) || perms->p[0].id == conn->id
++                || (conn->target && perms->p[0].id == conn->target->id))
+ 		return (XS_PERM_READ|XS_PERM_WRITE|XS_PERM_OWNER) & mask;
+ 
+-	for (i = 1; i < num; i++)
+-		if (perms[i].id == conn->id
+-                        || (conn->target && perms[i].id == conn->target->id))
+-			return perms[i].perms & mask;
++	for (i = 1; i < perms->num; i++)
++		if (perms->p[i].id == conn->id
++                        || (conn->target && perms->p[i].id == conn->target->id))
++			return perms->p[i].perms & mask;
+ 
+-	return perms[0].perms & mask;
++	return perms->p[0].perms & mask;
+ }
+ 
+ /*
+@@ -534,7 +534,7 @@ static int ask_parents(struct connection *conn, const void *ctx,
+ 		return 0;
+ 	}
+ 
+-	*perm = perm_for_conn(conn, node->perms, node->num_perms);
++	*perm = perm_for_conn(conn, &node->perms);
+ 	return 0;
+ }
+ 
+@@ -580,8 +580,7 @@ struct node *get_node(struct connection *conn,
+ 	node = read_node(conn, ctx, name);
+ 	/* If we don't have permission, we don't have node. */
+ 	if (node) {
+-		if ((perm_for_conn(conn, node->perms, node->num_perms) & perm)
+-		    != perm) {
++		if ((perm_for_conn(conn, &node->perms) & perm) != perm) {
+ 			errno = EACCES;
+ 			node = NULL;
+ 		}
+@@ -757,16 +756,15 @@ const char *onearg(struct buffered_data *in)
+ 	return in->buffer;
+ }
+ 
+-static char *perms_to_strings(const void *ctx,
+-			      struct xs_permissions *perms, unsigned int num,
++static char *perms_to_strings(const void *ctx, const struct node_perms *perms,
+ 			      unsigned int *len)
+ {
+ 	unsigned int i;
+ 	char *strings = NULL;
+ 	char buffer[MAX_STRLEN(unsigned int) + 1];
+ 
+-	for (*len = 0, i = 0; i < num; i++) {
+-		if (!xs_perm_to_string(&perms[i], buffer, sizeof(buffer)))
++	for (*len = 0, i = 0; i < perms->num; i++) {
++		if (!xs_perm_to_string(&perms->p[i], buffer, sizeof(buffer)))
+ 			return NULL;
+ 
+ 		strings = talloc_realloc(ctx, strings, char,
+@@ -945,13 +943,13 @@ static struct node *construct_node(struct connection *conn, const void *ctx,
+ 		goto nomem;
+ 
+ 	/* Inherit permissions, except unprivileged domains own what they create */
+-	node->num_perms = parent->num_perms;
+-	node->perms = talloc_memdup(node, parent->perms,
+-				    node->num_perms * sizeof(node->perms[0]));
+-	if (!node->perms)
++	node->perms.num = parent->perms.num;
++	node->perms.p = talloc_memdup(node, parent->perms.p,
++				      node->perms.num * sizeof(*node->perms.p));
++	if (!node->perms.p)
+ 		goto nomem;
+ 	if (domain_is_unprivileged(conn))
+-		node->perms[0].id = conn->id;
++		node->perms.p[0].id = conn->id;
+ 
+ 	/* No children, no data */
+ 	node->children = node->data = NULL;
+@@ -1228,7 +1226,7 @@ static int do_get_perms(struct connection *conn, struct buffered_data *in)
+ 	if (!node)
+ 		return errno;
+ 
+-	strings = perms_to_strings(node, node->perms, node->num_perms, &len);
++	strings = perms_to_strings(node, &node->perms, &len);
+ 	if (!strings)
+ 		return errno;
+ 
+@@ -1239,13 +1237,12 @@ static int do_get_perms(struct connection *conn, struct buffered_data *in)
+ 
+ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ {
+-	unsigned int num;
+-	struct xs_permissions *perms;
++	struct node_perms perms;
+ 	char *name, *permstr;
+ 	struct node *node;
+ 
+-	num = xs_count_strings(in->buffer, in->used);
+-	if (num < 2)
++	perms.num = xs_count_strings(in->buffer, in->used);
++	if (perms.num < 2)
+ 		return EINVAL;
+ 
+ 	/* First arg is node name. */
+@@ -1256,21 +1253,21 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ 		return errno;
+ 
+ 	permstr = in->buffer + strlen(in->buffer) + 1;
+-	num--;
++	perms.num--;
+ 
+-	perms = talloc_array(node, struct xs_permissions, num);
+-	if (!perms)
++	perms.p = talloc_array(node, struct xs_permissions, perms.num);
++	if (!perms.p)
+ 		return ENOMEM;
+-	if (!xs_strings_to_perms(perms, num, permstr))
++	if (!xs_strings_to_perms(perms.p, perms.num, permstr))
+ 		return errno;
+ 
+ 	/* Unprivileged domains may not change the owner. */
+-	if (domain_is_unprivileged(conn) && perms[0].id != node->perms[0].id)
++	if (domain_is_unprivileged(conn) &&
++	    perms.p[0].id != node->perms.p[0].id)
+ 		return EPERM;
+ 
+ 	domain_entry_dec(conn, node);
+ 	node->perms = perms;
+-	node->num_perms = num;
+ 	domain_entry_inc(conn, node);
+ 
+ 	if (write_node(conn, node, false))
+@@ -1545,8 +1542,8 @@ static void manual_node(const char *name, const char *child)
+ 		barf_perror("Could not allocate initial node %s", name);
+ 
+ 	node->name = name;
+-	node->perms = &perms;
+-	node->num_perms = 1;
++	node->perms.p = &perms;
++	node->perms.num = 1;
+ 	node->children = (char *)child;
+ 	if (child)
+ 		node->childlen = strlen(child) + 1;
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 3cb1c235a101..193d93142636 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -109,6 +109,11 @@ struct connection
+ };
+ extern struct list_head connections;
+ 
++struct node_perms {
++	unsigned int num;
++	struct xs_permissions *p;
++};
++
+ struct node {
+ 	const char *name;
+ 
+@@ -120,8 +125,7 @@ struct node {
+ #define NO_GENERATION ~((uint64_t)0)
+ 
+ 	/* Permissions. */
+-	unsigned int num_perms;
+-	struct xs_permissions *perms;
++	struct node_perms perms;
+ 
+ 	/* Contents. */
+ 	unsigned int datalen;
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index 0e2926e2a3d0..dc51cdfa9aa7 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -657,12 +657,12 @@ void domain_entry_inc(struct connection *conn, struct node *node)
+ 	if (!conn)
+ 		return;
+ 
+-	if (node->perms && node->perms[0].id != conn->id) {
++	if (node->perms.p && node->perms.p[0].id != conn->id) {
+ 		if (conn->transaction) {
+ 			transaction_entry_inc(conn->transaction,
+-				node->perms[0].id);
++				node->perms.p[0].id);
+ 		} else {
+-			d = find_domain_by_domid(node->perms[0].id);
++			d = find_domain_by_domid(node->perms.p[0].id);
+ 			if (d)
+ 				d->nbentry++;
+ 		}
+@@ -683,12 +683,12 @@ void domain_entry_dec(struct connection *conn, struct node *node)
+ 	if (!conn)
+ 		return;
+ 
+-	if (node->perms && node->perms[0].id != conn->id) {
++	if (node->perms.p && node->perms.p[0].id != conn->id) {
+ 		if (conn->transaction) {
+ 			transaction_entry_dec(conn->transaction,
+-				node->perms[0].id);
++				node->perms.p[0].id);
+ 		} else {
+-			d = find_domain_by_domid(node->perms[0].id);
++			d = find_domain_by_domid(node->perms.p[0].id);
+ 			if (d && d->nbentry)
+ 				d->nbentry--;
+ 		}
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0009-tools-xenstore-allow-special-watches-for-privileged-.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0009-tools-xenstore-allow-special-watches-for-privileged-.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0009-tools-xenstore-allow-special-watches-for-privileged-.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0009-tools-xenstore-allow-special-watches-for-privileged-.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,237 @@
+From cddf74031b3c8a108e8fd7db0bf56e9c2809d3e2 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:45 +0200
+Subject: [PATCH 09/10] tools/xenstore: allow special watches for privileged
+ callers only
+
+The special watches "@introduceDomain" and "@releaseDomain" should be
+allowed for privileged callers only, as they allow to gain information
+about presence of other guests on the host. So send watch events for
+those watches via privileged connections only.
+
+In order to allow for disaggregated setups where e.g. driver domains
+need to make use of those special watches add support for calling
+"set permissions" for those special nodes, too.
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ docs/misc/xenstore.txt            |  5 +++
+ tools/xenstore/xenstored_core.c   | 27 ++++++++------
+ tools/xenstore/xenstored_core.h   |  2 ++
+ tools/xenstore/xenstored_domain.c | 60 +++++++++++++++++++++++++++++++
+ tools/xenstore/xenstored_domain.h |  5 +++
+ tools/xenstore/xenstored_watch.c  |  4 +++
+ 6 files changed, 93 insertions(+), 10 deletions(-)
+
+diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
+index 6f8569d5760f..32969eb3fecd 100644
+--- a/docs/misc/xenstore.txt
++++ b/docs/misc/xenstore.txt
+@@ -170,6 +170,9 @@ SET_PERMS		<path>|<perm-as-string>|+?
+ 		n<domid>	no access
+ 	See http://wiki.xen.org/wiki/XenBus section
+ 	`Permissions' for details of the permissions system.
++	It is possible to set permissions for the special watch paths
++	"@introduceDomain" and "@releaseDomain" to enable receiving those
++	watches in unprivileged domains.
+ 
+ ---------- Watches ----------
+ 
+@@ -194,6 +197,8 @@ WATCH			<wpath>|<token>|?
+ 	    @releaseDomain 	occurs on any domain crash or
+ 				shutdown, and also on RELEASE
+ 				and domain destruction
++	<wspecial> events are sent to privileged callers or explicitly
++	via SET_PERMS enabled domains only.
+ 
+ 	When a watch is first set up it is triggered once straight
+ 	away, with <path> equal to <wpath>.  Watches may be triggered
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index fe9943113b9f..720bec269dd3 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -468,8 +468,8 @@ static int write_node(struct connection *conn, struct node *node,
+ 	return write_node_raw(conn, &key, node, no_quota_check);
+ }
+ 
+-static enum xs_perm_type perm_for_conn(struct connection *conn,
+-				       const struct node_perms *perms)
++enum xs_perm_type perm_for_conn(struct connection *conn,
++				const struct node_perms *perms)
+ {
+ 	unsigned int i;
+ 	enum xs_perm_type mask = XS_PERM_READ|XS_PERM_WRITE|XS_PERM_OWNER;
+@@ -1245,22 +1245,29 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ 	if (perms.num < 2)
+ 		return EINVAL;
+ 
+-	/* First arg is node name. */
+-	/* We must own node to do this (tools can do this too). */
+-	node = get_node_canonicalized(conn, in, in->buffer, &name,
+-				      XS_PERM_WRITE | XS_PERM_OWNER);
+-	if (!node)
+-		return errno;
+-
+ 	permstr = in->buffer + strlen(in->buffer) + 1;
+ 	perms.num--;
+ 
+-	perms.p = talloc_array(node, struct xs_permissions, perms.num);
++	perms.p = talloc_array(in, struct xs_permissions, perms.num);
+ 	if (!perms.p)
+ 		return ENOMEM;
+ 	if (!xs_strings_to_perms(perms.p, perms.num, permstr))
+ 		return errno;
+ 
++	/* First arg is node name. */
++	if (strstarts(in->buffer, "@")) {
++		if (set_perms_special(conn, in->buffer, &perms))
++			return errno;
++		send_ack(conn, XS_SET_PERMS);
++		return 0;
++	}
++
++	/* We must own node to do this (tools can do this too). */
++	node = get_node_canonicalized(conn, in, in->buffer, &name,
++				      XS_PERM_WRITE | XS_PERM_OWNER);
++	if (!node)
++		return errno;
++
+ 	/* Unprivileged domains may not change the owner. */
+ 	if (domain_is_unprivileged(conn) &&
+ 	    perms.p[0].id != node->perms.p[0].id)
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 193d93142636..f3da6bbc943d 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -165,6 +165,8 @@ struct node *get_node(struct connection *conn,
+ struct connection *new_connection(connwritefn_t *write, connreadfn_t *read);
+ void check_store(void);
+ void corrupt(struct connection *conn, const char *fmt, ...);
++enum xs_perm_type perm_for_conn(struct connection *conn,
++				const struct node_perms *perms);
+ 
+ /* Is this a valid node name? */
+ bool is_valid_nodename(const char *node);
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index dc51cdfa9aa7..7afabe0ae084 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -41,6 +41,9 @@ static evtchn_port_t virq_port;
+ 
+ xenevtchn_handle *xce_handle = NULL;
+ 
++static struct node_perms dom_release_perms;
++static struct node_perms dom_introduce_perms;
++
+ struct domain
+ {
+ 	struct list_head list;
+@@ -589,6 +592,59 @@ void restore_existing_connections(void)
+ {
+ }
+ 
++static int set_dom_perms_default(struct node_perms *perms)
++{
++	perms->num = 1;
++	perms->p = talloc_array(NULL, struct xs_permissions, perms->num);
++	if (!perms->p)
++		return -1;
++	perms->p->id = 0;
++	perms->p->perms = XS_PERM_NONE;
++
++	return 0;
++}
++
++static struct node_perms *get_perms_special(const char *name)
++{
++	if (!strcmp(name, "@releaseDomain"))
++		return &dom_release_perms;
++	if (!strcmp(name, "@introduceDomain"))
++		return &dom_introduce_perms;
++	return NULL;
++}
++
++int set_perms_special(struct connection *conn, const char *name,
++		      struct node_perms *perms)
++{
++	struct node_perms *p;
++
++	p = get_perms_special(name);
++	if (!p)
++		return EINVAL;
++
++	if ((perm_for_conn(conn, p) & (XS_PERM_WRITE | XS_PERM_OWNER)) !=
++	    (XS_PERM_WRITE | XS_PERM_OWNER))
++		return EACCES;
++
++	p->num = perms->num;
++	talloc_free(p->p);
++	p->p = perms->p;
++	talloc_steal(NULL, perms->p);
++
++	return 0;
++}
++
++bool check_perms_special(const char *name, struct connection *conn)
++{
++	struct node_perms *p;
++
++	p = get_perms_special(name);
++	if (!p)
++		return false;
++
++	return perm_for_conn(conn, p) & XS_PERM_READ;
++}
++
+ static int dom0_init(void) 
+ { 
+ 	evtchn_port_t port;
+@@ -610,6 +666,10 @@ static int dom0_init(void)
+ 
+ 	xenevtchn_notify(xce_handle, dom0->port);
+ 
++	if (set_dom_perms_default(&dom_release_perms) ||
++	    set_dom_perms_default(&dom_introduce_perms))
++		return -1;
++
+ 	return 0; 
+ }
+ 
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index 56ae01597475..259183962a9c 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -65,6 +65,11 @@ void domain_watch_inc(struct connection *conn);
+ void domain_watch_dec(struct connection *conn);
+ int domain_watch(struct connection *conn);
+ 
++/* Special node permission handling. */
++int set_perms_special(struct connection *conn, const char *name,
++		      struct node_perms *perms);
++bool check_perms_special(const char *name, struct connection *conn);
++
+ /* Write rate limiting */
+ 
+ #define WRL_FACTOR   1000 /* for fixed-point arithmetic */
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index 3836675459fa..f4e289362eb6 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -133,6 +133,10 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ 
+ 	/* Create an event for each watch. */
+ 	list_for_each_entry(i, &connections, list) {
++		/* introduce/release domain watches */
++		if (check_special_event(name) && !check_perms_special(name, i))
++			continue;
++
+ 		list_for_each_entry(watch, &i->watches, list) {
+ 			if (exact) {
+ 				if (streq(name, watch->node))
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/0010-tools-xenstore-avoid-watch-events-for-nodes-without-.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/0010-tools-xenstore-avoid-watch-events-for-nodes-without-.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/0010-tools-xenstore-avoid-watch-events-for-nodes-without-.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/0010-tools-xenstore-avoid-watch-events-for-nodes-without-.patch	2022-05-26 17:34:05.000000000 +0100
@@ -0,0 +1,375 @@
+From e57b7687b43b033fe45e755e285efbe67bc71921 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 11 Jun 2020 16:12:46 +0200
+Subject: [PATCH 10/10] tools/xenstore: avoid watch events for nodes without
+ access
+
+Today watch events are sent regardless of the access rights of the
+node the event is sent for. This enables any guest to e.g. setup a
+watch for "/" in order to have a detailed record of all Xenstore
+modifications.
+
+Modify that by sending only watch events for nodes that the watcher
+has a chance to see otherwise (either via direct reads or by querying
+the children of a node). This includes cases where the visibility of
+a node for a watcher is changing (permissions being removed).
+
+This is part of XSA-115.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+[julieng: Handle rebase conflict]
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+---
+ tools/xenstore/xenstored_core.c        | 28 +++++-----
+ tools/xenstore/xenstored_core.h        | 15 ++++--
+ tools/xenstore/xenstored_domain.c      |  6 +--
+ tools/xenstore/xenstored_transaction.c | 21 +++++++-
+ tools/xenstore/xenstored_watch.c       | 75 +++++++++++++++++++-------
+ tools/xenstore/xenstored_watch.h       |  2 +-
+ 6 files changed, 104 insertions(+), 43 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 720bec269dd3..1c2845454560 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -358,8 +358,8 @@ static void initialize_fds(int sock, int *p_sock_pollfd_idx,
+  * If it fails, returns NULL and sets errno.
+  * Temporary memory allocations will be done with ctx.
+  */
+-static struct node *read_node(struct connection *conn, const void *ctx,
+-			      const char *name)
++struct node *read_node(struct connection *conn, const void *ctx,
++		       const char *name)
+ {
+ 	TDB_DATA key, data;
+ 	struct xs_tdb_record_hdr *hdr;
+@@ -494,7 +494,7 @@ enum xs_perm_type perm_for_conn(struct connection *conn,
+  * Get name of node parent.
+  * Temporary memory allocations are done with ctx.
+  */
+-static char *get_parent(const void *ctx, const char *node)
++char *get_parent(const void *ctx, const char *node)
+ {
+ 	char *parent;
+ 	char *slash = strrchr(node + 1, '/');
+@@ -566,10 +566,10 @@ static int errno_from_parents(struct connection *conn, const void *ctx,
+  * If it fails, returns NULL and sets errno.
+  * Temporary memory allocations are done with ctx.
+  */
+-struct node *get_node(struct connection *conn,
+-		      const void *ctx,
+-		      const char *name,
+-		      enum xs_perm_type perm)
++static struct node *get_node(struct connection *conn,
++			     const void *ctx,
++			     const char *name,
++			     enum xs_perm_type perm)
+ {
+ 	struct node *node;
+ 
+@@ -1056,7 +1056,7 @@ static int do_write(struct connection *conn, struct buffered_data *in)
+ 			return errno;
+ 	}
+ 
+-	fire_watches(conn, in, name, false);
++	fire_watches(conn, in, name, node, false, NULL);
+ 	send_ack(conn, XS_WRITE);
+ 
+ 	return 0;
+@@ -1078,7 +1078,7 @@ static int do_mkdir(struct connection *conn, struct buffered_data *in)
+ 		node = create_node(conn, in, name, NULL, 0);
+ 		if (!node)
+ 			return errno;
+-		fire_watches(conn, in, name, false);
++		fire_watches(conn, in, name, node, false, NULL);
+ 	}
+ 	send_ack(conn, XS_MKDIR);
+ 
+@@ -1141,7 +1141,7 @@ static int delete_node(struct connection *conn, const void *ctx,
+ 		talloc_free(name);
+ 	}
+ 
+-	fire_watches(conn, ctx, node->name, true);
++	fire_watches(conn, ctx, node->name, node, true, NULL);
+ 	delete_node_single(conn, node);
+ 	delete_child(conn, parent, basename(node->name));
+ 	talloc_free(node);
+@@ -1165,13 +1165,14 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
+ 	parent = read_node(conn, ctx, parentname);
+ 	if (!parent)
+ 		return (errno == ENOMEM) ? ENOMEM : EINVAL;
++	node->parent = parent;
+ 
+ 	/*
+ 	 * Fire the watches now, when we can still see the node permissions.
+ 	 * This fine as we are single threaded and the next possible read will
+ 	 * be handled only after the node has been really removed.
+ 	 */
+-	fire_watches(conn, ctx, name, false);
++	fire_watches(conn, ctx, name, node, false, NULL);
+ 	return delete_node(conn, ctx, parent, node);
+ }
+ 
+@@ -1237,7 +1238,7 @@ static int do_get_perms(struct connection *conn, struct buffered_data *in)
+ 
+ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ {
+-	struct node_perms perms;
++	struct node_perms perms, old_perms;
+ 	char *name, *permstr;
+ 	struct node *node;
+ 
+@@ -1273,6 +1274,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ 	    perms.p[0].id != node->perms.p[0].id)
+ 		return EPERM;
+ 
++	old_perms = node->perms;
+ 	domain_entry_dec(conn, node);
+ 	node->perms = perms;
+ 	domain_entry_inc(conn, node);
+@@ -1280,7 +1282,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ 	if (write_node(conn, node, false))
+ 		return errno;
+ 
+-	fire_watches(conn, in, name, false);
++	fire_watches(conn, in, name, node, false, &old_perms);
+ 	send_ack(conn, XS_SET_PERMS);
+ 
+ 	return 0;
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index f3da6bbc943d..e050b27cbdde 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -152,15 +152,17 @@ void send_ack(struct connection *conn, enum xsd_sockmsg_type type);
+ /* Canonicalize this path if possible. */
+ char *canonicalize(struct connection *conn, const void *ctx, const char *node);
+ 
++/* Get access permissions. */
++enum xs_perm_type perm_for_conn(struct connection *conn,
++				const struct node_perms *perms);
++
+ /* Write a node to the tdb data base. */
+ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ 		   bool no_quota_check);
+ 
+-/* Get this node, checking we have permissions. */
+-struct node *get_node(struct connection *conn,
+-		      const void *ctx,
+-		      const char *name,
+-		      enum xs_perm_type perm);
++/* Get a node from the tdb data base. */
++struct node *read_node(struct connection *conn, const void *ctx,
++		       const char *name);
+ 
+ struct connection *new_connection(connwritefn_t *write, connreadfn_t *read);
+ void check_store(void);
+@@ -171,6 +173,9 @@ enum xs_perm_type perm_for_conn(struct connection *conn,
+ /* Is this a valid node name? */
+ bool is_valid_nodename(const char *node);
+ 
++/* Get name of parent node. */
++char *get_parent(const void *ctx, const char *node);
++
+ /* Tracing infrastructure. */
+ void trace_create(const void *data, const char *type);
+ void trace_destroy(const void *data, const char *type);
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index 7afabe0ae084..711a11b18ad6 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -206,7 +206,7 @@ static int destroy_domain(void *_domain)
+ 			unmap_interface(domain->interface);
+ 	}
+ 
+-	fire_watches(NULL, domain, "@releaseDomain", false);
++	fire_watches(NULL, domain, "@releaseDomain", NULL, false, NULL);
+ 
+ 	wrl_domain_destroy(domain);
+ 
+@@ -244,7 +244,7 @@ static void domain_cleanup(void)
+ 	}
+ 
+ 	if (notify)
+-		fire_watches(NULL, NULL, "@releaseDomain", false);
++		fire_watches(NULL, NULL, "@releaseDomain", NULL, false, NULL);
+ }
+ 
+ /* We scan all domains rather than use the information given here. */
+@@ -410,7 +410,7 @@ int do_introduce(struct connection *conn, struct buffered_data *in)
+ 		/* Now domain belongs to its connection. */
+ 		talloc_steal(domain->conn, domain);
+ 
+-		fire_watches(NULL, in, "@introduceDomain", false);
++		fire_watches(NULL, in, "@introduceDomain", NULL, false, NULL);
+ 	} else if ((domain->mfn == mfn) && (domain->conn != conn)) {
+ 		/* Use XS_INTRODUCE for recreating the xenbus event-channel. */
+ 		if (domain->port)
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index e87897573469..a7d8c5d475ec 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -114,6 +114,9 @@ struct accessed_node
+ 	/* Generation count (or NO_GENERATION) for conflict checking. */
+ 	uint64_t generation;
+ 
++	/* Original node permissions. */
++	struct node_perms perms;
++
+ 	/* Generation count checking required? */
+ 	bool check_gen;
+ 
+@@ -260,6 +263,15 @@ int access_node(struct connection *conn, struct node *node,
+ 		i->node = talloc_strdup(i, node->name);
+ 		if (!i->node)
+ 			goto nomem;
++		if (node->generation != NO_GENERATION && node->perms.num) {
++			i->perms.p = talloc_array(i, struct xs_permissions,
++						  node->perms.num);
++			if (!i->perms.p)
++				goto nomem;
++			i->perms.num = node->perms.num;
++			memcpy(i->perms.p, node->perms.p,
++			       i->perms.num * sizeof(*i->perms.p));
++		}
+ 
+ 		introduce = true;
+ 		i->ta_node = false;
+@@ -368,9 +380,14 @@ static int finalize_transaction(struct connection *conn,
+ 				talloc_free(data.dptr);
+ 				if (ret)
+ 					goto err;
+-			} else if (tdb_delete(tdb_ctx, key))
++				fire_watches(conn, trans, i->node, NULL, false,
++					     i->perms.p ? &i->perms : NULL);
++			} else {
++				fire_watches(conn, trans, i->node, NULL, false,
++					     i->perms.p ? &i->perms : NULL);
++				if (tdb_delete(tdb_ctx, key))
+ 					goto err;
+-			fire_watches(conn, trans, i->node, false);
++			}
+ 		}
+ 
+ 		if (i->ta_node && tdb_delete(tdb_ctx, ta_key))
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index f4e289362eb6..71c108ea99f1 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -85,22 +85,6 @@ static void add_event(struct connection *conn,
+ 	unsigned int len;
+ 	char *data;
+ 
+-	if (!check_special_event(name)) {
+-		/* Can this conn load node, or see that it doesn't exist? */
+-		struct node *node = get_node(conn, ctx, name, XS_PERM_READ);
+-		/*
+-		 * XXX We allow EACCES here because otherwise a non-dom0
+-		 * backend driver cannot watch for disappearance of a frontend
+-		 * xenstore directory. When the directory disappears, we
+-		 * revert to permissions of the parent directory for that path,
+-		 * which will typically disallow access for the backend.
+-		 * But this breaks device-channel teardown!
+-		 * Really we should fix this better...
+-		 */
+-		if (!node && errno != ENOENT && errno != EACCES)
+-			return;
+-	}
+-
+ 	if (watch->relative_path) {
+ 		name += strlen(watch->relative_path);
+ 		if (*name == '/') /* Could be "" */
+@@ -117,12 +101,60 @@ static void add_event(struct connection *conn,
+ 	talloc_free(data);
+ }
+ 
++/*
++ * Check permissions of a specific watch to fire:
++ * Either the node itself or its parent have to be readable by the connection
++ * the watch has been setup for. In case a watch event is created due to
++ * changed permissions we need to take the old permissions into account, too.
++ */
++static bool watch_permitted(struct connection *conn, const void *ctx,
++			    const char *name, struct node *node,
++			    struct node_perms *perms)
++{
++	enum xs_perm_type perm;
++	struct node *parent;
++	char *parent_name;
++
++	if (perms) {
++		perm = perm_for_conn(conn, perms);
++		if (perm & XS_PERM_READ)
++			return true;
++	}
++
++	if (!node) {
++		node = read_node(conn, ctx, name);
++		if (!node)
++			return false;
++	}
++
++	perm = perm_for_conn(conn, &node->perms);
++	if (perm & XS_PERM_READ)
++		return true;
++
++	parent = node->parent;
++	if (!parent) {
++		parent_name = get_parent(ctx, node->name);
++		if (!parent_name)
++			return false;
++		parent = read_node(conn, ctx, parent_name);
++		if (!parent)
++			return false;
++	}
++
++	perm = perm_for_conn(conn, &parent->perms);
++
++	return perm & XS_PERM_READ;
++}
++
+ /*
+  * Check whether any watch events are to be sent.
+  * Temporary memory allocations are done with ctx.
++ * We need to take the (potential) old permissions of the node into account
++ * as a watcher losing permissions to access a node should receive the
++ * watch event, too.
+  */
+ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+-		  bool exact)
++		  struct node *node, bool exact, struct node_perms *perms)
+ {
+ 	struct connection *i;
+ 	struct watch *watch;
+@@ -134,8 +166,13 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ 	/* Create an event for each watch. */
+ 	list_for_each_entry(i, &connections, list) {
+ 		/* introduce/release domain watches */
+-		if (check_special_event(name) && !check_perms_special(name, i))
+-			continue;
++		if (check_special_event(name)) {
++			if (!check_perms_special(name, i))
++				continue;
++		} else {
++			if (!watch_permitted(i, ctx, name, node, perms))
++				continue;
++		}
+ 
+ 		list_for_each_entry(watch, &i->watches, list) {
+ 			if (exact) {
+diff --git a/tools/xenstore/xenstored_watch.h b/tools/xenstore/xenstored_watch.h
+index 1b3c80d3dda1..03094374f379 100644
+--- a/tools/xenstore/xenstored_watch.h
++++ b/tools/xenstore/xenstored_watch.h
+@@ -26,7 +26,7 @@ int do_unwatch(struct connection *conn, struct buffered_data *in);
+ 
+ /* Fire all watches: !exact means all the children are affected (ie. rm). */
+ void fire_watches(struct connection *conn, const void *tmp, const char *name,
+-		  bool exact);
++		  struct node *node, bool exact, struct node_perms *perms);
+ 
+ void conn_delete_all_watches(struct connection *conn);
+ 
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/AMD-IOMMU-fix-off-by-one-in-amd_iommu_get_paging_mode-callers.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/AMD-IOMMU-fix-off-by-one-in-amd_iommu_get_paging_mode-callers.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/AMD-IOMMU-fix-off-by-one-in-amd_iommu_get_paging_mode-callers.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/AMD-IOMMU-fix-off-by-one-in-amd_iommu_get_paging_mode-callers.patch	2022-06-01 21:10:15.000000000 +0100
@@ -0,0 +1,124 @@
+From 696d142276e277264a9c6fcdd4f00edc8a6ce292 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 9 Apr 2020 10:11:50 +0200
+Subject: [PATCH] AMD/IOMMU: fix off-by-one in amd_iommu_get_paging_mode()
+ callers
+
+amd_iommu_get_paging_mode() expects a count, not a "maximum possible"
+value. Prior to b4f042236ae0 dropping the reference, the use of our mis-
+named "max_page" in amd_iommu_domain_init() may have lead to such a
+misunderstanding. In an attempt to avoid such confusion in the future,
+rename the function's parameter and - while at it - convert it to an
+inline function.
+
+Also replace a literal 4 by an expression tying it to a wider use
+constant, just like amd_iommu_quarantine_init() does.
+
+Fixes: ea38867831da ("x86 / iommu: set up a scratch page in the quarantine domain")
+Fixes: b4f042236ae0 ("AMD/IOMMU: Cease using a dynamic height for the IOMMU pagetables")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: b75b3c62fe4afe381c6f74a07f614c0b39fe2f5d
+master date: 2020-03-16 11:24:29 +0100
+---
+ xen/drivers/passthrough/amd/iommu_map.c       |  6 ++---
+ xen/drivers/passthrough/amd/pci_amd_iommu.c   | 23 ++++---------------
+ xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 17 +++++++++++++-
+ 3 files changed, 23 insertions(+), 23 deletions(-)
+
+diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
+index 21fbea0467..aa382dbabd 100644
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -745,9 +745,9 @@ void amd_iommu_share_p2m(struct domain *d)
+ int __init amd_iommu_quarantine_init(struct domain *d)
+ {
+     struct domain_iommu *hd = dom_iommu(d);
+-    unsigned long max_gfn =
+-        PFN_DOWN((1ul << DEFAULT_DOMAIN_ADDRESS_WIDTH) - 1);
+-    unsigned int level = amd_iommu_get_paging_mode(max_gfn);
++    unsigned long end_gfn =
++        1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT);
++    unsigned int level = amd_iommu_get_paging_mode(end_gfn);
+     uint64_t *table;
+ 
+     if ( hd->arch.root_table )
+diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+index 0b641ff75c..983ece5981 100644
+--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
++++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+@@ -218,22 +218,6 @@ static int __must_check allocate_domain_resources(struct domain_iommu *hd)
+     return rc;
+ }
+ 
+-int amd_iommu_get_paging_mode(unsigned long entries)
+-{
+-    int level = 1;
+-
+-    BUG_ON( !entries );
+-
+-    while ( entries > PTE_PER_TABLE_SIZE )
+-    {
+-        entries = PTE_PER_TABLE_ALIGN(entries) >> PTE_PER_TABLE_SHIFT;
+-        if ( ++level > 6 )
+-            return -ENOMEM;
+-    }
+-
+-    return level;
+-}
+-
+ static int amd_iommu_domain_init(struct domain *d)
+ {
+     struct domain_iommu *hd = dom_iommu(d);
+@@ -246,9 +230,10 @@ static int amd_iommu_domain_init(struct domain *d)
+      *   physical address space we give it, but this isn't known yet so use 4
+      *   unilaterally.
+      */
+-    hd->arch.paging_mode = is_hvm_domain(d)
+-        ? IOMMU_PAGING_MODE_LEVEL_4
+-        : amd_iommu_get_paging_mode(get_upper_mfn_bound());
++    hd->arch.paging_mode = amd_iommu_get_paging_mode(
++        is_hvm_domain(d)
++        ? 1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
++        : get_upper_mfn_bound() + 1);
+ 
+     return 0;
+ }
+diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+index c42688fe51..22d6614169 100644
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+@@ -51,7 +51,6 @@ void get_iommu_features(struct amd_iommu *iommu);
+ int amd_iommu_init(void);
+ int amd_iommu_update_ivrs_mapping_acpi(void);
+ 
+-int amd_iommu_get_paging_mode(unsigned long entries);
+ int amd_iommu_quarantine_init(struct domain *d);
+ 
+ /* mapping functions */
+@@ -168,6 +167,22 @@ static inline unsigned long region_to_pages(unsigned long addr, unsigned long si
+     return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >> PAGE_SHIFT;
+ }
+ 
++static inline int amd_iommu_get_paging_mode(unsigned long max_frames)
++{
++    int level = 1;
++
++    BUG_ON(!max_frames);
++
++    while ( max_frames > PTE_PER_TABLE_SIZE )
++    {
++        max_frames = PTE_PER_TABLE_ALIGN(max_frames) >> PTE_PER_TABLE_SHIFT;
++        if ( ++level > 6 )
++            return -ENOMEM;
++    }
++
++    return level;
++}
++
+ static inline struct page_info* alloc_amd_iommu_pgtable(void)
+ {
+     struct page_info *pg;
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/amd-iommu-get-rid-of-pointless-IOMMU_PAGING_MODE_LEVEL_X-definitions.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/amd-iommu-get-rid-of-pointless-IOMMU_PAGING_MODE_LEVEL_X-definitions.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/amd-iommu-get-rid-of-pointless-IOMMU_PAGING_MODE_LEVEL_X-definitions.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/amd-iommu-get-rid-of-pointless-IOMMU_PAGING_MODE_LEVEL_X-definitions.patch	2022-06-15 19:47:32.000000000 +0100
@@ -0,0 +1,169 @@
+From 1ecb1ee4d8475475c3ccf72f6654644b242ce856 Mon Sep 17 00:00:00 2001
+From: Paul Durrant <paul.durrant@citrix.com>
+Date: Mon, 29 Oct 2018 13:47:24 +0100
+Subject: [PATCH] amd-iommu: get rid of pointless IOMMU_PAGING_MODE_LEVEL_X
+ definitions
+
+The levels are absolute numbers such that IOMMU_PAGING_MODE_LEVEL_X
+evaluates to X (for the valid range of 0 - 7) so simply use numbers in
+the code.
+
+No functional change.
+
+NOTE: This patch also adds emacs boilerplate to amd-iommu-defs.h
+
+Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
+Acked-by: Brian Woods <brian.woods@amd.com>
+---
+ xen/drivers/passthrough/amd/iommu_map.c      | 26 +++++++++-----------
+ xen/drivers/passthrough/amd/pci_amd_iommu.c  |  4 +--
+ xen/include/asm-x86/hvm/svm/amd-iommu-defs.h | 21 ++++++++--------
+ 3 files changed, 23 insertions(+), 28 deletions(-)
+
+diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
+index d03a6d72b9..6a2c877d34 100644
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -40,7 +40,7 @@ static void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long gfn)
+     u64 *table, *pte;
+ 
+     table = map_domain_page(_mfn(l1_mfn));
+-    pte = table + pfn_to_pde_idx(gfn, IOMMU_PAGING_MODE_LEVEL_1);
++    pte = table + pfn_to_pde_idx(gfn, 1);
+     write_atomic(pte, 0);
+     unmap_domain_page(table);
+ }
+@@ -103,7 +103,7 @@ static bool_t set_iommu_pde_present(u32 *pde, unsigned long next_mfn,
+     /* FC bit should be enabled in PTE, this helps to solve potential
+      * issues with ATS devices
+      */
+-    if ( next_level == IOMMU_PAGING_MODE_LEVEL_0 )
++    if ( next_level == 0 )
+         set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, entry,
+                              IOMMU_PTE_FC_MASK, IOMMU_PTE_FC_SHIFT, &entry);
+     full = (uint64_t)entry << 32;
+@@ -137,8 +137,7 @@ static bool_t set_iommu_pte_present(unsigned long pt_mfn, unsigned long gfn,
+ 
+     pde = (u32*)(table + pfn_to_pde_idx(gfn, pde_level));
+ 
+-    need_flush = set_iommu_pde_present(pde, next_mfn, 
+-                                       IOMMU_PAGING_MODE_LEVEL_0, iw, ir);
++    need_flush = set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
+     unmap_domain_page(table);
+     return need_flush;
+ }
+@@ -458,8 +457,7 @@ static int iommu_merge_pages(struct domain *d, unsigned long pt_mfn,
+     }
+ 
+     /* setup super page mapping, next level = 0 */
+-    set_iommu_pde_present((u32*)pde, first_mfn,
+-                          IOMMU_PAGING_MODE_LEVEL_0,
++    set_iommu_pde_present((u32*)pde, first_mfn, 0,
+                           !!(flags & IOMMUF_writable),
+                           !!(flags & IOMMUF_readable));
+ 
+@@ -486,25 +484,24 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned long gfn,
+     table = hd->arch.root_table;
+     level = hd->arch.paging_mode;
+ 
+-    BUG_ON( table == NULL || level < IOMMU_PAGING_MODE_LEVEL_1 || 
+-            level > IOMMU_PAGING_MODE_LEVEL_6 );
++    BUG_ON( table == NULL || level < 1 || level > 6 );
+ 
+     /*
+      * A frame number past what the current page tables can represent can't
+      * possibly have a mapping.
+      */
+     if ( pfn >> (PTE_PER_TABLE_SHIFT * level) )
+         return 0;
+ 
+     next_table_mfn = mfn_x(page_to_mfn(table));
+ 
+-    if ( level == IOMMU_PAGING_MODE_LEVEL_1 )
++    if ( level == 1 )
+     {
+         pt_mfn[level] = next_table_mfn;
+         return 0;
+     }
+ 
+-    while ( level > IOMMU_PAGING_MODE_LEVEL_1 )
++    while ( level > 1 )
+     {
+         unsigned int next_level = level - 1;
+         pt_mfn[level] = next_table_mfn;
+@@ -622,8 +619,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
+     }
+ 
+     /* Install 4k mapping first */
+-    need_flush = set_iommu_pte_present(pt_mfn[1], gfn, mfn, 
+-                                       IOMMU_PAGING_MODE_LEVEL_1,
++    need_flush = set_iommu_pte_present(pt_mfn[1], gfn, mfn, 1,
+                                        !!(flags & IOMMUF_writable),
+                                        !!(flags & IOMMUF_readable));
+ 
+@@ -646,8 +642,8 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
+         goto out;
+     }
+ 
+-    for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
+-          merge_level <= hd->arch.paging_mode; merge_level++ )
++    for ( merge_level = 2; merge_level <= hd->arch.paging_mode;
++          merge_level++ )
+     {
+         if ( pt_mfn[merge_level] == 0 )
+             break;
+@@ -777,7 +773,7 @@ void amd_iommu_share_p2m(struct domain *d)
+         hd->arch.root_table = p2m_table;
+ 
+         /* When sharing p2m with iommu, paging mode = 4 */
+-        hd->arch.paging_mode = IOMMU_PAGING_MODE_LEVEL_4;
++        hd->arch.paging_mode = 4;
+         AMD_IOMMU_DEBUG("Share p2m table with iommu: p2m table = %#lx\n",
+                         mfn_x(pgd_mfn));
+     }
+diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+index 1f19cd3d27..a217245249 100644
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+@@ -35,8 +35,7 @@
+ 	PAGE_SIZE * (PTE_PER_TABLE_ALIGN(entries) >> PTE_PER_TABLE_SHIFT)
+ 
+ #define amd_offset_level_address(offset, level) \
+-      	((u64)(offset) << (12 + (PTE_PER_TABLE_SHIFT * \
+-                                (level - IOMMU_PAGING_MODE_LEVEL_1))))
++        ((uint64_t)(offset) << (12 + (PTE_PER_TABLE_SHIFT * ((level) - 1))))
+ 
+ #define PCI_MIN_CAP_OFFSET	0x40
+ #define PCI_MAX_CAP_BLOCKS	48
+@@ -446,14 +445,6 @@
+ 
+ /* Paging modes */
+ #define IOMMU_PAGING_MODE_DISABLED	0x0
+-#define IOMMU_PAGING_MODE_LEVEL_0	0x0
+-#define IOMMU_PAGING_MODE_LEVEL_1	0x1
+-#define IOMMU_PAGING_MODE_LEVEL_2	0x2
+-#define IOMMU_PAGING_MODE_LEVEL_3	0x3
+-#define IOMMU_PAGING_MODE_LEVEL_4	0x4
+-#define IOMMU_PAGING_MODE_LEVEL_5	0x5
+-#define IOMMU_PAGING_MODE_LEVEL_6	0x6
+-#define IOMMU_PAGING_MODE_LEVEL_7	0x7
+ 
+ /* Flags */
+ #define IOMMU_CONTROL_DISABLED	0
+@@ -494,3 +485,13 @@
+ #define IOMMU_REG_BASE_ADDR_HIGH_SHIFT              0
+ 
+ #endif /* _ASM_X86_64_AMD_IOMMU_DEFS_H */
++
++/*
++ * Local variables:
++ * mode: C
++ * c-file-style: "BSD"
++ * c-basic-offset: 4
++ * tab-width: 4
++ * indent-tabs-mode: nil
++ * End:
++ */
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/evtchn-fifo-use-stable-fields-when-recording-last-queue-information.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/evtchn-fifo-use-stable-fields-when-recording-last-queue-information.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/evtchn-fifo-use-stable-fields-when-recording-last-queue-information.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/evtchn-fifo-use-stable-fields-when-recording-last-queue-information.patch	2022-06-01 11:07:17.000000000 +0100
@@ -0,0 +1,41 @@
+From 2a730d5b6ad1ea95c3d67fa12ab0091d32b29505 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 1 Dec 2020 17:03:12 +0100
+Subject: [PATCH] evtchn/fifo: use stable fields when recording "last queue"
+ information
+
+Both evtchn->priority and evtchn->notify_vcpu_id could change behind the
+back of evtchn_fifo_set_pending(), as for it - in the case of
+interdomain channels - only the remote side's per-channel lock is held.
+Neither the queue's priority nor the vCPU's vcpu_id fields have similar
+properties, so they seem better suited for the purpose. In particular
+they reflect the respective evtchn fields' values at the time they were
+used to determine queue and vCPU.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+master commit: 6f6f07b64cbe90e54f8e62b4d6f2404cf5306536
+master date: 2020-10-02 08:37:35 +0200
+---
+ xen/common/event_fifo.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
+index 45c024739d..98742ba9cb 100644
+--- a/xen/common/event_fifo.c
++++ b/xen/common/event_fifo.c
+@@ -224,8 +224,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+         /* Moved to a different queue? */
+         if ( old_q != q )
+         {
+-            evtchn->last_vcpu_id = evtchn->notify_vcpu_id;
+-            evtchn->last_priority = evtchn->priority;
++            evtchn->last_vcpu_id = v->vcpu_id;
++            evtchn->last_priority = q->priority;
+ 
+             spin_unlock_irqrestore(&old_q->lock, flags);
+             spin_lock_irqsave(&q->lock, flags);
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/fix_event_channel_race.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/fix_event_channel_race.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/fix_event_channel_race.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/fix_event_channel_race.patch	2022-05-30 12:44:59.000000000 +0100
@@ -0,0 +1,196 @@
+diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
+index b1951a29ad..0a90a8404d 100644
+--- a/xen/common/event_fifo.c
++++ b/xen/common/event_fifo.c
+@@ -65,38 +65,6 @@ static void evtchn_fifo_init(struct domain *d, struct evtchn *evtchn)
+                  d->domain_id, evtchn->port);
+ }
+ 
+-static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
+-                                                struct evtchn *evtchn,
+-                                                unsigned long *flags)
+-{
+-    struct vcpu *v;
+-    struct evtchn_fifo_queue *q, *old_q;
+-    unsigned int try;
+-    union evtchn_fifo_lastq lastq;
+-
+-    for ( try = 0; try < 3; try++ )
+-    {
+-        lastq.raw = read_atomic(&evtchn->fifo_lastq);
+-        v = d->vcpu[lastq.last_vcpu_id];
+-        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
+-
+-        spin_lock_irqsave(&old_q->lock, *flags);
+-
+-        v = d->vcpu[lastq.last_vcpu_id];
+-        q = &v->evtchn_fifo->queue[lastq.last_priority];
+-
+-        if ( old_q == q )
+-            return old_q;
+-
+-        spin_unlock_irqrestore(&old_q->lock, *flags);
+-    }
+-
+-    gprintk(XENLOG_WARNING,
+-            "dom%d port %d lost event (too many queue changes)\n",
+-            d->domain_id, evtchn->port);
+-    return NULL;
+-}          
+-
+ static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link)
+ {
+     event_word_t new, old;
+@@ -168,6 +136,9 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+     event_word_t *word;
+     unsigned long flags;
+     bool_t was_pending;
++    struct evtchn_fifo_queue *q, *old_q;
++    unsigned int try;
++    bool linked = true;
+ 
+     port = evtchn->port;
+     word = evtchn_fifo_word_from_port(d, port);
+@@ -182,17 +153,67 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+         return;
+     }
+ 
++    /*
++     * Lock all queues related to the event channel (in case of a queue change
++     * this might be two).
++     * It is mandatory to do that before setting and testing the PENDING bit
++     * and to hold the current queue lock until the event has been put into the
++     * list of pending events in order to avoid waking up a guest without the
++     * event being visibly pending in the guest.
++     */
++    for ( try = 0; try < 3; try++ )
++    {
++        union evtchn_fifo_lastq lastq;
++        const struct vcpu *old_v;
++
++        lastq.raw = read_atomic(&evtchn->fifo_lastq);
++        old_v = d->vcpu[lastq.last_vcpu_id];
++
++        q = &v->evtchn_fifo->queue[evtchn->priority];
++        old_q = &old_v->evtchn_fifo->queue[lastq.last_priority];
++
++        if ( q == old_q )
++            spin_lock_irqsave(&q->lock, flags);
++        else if ( q < old_q )
++        {
++            spin_lock_irqsave(&q->lock, flags);
++            spin_lock(&old_q->lock);
++        }
++        else
++        {
++            spin_lock_irqsave(&old_q->lock, flags);
++            spin_lock(&q->lock);
++        }
++
++        lastq.raw = read_atomic(&evtchn->fifo_lastq);
++        old_v = d->vcpu[lastq.last_vcpu_id];
++        if ( q == &v->evtchn_fifo->queue[evtchn->priority] &&
++             old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] )
++            break;
++
++        if ( q != old_q )
++            spin_unlock(&old_q->lock);
++        spin_unlock_irqrestore(&q->lock, flags);
++    }
++
+     was_pending = guest_test_and_set_bit(d, EVTCHN_FIFO_PENDING, word);
+ 
++    /* If we didn't get the lock bail out. */
++    if ( try == 3 )
++    {
++        gprintk(XENLOG_WARNING,
++                "%pd port %u lost event (too many queue changes)\n",
++                d, evtchn->port);
++        goto done;
++    }
++
+     /*
+      * Link the event if it unmasked and not already linked.
+      */
+     if ( !guest_test_bit(d, EVTCHN_FIFO_MASKED, word) &&
+          !guest_test_bit(d, EVTCHN_FIFO_LINKED, word) )
+     {
+-        struct evtchn_fifo_queue *q, *old_q;
+         event_word_t *tail_word;
+-        bool_t linked = 0;
+ 
+         /*
+          * Control block not mapped.  The guest must not unmask an
+@@ -203,25 +224,11 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+         {
+             printk(XENLOG_G_WARNING
+                    "%pv has no FIFO event channel control block\n", v);
+-            goto done;
++            goto unlock;
+         }
+ 
+-        /*
+-         * No locking around getting the queue. This may race with
+-         * changing the priority but we are allowed to signal the
+-         * event once on the old priority.
+-         */
+-        q = &v->evtchn_fifo->queue[evtchn->priority];
+-
+-        old_q = lock_old_queue(d, evtchn, &flags);
+-        if ( !old_q )
+-            goto done;
+-
+         if ( guest_test_and_set_bit(d, EVTCHN_FIFO_LINKED, word) )
+-        {
+-            spin_unlock_irqrestore(&old_q->lock, flags);
+-            goto done;
+-        }
++            goto unlock;
+ 
+         /*
+          * If this event was a tail, the old queue is now empty and
+@@ -240,8 +247,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+             lastq.last_priority = q->priority;
+             write_atomic(&evtchn->fifo_lastq, lastq.raw);
+ 
+-            spin_unlock_irqrestore(&old_q->lock, flags);
+-            spin_lock_irqsave(&q->lock, flags);
++            spin_unlock(&old_q->lock);
++            old_q = q;
+         }
+ 
+         /*
+@@ -254,6 +261,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+          * If the queue is empty (i.e., we haven't linked to the new
+          * event), head must be updated.
+          */
++        linked = false;
+         if ( q->tail )
+         {
+             tail_word = evtchn_fifo_word_from_port(d, q->tail);
+@@ -262,15 +270,19 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+         if ( !linked )
+             write_atomic(q->head, port);
+         q->tail = port;
++    }
+ 
+-        spin_unlock_irqrestore(&q->lock, flags);
++ unlock:
++    if ( q != old_q )
++        spin_unlock(&old_q->lock);
++    spin_unlock_irqrestore(&q->lock, flags);
+ 
+-        if ( !linked
+-             && !guest_test_and_set_bit(d, q->priority,
+-                                        &v->evtchn_fifo->control_block->ready) )
+-            vcpu_mark_events_pending(v);
+-    }
+  done:
++    if ( !linked &&
++         !guest_test_and_set_bit(d, q->priority,
++                                 &v->evtchn_fifo->control_block->ready) )
++        vcpu_mark_events_pending(v);
++
+     if ( !was_pending )
+         evtchn_check_pollers(d, port);
+ }
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0001-introduce-unaligned.h.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0001-introduce-unaligned.h.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0001-introduce-unaligned.h.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0001-introduce-unaligned.h.patch	2022-07-13 14:06:12.000000000 +0100
@@ -0,0 +1,284 @@
+From 3453f57b52a84a522b864a5d01773e0911a2184e Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 18 Jan 2021 12:09:13 +0100
+Subject: [PATCH 1/5] introduce unaligned.h
+
+Rather than open-coding commonly used constructs in yet more places when
+pulling in zstd decompression support (and its xxhash prereq), pull out
+the custom bits into a commonly used header (for the hypervisor build;
+the tool stack and stubdom builds of libxenguest will still remain in
+need of similarly taking care of). For now this is limited to x86, where
+custom logic isn't needed (considering this is going to be used in init
+code only, even using alternatives patching to use MOVBE doesn't seem
+worthwhile).
+
+For Arm64 with CONFIG_ACPI=y (due to efi-dom0.c's re-use of xz/crc32.c)
+drop the not really necessary inclusion of xz's private.h.
+
+No change in generated code.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1956166
+Origin: backport, http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=7c9f81687ad611515474b1c17afc2f79f19faef5
+[backport: xen/common/lzo.c: refresh 2 context lines.]
+---
+ xen/common/lz4/defs.h           |  9 ++--
+ xen/common/lzo.c                |  7 ++-
+ xen/common/unlzo.c              | 19 ++------
+ xen/common/xz/crc32.c           |  2 -
+ xen/common/xz/private.h         | 23 +++-------
+ xen/include/asm-x86/unaligned.h |  6 +++
+ xen/include/xen/unaligned.h     | 79 +++++++++++++++++++++++++++++++++
+ 7 files changed, 104 insertions(+), 41 deletions(-)
+ create mode 100644 xen/include/asm-x86/unaligned.h
+ create mode 100644 xen/include/xen/unaligned.h
+
+diff --git a/xen/common/lz4/defs.h b/xen/common/lz4/defs.h
+index d886a4e122b8..4fbea2ac3dd4 100644
+--- a/xen/common/lz4/defs.h
++++ b/xen/common/lz4/defs.h
+@@ -10,18 +10,21 @@
+ 
+ #ifdef __XEN__
+ #include <asm/byteorder.h>
+-#endif
++#include <asm/unaligned.h>
++#else
+ 
+-static inline u16 INIT get_unaligned_le16(const void *p)
++static inline u16 get_unaligned_le16(const void *p)
+ {
+ 	return le16_to_cpup(p);
+ }
+ 
+-static inline u32 INIT get_unaligned_le32(const void *p)
++static inline u32 get_unaligned_le32(const void *p)
+ {
+ 	return le32_to_cpup(p);
+ }
+ 
++#endif
++
+ /*
+  * Detects 64 bits mode
+  */
+diff --git a/xen/common/lzo.c b/xen/common/lzo.c
+index 74831cb26836..f1cd1b58d27f 100644
+--- a/xen/common/lzo.c
++++ b/xen/common/lzo.c
+@@ -97,13 +97,12 @@
+ #ifdef __XEN__
+ #include <xen/lib.h>
+ #include <asm/byteorder.h>
++#include <asm/unaligned.h>
++#else
++#define get_unaligned_le16(_p) (*(u16 *)(_p))
+ #endif
+ 
+ #include <xen/lzo.h>
+-#define get_unaligned(_p) (*(_p))
+-#define put_unaligned(_val,_p) (*(_p)=_val)
+-#define get_unaligned_le16(_p) (*(u16 *)(_p))
+-#define get_unaligned_le32(_p) (*(u32 *)(_p))
+ 
+ static noinline size_t
+ lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
+diff --git a/xen/common/unlzo.c b/xen/common/unlzo.c
+index 5ae6cf911e86..11f64fcf3b26 100644
+--- a/xen/common/unlzo.c
++++ b/xen/common/unlzo.c
+@@ -34,30 +34,19 @@
+ 
+ #ifdef __XEN__
+ #include <asm/byteorder.h>
+-#endif
++#include <asm/unaligned.h>
++#else
+ 
+-#if 1 /* ndef CONFIG_??? */
+-static inline u16 INIT get_unaligned_be16(void *p)
++static inline u16 get_unaligned_be16(const void *p)
+ {
+ 	return be16_to_cpup(p);
+ }
+ 
+-static inline u32 INIT get_unaligned_be32(void *p)
++static inline u32 get_unaligned_be32(const void *p)
+ {
+ 	return be32_to_cpup(p);
+ }
+-#else
+-#include <asm/unaligned.h>
+-
+-static inline u16 INIT get_unaligned_be16(void *p)
+-{
+-	return be16_to_cpu(__get_unaligned(p, 2));
+-}
+ 
+-static inline u32 INIT get_unaligned_be32(void *p)
+-{
+-	return be32_to_cpu(__get_unaligned(p, 4));
+-}
+ #endif
+ 
+ static const unsigned char lzop_magic[] = {
+diff --git a/xen/common/xz/crc32.c b/xen/common/xz/crc32.c
+index af08ae2cf6e2..0708b6163812 100644
+--- a/xen/common/xz/crc32.c
++++ b/xen/common/xz/crc32.c
+@@ -15,8 +15,6 @@
+  * but they are bigger and use more memory for the lookup table.
+  */
+ 
+-#include "private.h"
+-
+ XZ_EXTERN uint32_t INITDATA xz_crc32_table[256];
+ 
+ XZ_EXTERN void INIT xz_crc32_init(void)
+diff --git a/xen/common/xz/private.h b/xen/common/xz/private.h
+index 7ea24892297f..511343fcc234 100644
+--- a/xen/common/xz/private.h
++++ b/xen/common/xz/private.h
+@@ -13,34 +13,23 @@
+ #ifdef __XEN__
+ #include <xen/kernel.h>
+ #include <asm/byteorder.h>
+-#endif
+-
+-#define get_le32(p) le32_to_cpup((const uint32_t *)(p))
++#include <asm/unaligned.h>
++#else
+ 
+-#if 1 /* ndef CONFIG_??? */
+-static inline u32 INIT get_unaligned_le32(void *p)
++static inline u32 get_unaligned_le32(const void *p)
+ {
+ 	return le32_to_cpup(p);
+ }
+ 
+-static inline void INIT put_unaligned_le32(u32 val, void *p)
++static inline void put_unaligned_le32(u32 val, void *p)
+ {
+ 	*(__force __le32*)p = cpu_to_le32(val);
+ }
+-#else
+-#include <asm/unaligned.h>
+-
+-static inline u32 INIT get_unaligned_le32(void *p)
+-{
+-	return le32_to_cpu(__get_unaligned(p, 4));
+-}
+ 
+-static inline void INIT put_unaligned_le32(u32 val, void *p)
+-{
+-	__put_unaligned(cpu_to_le32(val), p, 4);
+-}
+ #endif
+ 
++#define get_le32(p) le32_to_cpup((const uint32_t *)(p))
++
+ #define false 0
+ #define true 1
+ 
+diff --git a/xen/include/asm-x86/unaligned.h b/xen/include/asm-x86/unaligned.h
+new file mode 100644
+index 000000000000..6070801d4afd
+--- /dev/null
++++ b/xen/include/asm-x86/unaligned.h
+@@ -0,0 +1,6 @@
++#ifndef __ASM_UNALIGNED_H__
++#define __ASM_UNALIGNED_H__
++
++#include <xen/unaligned.h>
++
++#endif /* __ASM_UNALIGNED_H__ */
+diff --git a/xen/include/xen/unaligned.h b/xen/include/xen/unaligned.h
+new file mode 100644
+index 000000000000..eef7ec73b658
+--- /dev/null
++++ b/xen/include/xen/unaligned.h
+@@ -0,0 +1,79 @@
++/*
++ * This header can be used by architectures where unaligned accesses work
++ * without faulting, and at least reasonably efficiently.  Other architectures
++ * will need to have a custom asm/unaligned.h.
++ */
++#ifndef __ASM_UNALIGNED_H__
++#error "xen/unaligned.h should not be included directly - include asm/unaligned.h instead"
++#endif
++
++#ifndef __XEN_UNALIGNED_H__
++#define __XEN_UNALIGNED_H__
++
++#include <xen/types.h>
++#include <asm/byteorder.h>
++
++#define get_unaligned(p) (*(p))
++#define put_unaligned(val, p) (*(p) = (val))
++
++static inline uint16_t get_unaligned_be16(const void *p)
++{
++	return be16_to_cpup(p);
++}
++
++static inline void put_unaligned_be16(uint16_t val, void *p)
++{
++	*(__force __be16*)p = cpu_to_be16(val);
++}
++
++static inline uint32_t get_unaligned_be32(const void *p)
++{
++	return be32_to_cpup(p);
++}
++
++static inline void put_unaligned_be32(uint32_t val, void *p)
++{
++	*(__force __be32*)p = cpu_to_be32(val);
++}
++
++static inline uint64_t get_unaligned_be64(const void *p)
++{
++	return be64_to_cpup(p);
++}
++
++static inline void put_unaligned_be64(uint64_t val, void *p)
++{
++	*(__force __be64*)p = cpu_to_be64(val);
++}
++
++static inline uint16_t get_unaligned_le16(const void *p)
++{
++	return le16_to_cpup(p);
++}
++
++static inline void put_unaligned_le16(uint16_t val, void *p)
++{
++	*(__force __le16*)p = cpu_to_le16(val);
++}
++
++static inline uint32_t get_unaligned_le32(const void *p)
++{
++	return le32_to_cpup(p);
++}
++
++static inline void put_unaligned_le32(uint32_t val, void *p)
++{
++	*(__force __le32*)p = cpu_to_le32(val);
++}
++
++static inline uint64_t get_unaligned_le64(const void *p)
++{
++	return le64_to_cpup(p);
++}
++
++static inline void put_unaligned_le64(uint64_t val, void *p)
++{
++	*(__force __le64*)p = cpu_to_le64(val);
++}
++
++#endif /* __XEN_UNALIGNED_H__ */
+-- 
+2.34.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0002-lib-introduce-xxhash.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0002-lib-introduce-xxhash.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0002-lib-introduce-xxhash.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0002-lib-introduce-xxhash.patch	2022-07-13 14:06:12.000000000 +0100
@@ -0,0 +1,888 @@
+From 7253046d49a835c7fc13de1bd3529ff66dd2e1df Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 18 Jan 2021 12:10:34 +0100
+Subject: [PATCH 2/5] lib: introduce xxhash
+
+Taken from Linux at commit d89775fc929c ("lib/: replace HTTP links with
+HTTPS ones"), but split into separate 32-bit and 64-bit sources, since
+the immediate consumer (zstd) will need only the latter.
+
+Note that the building of this code is restricted to x86 for now because
+of the need to sort asm/unaligned.h for Arm.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1956166
+Origin: backport, http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=35d2960ae65f28106fdc5c2130f5f08fadca0e4c
+[backport: additional changes: Makefile and Rules.mk,
+ based on much larger/unneeded commits, respectively:
+ commit f301f9a9e84f ("lib: collect library files in an archive")
+ commit fea2fab96356 ("libx86: introduce a libx86 shared library")
+ - xen/lib/Makefile: add objects xen/lib/xxhash{32,64}.o
+ - xen/Rules.mk: add dir xen/lib/]
+---
+ xen/Rules.mk             |   1 +
+ xen/include/xen/xxhash.h | 259 ++++++++++++++++++++++++++++++++++
+ xen/lib/Makefile         |   2 +
+ xen/lib/xxhash32.c       | 259 ++++++++++++++++++++++++++++++++++
+ xen/lib/xxhash64.c       | 294 +++++++++++++++++++++++++++++++++++++++
+ 5 files changed, 815 insertions(+)
+ create mode 100644 xen/include/xen/xxhash.h
+ create mode 100644 xen/lib/Makefile
+ create mode 100644 xen/lib/xxhash32.c
+ create mode 100644 xen/lib/xxhash64.c
+
+diff --git a/xen/Rules.mk b/xen/Rules.mk
+index 5337e206ee17..47c954425d69 100644
+--- a/xen/Rules.mk
++++ b/xen/Rules.mk
+@@ -36,6 +36,7 @@ TARGET := $(BASEDIR)/xen
+ # Note that link order matters!
+ ALL_OBJS-y               += $(BASEDIR)/common/built_in.o
+ ALL_OBJS-y               += $(BASEDIR)/drivers/built_in.o
++ALL_OBJS-$(CONFIG_X86)   += $(BASEDIR)/lib/built_in.o
+ ALL_OBJS-y               += $(BASEDIR)/xsm/built_in.o
+ ALL_OBJS-y               += $(BASEDIR)/arch/$(TARGET_ARCH)/built_in.o
+ ALL_OBJS-$(CONFIG_CRYPTO)   += $(BASEDIR)/crypto/built_in.o
+diff --git a/xen/include/xen/xxhash.h b/xen/include/xen/xxhash.h
+new file mode 100644
+index 000000000000..6f2237cbcf8e
+--- /dev/null
++++ b/xen/include/xen/xxhash.h
+@@ -0,0 +1,259 @@
++/*
++ * xxHash - Extremely Fast Hash algorithm
++ * Copyright (C) 2012-2016, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ *     notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ *     copyright notice, this list of conditions and the following disclaimer
++ *     in the documentation and/or other materials provided with the
++ *     distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at:
++ * - xxHash homepage: https://cyan4973.github.io/xxHash/
++ * - xxHash source repository: https://github.com/Cyan4973/xxHash
++ */
++
++/*
++ * Notice extracted from xxHash homepage:
++ *
++ * xxHash is an extremely fast Hash algorithm, running at RAM speed limits.
++ * It also successfully passes all tests from the SMHasher suite.
++ *
++ * Comparison (single thread, Windows Seven 32 bits, using SMHasher on a Core 2
++ * Duo @3GHz)
++ *
++ * Name            Speed       Q.Score   Author
++ * xxHash          5.4 GB/s     10
++ * CrapWow         3.2 GB/s      2       Andrew
++ * MumurHash 3a    2.7 GB/s     10       Austin Appleby
++ * SpookyHash      2.0 GB/s     10       Bob Jenkins
++ * SBox            1.4 GB/s      9       Bret Mulvey
++ * Lookup3         1.2 GB/s      9       Bob Jenkins
++ * SuperFastHash   1.2 GB/s      1       Paul Hsieh
++ * CityHash64      1.05 GB/s    10       Pike & Alakuijala
++ * FNV             0.55 GB/s     5       Fowler, Noll, Vo
++ * CRC32           0.43 GB/s     9
++ * MD5-32          0.33 GB/s    10       Ronald L. Rivest
++ * SHA1-32         0.28 GB/s    10
++ *
++ * Q.Score is a measure of quality of the hash function.
++ * It depends on successfully passing SMHasher test set.
++ * 10 is a perfect score.
++ *
++ * A 64-bits version, named xxh64 offers much better speed,
++ * but for 64-bits applications only.
++ * Name     Speed on 64 bits    Speed on 32 bits
++ * xxh64       13.8 GB/s            1.9 GB/s
++ * xxh32        6.8 GB/s            6.0 GB/s
++ */
++
++#ifndef __XENXXHASH_H__
++#define __XENXXHASH_H__
++
++#include <xen/types.h>
++
++/*-****************************
++ * Simple Hash Functions
++ *****************************/
++
++/**
++ * xxh32() - calculate the 32-bit hash of the input with a given seed.
++ *
++ * @input:  The data to hash.
++ * @length: The length of the data to hash.
++ * @seed:   The seed can be used to alter the result predictably.
++ *
++ * Speed on Core 2 Duo @ 3 GHz (single thread, SMHasher benchmark) : 5.4 GB/s
++ *
++ * Return:  The 32-bit hash of the data.
++ */
++uint32_t xxh32(const void *input, size_t length, uint32_t seed);
++
++/**
++ * xxh64() - calculate the 64-bit hash of the input with a given seed.
++ *
++ * @input:  The data to hash.
++ * @length: The length of the data to hash.
++ * @seed:   The seed can be used to alter the result predictably.
++ *
++ * This function runs 2x faster on 64-bit systems, but slower on 32-bit systems.
++ *
++ * Return:  The 64-bit hash of the data.
++ */
++uint64_t xxh64(const void *input, size_t length, uint64_t seed);
++
++/**
++ * xxhash() - calculate wordsize hash of the input with a given seed
++ * @input:  The data to hash.
++ * @length: The length of the data to hash.
++ * @seed:   The seed can be used to alter the result predictably.
++ *
++ * If the hash does not need to be comparable between machines with
++ * different word sizes, this function will call whichever of xxh32()
++ * or xxh64() is faster.
++ *
++ * Return:  wordsize hash of the data.
++ */
++
++static inline unsigned long xxhash(const void *input, size_t length,
++				   uint64_t seed)
++{
++#if BITS_PER_LONG == 64
++       return xxh64(input, length, seed);
++#else
++       return xxh32(input, length, seed);
++#endif
++}
++
++/*-****************************
++ * Streaming Hash Functions
++ *****************************/
++
++/*
++ * These definitions are only meant to allow allocation of XXH state
++ * statically, on stack, or in a struct for example.
++ * Do not use members directly.
++ */
++
++/**
++ * struct xxh32_state - private xxh32 state, do not use members directly
++ */
++struct xxh32_state {
++	uint32_t total_len_32;
++	uint32_t large_len;
++	uint32_t v1;
++	uint32_t v2;
++	uint32_t v3;
++	uint32_t v4;
++	uint32_t mem32[4];
++	uint32_t memsize;
++};
++
++/**
++ * struct xxh32_state - private xxh64 state, do not use members directly
++ */
++struct xxh64_state {
++	uint64_t total_len;
++	uint64_t v1;
++	uint64_t v2;
++	uint64_t v3;
++	uint64_t v4;
++	uint64_t mem64[4];
++	uint32_t memsize;
++};
++
++/**
++ * xxh32_reset() - reset the xxh32 state to start a new hashing operation
++ *
++ * @state: The xxh32 state to reset.
++ * @seed:  Initialize the hash state with this seed.
++ *
++ * Call this function on any xxh32_state to prepare for a new hashing operation.
++ */
++void xxh32_reset(struct xxh32_state *state, uint32_t seed);
++
++/**
++ * xxh32_update() - hash the data given and update the xxh32 state
++ *
++ * @state:  The xxh32 state to update.
++ * @input:  The data to hash.
++ * @length: The length of the data to hash.
++ *
++ * After calling xxh32_reset() call xxh32_update() as many times as necessary.
++ *
++ * Return:  Zero on success, otherwise an error code.
++ */
++int xxh32_update(struct xxh32_state *state, const void *input, size_t length);
++
++/**
++ * xxh32_digest() - produce the current xxh32 hash
++ *
++ * @state: Produce the current xxh32 hash of this state.
++ *
++ * A hash value can be produced at any time. It is still possible to continue
++ * inserting input into the hash state after a call to xxh32_digest(), and
++ * generate new hashes later on, by calling xxh32_digest() again.
++ *
++ * Return: The xxh32 hash stored in the state.
++ */
++uint32_t xxh32_digest(const struct xxh32_state *state);
++
++/**
++ * xxh64_reset() - reset the xxh64 state to start a new hashing operation
++ *
++ * @state: The xxh64 state to reset.
++ * @seed:  Initialize the hash state with this seed.
++ */
++void xxh64_reset(struct xxh64_state *state, uint64_t seed);
++
++/**
++ * xxh64_update() - hash the data given and update the xxh64 state
++ * @state:  The xxh64 state to update.
++ * @input:  The data to hash.
++ * @length: The length of the data to hash.
++ *
++ * After calling xxh64_reset() call xxh64_update() as many times as necessary.
++ *
++ * Return:  Zero on success, otherwise an error code.
++ */
++int xxh64_update(struct xxh64_state *state, const void *input, size_t length);
++
++/**
++ * xxh64_digest() - produce the current xxh64 hash
++ *
++ * @state: Produce the current xxh64 hash of this state.
++ *
++ * A hash value can be produced at any time. It is still possible to continue
++ * inserting input into the hash state after a call to xxh64_digest(), and
++ * generate new hashes later on, by calling xxh64_digest() again.
++ *
++ * Return: The xxh64 hash stored in the state.
++ */
++uint64_t xxh64_digest(const struct xxh64_state *state);
++
++/*-**************************
++ * Utils
++ ***************************/
++
++/**
++ * xxh32_copy_state() - copy the source state into the destination state
++ *
++ * @src: The source xxh32 state.
++ * @dst: The destination xxh32 state.
++ */
++void xxh32_copy_state(struct xxh32_state *dst, const struct xxh32_state *src);
++
++/**
++ * xxh64_copy_state() - copy the source state into the destination state
++ *
++ * @src: The source xxh64 state.
++ * @dst: The destination xxh64 state.
++ */
++void xxh64_copy_state(struct xxh64_state *dst, const struct xxh64_state *src);
++
++#endif /* __XENXXHASH_H__ */
+diff --git a/xen/lib/Makefile b/xen/lib/Makefile
+new file mode 100644
+index 000000000000..922e09439a80
+--- /dev/null
++++ b/xen/lib/Makefile
+@@ -0,0 +1,2 @@
++obj-$(CONFIG_X86) += xxhash32.o
++obj-$(CONFIG_X86) += xxhash64.o
+diff --git a/xen/lib/xxhash32.c b/xen/lib/xxhash32.c
+new file mode 100644
+index 000000000000..e8d403e5ced6
+--- /dev/null
++++ b/xen/lib/xxhash32.c
+@@ -0,0 +1,259 @@
++/*
++ * xxHash - Extremely Fast Hash algorithm
++ * Copyright (C) 2012-2016, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ *     notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ *     copyright notice, this list of conditions and the following disclaimer
++ *     in the documentation and/or other materials provided with the
++ *     distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at:
++ * - xxHash homepage: https://cyan4973.github.io/xxHash/
++ * - xxHash source repository: https://github.com/Cyan4973/xxHash
++ */
++
++#include <xen/compiler.h>
++#include <xen/errno.h>
++#include <xen/string.h>
++#include <xen/xxhash.h>
++#include <asm/unaligned.h>
++
++/*-*************************************
++ * Macros
++ **************************************/
++#define xxh_rotl32(x, r) ((x << r) | (x >> (32 - r)))
++
++#ifdef __LITTLE_ENDIAN
++# define XXH_CPU_LITTLE_ENDIAN 1
++#else
++# define XXH_CPU_LITTLE_ENDIAN 0
++#endif
++
++/*-*************************************
++ * Constants
++ **************************************/
++static const uint32_t PRIME32_1 = 2654435761U;
++static const uint32_t PRIME32_2 = 2246822519U;
++static const uint32_t PRIME32_3 = 3266489917U;
++static const uint32_t PRIME32_4 =  668265263U;
++static const uint32_t PRIME32_5 =  374761393U;
++
++/*-**************************
++ *  Utils
++ ***************************/
++void xxh32_copy_state(struct xxh32_state *dst, const struct xxh32_state *src)
++{
++	memcpy(dst, src, sizeof(*dst));
++}
++
++/*-***************************
++ * Simple Hash Functions
++ ****************************/
++static uint32_t xxh32_round(uint32_t seed, const uint32_t input)
++{
++	seed += input * PRIME32_2;
++	seed = xxh_rotl32(seed, 13);
++	seed *= PRIME32_1;
++	return seed;
++}
++
++uint32_t xxh32(const void *input, const size_t len, const uint32_t seed)
++{
++	const uint8_t *p = (const uint8_t *)input;
++	const uint8_t *b_end = p + len;
++	uint32_t h32;
++
++	if (len >= 16) {
++		const uint8_t *const limit = b_end - 16;
++		uint32_t v1 = seed + PRIME32_1 + PRIME32_2;
++		uint32_t v2 = seed + PRIME32_2;
++		uint32_t v3 = seed + 0;
++		uint32_t v4 = seed - PRIME32_1;
++
++		do {
++			v1 = xxh32_round(v1, get_unaligned_le32(p));
++			p += 4;
++			v2 = xxh32_round(v2, get_unaligned_le32(p));
++			p += 4;
++			v3 = xxh32_round(v3, get_unaligned_le32(p));
++			p += 4;
++			v4 = xxh32_round(v4, get_unaligned_le32(p));
++			p += 4;
++		} while (p <= limit);
++
++		h32 = xxh_rotl32(v1, 1) + xxh_rotl32(v2, 7) +
++			xxh_rotl32(v3, 12) + xxh_rotl32(v4, 18);
++	} else {
++		h32 = seed + PRIME32_5;
++	}
++
++	h32 += (uint32_t)len;
++
++	while (p + 4 <= b_end) {
++		h32 += get_unaligned_le32(p) * PRIME32_3;
++		h32 = xxh_rotl32(h32, 17) * PRIME32_4;
++		p += 4;
++	}
++
++	while (p < b_end) {
++		h32 += (*p) * PRIME32_5;
++		h32 = xxh_rotl32(h32, 11) * PRIME32_1;
++		p++;
++	}
++
++	h32 ^= h32 >> 15;
++	h32 *= PRIME32_2;
++	h32 ^= h32 >> 13;
++	h32 *= PRIME32_3;
++	h32 ^= h32 >> 16;
++
++	return h32;
++}
++
++/*-**************************************************
++ * Advanced Hash Functions
++ ***************************************************/
++void xxh32_reset(struct xxh32_state *statePtr, const uint32_t seed)
++{
++	/* use a local state for memcpy() to avoid strict-aliasing warnings */
++	struct xxh32_state state;
++
++	memset(&state, 0, sizeof(state));
++	state.v1 = seed + PRIME32_1 + PRIME32_2;
++	state.v2 = seed + PRIME32_2;
++	state.v3 = seed + 0;
++	state.v4 = seed - PRIME32_1;
++	memcpy(statePtr, &state, sizeof(state));
++}
++
++int xxh32_update(struct xxh32_state *state, const void *input, const size_t len)
++{
++	const uint8_t *p = (const uint8_t *)input;
++	const uint8_t *const b_end = p + len;
++
++	if (input == NULL)
++		return -EINVAL;
++
++	state->total_len_32 += (uint32_t)len;
++	state->large_len |= (len >= 16) | (state->total_len_32 >= 16);
++
++	if (state->memsize + len < 16) { /* fill in tmp buffer */
++		memcpy((uint8_t *)(state->mem32) + state->memsize, input, len);
++		state->memsize += (uint32_t)len;
++		return 0;
++	}
++
++	if (state->memsize) { /* some data left from previous update */
++		const uint32_t *p32 = state->mem32;
++
++		memcpy((uint8_t *)(state->mem32) + state->memsize, input,
++			16 - state->memsize);
++
++		state->v1 = xxh32_round(state->v1, get_unaligned_le32(p32));
++		p32++;
++		state->v2 = xxh32_round(state->v2, get_unaligned_le32(p32));
++		p32++;
++		state->v3 = xxh32_round(state->v3, get_unaligned_le32(p32));
++		p32++;
++		state->v4 = xxh32_round(state->v4, get_unaligned_le32(p32));
++		p32++;
++
++		p += 16-state->memsize;
++		state->memsize = 0;
++	}
++
++	if (p <= b_end - 16) {
++		const uint8_t *const limit = b_end - 16;
++		uint32_t v1 = state->v1;
++		uint32_t v2 = state->v2;
++		uint32_t v3 = state->v3;
++		uint32_t v4 = state->v4;
++
++		do {
++			v1 = xxh32_round(v1, get_unaligned_le32(p));
++			p += 4;
++			v2 = xxh32_round(v2, get_unaligned_le32(p));
++			p += 4;
++			v3 = xxh32_round(v3, get_unaligned_le32(p));
++			p += 4;
++			v4 = xxh32_round(v4, get_unaligned_le32(p));
++			p += 4;
++		} while (p <= limit);
++
++		state->v1 = v1;
++		state->v2 = v2;
++		state->v3 = v3;
++		state->v4 = v4;
++	}
++
++	if (p < b_end) {
++		memcpy(state->mem32, p, (size_t)(b_end-p));
++		state->memsize = (uint32_t)(b_end-p);
++	}
++
++	return 0;
++}
++
++uint32_t xxh32_digest(const struct xxh32_state *state)
++{
++	const uint8_t *p = (const uint8_t *)state->mem32;
++	const uint8_t *const b_end = (const uint8_t *)(state->mem32) +
++		state->memsize;
++	uint32_t h32;
++
++	if (state->large_len) {
++		h32 = xxh_rotl32(state->v1, 1) + xxh_rotl32(state->v2, 7) +
++			xxh_rotl32(state->v3, 12) + xxh_rotl32(state->v4, 18);
++	} else {
++		h32 = state->v3 /* == seed */ + PRIME32_5;
++	}
++
++	h32 += state->total_len_32;
++
++	while (p + 4 <= b_end) {
++		h32 += get_unaligned_le32(p) * PRIME32_3;
++		h32 = xxh_rotl32(h32, 17) * PRIME32_4;
++		p += 4;
++	}
++
++	while (p < b_end) {
++		h32 += (*p) * PRIME32_5;
++		h32 = xxh_rotl32(h32, 11) * PRIME32_1;
++		p++;
++	}
++
++	h32 ^= h32 >> 15;
++	h32 *= PRIME32_2;
++	h32 ^= h32 >> 13;
++	h32 *= PRIME32_3;
++	h32 ^= h32 >> 16;
++
++	return h32;
++}
++
+diff --git a/xen/lib/xxhash64.c b/xen/lib/xxhash64.c
+new file mode 100644
+index 000000000000..ba6bcf152d6f
+--- /dev/null
++++ b/xen/lib/xxhash64.c
+@@ -0,0 +1,294 @@
++/*
++ * xxHash - Extremely Fast Hash algorithm
++ * Copyright (C) 2012-2016, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ *     notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ *     copyright notice, this list of conditions and the following disclaimer
++ *     in the documentation and/or other materials provided with the
++ *     distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at:
++ * - xxHash homepage: https://cyan4973.github.io/xxHash/
++ * - xxHash source repository: https://github.com/Cyan4973/xxHash
++ */
++
++#include <xen/compiler.h>
++#include <xen/errno.h>
++#include <xen/string.h>
++#include <xen/xxhash.h>
++#include <asm/unaligned.h>
++
++/*-*************************************
++ * Macros
++ **************************************/
++#define xxh_rotl64(x, r) ((x << r) | (x >> (64 - r)))
++
++#ifdef __LITTLE_ENDIAN
++# define XXH_CPU_LITTLE_ENDIAN 1
++#else
++# define XXH_CPU_LITTLE_ENDIAN 0
++#endif
++
++/*-*************************************
++ * Constants
++ **************************************/
++static const uint64_t PRIME64_1 = 11400714785074694791ULL;
++static const uint64_t PRIME64_2 = 14029467366897019727ULL;
++static const uint64_t PRIME64_3 =  1609587929392839161ULL;
++static const uint64_t PRIME64_4 =  9650029242287828579ULL;
++static const uint64_t PRIME64_5 =  2870177450012600261ULL;
++
++/*-**************************
++ *  Utils
++ ***************************/
++void xxh64_copy_state(struct xxh64_state *dst, const struct xxh64_state *src)
++{
++	memcpy(dst, src, sizeof(*dst));
++}
++
++/*-***************************
++ * Simple Hash Functions
++ ****************************/
++static uint64_t xxh64_round(uint64_t acc, const uint64_t input)
++{
++	acc += input * PRIME64_2;
++	acc = xxh_rotl64(acc, 31);
++	acc *= PRIME64_1;
++	return acc;
++}
++
++static uint64_t xxh64_merge_round(uint64_t acc, uint64_t val)
++{
++	val = xxh64_round(0, val);
++	acc ^= val;
++	acc = acc * PRIME64_1 + PRIME64_4;
++	return acc;
++}
++
++uint64_t xxh64(const void *input, const size_t len, const uint64_t seed)
++{
++	const uint8_t *p = (const uint8_t *)input;
++	const uint8_t *const b_end = p + len;
++	uint64_t h64;
++
++	if (len >= 32) {
++		const uint8_t *const limit = b_end - 32;
++		uint64_t v1 = seed + PRIME64_1 + PRIME64_2;
++		uint64_t v2 = seed + PRIME64_2;
++		uint64_t v3 = seed + 0;
++		uint64_t v4 = seed - PRIME64_1;
++
++		do {
++			v1 = xxh64_round(v1, get_unaligned_le64(p));
++			p += 8;
++			v2 = xxh64_round(v2, get_unaligned_le64(p));
++			p += 8;
++			v3 = xxh64_round(v3, get_unaligned_le64(p));
++			p += 8;
++			v4 = xxh64_round(v4, get_unaligned_le64(p));
++			p += 8;
++		} while (p <= limit);
++
++		h64 = xxh_rotl64(v1, 1) + xxh_rotl64(v2, 7) +
++			xxh_rotl64(v3, 12) + xxh_rotl64(v4, 18);
++		h64 = xxh64_merge_round(h64, v1);
++		h64 = xxh64_merge_round(h64, v2);
++		h64 = xxh64_merge_round(h64, v3);
++		h64 = xxh64_merge_round(h64, v4);
++
++	} else {
++		h64  = seed + PRIME64_5;
++	}
++
++	h64 += (uint64_t)len;
++
++	while (p + 8 <= b_end) {
++		const uint64_t k1 = xxh64_round(0, get_unaligned_le64(p));
++
++		h64 ^= k1;
++		h64 = xxh_rotl64(h64, 27) * PRIME64_1 + PRIME64_4;
++		p += 8;
++	}
++
++	if (p + 4 <= b_end) {
++		h64 ^= (uint64_t)(get_unaligned_le32(p)) * PRIME64_1;
++		h64 = xxh_rotl64(h64, 23) * PRIME64_2 + PRIME64_3;
++		p += 4;
++	}
++
++	while (p < b_end) {
++		h64 ^= (*p) * PRIME64_5;
++		h64 = xxh_rotl64(h64, 11) * PRIME64_1;
++		p++;
++	}
++
++	h64 ^= h64 >> 33;
++	h64 *= PRIME64_2;
++	h64 ^= h64 >> 29;
++	h64 *= PRIME64_3;
++	h64 ^= h64 >> 32;
++
++	return h64;
++}
++
++/*-**************************************************
++ * Advanced Hash Functions
++ ***************************************************/
++void xxh64_reset(struct xxh64_state *statePtr, const uint64_t seed)
++{
++	/* use a local state for memcpy() to avoid strict-aliasing warnings */
++	struct xxh64_state state;
++
++	memset(&state, 0, sizeof(state));
++	state.v1 = seed + PRIME64_1 + PRIME64_2;
++	state.v2 = seed + PRIME64_2;
++	state.v3 = seed + 0;
++	state.v4 = seed - PRIME64_1;
++	memcpy(statePtr, &state, sizeof(state));
++}
++
++int xxh64_update(struct xxh64_state *state, const void *input, const size_t len)
++{
++	const uint8_t *p = (const uint8_t *)input;
++	const uint8_t *const b_end = p + len;
++
++	if (input == NULL)
++		return -EINVAL;
++
++	state->total_len += len;
++
++	if (state->memsize + len < 32) { /* fill in tmp buffer */
++		memcpy(((uint8_t *)state->mem64) + state->memsize, input, len);
++		state->memsize += (uint32_t)len;
++		return 0;
++	}
++
++	if (state->memsize) { /* tmp buffer is full */
++		uint64_t *p64 = state->mem64;
++
++		memcpy(((uint8_t *)p64) + state->memsize, input,
++			32 - state->memsize);
++
++		state->v1 = xxh64_round(state->v1, get_unaligned_le64(p64));
++		p64++;
++		state->v2 = xxh64_round(state->v2, get_unaligned_le64(p64));
++		p64++;
++		state->v3 = xxh64_round(state->v3, get_unaligned_le64(p64));
++		p64++;
++		state->v4 = xxh64_round(state->v4, get_unaligned_le64(p64));
++
++		p += 32 - state->memsize;
++		state->memsize = 0;
++	}
++
++	if (p + 32 <= b_end) {
++		const uint8_t *const limit = b_end - 32;
++		uint64_t v1 = state->v1;
++		uint64_t v2 = state->v2;
++		uint64_t v3 = state->v3;
++		uint64_t v4 = state->v4;
++
++		do {
++			v1 = xxh64_round(v1, get_unaligned_le64(p));
++			p += 8;
++			v2 = xxh64_round(v2, get_unaligned_le64(p));
++			p += 8;
++			v3 = xxh64_round(v3, get_unaligned_le64(p));
++			p += 8;
++			v4 = xxh64_round(v4, get_unaligned_le64(p));
++			p += 8;
++		} while (p <= limit);
++
++		state->v1 = v1;
++		state->v2 = v2;
++		state->v3 = v3;
++		state->v4 = v4;
++	}
++
++	if (p < b_end) {
++		memcpy(state->mem64, p, (size_t)(b_end-p));
++		state->memsize = (uint32_t)(b_end - p);
++	}
++
++	return 0;
++}
++
++uint64_t xxh64_digest(const struct xxh64_state *state)
++{
++	const uint8_t *p = (const uint8_t *)state->mem64;
++	const uint8_t *const b_end = (const uint8_t *)state->mem64 +
++		state->memsize;
++	uint64_t h64;
++
++	if (state->total_len >= 32) {
++		const uint64_t v1 = state->v1;
++		const uint64_t v2 = state->v2;
++		const uint64_t v3 = state->v3;
++		const uint64_t v4 = state->v4;
++
++		h64 = xxh_rotl64(v1, 1) + xxh_rotl64(v2, 7) +
++			xxh_rotl64(v3, 12) + xxh_rotl64(v4, 18);
++		h64 = xxh64_merge_round(h64, v1);
++		h64 = xxh64_merge_round(h64, v2);
++		h64 = xxh64_merge_round(h64, v3);
++		h64 = xxh64_merge_round(h64, v4);
++	} else {
++		h64  = state->v3 + PRIME64_5;
++	}
++
++	h64 += (uint64_t)state->total_len;
++
++	while (p + 8 <= b_end) {
++		const uint64_t k1 = xxh64_round(0, get_unaligned_le64(p));
++
++		h64 ^= k1;
++		h64 = xxh_rotl64(h64, 27) * PRIME64_1 + PRIME64_4;
++		p += 8;
++	}
++
++	if (p + 4 <= b_end) {
++		h64 ^= (uint64_t)(get_unaligned_le32(p)) * PRIME64_1;
++		h64 = xxh_rotl64(h64, 23) * PRIME64_2 + PRIME64_3;
++		p += 4;
++	}
++
++	while (p < b_end) {
++		h64 ^= (*p) * PRIME64_5;
++		h64 = xxh_rotl64(h64, 11) * PRIME64_1;
++		p++;
++	}
++
++	h64 ^= h64 >> 33;
++	h64 *= PRIME64_2;
++	h64 ^= h64 >> 29;
++	h64 *= PRIME64_3;
++	h64 ^= h64 >> 32;
++
++	return h64;
++}
+-- 
+2.34.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0003-x86-Dom0-support-zstd-compressed-kernels.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0003-x86-Dom0-support-zstd-compressed-kernels.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0003-x86-Dom0-support-zstd-compressed-kernels.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0003-x86-Dom0-support-zstd-compressed-kernels.patch	2022-07-13 14:06:12.000000000 +0100
@@ -0,0 +1,6404 @@
+From 95becb20279ede2cc0b87e0311f43911997a53e7 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 18 Jan 2021 12:12:23 +0100
+Subject: [PATCH 3/5] x86/Dom0: support zstd compressed kernels
+
+Taken from Linux at commit 1c4dd334df3a ("lib: decompress_unzstd: Limit
+output size") for unzstd.c (renamed from decompress_unzstd.c) and
+36f9ff9e03de ("lib: Fix fall-through warnings for Clang") for zstd/,
+with bits from linux/zstd.h merged into suitable other headers.
+
+To limit the editing necessary, introduce ptrdiff_t.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1956166
+Origin: backport, http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=d6627cf1b63ce57a6a7e2c1800dbc50eed742c32
+[backport: xen/common/Makefile: remove 'lzo' from list,
+ and refresh 1 context line.]
+---
+ xen/common/Makefile              |    2 +-
+ xen/common/decompress.c          |    3 +
+ xen/common/unzstd.c              |  308 ++++
+ xen/common/zstd/bitstream.h      |  380 +++++
+ xen/common/zstd/decompress.c     | 2496 ++++++++++++++++++++++++++++++
+ xen/common/zstd/entropy_common.c |  243 +++
+ xen/common/zstd/error_private.h  |  110 ++
+ xen/common/zstd/fse.h            |  575 +++++++
+ xen/common/zstd/fse_decompress.c |  324 ++++
+ xen/common/zstd/huf.h            |  212 +++
+ xen/common/zstd/huf_decompress.c |  960 ++++++++++++
+ xen/common/zstd/mem.h            |  151 ++
+ xen/common/zstd/zstd_common.c    |   74 +
+ xen/common/zstd/zstd_internal.h  |  372 +++++
+ xen/include/asm-arm/types.h      |    6 +
+ xen/include/asm-x86/types.h      |    6 +
+ xen/include/xen/decompress.h     |    2 +-
+ 17 files changed, 6222 insertions(+), 2 deletions(-)
+ create mode 100644 xen/common/unzstd.c
+ create mode 100644 xen/common/zstd/bitstream.h
+ create mode 100644 xen/common/zstd/decompress.c
+ create mode 100644 xen/common/zstd/entropy_common.c
+ create mode 100644 xen/common/zstd/error_private.h
+ create mode 100644 xen/common/zstd/fse.h
+ create mode 100644 xen/common/zstd/fse_decompress.c
+ create mode 100644 xen/common/zstd/huf.h
+ create mode 100644 xen/common/zstd/huf_decompress.c
+ create mode 100644 xen/common/zstd/mem.h
+ create mode 100644 xen/common/zstd/zstd_common.c
+ create mode 100644 xen/common/zstd/zstd_internal.h
+
+diff --git a/xen/common/Makefile b/xen/common/Makefile
+index 24d4752ccc55..c4dceff97842 100644
+--- a/xen/common/Makefile
++++ b/xen/common/Makefile
+@@ -66,7 +66,7 @@ obj-bin-y += warning.init.o
+ obj-$(CONFIG_XENOPROF) += xenoprof.o
+ obj-y += xmalloc_tlsf.o
+ 
+-obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
++obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 unzstd earlycpio,$(n).init.o)
+ 
+ 
+ obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o kernel.o memory.o multicall.o xlat.o)
+diff --git a/xen/common/decompress.c b/xen/common/decompress.c
+index 9d6e0c4ab075..79e60f4802d5 100644
+--- a/xen/common/decompress.c
++++ b/xen/common/decompress.c
+@@ -31,5 +31,8 @@ int __init decompress(void *inbuf, unsigned int len, void *outbuf)
+     if ( len >= 2 && !memcmp(inbuf, "\x02\x21", 2) )
+ 	return unlz4(inbuf, len, NULL, NULL, outbuf, NULL, error);
+ 
++    if ( len >= 4 && !memcmp(inbuf, "\x28\xb5\x2f\xfd", 4) )
++	return unzstd(inbuf, len, NULL, NULL, outbuf, NULL, error);
++
+     return 1;
+ }
+diff --git a/xen/common/unzstd.c b/xen/common/unzstd.c
+new file mode 100644
+index 000000000000..a10761642764
+--- /dev/null
++++ b/xen/common/unzstd.c
+@@ -0,0 +1,308 @@
++// SPDX-License-Identifier: GPL-2.0
++
++/*
++ * Important notes about in-place decompression
++ *
++ * At least on x86, the kernel is decompressed in place: the compressed data
++ * is placed to the end of the output buffer, and the decompressor overwrites
++ * most of the compressed data. There must be enough safety margin to
++ * guarantee that the write position is always behind the read position.
++ *
++ * The safety margin for ZSTD with a 128 KB block size is calculated below.
++ * Note that the margin with ZSTD is bigger than with GZIP or XZ!
++ *
++ * The worst case for in-place decompression is that the beginning of
++ * the file is compressed extremely well, and the rest of the file is
++ * uncompressible. Thus, we must look for worst-case expansion when the
++ * compressor is encoding uncompressible data.
++ *
++ * The structure of the .zst file in case of a compresed kernel is as follows.
++ * Maximum sizes (as bytes) of the fields are in parenthesis.
++ *
++ *    Frame Header: (18)
++ *    Blocks: (N)
++ *    Checksum: (4)
++ *
++ * The frame header and checksum overhead is at most 22 bytes.
++ *
++ * ZSTD stores the data in blocks. Each block has a header whose size is
++ * a 3 bytes. After the block header, there is up to 128 KB of payload.
++ * The maximum uncompressed size of the payload is 128 KB. The minimum
++ * uncompressed size of the payload is never less than the payload size
++ * (excluding the block header).
++ *
++ * The assumption, that the uncompressed size of the payload is never
++ * smaller than the payload itself, is valid only when talking about
++ * the payload as a whole. It is possible that the payload has parts where
++ * the decompressor consumes more input than it produces output. Calculating
++ * the worst case for this would be tricky. Instead of trying to do that,
++ * let's simply make sure that the decompressor never overwrites any bytes
++ * of the payload which it is currently reading.
++ *
++ * Now we have enough information to calculate the safety margin. We need
++ *   - 22 bytes for the .zst file format headers;
++ *   - 3 bytes per every 128 KiB of uncompressed size (one block header per
++ *     block); and
++ *   - 128 KiB (biggest possible zstd block size) to make sure that the
++ *     decompressor never overwrites anything from the block it is currently
++ *     reading.
++ *
++ * We get the following formula:
++ *
++ *    safety_margin = 22 + uncompressed_size * 3 / 131072 + 131072
++ *                 <= 22 + (uncompressed_size >> 15) + 131072
++ */
++
++#include "decompress.h"
++
++#include "zstd/entropy_common.c"
++#include "zstd/fse_decompress.c"
++#include "zstd/huf_decompress.c"
++#include "zstd/zstd_common.c"
++#include "zstd/decompress.c"
++
++/* 128MB is the maximum window size supported by zstd. */
++#define ZSTD_WINDOWSIZE_MAX	(1 << ZSTD_WINDOWLOG_MAX)
++/*
++ * Size of the input and output buffers in multi-call mode.
++ * Pick a larger size because it isn't used during kernel decompression,
++ * since that is single pass, and we have to allocate a large buffer for
++ * zstd's window anyway. The larger size speeds up initramfs decompression.
++ */
++#define ZSTD_IOBUF_SIZE		(1 << 17)
++
++static int INIT handle_zstd_error(size_t ret, void (*error)(const char *x))
++{
++	const int err = ZSTD_getErrorCode(ret);
++
++	if (!ZSTD_isError(ret))
++		return 0;
++
++	switch (err) {
++	case ZSTD_error_memory_allocation:
++		error("ZSTD decompressor ran out of memory");
++		break;
++	case ZSTD_error_prefix_unknown:
++		error("Input is not in the ZSTD format (wrong magic bytes)");
++		break;
++	case ZSTD_error_dstSize_tooSmall:
++	case ZSTD_error_corruption_detected:
++	case ZSTD_error_checksum_wrong:
++		error("ZSTD-compressed data is corrupt");
++		break;
++	default:
++		error("ZSTD-compressed data is probably corrupt");
++		break;
++	}
++	return -1;
++}
++
++/*
++ * Handle the case where we have the entire input and output in one segment.
++ * We can allocate less memory (no circular buffer for the sliding window),
++ * and avoid some memcpy() calls.
++ */
++static int INIT decompress_single(const u8 *in_buf, long in_len, u8 *out_buf,
++				  long out_len, unsigned int *in_pos,
++				  void (*error)(const char *x))
++{
++	const size_t wksp_size = ZSTD_DCtxWorkspaceBound();
++	void *wksp = large_malloc(wksp_size);
++	ZSTD_DCtx *dctx = ZSTD_initDCtx(wksp, wksp_size);
++	int err;
++	size_t ret;
++
++	if (dctx == NULL) {
++		error("Out of memory while allocating ZSTD_DCtx");
++		err = -1;
++		goto out;
++	}
++	/*
++	 * Find out how large the frame actually is, there may be junk at
++	 * the end of the frame that ZSTD_decompressDCtx() can't handle.
++	 */
++	ret = ZSTD_findFrameCompressedSize(in_buf, in_len);
++	err = handle_zstd_error(ret, error);
++	if (err)
++		goto out;
++	in_len = (long)ret;
++
++	ret = ZSTD_decompressDCtx(dctx, out_buf, out_len, in_buf, in_len);
++	err = handle_zstd_error(ret, error);
++	if (err)
++		goto out;
++
++	if (in_pos != NULL)
++		*in_pos = in_len;
++
++	err = 0;
++out:
++	if (wksp != NULL)
++		large_free(wksp);
++	return err;
++}
++
++STATIC int INIT unzstd(unsigned char *in_buf, unsigned int in_len,
++		       int (*fill)(void*, unsigned int),
++		       int (*flush)(void*, unsigned int),
++		       unsigned char *out_buf,
++		       unsigned int *in_pos,
++		       void (*error)(const char *x))
++{
++	ZSTD_inBuffer in;
++	ZSTD_outBuffer out;
++	ZSTD_frameParams params;
++	void *in_allocated = NULL;
++	void *out_allocated = NULL;
++	void *wksp = NULL;
++	size_t wksp_size;
++	ZSTD_DStream *dstream;
++	int err;
++	size_t ret;
++	/*
++	 * ZSTD decompression code won't be happy if the buffer size is so big
++	 * that its end address overflows. When the size is not provided, make
++	 * it as big as possible without having the end address overflow.
++	 */
++	unsigned long out_len = ULONG_MAX - (unsigned long)out_buf;
++
++	if (fill == NULL && flush == NULL)
++		/*
++		 * We can decompress faster and with less memory when we have a
++		 * single chunk.
++		 */
++		return decompress_single(in_buf, in_len, out_buf, out_len,
++					 in_pos, error);
++
++	/*
++	 * If in_buf is not provided, we must be using fill(), so allocate
++	 * a large enough buffer. If it is provided, it must be at least
++	 * ZSTD_IOBUF_SIZE large.
++	 */
++	if (in_buf == NULL) {
++		in_allocated = large_malloc(ZSTD_IOBUF_SIZE);
++		if (in_allocated == NULL) {
++			error("Out of memory while allocating input buffer");
++			err = -1;
++			goto out;
++		}
++		in_buf = in_allocated;
++		in_len = 0;
++	}
++	/* Read the first chunk, since we need to decode the frame header. */
++	if (fill != NULL)
++		in_len = fill(in_buf, ZSTD_IOBUF_SIZE);
++	if ((int)in_len < 0) {
++		error("ZSTD-compressed data is truncated");
++		err = -1;
++		goto out;
++	}
++	/* Set the first non-empty input buffer. */
++	in.src = in_buf;
++	in.pos = 0;
++	in.size = in_len;
++	/* Allocate the output buffer if we are using flush(). */
++	if (flush != NULL) {
++		out_allocated = large_malloc(ZSTD_IOBUF_SIZE);
++		if (out_allocated == NULL) {
++			error("Out of memory while allocating output buffer");
++			err = -1;
++			goto out;
++		}
++		out_buf = out_allocated;
++		out_len = ZSTD_IOBUF_SIZE;
++	}
++	/* Set the output buffer. */
++	out.dst = out_buf;
++	out.pos = 0;
++	out.size = out_len;
++
++	/*
++	 * We need to know the window size to allocate the ZSTD_DStream.
++	 * Since we are streaming, we need to allocate a buffer for the sliding
++	 * window. The window size varies from 1 KB to ZSTD_WINDOWSIZE_MAX
++	 * (8 MB), so it is important to use the actual value so as not to
++	 * waste memory when it is smaller.
++	 */
++	ret = ZSTD_getFrameParams(&params, in.src, in.size);
++	err = handle_zstd_error(ret, error);
++	if (err)
++		goto out;
++	if (ret != 0) {
++		error("ZSTD-compressed data has an incomplete frame header");
++		err = -1;
++		goto out;
++	}
++	if (params.windowSize > ZSTD_WINDOWSIZE_MAX) {
++		error("ZSTD-compressed data has too large a window size");
++		err = -1;
++		goto out;
++	}
++
++	/*
++	 * Allocate the ZSTD_DStream now that we know how much memory is
++	 * required.
++	 */
++	wksp_size = ZSTD_DStreamWorkspaceBound(params.windowSize);
++	wksp = large_malloc(wksp_size);
++	dstream = ZSTD_initDStream(params.windowSize, wksp, wksp_size);
++	if (dstream == NULL) {
++		error("Out of memory while allocating ZSTD_DStream");
++		err = -1;
++		goto out;
++	}
++
++	/*
++	 * Decompression loop:
++	 * Read more data if necessary (error if no more data can be read).
++	 * Call the decompression function, which returns 0 when finished.
++	 * Flush any data produced if using flush().
++	 */
++	if (in_pos != NULL)
++		*in_pos = 0;
++	do {
++		/*
++		 * If we need to reload data, either we have fill() and can
++		 * try to get more data, or we don't and the input is truncated.
++		 */
++		if (in.pos == in.size) {
++			if (in_pos != NULL)
++				*in_pos += in.pos;
++			in_len = fill ? fill(in_buf, ZSTD_IOBUF_SIZE) : -1;
++			if ((int)in_len < 0) {
++				error("ZSTD-compressed data is truncated");
++				err = -1;
++				goto out;
++			}
++			in.pos = 0;
++			in.size = in_len;
++		}
++		/* Returns zero when the frame is complete. */
++		ret = ZSTD_decompressStream(dstream, &out, &in);
++		err = handle_zstd_error(ret, error);
++		if (err)
++			goto out;
++		/* Flush all of the data produced if using flush(). */
++		if (flush != NULL && out.pos > 0) {
++			if (out.pos != flush(out.dst, out.pos)) {
++				error("Failed to flush()");
++				err = -1;
++				goto out;
++			}
++			out.pos = 0;
++		}
++	} while (ret != 0);
++
++	if (in_pos != NULL)
++		*in_pos += in.pos;
++
++	err = 0;
++out:
++	if (in_allocated != NULL)
++		large_free(in_allocated);
++	if (out_allocated != NULL)
++		large_free(out_allocated);
++	if (wksp != NULL)
++		large_free(wksp);
++	return err;
++}
+diff --git a/xen/common/zstd/bitstream.h b/xen/common/zstd/bitstream.h
+new file mode 100644
+index 000000000000..2b06d4551f03
+--- /dev/null
++++ b/xen/common/zstd/bitstream.h
+@@ -0,0 +1,380 @@
++/*
++ * bitstream
++ * Part of FSE library
++ * header file (to include)
++ * Copyright (C) 2013-2016, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ * notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ * copyright notice, this list of conditions and the following disclaimer
++ * in the documentation and/or other materials provided with the
++ * distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at :
++ * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
++ */
++#ifndef BITSTREAM_H_MODULE
++#define BITSTREAM_H_MODULE
++
++/*
++*  This API consists of small unitary functions, which must be inlined for best performance.
++*  Since link-time-optimization is not available for all compilers,
++*  these functions are defined into a .h to be included.
++*/
++
++/*-****************************************
++*  Dependencies
++******************************************/
++#include "error_private.h" /* error codes and messages */
++#include "mem.h"	   /* unaligned access routines */
++
++/*=========================================
++*  Target specific
++=========================================*/
++#define STREAM_ACCUMULATOR_MIN_32 25
++#define STREAM_ACCUMULATOR_MIN_64 57
++#define STREAM_ACCUMULATOR_MIN ((U32)(ZSTD_32bits() ? STREAM_ACCUMULATOR_MIN_32 : STREAM_ACCUMULATOR_MIN_64))
++
++/*-******************************************
++*  bitStream encoding API (write forward)
++********************************************/
++/* bitStream can mix input from multiple sources.
++*  A critical property of these streams is that they encode and decode in **reverse** direction.
++*  So the first bit sequence you add will be the last to be read, like a LIFO stack.
++*/
++typedef struct {
++	size_t bitContainer;
++	int bitPos;
++	char *startPtr;
++	char *ptr;
++	char *endPtr;
++} BIT_CStream_t;
++
++ZSTD_STATIC size_t BIT_initCStream(BIT_CStream_t *bitC, void *dstBuffer, size_t dstCapacity);
++ZSTD_STATIC void BIT_addBits(BIT_CStream_t *bitC, size_t value, unsigned nbBits);
++ZSTD_STATIC void BIT_flushBits(BIT_CStream_t *bitC);
++ZSTD_STATIC size_t BIT_closeCStream(BIT_CStream_t *bitC);
++
++/* Start with initCStream, providing the size of buffer to write into.
++*  bitStream will never write outside of this buffer.
++*  `dstCapacity` must be >= sizeof(bitD->bitContainer), otherwise @return will be an error code.
++*
++*  bits are first added to a local register.
++*  Local register is size_t, hence 64-bits on 64-bits systems, or 32-bits on 32-bits systems.
++*  Writing data into memory is an explicit operation, performed by the flushBits function.
++*  Hence keep track how many bits are potentially stored into local register to avoid register overflow.
++*  After a flushBits, a maximum of 7 bits might still be stored into local register.
++*
++*  Avoid storing elements of more than 24 bits if you want compatibility with 32-bits bitstream readers.
++*
++*  Last operation is to close the bitStream.
++*  The function returns the final size of CStream in bytes.
++*  If data couldn't fit into `dstBuffer`, it will return a 0 ( == not storable)
++*/
++
++/*-********************************************
++*  bitStream decoding API (read backward)
++**********************************************/
++typedef struct {
++	size_t bitContainer;
++	unsigned bitsConsumed;
++	const char *ptr;
++	const char *start;
++} BIT_DStream_t;
++
++typedef enum {
++	BIT_DStream_unfinished = 0,
++	BIT_DStream_endOfBuffer = 1,
++	BIT_DStream_completed = 2,
++	BIT_DStream_overflow = 3
++} BIT_DStream_status; /* result of BIT_reloadDStream() */
++/* 1,2,4,8 would be better for bitmap combinations, but slows down performance a bit ... :( */
++
++ZSTD_STATIC size_t BIT_initDStream(BIT_DStream_t *bitD, const void *srcBuffer, size_t srcSize);
++ZSTD_STATIC size_t BIT_readBits(BIT_DStream_t *bitD, unsigned nbBits);
++ZSTD_STATIC BIT_DStream_status BIT_reloadDStream(BIT_DStream_t *bitD);
++ZSTD_STATIC unsigned BIT_endOfDStream(const BIT_DStream_t *bitD);
++
++/* Start by invoking BIT_initDStream().
++*  A chunk of the bitStream is then stored into a local register.
++*  Local register size is 64-bits on 64-bits systems, 32-bits on 32-bits systems (size_t).
++*  You can then retrieve bitFields stored into the local register, **in reverse order**.
++*  Local register is explicitly reloaded from memory by the BIT_reloadDStream() method.
++*  A reload guarantee a minimum of ((8*sizeof(bitD->bitContainer))-7) bits when its result is BIT_DStream_unfinished.
++*  Otherwise, it can be less than that, so proceed accordingly.
++*  Checking if DStream has reached its end can be performed with BIT_endOfDStream().
++*/
++
++/*-****************************************
++*  unsafe API
++******************************************/
++ZSTD_STATIC void BIT_addBitsFast(BIT_CStream_t *bitC, size_t value, unsigned nbBits);
++/* faster, but works only if value is "clean", meaning all high bits above nbBits are 0 */
++
++ZSTD_STATIC void BIT_flushBitsFast(BIT_CStream_t *bitC);
++/* unsafe version; does not check buffer overflow */
++
++ZSTD_STATIC size_t BIT_readBitsFast(BIT_DStream_t *bitD, unsigned nbBits);
++/* faster, but works only if nbBits >= 1 */
++
++/*-**************************************************************
++*  Internal functions
++****************************************************************/
++ZSTD_STATIC unsigned BIT_highbit32(register U32 val) { return 31 - __builtin_clz(val); }
++
++/*=====    Local Constants   =====*/
++static const unsigned BIT_mask[] = {0,       1,       3,       7,	0xF,      0x1F,     0x3F,     0x7F,      0xFF,
++				    0x1FF,   0x3FF,   0x7FF,   0xFFF,    0x1FFF,   0x3FFF,   0x7FFF,   0xFFFF,    0x1FFFF,
++				    0x3FFFF, 0x7FFFF, 0xFFFFF, 0x1FFFFF, 0x3FFFFF, 0x7FFFFF, 0xFFFFFF, 0x1FFFFFF, 0x3FFFFFF}; /* up to 26 bits */
++
++/*-**************************************************************
++*  bitStream encoding
++****************************************************************/
++/*! BIT_initCStream() :
++ *  `dstCapacity` must be > sizeof(void*)
++ *  @return : 0 if success,
++			  otherwise an error code (can be tested using ERR_isError() ) */
++ZSTD_STATIC size_t BIT_initCStream(BIT_CStream_t *bitC, void *startPtr, size_t dstCapacity)
++{
++	bitC->bitContainer = 0;
++	bitC->bitPos = 0;
++	bitC->startPtr = (char *)startPtr;
++	bitC->ptr = bitC->startPtr;
++	bitC->endPtr = bitC->startPtr + dstCapacity - sizeof(bitC->ptr);
++	if (dstCapacity <= sizeof(bitC->ptr))
++		return ERROR(dstSize_tooSmall);
++	return 0;
++}
++
++/*! BIT_addBits() :
++	can add up to 26 bits into `bitC`.
++	Does not check for register overflow ! */
++ZSTD_STATIC void BIT_addBits(BIT_CStream_t *bitC, size_t value, unsigned nbBits)
++{
++	bitC->bitContainer |= (value & BIT_mask[nbBits]) << bitC->bitPos;
++	bitC->bitPos += nbBits;
++}
++
++/*! BIT_addBitsFast() :
++ *  works only if `value` is _clean_, meaning all high bits above nbBits are 0 */
++ZSTD_STATIC void BIT_addBitsFast(BIT_CStream_t *bitC, size_t value, unsigned nbBits)
++{
++	bitC->bitContainer |= value << bitC->bitPos;
++	bitC->bitPos += nbBits;
++}
++
++/*! BIT_flushBitsFast() :
++ *  unsafe version; does not check buffer overflow */
++ZSTD_STATIC void BIT_flushBitsFast(BIT_CStream_t *bitC)
++{
++	size_t const nbBytes = bitC->bitPos >> 3;
++	ZSTD_writeLEST(bitC->ptr, bitC->bitContainer);
++	bitC->ptr += nbBytes;
++	bitC->bitPos &= 7;
++	bitC->bitContainer >>= nbBytes * 8; /* if bitPos >= sizeof(bitContainer)*8 --> undefined behavior */
++}
++
++/*! BIT_flushBits() :
++ *  safe version; check for buffer overflow, and prevents it.
++ *  note : does not signal buffer overflow. This will be revealed later on using BIT_closeCStream() */
++ZSTD_STATIC void BIT_flushBits(BIT_CStream_t *bitC)
++{
++	size_t const nbBytes = bitC->bitPos >> 3;
++	ZSTD_writeLEST(bitC->ptr, bitC->bitContainer);
++	bitC->ptr += nbBytes;
++	if (bitC->ptr > bitC->endPtr)
++		bitC->ptr = bitC->endPtr;
++	bitC->bitPos &= 7;
++	bitC->bitContainer >>= nbBytes * 8; /* if bitPos >= sizeof(bitContainer)*8 --> undefined behavior */
++}
++
++/*! BIT_closeCStream() :
++ *  @return : size of CStream, in bytes,
++			  or 0 if it could not fit into dstBuffer */
++ZSTD_STATIC size_t BIT_closeCStream(BIT_CStream_t *bitC)
++{
++	BIT_addBitsFast(bitC, 1, 1); /* endMark */
++	BIT_flushBits(bitC);
++
++	if (bitC->ptr >= bitC->endPtr)
++		return 0; /* doesn't fit within authorized budget : cancel */
++
++	return (bitC->ptr - bitC->startPtr) + (bitC->bitPos > 0);
++}
++
++/*-********************************************************
++* bitStream decoding
++**********************************************************/
++/*! BIT_initDStream() :
++*   Initialize a BIT_DStream_t.
++*   `bitD` : a pointer to an already allocated BIT_DStream_t structure.
++*   `srcSize` must be the *exact* size of the bitStream, in bytes.
++*   @return : size of stream (== srcSize) or an errorCode if a problem is detected
++*/
++ZSTD_STATIC size_t BIT_initDStream(BIT_DStream_t *bitD, const void *srcBuffer, size_t srcSize)
++{
++	if (srcSize < 1) {
++		memset(bitD, 0, sizeof(*bitD));
++		return ERROR(srcSize_wrong);
++	}
++
++	if (srcSize >= sizeof(bitD->bitContainer)) { /* normal case */
++		bitD->start = (const char *)srcBuffer;
++		bitD->ptr = (const char *)srcBuffer + srcSize - sizeof(bitD->bitContainer);
++		bitD->bitContainer = ZSTD_readLEST(bitD->ptr);
++		{
++			BYTE const lastByte = ((const BYTE *)srcBuffer)[srcSize - 1];
++			bitD->bitsConsumed = lastByte ? 8 - BIT_highbit32(lastByte) : 0; /* ensures bitsConsumed is always set */
++			if (lastByte == 0)
++				return ERROR(GENERIC); /* endMark not present */
++		}
++	} else {
++		bitD->start = (const char *)srcBuffer;
++		bitD->ptr = bitD->start;
++		bitD->bitContainer = *(const BYTE *)(bitD->start);
++		switch (srcSize) {
++		case 7: bitD->bitContainer += (size_t)(((const BYTE *)(srcBuffer))[6]) << (sizeof(bitD->bitContainer) * 8 - 16);
++			/* fallthrough */
++		case 6: bitD->bitContainer += (size_t)(((const BYTE *)(srcBuffer))[5]) << (sizeof(bitD->bitContainer) * 8 - 24);
++			/* fallthrough */
++		case 5: bitD->bitContainer += (size_t)(((const BYTE *)(srcBuffer))[4]) << (sizeof(bitD->bitContainer) * 8 - 32);
++			/* fallthrough */
++		case 4: bitD->bitContainer += (size_t)(((const BYTE *)(srcBuffer))[3]) << 24;
++			/* fallthrough */
++		case 3: bitD->bitContainer += (size_t)(((const BYTE *)(srcBuffer))[2]) << 16;
++			/* fallthrough */
++		case 2: bitD->bitContainer += (size_t)(((const BYTE *)(srcBuffer))[1]) << 8;
++			/* fallthrough */
++		default:;
++		}
++		{
++			BYTE const lastByte = ((const BYTE *)srcBuffer)[srcSize - 1];
++			bitD->bitsConsumed = lastByte ? 8 - BIT_highbit32(lastByte) : 0;
++			if (lastByte == 0)
++				return ERROR(GENERIC); /* endMark not present */
++		}
++		bitD->bitsConsumed += (U32)(sizeof(bitD->bitContainer) - srcSize) * 8;
++	}
++
++	return srcSize;
++}
++
++ZSTD_STATIC size_t BIT_getUpperBits(size_t bitContainer, U32 const start) { return bitContainer >> start; }
++
++ZSTD_STATIC size_t BIT_getMiddleBits(size_t bitContainer, U32 const start, U32 const nbBits) { return (bitContainer >> start) & BIT_mask[nbBits]; }
++
++ZSTD_STATIC size_t BIT_getLowerBits(size_t bitContainer, U32 const nbBits) { return bitContainer & BIT_mask[nbBits]; }
++
++/*! BIT_lookBits() :
++ *  Provides next n bits from local register.
++ *  local register is not modified.
++ *  On 32-bits, maxNbBits==24.
++ *  On 64-bits, maxNbBits==56.
++ *  @return : value extracted
++ */
++ZSTD_STATIC size_t BIT_lookBits(const BIT_DStream_t *bitD, U32 nbBits)
++{
++	U32 const bitMask = sizeof(bitD->bitContainer) * 8 - 1;
++	return ((bitD->bitContainer << (bitD->bitsConsumed & bitMask)) >> 1) >> ((bitMask - nbBits) & bitMask);
++}
++
++/*! BIT_lookBitsFast() :
++*   unsafe version; only works only if nbBits >= 1 */
++ZSTD_STATIC size_t BIT_lookBitsFast(const BIT_DStream_t *bitD, U32 nbBits)
++{
++	U32 const bitMask = sizeof(bitD->bitContainer) * 8 - 1;
++	return (bitD->bitContainer << (bitD->bitsConsumed & bitMask)) >> (((bitMask + 1) - nbBits) & bitMask);
++}
++
++ZSTD_STATIC void BIT_skipBits(BIT_DStream_t *bitD, U32 nbBits) { bitD->bitsConsumed += nbBits; }
++
++/*! BIT_readBits() :
++ *  Read (consume) next n bits from local register and update.
++ *  Pay attention to not read more than nbBits contained into local register.
++ *  @return : extracted value.
++ */
++ZSTD_STATIC size_t BIT_readBits(BIT_DStream_t *bitD, U32 nbBits)
++{
++	size_t const value = BIT_lookBits(bitD, nbBits);
++	BIT_skipBits(bitD, nbBits);
++	return value;
++}
++
++/*! BIT_readBitsFast() :
++*   unsafe version; only works only if nbBits >= 1 */
++ZSTD_STATIC size_t BIT_readBitsFast(BIT_DStream_t *bitD, U32 nbBits)
++{
++	size_t const value = BIT_lookBitsFast(bitD, nbBits);
++	BIT_skipBits(bitD, nbBits);
++	return value;
++}
++
++/*! BIT_reloadDStream() :
++*   Refill `bitD` from buffer previously set in BIT_initDStream() .
++*   This function is safe, it guarantees it will not read beyond src buffer.
++*   @return : status of `BIT_DStream_t` internal register.
++			  if status == BIT_DStream_unfinished, internal register is filled with >= (sizeof(bitD->bitContainer)*8 - 7) bits */
++ZSTD_STATIC BIT_DStream_status BIT_reloadDStream(BIT_DStream_t *bitD)
++{
++	if (bitD->bitsConsumed > (sizeof(bitD->bitContainer) * 8)) /* should not happen => corruption detected */
++		return BIT_DStream_overflow;
++
++	if (bitD->ptr >= bitD->start + sizeof(bitD->bitContainer)) {
++		bitD->ptr -= bitD->bitsConsumed >> 3;
++		bitD->bitsConsumed &= 7;
++		bitD->bitContainer = ZSTD_readLEST(bitD->ptr);
++		return BIT_DStream_unfinished;
++	}
++	if (bitD->ptr == bitD->start) {
++		if (bitD->bitsConsumed < sizeof(bitD->bitContainer) * 8)
++			return BIT_DStream_endOfBuffer;
++		return BIT_DStream_completed;
++	}
++	{
++		U32 nbBytes = bitD->bitsConsumed >> 3;
++		BIT_DStream_status result = BIT_DStream_unfinished;
++		if (bitD->ptr - nbBytes < bitD->start) {
++			nbBytes = (U32)(bitD->ptr - bitD->start); /* ptr > start */
++			result = BIT_DStream_endOfBuffer;
++		}
++		bitD->ptr -= nbBytes;
++		bitD->bitsConsumed -= nbBytes * 8;
++		bitD->bitContainer = ZSTD_readLEST(bitD->ptr); /* reminder : srcSize > sizeof(bitD) */
++		return result;
++	}
++}
++
++/*! BIT_endOfDStream() :
++*   @return Tells if DStream has exactly reached its end (all bits consumed).
++*/
++ZSTD_STATIC unsigned BIT_endOfDStream(const BIT_DStream_t *DStream)
++{
++	return ((DStream->ptr == DStream->start) && (DStream->bitsConsumed == sizeof(DStream->bitContainer) * 8));
++}
++
++#endif /* BITSTREAM_H_MODULE */
+diff --git a/xen/common/zstd/decompress.c b/xen/common/zstd/decompress.c
+new file mode 100644
+index 000000000000..3d3ef136e5c2
+--- /dev/null
++++ b/xen/common/zstd/decompress.c
+@@ -0,0 +1,2496 @@
++/**
++ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
++ * All rights reserved.
++ *
++ * This source code is licensed under the BSD-style license found in the
++ * LICENSE file in the root directory of https://github.com/facebook/zstd.
++ * An additional grant of patent rights can be found in the PATENTS file in the
++ * same directory.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ */
++
++/* ***************************************************************
++*  Tuning parameters
++*****************************************************************/
++/*!
++*  MAXWINDOWSIZE_DEFAULT :
++*  maximum window size accepted by DStream, by default.
++*  Frames requiring more memory will be rejected.
++*/
++#ifndef ZSTD_MAXWINDOWSIZE_DEFAULT
++#define ZSTD_MAXWINDOWSIZE_DEFAULT ((1 << ZSTD_WINDOWLOG_MAX) + 1) /* defined within zstd.h */
++#endif
++
++/*-*******************************************************
++*  Dependencies
++*********************************************************/
++#include "fse.h"
++#include "huf.h"
++#include "mem.h" /* low level memory routines */
++#include "zstd_internal.h"
++#include <xen/string.h> /* memcpy, memmove, memset */
++
++#define ZSTD_PREFETCH(ptr) __builtin_prefetch(ptr, 0, 0)
++
++/*-*************************************
++*  Macros
++***************************************/
++#define ZSTD_isError ERR_isError /* for inlining */
++#define FSE_isError ERR_isError
++#define HUF_isError ERR_isError
++
++/*_*******************************************************
++*  Memory operations
++**********************************************************/
++static void INIT ZSTD_copy4(void *dst, const void *src) { memcpy(dst, src, 4); }
++
++/*-*************************************************************
++*   Context management
++***************************************************************/
++typedef enum {
++	ZSTDds_getFrameHeaderSize,
++	ZSTDds_decodeFrameHeader,
++	ZSTDds_decodeBlockHeader,
++	ZSTDds_decompressBlock,
++	ZSTDds_decompressLastBlock,
++	ZSTDds_checkChecksum,
++	ZSTDds_decodeSkippableHeader,
++	ZSTDds_skipFrame
++} ZSTD_dStage;
++
++typedef struct {
++	FSE_DTable LLTable[FSE_DTABLE_SIZE_U32(LLFSELog)];
++	FSE_DTable OFTable[FSE_DTABLE_SIZE_U32(OffFSELog)];
++	FSE_DTable MLTable[FSE_DTABLE_SIZE_U32(MLFSELog)];
++	HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)]; /* can accommodate HUF_decompress4X */
++	U64 workspace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32 / 2];
++	U32 rep[ZSTD_REP_NUM];
++} ZSTD_entropyTables_t;
++
++struct ZSTD_DCtx_s {
++	const FSE_DTable *LLTptr;
++	const FSE_DTable *MLTptr;
++	const FSE_DTable *OFTptr;
++	const HUF_DTable *HUFptr;
++	ZSTD_entropyTables_t entropy;
++	const void *previousDstEnd; /* detect continuity */
++	const void *base;	   /* start of curr segment */
++	const void *vBase;	  /* virtual start of previous segment if it was just before curr one */
++	const void *dictEnd;	/* end of previous segment */
++	size_t expected;
++	ZSTD_frameParams fParams;
++	blockType_e bType; /* used in ZSTD_decompressContinue(), to transfer blockType between header decoding and block decoding stages */
++	ZSTD_dStage stage;
++	U32 litEntropy;
++	U32 fseEntropy;
++	struct xxh64_state xxhState;
++	size_t headerSize;
++	U32 dictID;
++	const BYTE *litPtr;
++	ZSTD_customMem customMem;
++	size_t litSize;
++	size_t rleSize;
++	BYTE litBuffer[ZSTD_BLOCKSIZE_ABSOLUTEMAX + WILDCOPY_OVERLENGTH];
++	BYTE headerBuffer[ZSTD_FRAMEHEADERSIZE_MAX];
++}; /* typedef'd to ZSTD_DCtx within "zstd.h" */
++
++size_t INIT ZSTD_DCtxWorkspaceBound(void) { return ZSTD_ALIGN(sizeof(ZSTD_stack)) + ZSTD_ALIGN(sizeof(ZSTD_DCtx)); }
++
++size_t INIT ZSTD_decompressBegin(ZSTD_DCtx *dctx)
++{
++	dctx->expected = ZSTD_frameHeaderSize_prefix;
++	dctx->stage = ZSTDds_getFrameHeaderSize;
++	dctx->previousDstEnd = NULL;
++	dctx->base = NULL;
++	dctx->vBase = NULL;
++	dctx->dictEnd = NULL;
++	dctx->entropy.hufTable[0] = (HUF_DTable)((HufLog)*0x1000001); /* cover both little and big endian */
++	dctx->litEntropy = dctx->fseEntropy = 0;
++	dctx->dictID = 0;
++	ZSTD_STATIC_ASSERT(sizeof(dctx->entropy.rep) == sizeof(repStartValue));
++	memcpy(dctx->entropy.rep, repStartValue, sizeof(repStartValue)); /* initial repcodes */
++	dctx->LLTptr = dctx->entropy.LLTable;
++	dctx->MLTptr = dctx->entropy.MLTable;
++	dctx->OFTptr = dctx->entropy.OFTable;
++	dctx->HUFptr = dctx->entropy.hufTable;
++	return 0;
++}
++
++ZSTD_DCtx *INIT ZSTD_createDCtx_advanced(ZSTD_customMem customMem)
++{
++	ZSTD_DCtx *dctx;
++
++	if (!customMem.customAlloc || !customMem.customFree)
++		return NULL;
++
++	dctx = (ZSTD_DCtx *)ZSTD_malloc(sizeof(ZSTD_DCtx), customMem);
++	if (!dctx)
++		return NULL;
++	memcpy(&dctx->customMem, &customMem, sizeof(customMem));
++	ZSTD_decompressBegin(dctx);
++	return dctx;
++}
++
++ZSTD_DCtx *INIT ZSTD_initDCtx(void *workspace, size_t workspaceSize)
++{
++	ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
++	return ZSTD_createDCtx_advanced(stackMem);
++}
++
++size_t INIT ZSTD_freeDCtx(ZSTD_DCtx *dctx)
++{
++	if (dctx == NULL)
++		return 0; /* support free on NULL */
++	ZSTD_free(dctx, dctx->customMem);
++	return 0; /* reserved as a potential error code in the future */
++}
++
++void INIT ZSTD_copyDCtx(ZSTD_DCtx *dstDCtx, const ZSTD_DCtx *srcDCtx)
++{
++	size_t const workSpaceSize = (ZSTD_BLOCKSIZE_ABSOLUTEMAX + WILDCOPY_OVERLENGTH) + ZSTD_frameHeaderSize_max;
++	memcpy(dstDCtx, srcDCtx, sizeof(ZSTD_DCtx) - workSpaceSize); /* no need to copy workspace */
++}
++
++STATIC size_t ZSTD_findFrameCompressedSize(const void *src, size_t srcSize);
++STATIC size_t ZSTD_decompressBegin_usingDict(ZSTD_DCtx *dctx, const void *dict,
++	size_t dictSize);
++
++static void ZSTD_refDDict(ZSTD_DCtx *dstDCtx, const ZSTD_DDict *ddict);
++
++/*-*************************************************************
++*   Decompression section
++***************************************************************/
++
++/*! ZSTD_isFrame() :
++ *  Tells if the content of `buffer` starts with a valid Frame Identifier.
++ *  Note : Frame Identifier is 4 bytes. If `size < 4`, @return will always be 0.
++ *  Note 2 : Legacy Frame Identifiers are considered valid only if Legacy Support is enabled.
++ *  Note 3 : Skippable Frame Identifiers are considered valid. */
++unsigned INIT ZSTD_isFrame(const void *buffer, size_t size)
++{
++	if (size < 4)
++		return 0;
++	{
++		U32 const magic = ZSTD_readLE32(buffer);
++		if (magic == ZSTD_MAGICNUMBER)
++			return 1;
++		if ((magic & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START)
++			return 1;
++	}
++	return 0;
++}
++
++/** ZSTD_frameHeaderSize() :
++*   srcSize must be >= ZSTD_frameHeaderSize_prefix.
++*   @return : size of the Frame Header */
++static size_t INIT ZSTD_frameHeaderSize(const void *src, size_t srcSize)
++{
++	if (srcSize < ZSTD_frameHeaderSize_prefix)
++		return ERROR(srcSize_wrong);
++	{
++		BYTE const fhd = ((const BYTE *)src)[4];
++		U32 const dictID = fhd & 3;
++		U32 const singleSegment = (fhd >> 5) & 1;
++		U32 const fcsId = fhd >> 6;
++		return ZSTD_frameHeaderSize_prefix + !singleSegment + ZSTD_did_fieldSize[dictID] + ZSTD_fcs_fieldSize[fcsId] + (singleSegment && !fcsId);
++	}
++}
++
++/** ZSTD_getFrameParams() :
++*   decode Frame Header, or require larger `srcSize`.
++*   @return : 0, `fparamsPtr` is correctly filled,
++*            >0, `srcSize` is too small, result is expected `srcSize`,
++*             or an error code, which can be tested using ZSTD_isError() */
++size_t INIT ZSTD_getFrameParams(ZSTD_frameParams *fparamsPtr, const void *src, size_t srcSize)
++{
++	const BYTE *ip = (const BYTE *)src;
++
++	if (srcSize < ZSTD_frameHeaderSize_prefix)
++		return ZSTD_frameHeaderSize_prefix;
++	if (ZSTD_readLE32(src) != ZSTD_MAGICNUMBER) {
++		if ((ZSTD_readLE32(src) & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {
++			if (srcSize < ZSTD_skippableHeaderSize)
++				return ZSTD_skippableHeaderSize; /* magic number + skippable frame length */
++			memset(fparamsPtr, 0, sizeof(*fparamsPtr));
++			fparamsPtr->frameContentSize = ZSTD_readLE32((const char *)src + 4);
++			fparamsPtr->windowSize = 0; /* windowSize==0 means a frame is skippable */
++			return 0;
++		}
++		return ERROR(prefix_unknown);
++	}
++
++	/* ensure there is enough `srcSize` to fully read/decode frame header */
++	{
++		size_t const fhsize = ZSTD_frameHeaderSize(src, srcSize);
++		if (srcSize < fhsize)
++			return fhsize;
++	}
++
++	{
++		BYTE const fhdByte = ip[4];
++		size_t pos = 5;
++		U32 const dictIDSizeCode = fhdByte & 3;
++		U32 const checksumFlag = (fhdByte >> 2) & 1;
++		U32 const singleSegment = (fhdByte >> 5) & 1;
++		U32 const fcsID = fhdByte >> 6;
++		U32 const windowSizeMax = 1U << ZSTD_WINDOWLOG_MAX;
++		U32 windowSize = 0;
++		U32 dictID = 0;
++		U64 frameContentSize = 0;
++		if ((fhdByte & 0x08) != 0)
++			return ERROR(frameParameter_unsupported); /* reserved bits, which must be zero */
++		if (!singleSegment) {
++			BYTE const wlByte = ip[pos++];
++			U32 const windowLog = (wlByte >> 3) + ZSTD_WINDOWLOG_ABSOLUTEMIN;
++			if (windowLog > ZSTD_WINDOWLOG_MAX)
++				return ERROR(frameParameter_windowTooLarge); /* avoids issue with 1 << windowLog */
++			windowSize = (1U << windowLog);
++			windowSize += (windowSize >> 3) * (wlByte & 7);
++		}
++
++		switch (dictIDSizeCode) {
++		default: /* impossible */
++		case 0: break;
++		case 1:
++			dictID = ip[pos];
++			pos++;
++			break;
++		case 2:
++			dictID = ZSTD_readLE16(ip + pos);
++			pos += 2;
++			break;
++		case 3:
++			dictID = ZSTD_readLE32(ip + pos);
++			pos += 4;
++			break;
++		}
++		switch (fcsID) {
++		default: /* impossible */
++		case 0:
++			if (singleSegment)
++				frameContentSize = ip[pos];
++			break;
++		case 1: frameContentSize = ZSTD_readLE16(ip + pos) + 256; break;
++		case 2: frameContentSize = ZSTD_readLE32(ip + pos); break;
++		case 3: frameContentSize = ZSTD_readLE64(ip + pos); break;
++		}
++		if (!windowSize)
++			windowSize = (U32)frameContentSize;
++		if (windowSize > windowSizeMax)
++			return ERROR(frameParameter_windowTooLarge);
++		fparamsPtr->frameContentSize = frameContentSize;
++		fparamsPtr->windowSize = windowSize;
++		fparamsPtr->dictID = dictID;
++		fparamsPtr->checksumFlag = checksumFlag;
++	}
++	return 0;
++}
++
++/** ZSTD_getFrameContentSize() :
++*   compatible with legacy mode
++*   @return : decompressed size of the single frame pointed to be `src` if known, otherwise
++*             - ZSTD_CONTENTSIZE_UNKNOWN if the size cannot be determined
++*             - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid magic number, srcSize too small) */
++unsigned long long INIT ZSTD_getFrameContentSize(const void *src, size_t srcSize)
++{
++	{
++		ZSTD_frameParams fParams;
++		if (ZSTD_getFrameParams(&fParams, src, srcSize) != 0)
++			return ZSTD_CONTENTSIZE_ERROR;
++		if (fParams.windowSize == 0) {
++			/* Either skippable or empty frame, size == 0 either way */
++			return 0;
++		} else if (fParams.frameContentSize != 0) {
++			return fParams.frameContentSize;
++		} else {
++			return ZSTD_CONTENTSIZE_UNKNOWN;
++		}
++	}
++}
++
++/** ZSTD_findDecompressedSize() :
++ *  compatible with legacy mode
++ *  `srcSize` must be the exact length of some number of ZSTD compressed and/or
++ *      skippable frames
++ *  @return : decompressed size of the frames contained */
++unsigned long long INIT ZSTD_findDecompressedSize(const void *src, size_t srcSize)
++{
++	{
++		unsigned long long totalDstSize = 0;
++		while (srcSize >= ZSTD_frameHeaderSize_prefix) {
++			const U32 magicNumber = ZSTD_readLE32(src);
++
++			if ((magicNumber & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {
++				size_t skippableSize;
++				if (srcSize < ZSTD_skippableHeaderSize)
++					return ERROR(srcSize_wrong);
++				skippableSize = ZSTD_readLE32((const BYTE *)src + 4) + ZSTD_skippableHeaderSize;
++				if (srcSize < skippableSize) {
++					return ZSTD_CONTENTSIZE_ERROR;
++				}
++
++				src = (const BYTE *)src + skippableSize;
++				srcSize -= skippableSize;
++				continue;
++			}
++
++			{
++				unsigned long long const ret = ZSTD_getFrameContentSize(src, srcSize);
++				if (ret >= ZSTD_CONTENTSIZE_ERROR)
++					return ret;
++
++				/* check for overflow */
++				if (totalDstSize + ret < totalDstSize)
++					return ZSTD_CONTENTSIZE_ERROR;
++				totalDstSize += ret;
++			}
++			{
++				size_t const frameSrcSize = ZSTD_findFrameCompressedSize(src, srcSize);
++				if (ZSTD_isError(frameSrcSize)) {
++					return ZSTD_CONTENTSIZE_ERROR;
++				}
++
++				src = (const BYTE *)src + frameSrcSize;
++				srcSize -= frameSrcSize;
++			}
++		}
++
++		if (srcSize) {
++			return ZSTD_CONTENTSIZE_ERROR;
++		}
++
++		return totalDstSize;
++	}
++}
++
++/** ZSTD_decodeFrameHeader() :
++*   `headerSize` must be the size provided by ZSTD_frameHeaderSize().
++*   @return : 0 if success, or an error code, which can be tested using ZSTD_isError() */
++static size_t INIT ZSTD_decodeFrameHeader(ZSTD_DCtx *dctx, const void *src, size_t headerSize)
++{
++	size_t const result = ZSTD_getFrameParams(&(dctx->fParams), src, headerSize);
++	if (ZSTD_isError(result))
++		return result; /* invalid header */
++	if (result > 0)
++		return ERROR(srcSize_wrong); /* headerSize too small */
++	if (dctx->fParams.dictID && (dctx->dictID != dctx->fParams.dictID))
++		return ERROR(dictionary_wrong);
++	if (dctx->fParams.checksumFlag)
++		xxh64_reset(&dctx->xxhState, 0);
++	return 0;
++}
++
++typedef struct {
++	blockType_e blockType;
++	U32 lastBlock;
++	U32 origSize;
++} blockProperties_t;
++
++/*! ZSTD_getcBlockSize() :
++*   Provides the size of compressed block from block header `src` */
++size_t INIT ZSTD_getcBlockSize(const void *src, size_t srcSize, blockProperties_t *bpPtr)
++{
++	if (srcSize < ZSTD_blockHeaderSize)
++		return ERROR(srcSize_wrong);
++	{
++		U32 const cBlockHeader = ZSTD_readLE24(src);
++		U32 const cSize = cBlockHeader >> 3;
++		bpPtr->lastBlock = cBlockHeader & 1;
++		bpPtr->blockType = (blockType_e)((cBlockHeader >> 1) & 3);
++		bpPtr->origSize = cSize; /* only useful for RLE */
++		if (bpPtr->blockType == bt_rle)
++			return 1;
++		if (bpPtr->blockType == bt_reserved)
++			return ERROR(corruption_detected);
++		return cSize;
++	}
++}
++
++static size_t INIT ZSTD_copyRawBlock(void *dst, size_t dstCapacity, const void *src, size_t srcSize)
++{
++	if (srcSize > dstCapacity)
++		return ERROR(dstSize_tooSmall);
++	memcpy(dst, src, srcSize);
++	return srcSize;
++}
++
++static size_t INIT ZSTD_setRleBlock(void *dst, size_t dstCapacity, const void *src, size_t srcSize, size_t regenSize)
++{
++	if (srcSize != 1)
++		return ERROR(srcSize_wrong);
++	if (regenSize > dstCapacity)
++		return ERROR(dstSize_tooSmall);
++	memset(dst, *(const BYTE *)src, regenSize);
++	return regenSize;
++}
++
++/*! ZSTD_decodeLiteralsBlock() :
++	@return : nb of bytes read from src (< srcSize ) */
++size_t INIT ZSTD_decodeLiteralsBlock(ZSTD_DCtx *dctx, const void *src, size_t srcSize) /* note : srcSize < BLOCKSIZE */
++{
++	if (srcSize < MIN_CBLOCK_SIZE)
++		return ERROR(corruption_detected);
++
++	{
++		const BYTE *const istart = (const BYTE *)src;
++		symbolEncodingType_e const litEncType = (symbolEncodingType_e)(istart[0] & 3);
++
++		switch (litEncType) {
++		case set_repeat:
++			if (dctx->litEntropy == 0)
++				return ERROR(dictionary_corrupted);
++			/* fallthrough */
++		case set_compressed:
++			if (srcSize < 5)
++				return ERROR(corruption_detected); /* srcSize >= MIN_CBLOCK_SIZE == 3; here we need up to 5 for case 3 */
++			{
++				size_t lhSize, litSize, litCSize;
++				U32 singleStream = 0;
++				U32 const lhlCode = (istart[0] >> 2) & 3;
++				U32 const lhc = ZSTD_readLE32(istart);
++				switch (lhlCode) {
++				case 0:
++				case 1:
++				default: /* note : default is impossible, since lhlCode into [0..3] */
++					/* 2 - 2 - 10 - 10 */
++					singleStream = !lhlCode;
++					lhSize = 3;
++					litSize = (lhc >> 4) & 0x3FF;
++					litCSize = (lhc >> 14) & 0x3FF;
++					break;
++				case 2:
++					/* 2 - 2 - 14 - 14 */
++					lhSize = 4;
++					litSize = (lhc >> 4) & 0x3FFF;
++					litCSize = lhc >> 18;
++					break;
++				case 3:
++					/* 2 - 2 - 18 - 18 */
++					lhSize = 5;
++					litSize = (lhc >> 4) & 0x3FFFF;
++					litCSize = (lhc >> 22) + (istart[4] << 10);
++					break;
++				}
++				if (litSize > ZSTD_BLOCKSIZE_ABSOLUTEMAX)
++					return ERROR(corruption_detected);
++				if (litCSize + lhSize > srcSize)
++					return ERROR(corruption_detected);
++
++				if (HUF_isError(
++					(litEncType == set_repeat)
++					    ? (singleStream ? HUF_decompress1X_usingDTable(dctx->litBuffer, litSize, istart + lhSize, litCSize, dctx->HUFptr)
++							    : HUF_decompress4X_usingDTable(dctx->litBuffer, litSize, istart + lhSize, litCSize, dctx->HUFptr))
++					    : (singleStream
++						   ? HUF_decompress1X2_DCtx_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize,
++										 dctx->entropy.workspace, sizeof(dctx->entropy.workspace))
++						   : HUF_decompress4X_hufOnly_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize,
++										   dctx->entropy.workspace, sizeof(dctx->entropy.workspace)))))
++					return ERROR(corruption_detected);
++
++				dctx->litPtr = dctx->litBuffer;
++				dctx->litSize = litSize;
++				dctx->litEntropy = 1;
++				if (litEncType == set_compressed)
++					dctx->HUFptr = dctx->entropy.hufTable;
++				memset(dctx->litBuffer + dctx->litSize, 0, WILDCOPY_OVERLENGTH);
++				return litCSize + lhSize;
++			}
++
++		case set_basic: {
++			size_t litSize, lhSize;
++			U32 const lhlCode = ((istart[0]) >> 2) & 3;
++			switch (lhlCode) {
++			case 0:
++			case 2:
++			default: /* note : default is impossible, since lhlCode into [0..3] */
++				lhSize = 1;
++				litSize = istart[0] >> 3;
++				break;
++			case 1:
++				lhSize = 2;
++				litSize = ZSTD_readLE16(istart) >> 4;
++				break;
++			case 3:
++				lhSize = 3;
++				litSize = ZSTD_readLE24(istart) >> 4;
++				break;
++			}
++
++			if (lhSize + litSize + WILDCOPY_OVERLENGTH > srcSize) { /* risk reading beyond src buffer with wildcopy */
++				if (litSize + lhSize > srcSize)
++					return ERROR(corruption_detected);
++				memcpy(dctx->litBuffer, istart + lhSize, litSize);
++				dctx->litPtr = dctx->litBuffer;
++				dctx->litSize = litSize;
++				memset(dctx->litBuffer + dctx->litSize, 0, WILDCOPY_OVERLENGTH);
++				return lhSize + litSize;
++			}
++			/* direct reference into compressed stream */
++			dctx->litPtr = istart + lhSize;
++			dctx->litSize = litSize;
++			return lhSize + litSize;
++		}
++
++		case set_rle: {
++			U32 const lhlCode = ((istart[0]) >> 2) & 3;
++			size_t litSize, lhSize;
++			switch (lhlCode) {
++			case 0:
++			case 2:
++			default: /* note : default is impossible, since lhlCode into [0..3] */
++				lhSize = 1;
++				litSize = istart[0] >> 3;
++				break;
++			case 1:
++				lhSize = 2;
++				litSize = ZSTD_readLE16(istart) >> 4;
++				break;
++			case 3:
++				lhSize = 3;
++				litSize = ZSTD_readLE24(istart) >> 4;
++				if (srcSize < 4)
++					return ERROR(corruption_detected); /* srcSize >= MIN_CBLOCK_SIZE == 3; here we need lhSize+1 = 4 */
++				break;
++			}
++			if (litSize > ZSTD_BLOCKSIZE_ABSOLUTEMAX)
++				return ERROR(corruption_detected);
++			memset(dctx->litBuffer, istart[lhSize], litSize + WILDCOPY_OVERLENGTH);
++			dctx->litPtr = dctx->litBuffer;
++			dctx->litSize = litSize;
++			return lhSize + 1;
++		}
++		default:
++			return ERROR(corruption_detected); /* impossible */
++		}
++	}
++}
++
++typedef union {
++	FSE_decode_t realData;
++	U32 alignedBy4;
++} FSE_decode_t4;
++
++static const FSE_decode_t4 LL_defaultDTable[(1 << LL_DEFAULTNORMLOG) + 1] = {
++    {{LL_DEFAULTNORMLOG, 1, 1}}, /* header : tableLog, fastMode, fastMode */
++    {{0, 0, 4}},		 /* 0 : base, symbol, bits */
++    {{16, 0, 4}},
++    {{32, 1, 5}},
++    {{0, 3, 5}},
++    {{0, 4, 5}},
++    {{0, 6, 5}},
++    {{0, 7, 5}},
++    {{0, 9, 5}},
++    {{0, 10, 5}},
++    {{0, 12, 5}},
++    {{0, 14, 6}},
++    {{0, 16, 5}},
++    {{0, 18, 5}},
++    {{0, 19, 5}},
++    {{0, 21, 5}},
++    {{0, 22, 5}},
++    {{0, 24, 5}},
++    {{32, 25, 5}},
++    {{0, 26, 5}},
++    {{0, 27, 6}},
++    {{0, 29, 6}},
++    {{0, 31, 6}},
++    {{32, 0, 4}},
++    {{0, 1, 4}},
++    {{0, 2, 5}},
++    {{32, 4, 5}},
++    {{0, 5, 5}},
++    {{32, 7, 5}},
++    {{0, 8, 5}},
++    {{32, 10, 5}},
++    {{0, 11, 5}},
++    {{0, 13, 6}},
++    {{32, 16, 5}},
++    {{0, 17, 5}},
++    {{32, 19, 5}},
++    {{0, 20, 5}},
++    {{32, 22, 5}},
++    {{0, 23, 5}},
++    {{0, 25, 4}},
++    {{16, 25, 4}},
++    {{32, 26, 5}},
++    {{0, 28, 6}},
++    {{0, 30, 6}},
++    {{48, 0, 4}},
++    {{16, 1, 4}},
++    {{32, 2, 5}},
++    {{32, 3, 5}},
++    {{32, 5, 5}},
++    {{32, 6, 5}},
++    {{32, 8, 5}},
++    {{32, 9, 5}},
++    {{32, 11, 5}},
++    {{32, 12, 5}},
++    {{0, 15, 6}},
++    {{32, 17, 5}},
++    {{32, 18, 5}},
++    {{32, 20, 5}},
++    {{32, 21, 5}},
++    {{32, 23, 5}},
++    {{32, 24, 5}},
++    {{0, 35, 6}},
++    {{0, 34, 6}},
++    {{0, 33, 6}},
++    {{0, 32, 6}},
++}; /* LL_defaultDTable */
++
++static const FSE_decode_t4 ML_defaultDTable[(1 << ML_DEFAULTNORMLOG) + 1] = {
++    {{ML_DEFAULTNORMLOG, 1, 1}}, /* header : tableLog, fastMode, fastMode */
++    {{0, 0, 6}},		 /* 0 : base, symbol, bits */
++    {{0, 1, 4}},
++    {{32, 2, 5}},
++    {{0, 3, 5}},
++    {{0, 5, 5}},
++    {{0, 6, 5}},
++    {{0, 8, 5}},
++    {{0, 10, 6}},
++    {{0, 13, 6}},
++    {{0, 16, 6}},
++    {{0, 19, 6}},
++    {{0, 22, 6}},
++    {{0, 25, 6}},
++    {{0, 28, 6}},
++    {{0, 31, 6}},
++    {{0, 33, 6}},
++    {{0, 35, 6}},
++    {{0, 37, 6}},
++    {{0, 39, 6}},
++    {{0, 41, 6}},
++    {{0, 43, 6}},
++    {{0, 45, 6}},
++    {{16, 1, 4}},
++    {{0, 2, 4}},
++    {{32, 3, 5}},
++    {{0, 4, 5}},
++    {{32, 6, 5}},
++    {{0, 7, 5}},
++    {{0, 9, 6}},
++    {{0, 12, 6}},
++    {{0, 15, 6}},
++    {{0, 18, 6}},
++    {{0, 21, 6}},
++    {{0, 24, 6}},
++    {{0, 27, 6}},
++    {{0, 30, 6}},
++    {{0, 32, 6}},
++    {{0, 34, 6}},
++    {{0, 36, 6}},
++    {{0, 38, 6}},
++    {{0, 40, 6}},
++    {{0, 42, 6}},
++    {{0, 44, 6}},
++    {{32, 1, 4}},
++    {{48, 1, 4}},
++    {{16, 2, 4}},
++    {{32, 4, 5}},
++    {{32, 5, 5}},
++    {{32, 7, 5}},
++    {{32, 8, 5}},
++    {{0, 11, 6}},
++    {{0, 14, 6}},
++    {{0, 17, 6}},
++    {{0, 20, 6}},
++    {{0, 23, 6}},
++    {{0, 26, 6}},
++    {{0, 29, 6}},
++    {{0, 52, 6}},
++    {{0, 51, 6}},
++    {{0, 50, 6}},
++    {{0, 49, 6}},
++    {{0, 48, 6}},
++    {{0, 47, 6}},
++    {{0, 46, 6}},
++}; /* ML_defaultDTable */
++
++static const FSE_decode_t4 OF_defaultDTable[(1 << OF_DEFAULTNORMLOG) + 1] = {
++    {{OF_DEFAULTNORMLOG, 1, 1}}, /* header : tableLog, fastMode, fastMode */
++    {{0, 0, 5}},		 /* 0 : base, symbol, bits */
++    {{0, 6, 4}},
++    {{0, 9, 5}},
++    {{0, 15, 5}},
++    {{0, 21, 5}},
++    {{0, 3, 5}},
++    {{0, 7, 4}},
++    {{0, 12, 5}},
++    {{0, 18, 5}},
++    {{0, 23, 5}},
++    {{0, 5, 5}},
++    {{0, 8, 4}},
++    {{0, 14, 5}},
++    {{0, 20, 5}},
++    {{0, 2, 5}},
++    {{16, 7, 4}},
++    {{0, 11, 5}},
++    {{0, 17, 5}},
++    {{0, 22, 5}},
++    {{0, 4, 5}},
++    {{16, 8, 4}},
++    {{0, 13, 5}},
++    {{0, 19, 5}},
++    {{0, 1, 5}},
++    {{16, 6, 4}},
++    {{0, 10, 5}},
++    {{0, 16, 5}},
++    {{0, 28, 5}},
++    {{0, 27, 5}},
++    {{0, 26, 5}},
++    {{0, 25, 5}},
++    {{0, 24, 5}},
++}; /* OF_defaultDTable */
++
++/*! ZSTD_buildSeqTable() :
++	@return : nb bytes read from src,
++			  or an error code if it fails, testable with ZSTD_isError()
++*/
++static size_t INIT ZSTD_buildSeqTable(FSE_DTable *DTableSpace, const FSE_DTable **DTablePtr,
++				      symbolEncodingType_e type, U32 max, U32 maxLog, const void *src,
++				      size_t srcSize, const FSE_decode_t4 *defaultTable,
++				      U32 flagRepeatTable, void *workspace, size_t workspaceSize)
++{
++	const void *const tmpPtr = defaultTable; /* bypass strict aliasing */
++	switch (type) {
++	case set_rle:
++		if (!srcSize)
++			return ERROR(srcSize_wrong);
++		if ((*(const BYTE *)src) > max)
++			return ERROR(corruption_detected);
++		FSE_buildDTable_rle(DTableSpace, *(const BYTE *)src);
++		*DTablePtr = DTableSpace;
++		return 1;
++	case set_basic: *DTablePtr = (const FSE_DTable *)tmpPtr; return 0;
++	case set_repeat:
++		if (!flagRepeatTable)
++			return ERROR(corruption_detected);
++		return 0;
++	default: /* impossible */
++	case set_compressed: {
++		U32 tableLog;
++		S16 *norm = (S16 *)workspace;
++		size_t const spaceUsed32 = ALIGN(sizeof(S16) * (MaxSeq + 1), sizeof(U32)) >> 2;
++
++		if ((spaceUsed32 << 2) > workspaceSize)
++			return ERROR(GENERIC);
++		workspace = (U32 *)workspace + spaceUsed32;
++		workspaceSize -= (spaceUsed32 << 2);
++		{
++			size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize);
++			if (FSE_isError(headerSize))
++				return ERROR(corruption_detected);
++			if (tableLog > maxLog)
++				return ERROR(corruption_detected);
++			FSE_buildDTable_wksp(DTableSpace, norm, max, tableLog, workspace, workspaceSize);
++			*DTablePtr = DTableSpace;
++			return headerSize;
++		}
++	}
++	}
++}
++
++size_t INIT ZSTD_decodeSeqHeaders(ZSTD_DCtx *dctx, int *nbSeqPtr, const void *src, size_t srcSize)
++{
++	const BYTE *const istart = (const BYTE *const)src;
++	const BYTE *const iend = istart + srcSize;
++	const BYTE *ip = istart;
++
++	/* check */
++	if (srcSize < MIN_SEQUENCES_SIZE)
++		return ERROR(srcSize_wrong);
++
++	/* SeqHead */
++	{
++		int nbSeq = *ip++;
++		if (!nbSeq) {
++			*nbSeqPtr = 0;
++			return 1;
++		}
++		if (nbSeq > 0x7F) {
++			if (nbSeq == 0xFF) {
++				if (ip + 2 > iend)
++					return ERROR(srcSize_wrong);
++				nbSeq = ZSTD_readLE16(ip) + LONGNBSEQ, ip += 2;
++			} else {
++				if (ip >= iend)
++					return ERROR(srcSize_wrong);
++				nbSeq = ((nbSeq - 0x80) << 8) + *ip++;
++			}
++		}
++		*nbSeqPtr = nbSeq;
++	}
++
++	/* FSE table descriptors */
++	if (ip + 4 > iend)
++		return ERROR(srcSize_wrong); /* minimum possible size */
++	{
++		symbolEncodingType_e const LLtype = (symbolEncodingType_e)(*ip >> 6);
++		symbolEncodingType_e const OFtype = (symbolEncodingType_e)((*ip >> 4) & 3);
++		symbolEncodingType_e const MLtype = (symbolEncodingType_e)((*ip >> 2) & 3);
++		ip++;
++
++		/* Build DTables */
++		{
++			size_t const llhSize = ZSTD_buildSeqTable(dctx->entropy.LLTable, &dctx->LLTptr, LLtype, MaxLL, LLFSELog, ip, iend - ip,
++								  LL_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace));
++			if (ZSTD_isError(llhSize))
++				return ERROR(corruption_detected);
++			ip += llhSize;
++		}
++		{
++			size_t const ofhSize = ZSTD_buildSeqTable(dctx->entropy.OFTable, &dctx->OFTptr, OFtype, MaxOff, OffFSELog, ip, iend - ip,
++								  OF_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace));
++			if (ZSTD_isError(ofhSize))
++				return ERROR(corruption_detected);
++			ip += ofhSize;
++		}
++		{
++			size_t const mlhSize = ZSTD_buildSeqTable(dctx->entropy.MLTable, &dctx->MLTptr, MLtype, MaxML, MLFSELog, ip, iend - ip,
++								  ML_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace));
++			if (ZSTD_isError(mlhSize))
++				return ERROR(corruption_detected);
++			ip += mlhSize;
++		}
++	}
++
++	return ip - istart;
++}
++
++typedef struct {
++	size_t litLength;
++	size_t matchLength;
++	size_t offset;
++	const BYTE *match;
++} seq_t;
++
++typedef struct {
++	BIT_DStream_t DStream;
++	FSE_DState_t stateLL;
++	FSE_DState_t stateOffb;
++	FSE_DState_t stateML;
++	size_t prevOffset[ZSTD_REP_NUM];
++	const BYTE *base;
++	size_t pos;
++	uPtrDiff gotoDict;
++} seqState_t;
++
++FORCE_NOINLINE
++size_t ZSTD_execSequenceLast7(BYTE *op, BYTE *const oend, seq_t sequence, const BYTE **litPtr, const BYTE *const litLimit, const BYTE *const base,
++			      const BYTE *const vBase, const BYTE *const dictEnd)
++{
++	BYTE *const oLitEnd = op + sequence.litLength;
++	size_t const sequenceLength = sequence.litLength + sequence.matchLength;
++	BYTE *const oMatchEnd = op + sequenceLength; /* risk : address space overflow (32-bits) */
++	BYTE *const oend_w = oend - WILDCOPY_OVERLENGTH;
++	const BYTE *const iLitEnd = *litPtr + sequence.litLength;
++	const BYTE *match = oLitEnd - sequence.offset;
++
++	/* check */
++	if (oMatchEnd > oend)
++		return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
++	if (iLitEnd > litLimit)
++		return ERROR(corruption_detected); /* over-read beyond lit buffer */
++	if (oLitEnd <= oend_w)
++		return ERROR(GENERIC); /* Precondition */
++
++	/* copy literals */
++	if (op < oend_w) {
++		ZSTD_wildcopy(op, *litPtr, oend_w - op);
++		*litPtr += oend_w - op;
++		op = oend_w;
++	}
++	while (op < oLitEnd)
++		*op++ = *(*litPtr)++;
++
++	/* copy Match */
++	if (sequence.offset > (size_t)(oLitEnd - base)) {
++		/* offset beyond prefix */
++		if (sequence.offset > (size_t)(oLitEnd - vBase))
++			return ERROR(corruption_detected);
++		match = dictEnd - (base - match);
++		if (match + sequence.matchLength <= dictEnd) {
++			memmove(oLitEnd, match, sequence.matchLength);
++			return sequenceLength;
++		}
++		/* span extDict & currPrefixSegment */
++		{
++			size_t const length1 = dictEnd - match;
++			memmove(oLitEnd, match, length1);
++			op = oLitEnd + length1;
++			sequence.matchLength -= length1;
++			match = base;
++		}
++	}
++	while (op < oMatchEnd)
++		*op++ = *match++;
++	return sequenceLength;
++}
++
++static seq_t INIT ZSTD_decodeSequence(seqState_t *seqState)
++{
++	seq_t seq;
++
++	U32 const llCode = FSE_peekSymbol(&seqState->stateLL);
++	U32 const mlCode = FSE_peekSymbol(&seqState->stateML);
++	U32 const ofCode = FSE_peekSymbol(&seqState->stateOffb); /* <= maxOff, by table construction */
++
++	U32 const llBits = LL_bits[llCode];
++	U32 const mlBits = ML_bits[mlCode];
++	U32 const ofBits = ofCode;
++	U32 const totalBits = llBits + mlBits + ofBits;
++
++	static const U32 LL_base[MaxLL + 1] = {0,  1,  2,  3,  4,  5,  6,  7,  8,    9,     10,    11,    12,    13,     14,     15,     16,     18,
++					       20, 22, 24, 28, 32, 40, 48, 64, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000, 0x2000, 0x4000, 0x8000, 0x10000};
++
++	static const U32 ML_base[MaxML + 1] = {3,  4,  5,  6,  7,  8,  9,  10,   11,    12,    13,    14,    15,     16,     17,     18,     19,     20,
++					       21, 22, 23, 24, 25, 26, 27, 28,   29,    30,    31,    32,    33,     34,     35,     37,     39,     41,
++					       43, 47, 51, 59, 67, 83, 99, 0x83, 0x103, 0x203, 0x403, 0x803, 0x1003, 0x2003, 0x4003, 0x8003, 0x10003};
++
++	static const U32 OF_base[MaxOff + 1] = {0,       1,	1,	5,	0xD,      0x1D,      0x3D,      0x7D,      0xFD,     0x1FD,
++						0x3FD,   0x7FD,    0xFFD,    0x1FFD,   0x3FFD,   0x7FFD,    0xFFFD,    0x1FFFD,   0x3FFFD,  0x7FFFD,
++						0xFFFFD, 0x1FFFFD, 0x3FFFFD, 0x7FFFFD, 0xFFFFFD, 0x1FFFFFD, 0x3FFFFFD, 0x7FFFFFD, 0xFFFFFFD};
++
++	/* sequence */
++	{
++		size_t offset;
++		if (!ofCode)
++			offset = 0;
++		else {
++			offset = OF_base[ofCode] + BIT_readBitsFast(&seqState->DStream, ofBits); /* <=  (ZSTD_WINDOWLOG_MAX-1) bits */
++			if (ZSTD_32bits())
++				BIT_reloadDStream(&seqState->DStream);
++		}
++
++		if (ofCode <= 1) {
++			offset += (llCode == 0);
++			if (offset) {
++				size_t temp = (offset == 3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
++				temp += !temp; /* 0 is not valid; input is corrupted; force offset to 1 */
++				if (offset != 1)
++					seqState->prevOffset[2] = seqState->prevOffset[1];
++				seqState->prevOffset[1] = seqState->prevOffset[0];
++				seqState->prevOffset[0] = offset = temp;
++			} else {
++				offset = seqState->prevOffset[0];
++			}
++		} else {
++			seqState->prevOffset[2] = seqState->prevOffset[1];
++			seqState->prevOffset[1] = seqState->prevOffset[0];
++			seqState->prevOffset[0] = offset;
++		}
++		seq.offset = offset;
++	}
++
++	seq.matchLength = ML_base[mlCode] + ((mlCode > 31) ? BIT_readBitsFast(&seqState->DStream, mlBits) : 0); /* <=  16 bits */
++	if (ZSTD_32bits() && (mlBits + llBits > 24))
++		BIT_reloadDStream(&seqState->DStream);
++
++	seq.litLength = LL_base[llCode] + ((llCode > 15) ? BIT_readBitsFast(&seqState->DStream, llBits) : 0); /* <=  16 bits */
++	if (ZSTD_32bits() || (totalBits > 64 - 7 - (LLFSELog + MLFSELog + OffFSELog)))
++		BIT_reloadDStream(&seqState->DStream);
++
++	/* ANS state update */
++	FSE_updateState(&seqState->stateLL, &seqState->DStream); /* <=  9 bits */
++	FSE_updateState(&seqState->stateML, &seqState->DStream); /* <=  9 bits */
++	if (ZSTD_32bits())
++		BIT_reloadDStream(&seqState->DStream);		   /* <= 18 bits */
++	FSE_updateState(&seqState->stateOffb, &seqState->DStream); /* <=  8 bits */
++
++	seq.match = NULL;
++
++	return seq;
++}
++
++FORCE_INLINE
++size_t ZSTD_execSequence(BYTE *op, BYTE *const oend, seq_t sequence, const BYTE **litPtr, const BYTE *const litLimit, const BYTE *const base,
++			 const BYTE *const vBase, const BYTE *const dictEnd)
++{
++	BYTE *const oLitEnd = op + sequence.litLength;
++	size_t const sequenceLength = sequence.litLength + sequence.matchLength;
++	BYTE *const oMatchEnd = op + sequenceLength; /* risk : address space overflow (32-bits) */
++	BYTE *const oend_w = oend - WILDCOPY_OVERLENGTH;
++	const BYTE *const iLitEnd = *litPtr + sequence.litLength;
++	const BYTE *match = oLitEnd - sequence.offset;
++
++	/* check */
++	if (oMatchEnd > oend)
++		return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
++	if (iLitEnd > litLimit)
++		return ERROR(corruption_detected); /* over-read beyond lit buffer */
++	if (oLitEnd > oend_w)
++		return ZSTD_execSequenceLast7(op, oend, sequence, litPtr, litLimit, base, vBase, dictEnd);
++
++	/* copy Literals */
++	ZSTD_copy8(op, *litPtr);
++	if (sequence.litLength > 8)
++		ZSTD_wildcopy(op + 8, (*litPtr) + 8,
++			      sequence.litLength - 8); /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
++	op = oLitEnd;
++	*litPtr = iLitEnd; /* update for next sequence */
++
++	/* copy Match */
++	if (sequence.offset > (size_t)(oLitEnd - base)) {
++		/* offset beyond prefix */
++		if (sequence.offset > (size_t)(oLitEnd - vBase))
++			return ERROR(corruption_detected);
++		match = dictEnd + (match - base);
++		if (match + sequence.matchLength <= dictEnd) {
++			memmove(oLitEnd, match, sequence.matchLength);
++			return sequenceLength;
++		}
++		/* span extDict & currPrefixSegment */
++		{
++			size_t const length1 = dictEnd - match;
++			memmove(oLitEnd, match, length1);
++			op = oLitEnd + length1;
++			sequence.matchLength -= length1;
++			match = base;
++			if (op > oend_w || sequence.matchLength < MINMATCH) {
++				U32 i;
++				for (i = 0; i < sequence.matchLength; ++i)
++					op[i] = match[i];
++				return sequenceLength;
++			}
++		}
++	}
++	/* Requirement: op <= oend_w && sequence.matchLength >= MINMATCH */
++
++	/* match within prefix */
++	if (sequence.offset < 8) {
++		/* close range match, overlap */
++		static const U32 dec32table[] = {0, 1, 2, 1, 4, 4, 4, 4};   /* added */
++		static const int dec64table[] = {8, 8, 8, 7, 8, 9, 10, 11}; /* subtracted */
++		int const sub2 = dec64table[sequence.offset];
++		op[0] = match[0];
++		op[1] = match[1];
++		op[2] = match[2];
++		op[3] = match[3];
++		match += dec32table[sequence.offset];
++		ZSTD_copy4(op + 4, match);
++		match -= sub2;
++	} else {
++		ZSTD_copy8(op, match);
++	}
++	op += 8;
++	match += 8;
++
++	if (oMatchEnd > oend - (16 - MINMATCH)) {
++		if (op < oend_w) {
++			ZSTD_wildcopy(op, match, oend_w - op);
++			match += oend_w - op;
++			op = oend_w;
++		}
++		while (op < oMatchEnd)
++			*op++ = *match++;
++	} else {
++		ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength - 8); /* works even if matchLength < 8 */
++	}
++	return sequenceLength;
++}
++
++static size_t INIT ZSTD_decompressSequences(ZSTD_DCtx *dctx, void *dst, size_t maxDstSize, const void *seqStart, size_t seqSize)
++{
++	const BYTE *ip = (const BYTE *)seqStart;
++	const BYTE *const iend = ip + seqSize;
++	BYTE *const ostart = (BYTE * const)dst;
++	BYTE *const oend = ostart + maxDstSize;
++	BYTE *op = ostart;
++	const BYTE *litPtr = dctx->litPtr;
++	const BYTE *const litEnd = litPtr + dctx->litSize;
++	const BYTE *const base = (const BYTE *)(dctx->base);
++	const BYTE *const vBase = (const BYTE *)(dctx->vBase);
++	const BYTE *const dictEnd = (const BYTE *)(dctx->dictEnd);
++	int nbSeq;
++
++	/* Build Decoding Tables */
++	{
++		size_t const seqHSize = ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, seqSize);
++		if (ZSTD_isError(seqHSize))
++			return seqHSize;
++		ip += seqHSize;
++	}
++
++	/* Regen sequences */
++	if (nbSeq) {
++		seqState_t seqState;
++		dctx->fseEntropy = 1;
++		{
++			U32 i;
++			for (i = 0; i < ZSTD_REP_NUM; i++)
++				seqState.prevOffset[i] = dctx->entropy.rep[i];
++		}
++		CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend - ip), corruption_detected);
++		FSE_initDState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
++		FSE_initDState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
++		FSE_initDState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
++
++		for (; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && nbSeq;) {
++			nbSeq--;
++			{
++				seq_t const sequence = ZSTD_decodeSequence(&seqState);
++				size_t const oneSeqSize = ZSTD_execSequence(op, oend, sequence, &litPtr, litEnd, base, vBase, dictEnd);
++				if (ZSTD_isError(oneSeqSize))
++					return oneSeqSize;
++				op += oneSeqSize;
++			}
++		}
++
++		/* check if reached exact end */
++		if (nbSeq)
++			return ERROR(corruption_detected);
++		/* save reps for next block */
++		{
++			U32 i;
++			for (i = 0; i < ZSTD_REP_NUM; i++)
++				dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]);
++		}
++	}
++
++	/* last literal segment */
++	{
++		size_t const lastLLSize = litEnd - litPtr;
++		if (lastLLSize > (size_t)(oend - op))
++			return ERROR(dstSize_tooSmall);
++		memcpy(op, litPtr, lastLLSize);
++		op += lastLLSize;
++	}
++
++	return op - ostart;
++}
++
++FORCE_INLINE seq_t ZSTD_decodeSequenceLong_generic(seqState_t *seqState, int const longOffsets)
++{
++	seq_t seq;
++
++	U32 const llCode = FSE_peekSymbol(&seqState->stateLL);
++	U32 const mlCode = FSE_peekSymbol(&seqState->stateML);
++	U32 const ofCode = FSE_peekSymbol(&seqState->stateOffb); /* <= maxOff, by table construction */
++
++	U32 const llBits = LL_bits[llCode];
++	U32 const mlBits = ML_bits[mlCode];
++	U32 const ofBits = ofCode;
++	U32 const totalBits = llBits + mlBits + ofBits;
++
++	static const U32 LL_base[MaxLL + 1] = {0,  1,  2,  3,  4,  5,  6,  7,  8,    9,     10,    11,    12,    13,     14,     15,     16,     18,
++					       20, 22, 24, 28, 32, 40, 48, 64, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000, 0x2000, 0x4000, 0x8000, 0x10000};
++
++	static const U32 ML_base[MaxML + 1] = {3,  4,  5,  6,  7,  8,  9,  10,   11,    12,    13,    14,    15,     16,     17,     18,     19,     20,
++					       21, 22, 23, 24, 25, 26, 27, 28,   29,    30,    31,    32,    33,     34,     35,     37,     39,     41,
++					       43, 47, 51, 59, 67, 83, 99, 0x83, 0x103, 0x203, 0x403, 0x803, 0x1003, 0x2003, 0x4003, 0x8003, 0x10003};
++
++	static const U32 OF_base[MaxOff + 1] = {0,       1,	1,	5,	0xD,      0x1D,      0x3D,      0x7D,      0xFD,     0x1FD,
++						0x3FD,   0x7FD,    0xFFD,    0x1FFD,   0x3FFD,   0x7FFD,    0xFFFD,    0x1FFFD,   0x3FFFD,  0x7FFFD,
++						0xFFFFD, 0x1FFFFD, 0x3FFFFD, 0x7FFFFD, 0xFFFFFD, 0x1FFFFFD, 0x3FFFFFD, 0x7FFFFFD, 0xFFFFFFD};
++
++	/* sequence */
++	{
++		size_t offset;
++		if (!ofCode)
++			offset = 0;
++		else {
++			if (longOffsets) {
++				int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN);
++				offset = OF_base[ofCode] + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
++				if (ZSTD_32bits() || extraBits)
++					BIT_reloadDStream(&seqState->DStream);
++				if (extraBits)
++					offset += BIT_readBitsFast(&seqState->DStream, extraBits);
++			} else {
++				offset = OF_base[ofCode] + BIT_readBitsFast(&seqState->DStream, ofBits); /* <=  (ZSTD_WINDOWLOG_MAX-1) bits */
++				if (ZSTD_32bits())
++					BIT_reloadDStream(&seqState->DStream);
++			}
++		}
++
++		if (ofCode <= 1) {
++			offset += (llCode == 0);
++			if (offset) {
++				size_t temp = (offset == 3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
++				temp += !temp; /* 0 is not valid; input is corrupted; force offset to 1 */
++				if (offset != 1)
++					seqState->prevOffset[2] = seqState->prevOffset[1];
++				seqState->prevOffset[1] = seqState->prevOffset[0];
++				seqState->prevOffset[0] = offset = temp;
++			} else {
++				offset = seqState->prevOffset[0];
++			}
++		} else {
++			seqState->prevOffset[2] = seqState->prevOffset[1];
++			seqState->prevOffset[1] = seqState->prevOffset[0];
++			seqState->prevOffset[0] = offset;
++		}
++		seq.offset = offset;
++	}
++
++	seq.matchLength = ML_base[mlCode] + ((mlCode > 31) ? BIT_readBitsFast(&seqState->DStream, mlBits) : 0); /* <=  16 bits */
++	if (ZSTD_32bits() && (mlBits + llBits > 24))
++		BIT_reloadDStream(&seqState->DStream);
++
++	seq.litLength = LL_base[llCode] + ((llCode > 15) ? BIT_readBitsFast(&seqState->DStream, llBits) : 0); /* <=  16 bits */
++	if (ZSTD_32bits() || (totalBits > 64 - 7 - (LLFSELog + MLFSELog + OffFSELog)))
++		BIT_reloadDStream(&seqState->DStream);
++
++	{
++		size_t const pos = seqState->pos + seq.litLength;
++		seq.match = seqState->base + pos - seq.offset; /* single memory segment */
++		if (seq.offset > pos)
++			seq.match += seqState->gotoDict; /* separate memory segment */
++		seqState->pos = pos + seq.matchLength;
++	}
++
++	/* ANS state update */
++	FSE_updateState(&seqState->stateLL, &seqState->DStream); /* <=  9 bits */
++	FSE_updateState(&seqState->stateML, &seqState->DStream); /* <=  9 bits */
++	if (ZSTD_32bits())
++		BIT_reloadDStream(&seqState->DStream);		   /* <= 18 bits */
++	FSE_updateState(&seqState->stateOffb, &seqState->DStream); /* <=  8 bits */
++
++	return seq;
++}
++
++static seq_t INIT ZSTD_decodeSequenceLong(seqState_t *seqState, unsigned const windowSize)
++{
++	if (ZSTD_highbit32(windowSize) > STREAM_ACCUMULATOR_MIN) {
++		return ZSTD_decodeSequenceLong_generic(seqState, 1);
++	} else {
++		return ZSTD_decodeSequenceLong_generic(seqState, 0);
++	}
++}
++
++FORCE_INLINE
++size_t INIT ZSTD_execSequenceLong(BYTE *op, BYTE *const oend, seq_t sequence, const BYTE **litPtr,
++				  const BYTE *const litLimit, const BYTE *const base,
++				  const BYTE *const vBase, const BYTE *const dictEnd)
++{
++	BYTE *const oLitEnd = op + sequence.litLength;
++	size_t const sequenceLength = sequence.litLength + sequence.matchLength;
++	BYTE *const oMatchEnd = op + sequenceLength; /* risk : address space overflow (32-bits) */
++	BYTE *const oend_w = oend - WILDCOPY_OVERLENGTH;
++	const BYTE *const iLitEnd = *litPtr + sequence.litLength;
++	const BYTE *match = sequence.match;
++
++	/* check */
++	if (oMatchEnd > oend)
++		return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
++	if (iLitEnd > litLimit)
++		return ERROR(corruption_detected); /* over-read beyond lit buffer */
++	if (oLitEnd > oend_w)
++		return ZSTD_execSequenceLast7(op, oend, sequence, litPtr, litLimit, base, vBase, dictEnd);
++
++	/* copy Literals */
++	ZSTD_copy8(op, *litPtr);
++	if (sequence.litLength > 8)
++		ZSTD_wildcopy(op + 8, (*litPtr) + 8,
++			      sequence.litLength - 8); /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
++	op = oLitEnd;
++	*litPtr = iLitEnd; /* update for next sequence */
++
++	/* copy Match */
++	if (sequence.offset > (size_t)(oLitEnd - base)) {
++		/* offset beyond prefix */
++		if (sequence.offset > (size_t)(oLitEnd - vBase))
++			return ERROR(corruption_detected);
++		if (match + sequence.matchLength <= dictEnd) {
++			memmove(oLitEnd, match, sequence.matchLength);
++			return sequenceLength;
++		}
++		/* span extDict & currPrefixSegment */
++		{
++			size_t const length1 = dictEnd - match;
++			memmove(oLitEnd, match, length1);
++			op = oLitEnd + length1;
++			sequence.matchLength -= length1;
++			match = base;
++			if (op > oend_w || sequence.matchLength < MINMATCH) {
++				U32 i;
++				for (i = 0; i < sequence.matchLength; ++i)
++					op[i] = match[i];
++				return sequenceLength;
++			}
++		}
++	}
++	/* Requirement: op <= oend_w && sequence.matchLength >= MINMATCH */
++
++	/* match within prefix */
++	if (sequence.offset < 8) {
++		/* close range match, overlap */
++		static const U32 dec32table[] = {0, 1, 2, 1, 4, 4, 4, 4};   /* added */
++		static const int dec64table[] = {8, 8, 8, 7, 8, 9, 10, 11}; /* subtracted */
++		int const sub2 = dec64table[sequence.offset];
++		op[0] = match[0];
++		op[1] = match[1];
++		op[2] = match[2];
++		op[3] = match[3];
++		match += dec32table[sequence.offset];
++		ZSTD_copy4(op + 4, match);
++		match -= sub2;
++	} else {
++		ZSTD_copy8(op, match);
++	}
++	op += 8;
++	match += 8;
++
++	if (oMatchEnd > oend - (16 - MINMATCH)) {
++		if (op < oend_w) {
++			ZSTD_wildcopy(op, match, oend_w - op);
++			match += oend_w - op;
++			op = oend_w;
++		}
++		while (op < oMatchEnd)
++			*op++ = *match++;
++	} else {
++		ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength - 8); /* works even if matchLength < 8 */
++	}
++	return sequenceLength;
++}
++
++static size_t INIT ZSTD_decompressSequencesLong(ZSTD_DCtx *dctx, void *dst, size_t maxDstSize, const void *seqStart, size_t seqSize)
++{
++	const BYTE *ip = (const BYTE *)seqStart;
++	const BYTE *const iend = ip + seqSize;
++	BYTE *const ostart = (BYTE * const)dst;
++	BYTE *const oend = ostart + maxDstSize;
++	BYTE *op = ostart;
++	const BYTE *litPtr = dctx->litPtr;
++	const BYTE *const litEnd = litPtr + dctx->litSize;
++	const BYTE *const base = (const BYTE *)(dctx->base);
++	const BYTE *const vBase = (const BYTE *)(dctx->vBase);
++	const BYTE *const dictEnd = (const BYTE *)(dctx->dictEnd);
++	unsigned const windowSize = dctx->fParams.windowSize;
++	int nbSeq;
++
++	/* Build Decoding Tables */
++	{
++		size_t const seqHSize = ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, seqSize);
++		if (ZSTD_isError(seqHSize))
++			return seqHSize;
++		ip += seqHSize;
++	}
++
++	/* Regen sequences */
++	if (nbSeq) {
++#define STORED_SEQS 4
++#define STOSEQ_MASK (STORED_SEQS - 1)
++#define ADVANCED_SEQS 4
++		seq_t *sequences = (seq_t *)dctx->entropy.workspace;
++		int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS);
++		seqState_t seqState;
++		int seqNb;
++		ZSTD_STATIC_ASSERT(sizeof(dctx->entropy.workspace) >= sizeof(seq_t) * STORED_SEQS);
++		dctx->fseEntropy = 1;
++		{
++			U32 i;
++			for (i = 0; i < ZSTD_REP_NUM; i++)
++				seqState.prevOffset[i] = dctx->entropy.rep[i];
++		}
++		seqState.base = base;
++		seqState.pos = (size_t)(op - base);
++		seqState.gotoDict = (uPtrDiff)dictEnd - (uPtrDiff)base; /* cast to avoid undefined behaviour */
++		CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend - ip), corruption_detected);
++		FSE_initDState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
++		FSE_initDState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
++		FSE_initDState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
++
++		/* prepare in advance */
++		for (seqNb = 0; (BIT_reloadDStream(&seqState.DStream) <= BIT_DStream_completed) && seqNb < seqAdvance; seqNb++) {
++			sequences[seqNb] = ZSTD_decodeSequenceLong(&seqState, windowSize);
++		}
++		if (seqNb < seqAdvance)
++			return ERROR(corruption_detected);
++
++		/* decode and decompress */
++		for (; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && seqNb < nbSeq; seqNb++) {
++			seq_t const sequence = ZSTD_decodeSequenceLong(&seqState, windowSize);
++			size_t const oneSeqSize =
++			    ZSTD_execSequenceLong(op, oend, sequences[(seqNb - ADVANCED_SEQS) & STOSEQ_MASK], &litPtr, litEnd, base, vBase, dictEnd);
++			if (ZSTD_isError(oneSeqSize))
++				return oneSeqSize;
++			ZSTD_PREFETCH(sequence.match);
++			sequences[seqNb & STOSEQ_MASK] = sequence;
++			op += oneSeqSize;
++		}
++		if (seqNb < nbSeq)
++			return ERROR(corruption_detected);
++
++		/* finish queue */
++		seqNb -= seqAdvance;
++		for (; seqNb < nbSeq; seqNb++) {
++			size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[seqNb & STOSEQ_MASK], &litPtr, litEnd, base, vBase, dictEnd);
++			if (ZSTD_isError(oneSeqSize))
++				return oneSeqSize;
++			op += oneSeqSize;
++		}
++
++		/* save reps for next block */
++		{
++			U32 i;
++			for (i = 0; i < ZSTD_REP_NUM; i++)
++				dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]);
++		}
++	}
++
++	/* last literal segment */
++	{
++		size_t const lastLLSize = litEnd - litPtr;
++		if (lastLLSize > (size_t)(oend - op))
++			return ERROR(dstSize_tooSmall);
++		memcpy(op, litPtr, lastLLSize);
++		op += lastLLSize;
++	}
++
++	return op - ostart;
++}
++
++static size_t INIT ZSTD_decompressBlock_internal(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
++{ /* blockType == blockCompressed */
++	const BYTE *ip = (const BYTE *)src;
++
++	if (srcSize >= ZSTD_BLOCKSIZE_ABSOLUTEMAX)
++		return ERROR(srcSize_wrong);
++
++	/* Decode literals section */
++	{
++		size_t const litCSize = ZSTD_decodeLiteralsBlock(dctx, src, srcSize);
++		if (ZSTD_isError(litCSize))
++			return litCSize;
++		ip += litCSize;
++		srcSize -= litCSize;
++	}
++	if (sizeof(size_t) > 4) /* do not enable prefetching on 32-bits x86, as it's performance detrimental */
++				/* likely because of register pressure */
++				/* if that's the correct cause, then 32-bits ARM should be affected differently */
++				/* it would be good to test this on ARM real hardware, to see if prefetch version improves speed */
++		if (dctx->fParams.windowSize > (1 << 23))
++			return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip, srcSize);
++	return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, srcSize);
++}
++
++static void INIT ZSTD_checkContinuity(ZSTD_DCtx *dctx, const void *dst)
++{
++	if (dst != dctx->previousDstEnd) { /* not contiguous */
++		dctx->dictEnd = dctx->previousDstEnd;
++		dctx->vBase = (const char *)dst - ((const char *)(dctx->previousDstEnd) - (const char *)(dctx->base));
++		dctx->base = dst;
++		dctx->previousDstEnd = dst;
++	}
++}
++
++size_t INIT ZSTD_decompressBlock(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
++{
++	size_t dSize;
++	ZSTD_checkContinuity(dctx, dst);
++	dSize = ZSTD_decompressBlock_internal(dctx, dst, dstCapacity, src, srcSize);
++	dctx->previousDstEnd = (char *)dst + dSize;
++	return dSize;
++}
++
++/** ZSTD_insertBlock() :
++	insert `src` block into `dctx` history. Useful to track uncompressed blocks. */
++size_t INIT ZSTD_insertBlock(ZSTD_DCtx *dctx, const void *blockStart, size_t blockSize)
++{
++	ZSTD_checkContinuity(dctx, blockStart);
++	dctx->previousDstEnd = (const char *)blockStart + blockSize;
++	return blockSize;
++}
++
++size_t INIT ZSTD_generateNxBytes(void *dst, size_t dstCapacity, BYTE byte, size_t length)
++{
++	if (length > dstCapacity)
++		return ERROR(dstSize_tooSmall);
++	memset(dst, byte, length);
++	return length;
++}
++
++/** ZSTD_findFrameCompressedSize() :
++ *  compatible with legacy mode
++ *  `src` must point to the start of a ZSTD frame, ZSTD legacy frame, or skippable frame
++ *  `srcSize` must be at least as large as the frame contained
++ *  @return : the compressed size of the frame starting at `src` */
++size_t INIT ZSTD_findFrameCompressedSize(const void *src, size_t srcSize)
++{
++	if (srcSize >= ZSTD_skippableHeaderSize && (ZSTD_readLE32(src) & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {
++		return ZSTD_skippableHeaderSize + ZSTD_readLE32((const BYTE *)src + 4);
++	} else {
++		const BYTE *ip = (const BYTE *)src;
++		const BYTE *const ipstart = ip;
++		size_t remainingSize = srcSize;
++		ZSTD_frameParams fParams;
++
++		size_t const headerSize = ZSTD_frameHeaderSize(ip, remainingSize);
++		if (ZSTD_isError(headerSize))
++			return headerSize;
++
++		/* Frame Header */
++		{
++			size_t const ret = ZSTD_getFrameParams(&fParams, ip, remainingSize);
++			if (ZSTD_isError(ret))
++				return ret;
++			if (ret > 0)
++				return ERROR(srcSize_wrong);
++		}
++
++		ip += headerSize;
++		remainingSize -= headerSize;
++
++		/* Loop on each block */
++		while (1) {
++			blockProperties_t blockProperties;
++			size_t const cBlockSize = ZSTD_getcBlockSize(ip, remainingSize, &blockProperties);
++			if (ZSTD_isError(cBlockSize))
++				return cBlockSize;
++
++			if (ZSTD_blockHeaderSize + cBlockSize > remainingSize)
++				return ERROR(srcSize_wrong);
++
++			ip += ZSTD_blockHeaderSize + cBlockSize;
++			remainingSize -= ZSTD_blockHeaderSize + cBlockSize;
++
++			if (blockProperties.lastBlock)
++				break;
++		}
++
++		if (fParams.checksumFlag) { /* Frame content checksum */
++			if (remainingSize < 4)
++				return ERROR(srcSize_wrong);
++			ip += 4;
++			remainingSize -= 4;
++		}
++
++		return ip - ipstart;
++	}
++}
++
++/*! ZSTD_decompressFrame() :
++*   @dctx must be properly initialized */
++static size_t INIT ZSTD_decompressFrame(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void **srcPtr, size_t *srcSizePtr)
++{
++	const BYTE *ip = (const BYTE *)(*srcPtr);
++	BYTE *const ostart = (BYTE * const)dst;
++	BYTE *const oend = ostart + dstCapacity;
++	BYTE *op = ostart;
++	size_t remainingSize = *srcSizePtr;
++
++	/* check */
++	if (remainingSize < ZSTD_frameHeaderSize_min + ZSTD_blockHeaderSize)
++		return ERROR(srcSize_wrong);
++
++	/* Frame Header */
++	{
++		size_t const frameHeaderSize = ZSTD_frameHeaderSize(ip, ZSTD_frameHeaderSize_prefix);
++		if (ZSTD_isError(frameHeaderSize))
++			return frameHeaderSize;
++		if (remainingSize < frameHeaderSize + ZSTD_blockHeaderSize)
++			return ERROR(srcSize_wrong);
++		CHECK_F(ZSTD_decodeFrameHeader(dctx, ip, frameHeaderSize));
++		ip += frameHeaderSize;
++		remainingSize -= frameHeaderSize;
++	}
++
++	/* Loop on each block */
++	while (1) {
++		size_t decodedSize;
++		blockProperties_t blockProperties;
++		size_t const cBlockSize = ZSTD_getcBlockSize(ip, remainingSize, &blockProperties);
++		if (ZSTD_isError(cBlockSize))
++			return cBlockSize;
++
++		ip += ZSTD_blockHeaderSize;
++		remainingSize -= ZSTD_blockHeaderSize;
++		if (cBlockSize > remainingSize)
++			return ERROR(srcSize_wrong);
++
++		switch (blockProperties.blockType) {
++		case bt_compressed: decodedSize = ZSTD_decompressBlock_internal(dctx, op, oend - op, ip, cBlockSize); break;
++		case bt_raw: decodedSize = ZSTD_copyRawBlock(op, oend - op, ip, cBlockSize); break;
++		case bt_rle: decodedSize = ZSTD_generateNxBytes(op, oend - op, *ip, blockProperties.origSize); break;
++		case bt_reserved:
++		default: return ERROR(corruption_detected);
++		}
++
++		if (ZSTD_isError(decodedSize))
++			return decodedSize;
++		if (dctx->fParams.checksumFlag)
++			xxh64_update(&dctx->xxhState, op, decodedSize);
++		op += decodedSize;
++		ip += cBlockSize;
++		remainingSize -= cBlockSize;
++		if (blockProperties.lastBlock)
++			break;
++	}
++
++	if (dctx->fParams.checksumFlag) { /* Frame content checksum verification */
++		U32 const checkCalc = (U32)xxh64_digest(&dctx->xxhState);
++		U32 checkRead;
++		if (remainingSize < 4)
++			return ERROR(checksum_wrong);
++		checkRead = ZSTD_readLE32(ip);
++		if (checkRead != checkCalc)
++			return ERROR(checksum_wrong);
++		ip += 4;
++		remainingSize -= 4;
++	}
++
++	/* Allow caller to get size read */
++	*srcPtr = ip;
++	*srcSizePtr = remainingSize;
++	return op - ostart;
++}
++
++static const void *ZSTD_DDictDictContent(const ZSTD_DDict *ddict);
++static size_t ZSTD_DDictDictSize(const ZSTD_DDict *ddict);
++
++static size_t INIT ZSTD_decompressMultiFrame(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const void *dict, size_t dictSize,
++					const ZSTD_DDict *ddict)
++{
++	void *const dststart = dst;
++
++	if (ddict) {
++		if (dict) {
++			/* programmer error, these two cases should be mutually exclusive */
++			return ERROR(GENERIC);
++		}
++
++		dict = ZSTD_DDictDictContent(ddict);
++		dictSize = ZSTD_DDictDictSize(ddict);
++	}
++
++	while (srcSize >= ZSTD_frameHeaderSize_prefix) {
++		U32 magicNumber;
++
++		magicNumber = ZSTD_readLE32(src);
++		if (magicNumber != ZSTD_MAGICNUMBER) {
++			if ((magicNumber & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {
++				size_t skippableSize;
++				if (srcSize < ZSTD_skippableHeaderSize)
++					return ERROR(srcSize_wrong);
++				skippableSize = ZSTD_readLE32((const BYTE *)src + 4) + ZSTD_skippableHeaderSize;
++				if (srcSize < skippableSize) {
++					return ERROR(srcSize_wrong);
++				}
++
++				src = (const BYTE *)src + skippableSize;
++				srcSize -= skippableSize;
++				continue;
++			} else {
++				return ERROR(prefix_unknown);
++			}
++		}
++
++		if (ddict) {
++			/* we were called from ZSTD_decompress_usingDDict */
++			ZSTD_refDDict(dctx, ddict);
++		} else {
++			/* this will initialize correctly with no dict if dict == NULL, so
++			 * use this in all cases but ddict */
++			CHECK_F(ZSTD_decompressBegin_usingDict(dctx, dict, dictSize));
++		}
++		ZSTD_checkContinuity(dctx, dst);
++
++		{
++			const size_t res = ZSTD_decompressFrame(dctx, dst, dstCapacity, &src, &srcSize);
++			if (ZSTD_isError(res))
++				return res;
++			/* don't need to bounds check this, ZSTD_decompressFrame will have
++			 * already */
++			dst = (BYTE *)dst + res;
++			dstCapacity -= res;
++		}
++	}
++
++	if (srcSize)
++		return ERROR(srcSize_wrong); /* input not entirely consumed */
++
++	return (BYTE *)dst - (BYTE *)dststart;
++}
++
++size_t INIT ZSTD_decompress_usingDict(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const void *dict, size_t dictSize)
++{
++	return ZSTD_decompressMultiFrame(dctx, dst, dstCapacity, src, srcSize, dict, dictSize, NULL);
++}
++
++size_t INIT ZSTD_decompressDCtx(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
++{
++	return ZSTD_decompress_usingDict(dctx, dst, dstCapacity, src, srcSize, NULL, 0);
++}
++
++/*-**************************************
++*   Advanced Streaming Decompression API
++*   Bufferless and synchronous
++****************************************/
++size_t INIT ZSTD_nextSrcSizeToDecompress(ZSTD_DCtx *dctx) { return dctx->expected; }
++
++ZSTD_nextInputType_e INIT ZSTD_nextInputType(ZSTD_DCtx *dctx)
++{
++	switch (dctx->stage) {
++	default: /* should not happen */
++	case ZSTDds_getFrameHeaderSize:
++	case ZSTDds_decodeFrameHeader: return ZSTDnit_frameHeader;
++	case ZSTDds_decodeBlockHeader: return ZSTDnit_blockHeader;
++	case ZSTDds_decompressBlock: return ZSTDnit_block;
++	case ZSTDds_decompressLastBlock: return ZSTDnit_lastBlock;
++	case ZSTDds_checkChecksum: return ZSTDnit_checksum;
++	case ZSTDds_decodeSkippableHeader:
++	case ZSTDds_skipFrame: return ZSTDnit_skippableFrame;
++	}
++}
++
++int INIT ZSTD_isSkipFrame(ZSTD_DCtx *dctx) { return dctx->stage == ZSTDds_skipFrame; } /* for zbuff */
++
++/** ZSTD_decompressContinue() :
++*   @return : nb of bytes generated into `dst` (necessarily <= `dstCapacity)
++*             or an error code, which can be tested using ZSTD_isError() */
++size_t INIT ZSTD_decompressContinue(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
++{
++	/* Sanity check */
++	if (srcSize != dctx->expected)
++		return ERROR(srcSize_wrong);
++	if (dstCapacity)
++		ZSTD_checkContinuity(dctx, dst);
++
++	switch (dctx->stage) {
++	case ZSTDds_getFrameHeaderSize:
++		if (srcSize != ZSTD_frameHeaderSize_prefix)
++			return ERROR(srcSize_wrong);					/* impossible */
++		if ((ZSTD_readLE32(src) & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) { /* skippable frame */
++			memcpy(dctx->headerBuffer, src, ZSTD_frameHeaderSize_prefix);
++			dctx->expected = ZSTD_skippableHeaderSize - ZSTD_frameHeaderSize_prefix; /* magic number + skippable frame length */
++			dctx->stage = ZSTDds_decodeSkippableHeader;
++			return 0;
++		}
++		dctx->headerSize = ZSTD_frameHeaderSize(src, ZSTD_frameHeaderSize_prefix);
++		if (ZSTD_isError(dctx->headerSize))
++			return dctx->headerSize;
++		memcpy(dctx->headerBuffer, src, ZSTD_frameHeaderSize_prefix);
++		if (dctx->headerSize > ZSTD_frameHeaderSize_prefix) {
++			dctx->expected = dctx->headerSize - ZSTD_frameHeaderSize_prefix;
++			dctx->stage = ZSTDds_decodeFrameHeader;
++			return 0;
++		}
++		dctx->expected = 0; /* not necessary to copy more */
++		/* fallthrough */
++
++	case ZSTDds_decodeFrameHeader:
++		memcpy(dctx->headerBuffer + ZSTD_frameHeaderSize_prefix, src, dctx->expected);
++		CHECK_F(ZSTD_decodeFrameHeader(dctx, dctx->headerBuffer, dctx->headerSize));
++		dctx->expected = ZSTD_blockHeaderSize;
++		dctx->stage = ZSTDds_decodeBlockHeader;
++		return 0;
++
++	case ZSTDds_decodeBlockHeader: {
++		blockProperties_t bp;
++		size_t const cBlockSize = ZSTD_getcBlockSize(src, ZSTD_blockHeaderSize, &bp);
++		if (ZSTD_isError(cBlockSize))
++			return cBlockSize;
++		dctx->expected = cBlockSize;
++		dctx->bType = bp.blockType;
++		dctx->rleSize = bp.origSize;
++		if (cBlockSize) {
++			dctx->stage = bp.lastBlock ? ZSTDds_decompressLastBlock : ZSTDds_decompressBlock;
++			return 0;
++		}
++		/* empty block */
++		if (bp.lastBlock) {
++			if (dctx->fParams.checksumFlag) {
++				dctx->expected = 4;
++				dctx->stage = ZSTDds_checkChecksum;
++			} else {
++				dctx->expected = 0; /* end of frame */
++				dctx->stage = ZSTDds_getFrameHeaderSize;
++			}
++		} else {
++			dctx->expected = 3; /* go directly to next header */
++			dctx->stage = ZSTDds_decodeBlockHeader;
++		}
++		return 0;
++	}
++	case ZSTDds_decompressLastBlock:
++	case ZSTDds_decompressBlock: {
++		size_t rSize;
++		switch (dctx->bType) {
++		case bt_compressed: rSize = ZSTD_decompressBlock_internal(dctx, dst, dstCapacity, src, srcSize); break;
++		case bt_raw: rSize = ZSTD_copyRawBlock(dst, dstCapacity, src, srcSize); break;
++		case bt_rle: rSize = ZSTD_setRleBlock(dst, dstCapacity, src, srcSize, dctx->rleSize); break;
++		case bt_reserved: /* should never happen */
++		default: return ERROR(corruption_detected);
++		}
++		if (ZSTD_isError(rSize))
++			return rSize;
++		if (dctx->fParams.checksumFlag)
++			xxh64_update(&dctx->xxhState, dst, rSize);
++
++		if (dctx->stage == ZSTDds_decompressLastBlock) { /* end of frame */
++			if (dctx->fParams.checksumFlag) {	/* another round for frame checksum */
++				dctx->expected = 4;
++				dctx->stage = ZSTDds_checkChecksum;
++			} else {
++				dctx->expected = 0; /* ends here */
++				dctx->stage = ZSTDds_getFrameHeaderSize;
++			}
++		} else {
++			dctx->stage = ZSTDds_decodeBlockHeader;
++			dctx->expected = ZSTD_blockHeaderSize;
++			dctx->previousDstEnd = (char *)dst + rSize;
++		}
++		return rSize;
++	}
++	case ZSTDds_checkChecksum: {
++		U32 const h32 = (U32)xxh64_digest(&dctx->xxhState);
++		U32 const check32 = ZSTD_readLE32(src); /* srcSize == 4, guaranteed by dctx->expected */
++		if (check32 != h32)
++			return ERROR(checksum_wrong);
++		dctx->expected = 0;
++		dctx->stage = ZSTDds_getFrameHeaderSize;
++		return 0;
++	}
++	case ZSTDds_decodeSkippableHeader: {
++		memcpy(dctx->headerBuffer + ZSTD_frameHeaderSize_prefix, src, dctx->expected);
++		dctx->expected = ZSTD_readLE32(dctx->headerBuffer + 4);
++		dctx->stage = ZSTDds_skipFrame;
++		return 0;
++	}
++	case ZSTDds_skipFrame: {
++		dctx->expected = 0;
++		dctx->stage = ZSTDds_getFrameHeaderSize;
++		return 0;
++	}
++	default:
++		return ERROR(GENERIC); /* impossible */
++	}
++}
++
++static size_t INIT ZSTD_refDictContent(ZSTD_DCtx *dctx, const void *dict, size_t dictSize)
++{
++	dctx->dictEnd = dctx->previousDstEnd;
++	dctx->vBase = (const char *)dict - ((const char *)(dctx->previousDstEnd) - (const char *)(dctx->base));
++	dctx->base = dict;
++	dctx->previousDstEnd = (const char *)dict + dictSize;
++	return 0;
++}
++
++/* ZSTD_loadEntropy() :
++ * dict : must point at beginning of a valid zstd dictionary
++ * @return : size of entropy tables read */
++static size_t INIT ZSTD_loadEntropy(ZSTD_entropyTables_t *entropy, const void *const dict, size_t const dictSize)
++{
++	const BYTE *dictPtr = (const BYTE *)dict;
++	const BYTE *const dictEnd = dictPtr + dictSize;
++
++	if (dictSize <= 8)
++		return ERROR(dictionary_corrupted);
++	dictPtr += 8; /* skip header = magic + dictID */
++
++	{
++		size_t const hSize = HUF_readDTableX4_wksp(entropy->hufTable, dictPtr, dictEnd - dictPtr, entropy->workspace, sizeof(entropy->workspace));
++		if (HUF_isError(hSize))
++			return ERROR(dictionary_corrupted);
++		dictPtr += hSize;
++	}
++
++	{
++		short offcodeNCount[MaxOff + 1];
++		U32 offcodeMaxValue = MaxOff, offcodeLog;
++		size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dictPtr, dictEnd - dictPtr);
++		if (FSE_isError(offcodeHeaderSize))
++			return ERROR(dictionary_corrupted);
++		if (offcodeLog > OffFSELog)
++			return ERROR(dictionary_corrupted);
++		CHECK_E(FSE_buildDTable_wksp(entropy->OFTable, offcodeNCount, offcodeMaxValue, offcodeLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted);
++		dictPtr += offcodeHeaderSize;
++	}
++
++	{
++		short matchlengthNCount[MaxML + 1];
++		unsigned matchlengthMaxValue = MaxML, matchlengthLog;
++		size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd - dictPtr);
++		if (FSE_isError(matchlengthHeaderSize))
++			return ERROR(dictionary_corrupted);
++		if (matchlengthLog > MLFSELog)
++			return ERROR(dictionary_corrupted);
++		CHECK_E(FSE_buildDTable_wksp(entropy->MLTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted);
++		dictPtr += matchlengthHeaderSize;
++	}
++
++	{
++		short litlengthNCount[MaxLL + 1];
++		unsigned litlengthMaxValue = MaxLL, litlengthLog;
++		size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd - dictPtr);
++		if (FSE_isError(litlengthHeaderSize))
++			return ERROR(dictionary_corrupted);
++		if (litlengthLog > LLFSELog)
++			return ERROR(dictionary_corrupted);
++		CHECK_E(FSE_buildDTable_wksp(entropy->LLTable, litlengthNCount, litlengthMaxValue, litlengthLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted);
++		dictPtr += litlengthHeaderSize;
++	}
++
++	if (dictPtr + 12 > dictEnd)
++		return ERROR(dictionary_corrupted);
++	{
++		int i;
++		size_t const dictContentSize = (size_t)(dictEnd - (dictPtr + 12));
++		for (i = 0; i < 3; i++) {
++			U32 const rep = ZSTD_readLE32(dictPtr);
++			dictPtr += 4;
++			if (rep == 0 || rep >= dictContentSize)
++				return ERROR(dictionary_corrupted);
++			entropy->rep[i] = rep;
++		}
++	}
++
++	return dictPtr - (const BYTE *)dict;
++}
++
++static size_t INIT ZSTD_decompress_insertDictionary(ZSTD_DCtx *dctx, const void *dict, size_t dictSize)
++{
++	if (dictSize < 8)
++		return ZSTD_refDictContent(dctx, dict, dictSize);
++	{
++		U32 const magic = ZSTD_readLE32(dict);
++		if (magic != ZSTD_DICT_MAGIC) {
++			return ZSTD_refDictContent(dctx, dict, dictSize); /* pure content mode */
++		}
++	}
++	dctx->dictID = ZSTD_readLE32((const char *)dict + 4);
++
++	/* load entropy tables */
++	{
++		size_t const eSize = ZSTD_loadEntropy(&dctx->entropy, dict, dictSize);
++		if (ZSTD_isError(eSize))
++			return ERROR(dictionary_corrupted);
++		dict = (const char *)dict + eSize;
++		dictSize -= eSize;
++	}
++	dctx->litEntropy = dctx->fseEntropy = 1;
++
++	/* reference dictionary content */
++	return ZSTD_refDictContent(dctx, dict, dictSize);
++}
++
++size_t INIT ZSTD_decompressBegin_usingDict(ZSTD_DCtx *dctx, const void *dict, size_t dictSize)
++{
++	CHECK_F(ZSTD_decompressBegin(dctx));
++	if (dict && dictSize)
++		CHECK_E(ZSTD_decompress_insertDictionary(dctx, dict, dictSize), dictionary_corrupted);
++	return 0;
++}
++
++/* ======   ZSTD_DDict   ====== */
++
++struct ZSTD_DDict_s {
++	void *dictBuffer;
++	const void *dictContent;
++	size_t dictSize;
++	ZSTD_entropyTables_t entropy;
++	U32 dictID;
++	U32 entropyPresent;
++	ZSTD_customMem cMem;
++}; /* typedef'd to ZSTD_DDict within "zstd.h" */
++
++size_t INIT ZSTD_DDictWorkspaceBound(void) { return ZSTD_ALIGN(sizeof(ZSTD_stack)) + ZSTD_ALIGN(sizeof(ZSTD_DDict)); }
++
++static const void *INIT ZSTD_DDictDictContent(const ZSTD_DDict *ddict) { return ddict->dictContent; }
++
++static size_t INIT ZSTD_DDictDictSize(const ZSTD_DDict *ddict) { return ddict->dictSize; }
++
++static void INIT ZSTD_refDDict(ZSTD_DCtx *dstDCtx, const ZSTD_DDict *ddict)
++{
++	ZSTD_decompressBegin(dstDCtx); /* init */
++	if (ddict) {		       /* support refDDict on NULL */
++		dstDCtx->dictID = ddict->dictID;
++		dstDCtx->base = ddict->dictContent;
++		dstDCtx->vBase = ddict->dictContent;
++		dstDCtx->dictEnd = (const BYTE *)ddict->dictContent + ddict->dictSize;
++		dstDCtx->previousDstEnd = dstDCtx->dictEnd;
++		if (ddict->entropyPresent) {
++			dstDCtx->litEntropy = 1;
++			dstDCtx->fseEntropy = 1;
++			dstDCtx->LLTptr = ddict->entropy.LLTable;
++			dstDCtx->MLTptr = ddict->entropy.MLTable;
++			dstDCtx->OFTptr = ddict->entropy.OFTable;
++			dstDCtx->HUFptr = ddict->entropy.hufTable;
++			dstDCtx->entropy.rep[0] = ddict->entropy.rep[0];
++			dstDCtx->entropy.rep[1] = ddict->entropy.rep[1];
++			dstDCtx->entropy.rep[2] = ddict->entropy.rep[2];
++		} else {
++			dstDCtx->litEntropy = 0;
++			dstDCtx->fseEntropy = 0;
++		}
++	}
++}
++
++static size_t INIT ZSTD_loadEntropy_inDDict(ZSTD_DDict *ddict)
++{
++	ddict->dictID = 0;
++	ddict->entropyPresent = 0;
++	if (ddict->dictSize < 8)
++		return 0;
++	{
++		U32 const magic = ZSTD_readLE32(ddict->dictContent);
++		if (magic != ZSTD_DICT_MAGIC)
++			return 0; /* pure content mode */
++	}
++	ddict->dictID = ZSTD_readLE32((const char *)ddict->dictContent + 4);
++
++	/* load entropy tables */
++	CHECK_E(ZSTD_loadEntropy(&ddict->entropy, ddict->dictContent, ddict->dictSize), dictionary_corrupted);
++	ddict->entropyPresent = 1;
++	return 0;
++}
++
++static ZSTD_DDict *INIT ZSTD_createDDict_advanced(const void *dict, size_t dictSize, unsigned byReference, ZSTD_customMem customMem)
++{
++	if (!customMem.customAlloc || !customMem.customFree)
++		return NULL;
++
++	{
++		ZSTD_DDict *const ddict = (ZSTD_DDict *)ZSTD_malloc(sizeof(ZSTD_DDict), customMem);
++		if (!ddict)
++			return NULL;
++		ddict->cMem = customMem;
++
++		if ((byReference) || (!dict) || (!dictSize)) {
++			ddict->dictBuffer = NULL;
++			ddict->dictContent = dict;
++		} else {
++			void *const internalBuffer = ZSTD_malloc(dictSize, customMem);
++			if (!internalBuffer) {
++				ZSTD_freeDDict(ddict);
++				return NULL;
++			}
++			memcpy(internalBuffer, dict, dictSize);
++			ddict->dictBuffer = internalBuffer;
++			ddict->dictContent = internalBuffer;
++		}
++		ddict->dictSize = dictSize;
++		ddict->entropy.hufTable[0] = (HUF_DTable)((HufLog)*0x1000001); /* cover both little and big endian */
++		/* parse dictionary content */
++		{
++			size_t const errorCode = ZSTD_loadEntropy_inDDict(ddict);
++			if (ZSTD_isError(errorCode)) {
++				ZSTD_freeDDict(ddict);
++				return NULL;
++			}
++		}
++
++		return ddict;
++	}
++}
++
++/*! ZSTD_initDDict() :
++*   Create a digested dictionary, to start decompression without startup delay.
++*   `dict` content is copied inside DDict.
++*   Consequently, `dict` can be released after `ZSTD_DDict` creation */
++ZSTD_DDict *INIT ZSTD_initDDict(const void *dict, size_t dictSize, void *workspace, size_t workspaceSize)
++{
++	ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
++	return ZSTD_createDDict_advanced(dict, dictSize, 1, stackMem);
++}
++
++size_t INIT ZSTD_freeDDict(ZSTD_DDict *ddict)
++{
++	if (ddict == NULL)
++		return 0; /* support free on NULL */
++	{
++		ZSTD_customMem const cMem = ddict->cMem;
++		ZSTD_free(ddict->dictBuffer, cMem);
++		ZSTD_free(ddict, cMem);
++		return 0;
++	}
++}
++
++/*! ZSTD_getDictID_fromDict() :
++ *  Provides the dictID stored within dictionary.
++ *  if @return == 0, the dictionary is not conformant with Zstandard specification.
++ *  It can still be loaded, but as a content-only dictionary. */
++unsigned INIT ZSTD_getDictID_fromDict(const void *dict, size_t dictSize)
++{
++	if (dictSize < 8)
++		return 0;
++	if (ZSTD_readLE32(dict) != ZSTD_DICT_MAGIC)
++		return 0;
++	return ZSTD_readLE32((const char *)dict + 4);
++}
++
++/*! ZSTD_getDictID_fromDDict() :
++ *  Provides the dictID of the dictionary loaded into `ddict`.
++ *  If @return == 0, the dictionary is not conformant to Zstandard specification, or empty.
++ *  Non-conformant dictionaries can still be loaded, but as content-only dictionaries. */
++unsigned INIT ZSTD_getDictID_fromDDict(const ZSTD_DDict *ddict)
++{
++	if (ddict == NULL)
++		return 0;
++	return ZSTD_getDictID_fromDict(ddict->dictContent, ddict->dictSize);
++}
++
++/*! ZSTD_getDictID_fromFrame() :
++ *  Provides the dictID required to decompressed the frame stored within `src`.
++ *  If @return == 0, the dictID could not be decoded.
++ *  This could for one of the following reasons :
++ *  - The frame does not require a dictionary to be decoded (most common case).
++ *  - The frame was built with dictID intentionally removed. Whatever dictionary is necessary is a hidden information.
++ *    Note : this use case also happens when using a non-conformant dictionary.
++ *  - `srcSize` is too small, and as a result, the frame header could not be decoded (only possible if `srcSize < ZSTD_FRAMEHEADERSIZE_MAX`).
++ *  - This is not a Zstandard frame.
++ *  When identifying the exact failure cause, it's possible to used ZSTD_getFrameParams(), which will provide a more precise error code. */
++unsigned INIT ZSTD_getDictID_fromFrame(const void *src, size_t srcSize)
++{
++	ZSTD_frameParams zfp = {0, 0, 0, 0};
++	size_t const hError = ZSTD_getFrameParams(&zfp, src, srcSize);
++	if (ZSTD_isError(hError))
++		return 0;
++	return zfp.dictID;
++}
++
++/*! ZSTD_decompress_usingDDict() :
++*   Decompression using a pre-digested Dictionary
++*   Use dictionary without significant overhead. */
++size_t INIT ZSTD_decompress_usingDDict(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const ZSTD_DDict *ddict)
++{
++	/* pass content and size in case legacy frames are encountered */
++	return ZSTD_decompressMultiFrame(dctx, dst, dstCapacity, src, srcSize, NULL, 0, ddict);
++}
++
++/*=====================================
++*   Streaming decompression
++*====================================*/
++
++typedef enum { zdss_init, zdss_loadHeader, zdss_read, zdss_load, zdss_flush } ZSTD_dStreamStage;
++
++/* *** Resource management *** */
++struct ZSTD_DStream_s {
++	ZSTD_DCtx *dctx;
++	ZSTD_DDict *ddictLocal;
++	const ZSTD_DDict *ddict;
++	ZSTD_frameParams fParams;
++	ZSTD_dStreamStage stage;
++	char *inBuff;
++	size_t inBuffSize;
++	size_t inPos;
++	size_t maxWindowSize;
++	char *outBuff;
++	size_t outBuffSize;
++	size_t outStart;
++	size_t outEnd;
++	size_t blockSize;
++	BYTE headerBuffer[ZSTD_FRAMEHEADERSIZE_MAX]; /* tmp buffer to store frame header */
++	size_t lhSize;
++	ZSTD_customMem customMem;
++	void *legacyContext;
++	U32 previousLegacyVersion;
++	U32 legacyVersion;
++	U32 hostageByte;
++}; /* typedef'd to ZSTD_DStream within "zstd.h" */
++
++size_t INIT ZSTD_DStreamWorkspaceBound(size_t maxWindowSize)
++{
++	size_t const blockSize = MIN(maxWindowSize, ZSTD_BLOCKSIZE_ABSOLUTEMAX);
++	size_t const inBuffSize = blockSize;
++	size_t const outBuffSize = maxWindowSize + blockSize + WILDCOPY_OVERLENGTH * 2;
++	return ZSTD_DCtxWorkspaceBound() + ZSTD_ALIGN(sizeof(ZSTD_DStream)) + ZSTD_ALIGN(inBuffSize) + ZSTD_ALIGN(outBuffSize);
++}
++
++static ZSTD_DStream *INIT ZSTD_createDStream_advanced(ZSTD_customMem customMem)
++{
++	ZSTD_DStream *zds;
++
++	if (!customMem.customAlloc || !customMem.customFree)
++		return NULL;
++
++	zds = (ZSTD_DStream *)ZSTD_malloc(sizeof(ZSTD_DStream), customMem);
++	if (zds == NULL)
++		return NULL;
++	memset(zds, 0, sizeof(ZSTD_DStream));
++	memcpy(&zds->customMem, &customMem, sizeof(ZSTD_customMem));
++	zds->dctx = ZSTD_createDCtx_advanced(customMem);
++	if (zds->dctx == NULL) {
++		ZSTD_freeDStream(zds);
++		return NULL;
++	}
++	zds->stage = zdss_init;
++	zds->maxWindowSize = ZSTD_MAXWINDOWSIZE_DEFAULT;
++	return zds;
++}
++
++ZSTD_DStream *INIT ZSTD_initDStream(size_t maxWindowSize, void *workspace, size_t workspaceSize)
++{
++	ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
++	ZSTD_DStream *zds = ZSTD_createDStream_advanced(stackMem);
++	if (!zds) {
++		return NULL;
++	}
++
++	zds->maxWindowSize = maxWindowSize;
++	zds->stage = zdss_loadHeader;
++	zds->lhSize = zds->inPos = zds->outStart = zds->outEnd = 0;
++	ZSTD_freeDDict(zds->ddictLocal);
++	zds->ddictLocal = NULL;
++	zds->ddict = zds->ddictLocal;
++	zds->legacyVersion = 0;
++	zds->hostageByte = 0;
++
++	{
++		size_t const blockSize = MIN(zds->maxWindowSize, ZSTD_BLOCKSIZE_ABSOLUTEMAX);
++		size_t const neededOutSize = zds->maxWindowSize + blockSize + WILDCOPY_OVERLENGTH * 2;
++
++		zds->inBuff = (char *)ZSTD_malloc(blockSize, zds->customMem);
++		zds->inBuffSize = blockSize;
++		zds->outBuff = (char *)ZSTD_malloc(neededOutSize, zds->customMem);
++		zds->outBuffSize = neededOutSize;
++		if (zds->inBuff == NULL || zds->outBuff == NULL) {
++			ZSTD_freeDStream(zds);
++			return NULL;
++		}
++	}
++	return zds;
++}
++
++ZSTD_DStream *INIT ZSTD_initDStream_usingDDict(size_t maxWindowSize, const ZSTD_DDict *ddict, void *workspace, size_t workspaceSize)
++{
++	ZSTD_DStream *zds = ZSTD_initDStream(maxWindowSize, workspace, workspaceSize);
++	if (zds) {
++		zds->ddict = ddict;
++	}
++	return zds;
++}
++
++size_t INIT ZSTD_freeDStream(ZSTD_DStream *zds)
++{
++	if (zds == NULL)
++		return 0; /* support free on null */
++	{
++		ZSTD_customMem const cMem = zds->customMem;
++		ZSTD_freeDCtx(zds->dctx);
++		zds->dctx = NULL;
++		ZSTD_freeDDict(zds->ddictLocal);
++		zds->ddictLocal = NULL;
++		ZSTD_free(zds->inBuff, cMem);
++		zds->inBuff = NULL;
++		ZSTD_free(zds->outBuff, cMem);
++		zds->outBuff = NULL;
++		ZSTD_free(zds, cMem);
++		return 0;
++	}
++}
++
++/* *** Initialization *** */
++
++size_t INIT ZSTD_DStreamInSize(void) { return ZSTD_BLOCKSIZE_ABSOLUTEMAX + ZSTD_blockHeaderSize; }
++size_t INIT ZSTD_DStreamOutSize(void) { return ZSTD_BLOCKSIZE_ABSOLUTEMAX; }
++
++size_t INIT ZSTD_resetDStream(ZSTD_DStream *zds)
++{
++	zds->stage = zdss_loadHeader;
++	zds->lhSize = zds->inPos = zds->outStart = zds->outEnd = 0;
++	zds->legacyVersion = 0;
++	zds->hostageByte = 0;
++	return ZSTD_frameHeaderSize_prefix;
++}
++
++/* *****   Decompression   ***** */
++
++ZSTD_STATIC size_t INIT ZSTD_limitCopy(void *dst, size_t dstCapacity, const void *src, size_t srcSize)
++{
++	size_t const length = MIN(dstCapacity, srcSize);
++	memcpy(dst, src, length);
++	return length;
++}
++
++size_t INIT ZSTD_decompressStream(ZSTD_DStream *zds, ZSTD_outBuffer *output, ZSTD_inBuffer *input)
++{
++	const char *const istart = (const char *)(input->src) + input->pos;
++	const char *const iend = (const char *)(input->src) + input->size;
++	const char *ip = istart;
++	char *const ostart = (char *)(output->dst) + output->pos;
++	char *const oend = (char *)(output->dst) + output->size;
++	char *op = ostart;
++	U32 someMoreWork = 1;
++
++	while (someMoreWork) {
++		switch (zds->stage) {
++		case zdss_init:
++			ZSTD_resetDStream(zds); /* transparent reset on starting decoding a new frame */
++			/* fallthrough */
++
++		case zdss_loadHeader: {
++			size_t const hSize = ZSTD_getFrameParams(&zds->fParams, zds->headerBuffer, zds->lhSize);
++			if (ZSTD_isError(hSize))
++				return hSize;
++			if (hSize != 0) {				   /* need more input */
++				size_t const toLoad = hSize - zds->lhSize; /* if hSize!=0, hSize > zds->lhSize */
++				if (toLoad > (size_t)(iend - ip)) {	/* not enough input to load full header */
++					memcpy(zds->headerBuffer + zds->lhSize, ip, iend - ip);
++					zds->lhSize += iend - ip;
++					input->pos = input->size;
++					return (MAX(ZSTD_frameHeaderSize_min, hSize) - zds->lhSize) +
++					       ZSTD_blockHeaderSize; /* remaining header bytes + next block header */
++				}
++				memcpy(zds->headerBuffer + zds->lhSize, ip, toLoad);
++				zds->lhSize = hSize;
++				ip += toLoad;
++				break;
++			}
++
++			/* check for single-pass mode opportunity */
++			if (zds->fParams.frameContentSize && zds->fParams.windowSize /* skippable frame if == 0 */
++			    && (U64)(size_t)(oend - op) >= zds->fParams.frameContentSize) {
++				size_t const cSize = ZSTD_findFrameCompressedSize(istart, iend - istart);
++				if (cSize <= (size_t)(iend - istart)) {
++					size_t const decompressedSize = ZSTD_decompress_usingDDict(zds->dctx, op, oend - op, istart, cSize, zds->ddict);
++					if (ZSTD_isError(decompressedSize))
++						return decompressedSize;
++					ip = istart + cSize;
++					op += decompressedSize;
++					zds->dctx->expected = 0;
++					zds->stage = zdss_init;
++					someMoreWork = 0;
++					break;
++				}
++			}
++
++			/* Consume header */
++			ZSTD_refDDict(zds->dctx, zds->ddict);
++			{
++				size_t const h1Size = ZSTD_nextSrcSizeToDecompress(zds->dctx); /* == ZSTD_frameHeaderSize_prefix */
++				CHECK_F(ZSTD_decompressContinue(zds->dctx, NULL, 0, zds->headerBuffer, h1Size));
++				{
++					size_t const h2Size = ZSTD_nextSrcSizeToDecompress(zds->dctx);
++					CHECK_F(ZSTD_decompressContinue(zds->dctx, NULL, 0, zds->headerBuffer + h1Size, h2Size));
++				}
++			}
++
++			zds->fParams.windowSize = MAX(zds->fParams.windowSize, 1U << ZSTD_WINDOWLOG_ABSOLUTEMIN);
++			if (zds->fParams.windowSize > zds->maxWindowSize)
++				return ERROR(frameParameter_windowTooLarge);
++
++			/* Buffers are preallocated, but double check */
++			{
++				size_t const blockSize = MIN(zds->maxWindowSize, ZSTD_BLOCKSIZE_ABSOLUTEMAX);
++				size_t const neededOutSize = zds->maxWindowSize + blockSize + WILDCOPY_OVERLENGTH * 2;
++				if (zds->inBuffSize < blockSize) {
++					return ERROR(GENERIC);
++				}
++				if (zds->outBuffSize < neededOutSize) {
++					return ERROR(GENERIC);
++				}
++				zds->blockSize = blockSize;
++			}
++			zds->stage = zdss_read;
++		}
++			/* fallthrough */
++
++		case zdss_read: {
++			size_t const neededInSize = ZSTD_nextSrcSizeToDecompress(zds->dctx);
++			if (neededInSize == 0) { /* end of frame */
++				zds->stage = zdss_init;
++				someMoreWork = 0;
++				break;
++			}
++			if ((size_t)(iend - ip) >= neededInSize) { /* decode directly from src */
++				const int isSkipFrame = ZSTD_isSkipFrame(zds->dctx);
++				size_t const decodedSize = ZSTD_decompressContinue(zds->dctx, zds->outBuff + zds->outStart,
++										   (isSkipFrame ? 0 : zds->outBuffSize - zds->outStart), ip, neededInSize);
++				if (ZSTD_isError(decodedSize))
++					return decodedSize;
++				ip += neededInSize;
++				if (!decodedSize && !isSkipFrame)
++					break; /* this was just a header */
++				zds->outEnd = zds->outStart + decodedSize;
++				zds->stage = zdss_flush;
++				break;
++			}
++			if (ip == iend) {
++				someMoreWork = 0;
++				break;
++			} /* no more input */
++			zds->stage = zdss_load;
++			/* pass-through */
++		}
++			/* fallthrough */
++
++		case zdss_load: {
++			size_t const neededInSize = ZSTD_nextSrcSizeToDecompress(zds->dctx);
++			size_t const toLoad = neededInSize - zds->inPos; /* should always be <= remaining space within inBuff */
++			size_t loadedSize;
++			if (toLoad > zds->inBuffSize - zds->inPos)
++				return ERROR(corruption_detected); /* should never happen */
++			loadedSize = ZSTD_limitCopy(zds->inBuff + zds->inPos, toLoad, ip, iend - ip);
++			ip += loadedSize;
++			zds->inPos += loadedSize;
++			if (loadedSize < toLoad) {
++				someMoreWork = 0;
++				break;
++			} /* not enough input, wait for more */
++
++			/* decode loaded input */
++			{
++				const int isSkipFrame = ZSTD_isSkipFrame(zds->dctx);
++				size_t const decodedSize = ZSTD_decompressContinue(zds->dctx, zds->outBuff + zds->outStart, zds->outBuffSize - zds->outStart,
++										   zds->inBuff, neededInSize);
++				if (ZSTD_isError(decodedSize))
++					return decodedSize;
++				zds->inPos = 0; /* input is consumed */
++				if (!decodedSize && !isSkipFrame) {
++					zds->stage = zdss_read;
++					break;
++				} /* this was just a header */
++				zds->outEnd = zds->outStart + decodedSize;
++				zds->stage = zdss_flush;
++				/* pass-through */
++			}
++		}
++			/* fallthrough */
++
++		case zdss_flush: {
++			size_t const toFlushSize = zds->outEnd - zds->outStart;
++			size_t const flushedSize = ZSTD_limitCopy(op, oend - op, zds->outBuff + zds->outStart, toFlushSize);
++			op += flushedSize;
++			zds->outStart += flushedSize;
++			if (flushedSize == toFlushSize) { /* flush completed */
++				zds->stage = zdss_read;
++				if (zds->outStart + zds->blockSize > zds->outBuffSize)
++					zds->outStart = zds->outEnd = 0;
++				break;
++			}
++			/* cannot complete flush */
++			someMoreWork = 0;
++			break;
++		}
++		default:
++			return ERROR(GENERIC); /* impossible */
++		}
++	}
++
++	/* result */
++	input->pos += (size_t)(ip - istart);
++	output->pos += (size_t)(op - ostart);
++	{
++		size_t nextSrcSizeHint = ZSTD_nextSrcSizeToDecompress(zds->dctx);
++		if (!nextSrcSizeHint) {			    /* frame fully decoded */
++			if (zds->outEnd == zds->outStart) { /* output fully flushed */
++				if (zds->hostageByte) {
++					if (input->pos >= input->size) {
++						zds->stage = zdss_read;
++						return 1;
++					}	     /* can't release hostage (not present) */
++					input->pos++; /* release hostage */
++				}
++				return 0;
++			}
++			if (!zds->hostageByte) { /* output not fully flushed; keep last byte as hostage; will be released when all output is flushed */
++				input->pos--;    /* note : pos > 0, otherwise, impossible to finish reading last block */
++				zds->hostageByte = 1;
++			}
++			return 1;
++		}
++		nextSrcSizeHint += ZSTD_blockHeaderSize * (ZSTD_nextInputType(zds->dctx) == ZSTDnit_block); /* preload header of next block */
++		if (zds->inPos > nextSrcSizeHint)
++			return ERROR(GENERIC); /* should never happen */
++		nextSrcSizeHint -= zds->inPos; /* already loaded*/
++		return nextSrcSizeHint;
++	}
++}
+diff --git a/xen/common/zstd/entropy_common.c b/xen/common/zstd/entropy_common.c
+new file mode 100644
+index 000000000000..bcdb57982ba5
+--- /dev/null
++++ b/xen/common/zstd/entropy_common.c
+@@ -0,0 +1,243 @@
++/*
++ * Common functions of New Generation Entropy library
++ * Copyright (C) 2016, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ * notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ * copyright notice, this list of conditions and the following disclaimer
++ * in the documentation and/or other materials provided with the
++ * distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at :
++ * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
++ */
++
++/* *************************************
++*  Dependencies
++***************************************/
++#include "error_private.h" /* ERR_*, ERROR */
++#include "fse.h"
++#include "huf.h"
++#include "mem.h"
++
++/*===   Version   ===*/
++unsigned INIT FSE_versionNumber(void) { return FSE_VERSION_NUMBER; }
++
++/*===   Error Management   ===*/
++unsigned INIT FSE_isError(size_t code) { return ERR_isError(code); }
++
++unsigned INIT HUF_isError(size_t code) { return ERR_isError(code); }
++
++/*-**************************************************************
++*  FSE NCount encoding-decoding
++****************************************************************/
++size_t INIT FSE_readNCount(short *normalizedCounter, unsigned *maxSVPtr, unsigned *tableLogPtr, const void *headerBuffer, size_t hbSize)
++{
++	const BYTE *const istart = (const BYTE *)headerBuffer;
++	const BYTE *const iend = istart + hbSize;
++	const BYTE *ip = istart;
++	int nbBits;
++	int remaining;
++	int threshold;
++	U32 bitStream;
++	int bitCount;
++	unsigned charnum = 0;
++	int previous0 = 0;
++
++	if (hbSize < 4)
++		return ERROR(srcSize_wrong);
++	bitStream = ZSTD_readLE32(ip);
++	nbBits = (bitStream & 0xF) + FSE_MIN_TABLELOG; /* extract tableLog */
++	if (nbBits > FSE_TABLELOG_ABSOLUTE_MAX)
++		return ERROR(tableLog_tooLarge);
++	bitStream >>= 4;
++	bitCount = 4;
++	*tableLogPtr = nbBits;
++	remaining = (1 << nbBits) + 1;
++	threshold = 1 << nbBits;
++	nbBits++;
++
++	while ((remaining > 1) & (charnum <= *maxSVPtr)) {
++		if (previous0) {
++			unsigned n0 = charnum;
++			while ((bitStream & 0xFFFF) == 0xFFFF) {
++				n0 += 24;
++				if (ip < iend - 5) {
++					ip += 2;
++					bitStream = ZSTD_readLE32(ip) >> bitCount;
++				} else {
++					bitStream >>= 16;
++					bitCount += 16;
++				}
++			}
++			while ((bitStream & 3) == 3) {
++				n0 += 3;
++				bitStream >>= 2;
++				bitCount += 2;
++			}
++			n0 += bitStream & 3;
++			bitCount += 2;
++			if (n0 > *maxSVPtr)
++				return ERROR(maxSymbolValue_tooSmall);
++			while (charnum < n0)
++				normalizedCounter[charnum++] = 0;
++			if ((ip <= iend - 7) || (ip + (bitCount >> 3) <= iend - 4)) {
++				ip += bitCount >> 3;
++				bitCount &= 7;
++				bitStream = ZSTD_readLE32(ip) >> bitCount;
++			} else {
++				bitStream >>= 2;
++			}
++		}
++		{
++			int const max = (2 * threshold - 1) - remaining;
++			int count;
++
++			if ((bitStream & (threshold - 1)) < (U32)max) {
++				count = bitStream & (threshold - 1);
++				bitCount += nbBits - 1;
++			} else {
++				count = bitStream & (2 * threshold - 1);
++				if (count >= threshold)
++					count -= max;
++				bitCount += nbBits;
++			}
++
++			count--;				 /* extra accuracy */
++			remaining -= count < 0 ? -count : count; /* -1 means +1 */
++			normalizedCounter[charnum++] = (short)count;
++			previous0 = !count;
++			while (remaining < threshold) {
++				nbBits--;
++				threshold >>= 1;
++			}
++
++			if ((ip <= iend - 7) || (ip + (bitCount >> 3) <= iend - 4)) {
++				ip += bitCount >> 3;
++				bitCount &= 7;
++			} else {
++				bitCount -= (int)(8 * (iend - 4 - ip));
++				ip = iend - 4;
++			}
++			bitStream = ZSTD_readLE32(ip) >> (bitCount & 31);
++		}
++	} /* while ((remaining>1) & (charnum<=*maxSVPtr)) */
++	if (remaining != 1)
++		return ERROR(corruption_detected);
++	if (bitCount > 32)
++		return ERROR(corruption_detected);
++	*maxSVPtr = charnum - 1;
++
++	ip += (bitCount + 7) >> 3;
++	return ip - istart;
++}
++
++/*! HUF_readStats() :
++	Read compact Huffman tree, saved by HUF_writeCTable().
++	`huffWeight` is destination buffer.
++	`rankStats` is assumed to be a table of at least HUF_TABLELOG_MAX U32.
++	@return : size read from `src` , or an error Code .
++	Note : Needed by HUF_readCTable() and HUF_readDTableX?() .
++*/
++size_t INIT HUF_readStats_wksp(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize, void *workspace, size_t workspaceSize)
++{
++	U32 weightTotal;
++	const BYTE *ip = (const BYTE *)src;
++	size_t iSize;
++	size_t oSize;
++
++	if (!srcSize)
++		return ERROR(srcSize_wrong);
++	iSize = ip[0];
++	/* memset(huffWeight, 0, hwSize);   */ /* is not necessary, even though some analyzer complain ... */
++
++	if (iSize >= 128) { /* special header */
++		oSize = iSize - 127;
++		iSize = ((oSize + 1) / 2);
++		if (iSize + 1 > srcSize)
++			return ERROR(srcSize_wrong);
++		if (oSize >= hwSize)
++			return ERROR(corruption_detected);
++		ip += 1;
++		{
++			U32 n;
++			for (n = 0; n < oSize; n += 2) {
++				huffWeight[n] = ip[n / 2] >> 4;
++				huffWeight[n + 1] = ip[n / 2] & 15;
++			}
++		}
++	} else {						 /* header compressed with FSE (normal case) */
++		if (iSize + 1 > srcSize)
++			return ERROR(srcSize_wrong);
++		oSize = FSE_decompress_wksp(huffWeight, hwSize - 1, ip + 1, iSize, 6, workspace, workspaceSize); /* max (hwSize-1) values decoded, as last one is implied */
++		if (FSE_isError(oSize))
++			return oSize;
++	}
++
++	/* collect weight stats */
++	memset(rankStats, 0, (HUF_TABLELOG_MAX + 1) * sizeof(U32));
++	weightTotal = 0;
++	{
++		U32 n;
++		for (n = 0; n < oSize; n++) {
++			if (huffWeight[n] >= HUF_TABLELOG_MAX)
++				return ERROR(corruption_detected);
++			rankStats[huffWeight[n]]++;
++			weightTotal += (1 << huffWeight[n]) >> 1;
++		}
++	}
++	if (weightTotal == 0)
++		return ERROR(corruption_detected);
++
++	/* get last non-null symbol weight (implied, total must be 2^n) */
++	{
++		U32 const tableLog = BIT_highbit32(weightTotal) + 1;
++		if (tableLog > HUF_TABLELOG_MAX)
++			return ERROR(corruption_detected);
++		*tableLogPtr = tableLog;
++		/* determine last weight */
++		{
++			U32 const total = 1 << tableLog;
++			U32 const rest = total - weightTotal;
++			U32 const verif = 1 << BIT_highbit32(rest);
++			U32 const lastWeight = BIT_highbit32(rest) + 1;
++			if (verif != rest)
++				return ERROR(corruption_detected); /* last value must be a clean power of 2 */
++			huffWeight[oSize] = (BYTE)lastWeight;
++			rankStats[lastWeight]++;
++		}
++	}
++
++	/* check tree construction validity */
++	if ((rankStats[1] < 2) || (rankStats[1] & 1))
++		return ERROR(corruption_detected); /* by construction : at least 2 elts of rank 1, must be even */
++
++	/* results */
++	*nbSymbolsPtr = (U32)(oSize + 1);
++	return iSize + 1;
++}
+diff --git a/xen/common/zstd/error_private.h b/xen/common/zstd/error_private.h
+new file mode 100644
+index 000000000000..d07bf3cb9b55
+--- /dev/null
++++ b/xen/common/zstd/error_private.h
+@@ -0,0 +1,110 @@
++/**
++ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
++ * All rights reserved.
++ *
++ * This source code is licensed under the BSD-style license found in the
++ * LICENSE file in the root directory of https://github.com/facebook/zstd.
++ * An additional grant of patent rights can be found in the PATENTS file in the
++ * same directory.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ */
++
++/* Note : this module is expected to remain private, do not expose it */
++
++#ifndef ERROR_H_MODULE
++#define ERROR_H_MODULE
++
++/* ****************************************
++*  Dependencies
++******************************************/
++#include <xen/types.h> /* size_t */
++
++/**
++ * enum ZSTD_ErrorCode - zstd error codes
++ *
++ * Functions that return size_t can be checked for errors using ZSTD_isError()
++ * and the ZSTD_ErrorCode can be extracted using ZSTD_getErrorCode().
++ */
++typedef enum {
++	ZSTD_error_no_error,
++	ZSTD_error_GENERIC,
++	ZSTD_error_prefix_unknown,
++	ZSTD_error_version_unsupported,
++	ZSTD_error_parameter_unknown,
++	ZSTD_error_frameParameter_unsupported,
++	ZSTD_error_frameParameter_unsupportedBy32bits,
++	ZSTD_error_frameParameter_windowTooLarge,
++	ZSTD_error_compressionParameter_unsupported,
++	ZSTD_error_init_missing,
++	ZSTD_error_memory_allocation,
++	ZSTD_error_stage_wrong,
++	ZSTD_error_dstSize_tooSmall,
++	ZSTD_error_srcSize_wrong,
++	ZSTD_error_corruption_detected,
++	ZSTD_error_checksum_wrong,
++	ZSTD_error_tableLog_tooLarge,
++	ZSTD_error_maxSymbolValue_tooLarge,
++	ZSTD_error_maxSymbolValue_tooSmall,
++	ZSTD_error_dictionary_corrupted,
++	ZSTD_error_dictionary_wrong,
++	ZSTD_error_dictionaryCreation_failed,
++	ZSTD_error_maxCode
++} ZSTD_ErrorCode;
++
++/* ****************************************
++*  Compiler-specific
++******************************************/
++#define ERR_STATIC static __attribute__((unused))
++
++/*-****************************************
++*  Customization (error_public.h)
++******************************************/
++typedef ZSTD_ErrorCode ERR_enum;
++#define PREFIX(name) ZSTD_error_##name
++
++/*-****************************************
++*  Error codes handling
++******************************************/
++#define ERROR(name) ((size_t)-PREFIX(name))
++
++ERR_STATIC unsigned INIT ERR_isError(size_t code) { return (code > ERROR(maxCode)); }
++
++ERR_STATIC ERR_enum INIT ERR_getErrorCode(size_t code)
++{
++	if (!ERR_isError(code))
++		return (ERR_enum)0;
++	return (ERR_enum)(0 - code);
++}
++
++/**
++ * ZSTD_isError() - tells if a size_t function result is an error code
++ * @code:  The function result to check for error.
++ *
++ * Return: Non-zero iff the code is an error.
++ */
++static __attribute__((unused)) unsigned int INIT ZSTD_isError(size_t code)
++{
++	return code > (size_t)-ZSTD_error_maxCode;
++}
++
++/**
++ * ZSTD_getErrorCode() - translates an error function result to a ZSTD_ErrorCode
++ * @functionResult: The result of a function for which ZSTD_isError() is true.
++ *
++ * Return:          The ZSTD_ErrorCode corresponding to the functionResult or 0
++ *                  if the functionResult isn't an error.
++ */
++static __attribute__((unused)) ZSTD_ErrorCode INIT ZSTD_getErrorCode(
++	size_t functionResult)
++{
++	if (!ZSTD_isError(functionResult))
++		return (ZSTD_ErrorCode)0;
++	return (ZSTD_ErrorCode)(0 - functionResult);
++}
++
++#endif /* ERROR_H_MODULE */
+diff --git a/xen/common/zstd/fse.h b/xen/common/zstd/fse.h
+new file mode 100644
+index 000000000000..b86717c34d0f
+--- /dev/null
++++ b/xen/common/zstd/fse.h
+@@ -0,0 +1,575 @@
++/*
++ * FSE : Finite State Entropy codec
++ * Public Prototypes declaration
++ * Copyright (C) 2013-2016, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ * notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ * copyright notice, this list of conditions and the following disclaimer
++ * in the documentation and/or other materials provided with the
++ * distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at :
++ * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
++ */
++#ifndef FSE_H
++#define FSE_H
++
++/*-*****************************************
++*  Dependencies
++******************************************/
++#include <xen/types.h> /* size_t, ptrdiff_t */
++
++/*-*****************************************
++*  FSE_PUBLIC_API : control library symbols visibility
++******************************************/
++#define FSE_PUBLIC_API
++
++/*------   Version   ------*/
++#define FSE_VERSION_MAJOR 0
++#define FSE_VERSION_MINOR 9
++#define FSE_VERSION_RELEASE 0
++
++#define FSE_LIB_VERSION FSE_VERSION_MAJOR.FSE_VERSION_MINOR.FSE_VERSION_RELEASE
++#define FSE_QUOTE(str) #str
++#define FSE_EXPAND_AND_QUOTE(str) FSE_QUOTE(str)
++#define FSE_VERSION_STRING FSE_EXPAND_AND_QUOTE(FSE_LIB_VERSION)
++
++#define FSE_VERSION_NUMBER (FSE_VERSION_MAJOR * 100 * 100 + FSE_VERSION_MINOR * 100 + FSE_VERSION_RELEASE)
++FSE_PUBLIC_API unsigned FSE_versionNumber(void); /**< library version number; to be used when checking dll version */
++
++/*-*****************************************
++*  Tool functions
++******************************************/
++FSE_PUBLIC_API size_t FSE_compressBound(size_t size); /* maximum compressed size */
++
++/* Error Management */
++FSE_PUBLIC_API unsigned FSE_isError(size_t code); /* tells if a return value is an error code */
++
++/*-*****************************************
++*  FSE detailed API
++******************************************/
++/*!
++FSE_compress() does the following:
++1. count symbol occurrence from source[] into table count[]
++2. normalize counters so that sum(count[]) == Power_of_2 (2^tableLog)
++3. save normalized counters to memory buffer using writeNCount()
++4. build encoding table 'CTable' from normalized counters
++5. encode the data stream using encoding table 'CTable'
++
++FSE_decompress() does the following:
++1. read normalized counters with readNCount()
++2. build decoding table 'DTable' from normalized counters
++3. decode the data stream using decoding table 'DTable'
++
++The following API allows targeting specific sub-functions for advanced tasks.
++For example, it's possible to compress several blocks using the same 'CTable',
++or to save and provide normalized distribution using external method.
++*/
++
++/* *** COMPRESSION *** */
++/*! FSE_optimalTableLog():
++	dynamically downsize 'tableLog' when conditions are met.
++	It saves CPU time, by using smaller tables, while preserving or even improving compression ratio.
++	@return : recommended tableLog (necessarily <= 'maxTableLog') */
++FSE_PUBLIC_API unsigned FSE_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigned maxSymbolValue);
++
++/*! FSE_normalizeCount():
++	normalize counts so that sum(count[]) == Power_of_2 (2^tableLog)
++	'normalizedCounter' is a table of short, of minimum size (maxSymbolValue+1).
++	@return : tableLog,
++			  or an errorCode, which can be tested using FSE_isError() */
++FSE_PUBLIC_API size_t FSE_normalizeCount(short *normalizedCounter, unsigned tableLog, const unsigned *count, size_t srcSize, unsigned maxSymbolValue);
++
++/*! FSE_NCountWriteBound():
++	Provides the maximum possible size of an FSE normalized table, given 'maxSymbolValue' and 'tableLog'.
++	Typically useful for allocation purpose. */
++FSE_PUBLIC_API size_t FSE_NCountWriteBound(unsigned maxSymbolValue, unsigned tableLog);
++
++/*! FSE_writeNCount():
++	Compactly save 'normalizedCounter' into 'buffer'.
++	@return : size of the compressed table,
++			  or an errorCode, which can be tested using FSE_isError(). */
++FSE_PUBLIC_API size_t FSE_writeNCount(void *buffer, size_t bufferSize, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog);
++
++/*! Constructor and Destructor of FSE_CTable.
++	Note that FSE_CTable size depends on 'tableLog' and 'maxSymbolValue' */
++typedef unsigned FSE_CTable; /* don't allocate that. It's only meant to be more restrictive than void* */
++
++/*! FSE_compress_usingCTable():
++	Compress `src` using `ct` into `dst` which must be already allocated.
++	@return : size of compressed data (<= `dstCapacity`),
++			  or 0 if compressed data could not fit into `dst`,
++			  or an errorCode, which can be tested using FSE_isError() */
++FSE_PUBLIC_API size_t FSE_compress_usingCTable(void *dst, size_t dstCapacity, const void *src, size_t srcSize, const FSE_CTable *ct);
++
++/*!
++Tutorial :
++----------
++The first step is to count all symbols. FSE_count() does this job very fast.
++Result will be saved into 'count', a table of unsigned int, which must be already allocated, and have 'maxSymbolValuePtr[0]+1' cells.
++'src' is a table of bytes of size 'srcSize'. All values within 'src' MUST be <= maxSymbolValuePtr[0]
++maxSymbolValuePtr[0] will be updated, with its real value (necessarily <= original value)
++FSE_count() will return the number of occurrence of the most frequent symbol.
++This can be used to know if there is a single symbol within 'src', and to quickly evaluate its compressibility.
++If there is an error, the function will return an ErrorCode (which can be tested using FSE_isError()).
++
++The next step is to normalize the frequencies.
++FSE_normalizeCount() will ensure that sum of frequencies is == 2 ^'tableLog'.
++It also guarantees a minimum of 1 to any Symbol with frequency >= 1.
++You can use 'tableLog'==0 to mean "use default tableLog value".
++If you are unsure of which tableLog value to use, you can ask FSE_optimalTableLog(),
++which will provide the optimal valid tableLog given sourceSize, maxSymbolValue, and a user-defined maximum (0 means "default").
++
++The result of FSE_normalizeCount() will be saved into a table,
++called 'normalizedCounter', which is a table of signed short.
++'normalizedCounter' must be already allocated, and have at least 'maxSymbolValue+1' cells.
++The return value is tableLog if everything proceeded as expected.
++It is 0 if there is a single symbol within distribution.
++If there is an error (ex: invalid tableLog value), the function will return an ErrorCode (which can be tested using FSE_isError()).
++
++'normalizedCounter' can be saved in a compact manner to a memory area using FSE_writeNCount().
++'buffer' must be already allocated.
++For guaranteed success, buffer size must be at least FSE_headerBound().
++The result of the function is the number of bytes written into 'buffer'.
++If there is an error, the function will return an ErrorCode (which can be tested using FSE_isError(); ex : buffer size too small).
++
++'normalizedCounter' can then be used to create the compression table 'CTable'.
++The space required by 'CTable' must be already allocated, using FSE_createCTable().
++You can then use FSE_buildCTable() to fill 'CTable'.
++If there is an error, both functions will return an ErrorCode (which can be tested using FSE_isError()).
++
++'CTable' can then be used to compress 'src', with FSE_compress_usingCTable().
++Similar to FSE_count(), the convention is that 'src' is assumed to be a table of char of size 'srcSize'
++The function returns the size of compressed data (without header), necessarily <= `dstCapacity`.
++If it returns '0', compressed data could not fit into 'dst'.
++If there is an error, the function will return an ErrorCode (which can be tested using FSE_isError()).
++*/
++
++/* *** DECOMPRESSION *** */
++
++/*! FSE_readNCount():
++	Read compactly saved 'normalizedCounter' from 'rBuffer'.
++	@return : size read from 'rBuffer',
++			  or an errorCode, which can be tested using FSE_isError().
++			  maxSymbolValuePtr[0] and tableLogPtr[0] will also be updated with their respective values */
++FSE_PUBLIC_API size_t FSE_readNCount(short *normalizedCounter, unsigned *maxSymbolValuePtr, unsigned *tableLogPtr, const void *rBuffer, size_t rBuffSize);
++
++/*! Constructor and Destructor of FSE_DTable.
++	Note that its size depends on 'tableLog' */
++typedef unsigned FSE_DTable; /* don't allocate that. It's just a way to be more restrictive than void* */
++
++/*! FSE_buildDTable():
++	Builds 'dt', which must be already allocated, using FSE_createDTable().
++	return : 0, or an errorCode, which can be tested using FSE_isError() */
++FSE_PUBLIC_API size_t FSE_buildDTable_wksp(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void *workspace, size_t workspaceSize);
++
++/*! FSE_decompress_usingDTable():
++	Decompress compressed source `cSrc` of size `cSrcSize` using `dt`
++	into `dst` which must be already allocated.
++	@return : size of regenerated data (necessarily <= `dstCapacity`),
++			  or an errorCode, which can be tested using FSE_isError() */
++FSE_PUBLIC_API size_t FSE_decompress_usingDTable(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, const FSE_DTable *dt);
++
++/*!
++Tutorial :
++----------
++(Note : these functions only decompress FSE-compressed blocks.
++ If block is uncompressed, use memcpy() instead
++ If block is a single repeated byte, use memset() instead )
++
++The first step is to obtain the normalized frequencies of symbols.
++This can be performed by FSE_readNCount() if it was saved using FSE_writeNCount().
++'normalizedCounter' must be already allocated, and have at least 'maxSymbolValuePtr[0]+1' cells of signed short.
++In practice, that means it's necessary to know 'maxSymbolValue' beforehand,
++or size the table to handle worst case situations (typically 256).
++FSE_readNCount() will provide 'tableLog' and 'maxSymbolValue'.
++The result of FSE_readNCount() is the number of bytes read from 'rBuffer'.
++Note that 'rBufferSize' must be at least 4 bytes, even if useful information is less than that.
++If there is an error, the function will return an error code, which can be tested using FSE_isError().
++
++The next step is to build the decompression tables 'FSE_DTable' from 'normalizedCounter'.
++This is performed by the function FSE_buildDTable().
++The space required by 'FSE_DTable' must be already allocated using FSE_createDTable().
++If there is an error, the function will return an error code, which can be tested using FSE_isError().
++
++`FSE_DTable` can then be used to decompress `cSrc`, with FSE_decompress_usingDTable().
++`cSrcSize` must be strictly correct, otherwise decompression will fail.
++FSE_decompress_usingDTable() result will tell how many bytes were regenerated (<=`dstCapacity`).
++If there is an error, the function will return an error code, which can be tested using FSE_isError(). (ex: dst buffer too small)
++*/
++
++/* *** Dependency *** */
++#include "bitstream.h"
++
++/* *****************************************
++*  Static allocation
++*******************************************/
++/* FSE buffer bounds */
++#define FSE_NCOUNTBOUND 512
++#define FSE_BLOCKBOUND(size) (size + (size >> 7))
++#define FSE_COMPRESSBOUND(size) (FSE_NCOUNTBOUND + FSE_BLOCKBOUND(size)) /* Macro version, useful for static allocation */
++
++/* It is possible to statically allocate FSE CTable/DTable as a table of FSE_CTable/FSE_DTable using below macros */
++#define FSE_CTABLE_SIZE_U32(maxTableLog, maxSymbolValue) (1 + (1 << (maxTableLog - 1)) + ((maxSymbolValue + 1) * 2))
++#define FSE_DTABLE_SIZE_U32(maxTableLog) (1 + (1 << maxTableLog))
++
++/* *****************************************
++*  FSE advanced API
++*******************************************/
++/* FSE_count_wksp() :
++ * Same as FSE_count(), but using an externally provided scratch buffer.
++ * `workSpace` size must be table of >= `1024` unsigned
++ */
++size_t FSE_count_wksp(unsigned *count, unsigned *maxSymbolValuePtr, const void *source, size_t sourceSize, unsigned *workSpace);
++
++/* FSE_countFast_wksp() :
++ * Same as FSE_countFast(), but using an externally provided scratch buffer.
++ * `workSpace` must be a table of minimum `1024` unsigned
++ */
++size_t FSE_countFast_wksp(unsigned *count, unsigned *maxSymbolValuePtr, const void *src, size_t srcSize, unsigned *workSpace);
++
++/*! FSE_count_simple
++ * Same as FSE_countFast(), but does not use any additional memory (not even on stack).
++ * This function is unsafe, and will segfault if any value within `src` is `> *maxSymbolValuePtr` (presuming it's also the size of `count`).
++*/
++size_t FSE_count_simple(unsigned *count, unsigned *maxSymbolValuePtr, const void *src, size_t srcSize);
++
++unsigned FSE_optimalTableLog_internal(unsigned maxTableLog, size_t srcSize, unsigned maxSymbolValue, unsigned minus);
++/**< same as FSE_optimalTableLog(), which used `minus==2` */
++
++size_t FSE_buildCTable_raw(FSE_CTable *ct, unsigned nbBits);
++/**< build a fake FSE_CTable, designed for a flat distribution, where each symbol uses nbBits */
++
++size_t FSE_buildCTable_rle(FSE_CTable *ct, unsigned char symbolValue);
++/**< build a fake FSE_CTable, designed to compress always the same symbolValue */
++
++/* FSE_buildCTable_wksp() :
++ * Same as FSE_buildCTable(), but using an externally allocated scratch buffer (`workSpace`).
++ * `wkspSize` must be >= `(1<<tableLog)`.
++ */
++size_t FSE_buildCTable_wksp(FSE_CTable *ct, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, size_t wkspSize);
++
++size_t FSE_buildDTable_raw(FSE_DTable *dt, unsigned nbBits);
++/**< build a fake FSE_DTable, designed to read a flat distribution where each symbol uses nbBits */
++
++size_t FSE_buildDTable_rle(FSE_DTable *dt, unsigned char symbolValue);
++/**< build a fake FSE_DTable, designed to always generate the same symbolValue */
++
++size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, unsigned maxLog, void *workspace, size_t workspaceSize);
++/**< same as FSE_decompress(), using an externally allocated `workSpace` produced with `FSE_DTABLE_SIZE_U32(maxLog)` */
++
++/* *****************************************
++*  FSE symbol compression API
++*******************************************/
++/*!
++   This API consists of small unitary functions, which highly benefit from being inlined.
++   Hence their body are included in next section.
++*/
++typedef struct {
++	ptrdiff_t value;
++	const void *stateTable;
++	const void *symbolTT;
++	unsigned stateLog;
++} FSE_CState_t;
++
++static void FSE_initCState(FSE_CState_t *CStatePtr, const FSE_CTable *ct);
++
++static void FSE_encodeSymbol(BIT_CStream_t *bitC, FSE_CState_t *CStatePtr, unsigned symbol);
++
++static void FSE_flushCState(BIT_CStream_t *bitC, const FSE_CState_t *CStatePtr);
++
++/**<
++These functions are inner components of FSE_compress_usingCTable().
++They allow the creation of custom streams, mixing multiple tables and bit sources.
++
++A key property to keep in mind is that encoding and decoding are done **in reverse direction**.
++So the first symbol you will encode is the last you will decode, like a LIFO stack.
++
++You will need a few variables to track your CStream. They are :
++
++FSE_CTable    ct;         // Provided by FSE_buildCTable()
++BIT_CStream_t bitStream;  // bitStream tracking structure
++FSE_CState_t  state;      // State tracking structure (can have several)
++
++
++The first thing to do is to init bitStream and state.
++	size_t errorCode = BIT_initCStream(&bitStream, dstBuffer, maxDstSize);
++	FSE_initCState(&state, ct);
++
++Note that BIT_initCStream() can produce an error code, so its result should be tested, using FSE_isError();
++You can then encode your input data, byte after byte.
++FSE_encodeSymbol() outputs a maximum of 'tableLog' bits at a time.
++Remember decoding will be done in reverse direction.
++	FSE_encodeByte(&bitStream, &state, symbol);
++
++At any time, you can also add any bit sequence.
++Note : maximum allowed nbBits is 25, for compatibility with 32-bits decoders
++	BIT_addBits(&bitStream, bitField, nbBits);
++
++The above methods don't commit data to memory, they just store it into local register, for speed.
++Local register size is 64-bits on 64-bits systems, 32-bits on 32-bits systems (size_t).
++Writing data to memory is a manual operation, performed by the flushBits function.
++	BIT_flushBits(&bitStream);
++
++Your last FSE encoding operation shall be to flush your last state value(s).
++	FSE_flushState(&bitStream, &state);
++
++Finally, you must close the bitStream.
++The function returns the size of CStream in bytes.
++If data couldn't fit into dstBuffer, it will return a 0 ( == not compressible)
++If there is an error, it returns an errorCode (which can be tested using FSE_isError()).
++	size_t size = BIT_closeCStream(&bitStream);
++*/
++
++/* *****************************************
++*  FSE symbol decompression API
++*******************************************/
++typedef struct {
++	size_t state;
++	const void *table; /* precise table may vary, depending on U16 */
++} FSE_DState_t;
++
++static void FSE_initDState(FSE_DState_t *DStatePtr, BIT_DStream_t *bitD, const FSE_DTable *dt);
++
++static unsigned char FSE_decodeSymbol(FSE_DState_t *DStatePtr, BIT_DStream_t *bitD);
++
++static unsigned FSE_endOfDState(const FSE_DState_t *DStatePtr);
++
++/**<
++Let's now decompose FSE_decompress_usingDTable() into its unitary components.
++You will decode FSE-encoded symbols from the bitStream,
++and also any other bitFields you put in, **in reverse order**.
++
++You will need a few variables to track your bitStream. They are :
++
++BIT_DStream_t DStream;    // Stream context
++FSE_DState_t  DState;     // State context. Multiple ones are possible
++FSE_DTable*   DTablePtr;  // Decoding table, provided by FSE_buildDTable()
++
++The first thing to do is to init the bitStream.
++	errorCode = BIT_initDStream(&DStream, srcBuffer, srcSize);
++
++You should then retrieve your initial state(s)
++(in reverse flushing order if you have several ones) :
++	errorCode = FSE_initDState(&DState, &DStream, DTablePtr);
++
++You can then decode your data, symbol after symbol.
++For information the maximum number of bits read by FSE_decodeSymbol() is 'tableLog'.
++Keep in mind that symbols are decoded in reverse order, like a LIFO stack (last in, first out).
++	unsigned char symbol = FSE_decodeSymbol(&DState, &DStream);
++
++You can retrieve any bitfield you eventually stored into the bitStream (in reverse order)
++Note : maximum allowed nbBits is 25, for 32-bits compatibility
++	size_t bitField = BIT_readBits(&DStream, nbBits);
++
++All above operations only read from local register (which size depends on size_t).
++Refueling the register from memory is manually performed by the reload method.
++	endSignal = FSE_reloadDStream(&DStream);
++
++BIT_reloadDStream() result tells if there is still some more data to read from DStream.
++BIT_DStream_unfinished : there is still some data left into the DStream.
++BIT_DStream_endOfBuffer : Dstream reached end of buffer. Its container may no longer be completely filled.
++BIT_DStream_completed : Dstream reached its exact end, corresponding in general to decompression completed.
++BIT_DStream_tooFar : Dstream went too far. Decompression result is corrupted.
++
++When reaching end of buffer (BIT_DStream_endOfBuffer), progress slowly, notably if you decode multiple symbols per loop,
++to properly detect the exact end of stream.
++After each decoded symbol, check if DStream is fully consumed using this simple test :
++	BIT_reloadDStream(&DStream) >= BIT_DStream_completed
++
++When it's done, verify decompression is fully completed, by checking both DStream and the relevant states.
++Checking if DStream has reached its end is performed by :
++	BIT_endOfDStream(&DStream);
++Check also the states. There might be some symbols left there, if some high probability ones (>50%) are possible.
++	FSE_endOfDState(&DState);
++*/
++
++/* *****************************************
++*  FSE unsafe API
++*******************************************/
++static unsigned char FSE_decodeSymbolFast(FSE_DState_t *DStatePtr, BIT_DStream_t *bitD);
++/* faster, but works only if nbBits is always >= 1 (otherwise, result will be corrupted) */
++
++/* *****************************************
++*  Implementation of inlined functions
++*******************************************/
++typedef struct {
++	int deltaFindState;
++	U32 deltaNbBits;
++} FSE_symbolCompressionTransform; /* total 8 bytes */
++
++ZSTD_STATIC void FSE_initCState(FSE_CState_t *statePtr, const FSE_CTable *ct)
++{
++	const void *ptr = ct;
++	const U16 *u16ptr = (const U16 *)ptr;
++	const U32 tableLog = ZSTD_read16(ptr);
++	statePtr->value = (ptrdiff_t)1 << tableLog;
++	statePtr->stateTable = u16ptr + 2;
++	statePtr->symbolTT = ((const U32 *)ct + 1 + (tableLog ? (1 << (tableLog - 1)) : 1));
++	statePtr->stateLog = tableLog;
++}
++
++/*! FSE_initCState2() :
++*   Same as FSE_initCState(), but the first symbol to include (which will be the last to be read)
++*   uses the smallest state value possible, saving the cost of this symbol */
++ZSTD_STATIC void FSE_initCState2(FSE_CState_t *statePtr, const FSE_CTable *ct, U32 symbol)
++{
++	FSE_initCState(statePtr, ct);
++	{
++		const FSE_symbolCompressionTransform symbolTT = ((const FSE_symbolCompressionTransform *)(statePtr->symbolTT))[symbol];
++		const U16 *stateTable = (const U16 *)(statePtr->stateTable);
++		U32 nbBitsOut = (U32)((symbolTT.deltaNbBits + (1 << 15)) >> 16);
++		statePtr->value = (nbBitsOut << 16) - symbolTT.deltaNbBits;
++		statePtr->value = stateTable[(statePtr->value >> nbBitsOut) + symbolTT.deltaFindState];
++	}
++}
++
++ZSTD_STATIC void FSE_encodeSymbol(BIT_CStream_t *bitC, FSE_CState_t *statePtr, U32 symbol)
++{
++	const FSE_symbolCompressionTransform symbolTT = ((const FSE_symbolCompressionTransform *)(statePtr->symbolTT))[symbol];
++	const U16 *const stateTable = (const U16 *)(statePtr->stateTable);
++	U32 nbBitsOut = (U32)((statePtr->value + symbolTT.deltaNbBits) >> 16);
++	BIT_addBits(bitC, statePtr->value, nbBitsOut);
++	statePtr->value = stateTable[(statePtr->value >> nbBitsOut) + symbolTT.deltaFindState];
++}
++
++ZSTD_STATIC void FSE_flushCState(BIT_CStream_t *bitC, const FSE_CState_t *statePtr)
++{
++	BIT_addBits(bitC, statePtr->value, statePtr->stateLog);
++	BIT_flushBits(bitC);
++}
++
++/* ======    Decompression    ====== */
++
++typedef struct {
++	U16 tableLog;
++	U16 fastMode;
++} FSE_DTableHeader; /* sizeof U32 */
++
++typedef struct {
++	unsigned short newState;
++	unsigned char symbol;
++	unsigned char nbBits;
++} FSE_decode_t; /* size == U32 */
++
++ZSTD_STATIC void FSE_initDState(FSE_DState_t *DStatePtr, BIT_DStream_t *bitD, const FSE_DTable *dt)
++{
++	const void *ptr = dt;
++	const FSE_DTableHeader *const DTableH = (const FSE_DTableHeader *)ptr;
++	DStatePtr->state = BIT_readBits(bitD, DTableH->tableLog);
++	BIT_reloadDStream(bitD);
++	DStatePtr->table = dt + 1;
++}
++
++ZSTD_STATIC BYTE FSE_peekSymbol(const FSE_DState_t *DStatePtr)
++{
++	FSE_decode_t const DInfo = ((const FSE_decode_t *)(DStatePtr->table))[DStatePtr->state];
++	return DInfo.symbol;
++}
++
++ZSTD_STATIC void FSE_updateState(FSE_DState_t *DStatePtr, BIT_DStream_t *bitD)
++{
++	FSE_decode_t const DInfo = ((const FSE_decode_t *)(DStatePtr->table))[DStatePtr->state];
++	U32 const nbBits = DInfo.nbBits;
++	size_t const lowBits = BIT_readBits(bitD, nbBits);
++	DStatePtr->state = DInfo.newState + lowBits;
++}
++
++ZSTD_STATIC BYTE FSE_decodeSymbol(FSE_DState_t *DStatePtr, BIT_DStream_t *bitD)
++{
++	FSE_decode_t const DInfo = ((const FSE_decode_t *)(DStatePtr->table))[DStatePtr->state];
++	U32 const nbBits = DInfo.nbBits;
++	BYTE const symbol = DInfo.symbol;
++	size_t const lowBits = BIT_readBits(bitD, nbBits);
++
++	DStatePtr->state = DInfo.newState + lowBits;
++	return symbol;
++}
++
++/*! FSE_decodeSymbolFast() :
++	unsafe, only works if no symbol has a probability > 50% */
++ZSTD_STATIC BYTE FSE_decodeSymbolFast(FSE_DState_t *DStatePtr, BIT_DStream_t *bitD)
++{
++	FSE_decode_t const DInfo = ((const FSE_decode_t *)(DStatePtr->table))[DStatePtr->state];
++	U32 const nbBits = DInfo.nbBits;
++	BYTE const symbol = DInfo.symbol;
++	size_t const lowBits = BIT_readBitsFast(bitD, nbBits);
++
++	DStatePtr->state = DInfo.newState + lowBits;
++	return symbol;
++}
++
++ZSTD_STATIC unsigned FSE_endOfDState(const FSE_DState_t *DStatePtr) { return DStatePtr->state == 0; }
++
++/* **************************************************************
++*  Tuning parameters
++****************************************************************/
++/*!MEMORY_USAGE :
++*  Memory usage formula : N->2^N Bytes (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
++*  Increasing memory usage improves compression ratio
++*  Reduced memory usage can improve speed, due to cache effect
++*  Recommended max value is 14, for 16KB, which nicely fits into Intel x86 L1 cache */
++#ifndef FSE_MAX_MEMORY_USAGE
++#define FSE_MAX_MEMORY_USAGE 14
++#endif
++#ifndef FSE_DEFAULT_MEMORY_USAGE
++#define FSE_DEFAULT_MEMORY_USAGE 13
++#endif
++
++/*!FSE_MAX_SYMBOL_VALUE :
++*  Maximum symbol value authorized.
++*  Required for proper stack allocation */
++#ifndef FSE_MAX_SYMBOL_VALUE
++#define FSE_MAX_SYMBOL_VALUE 255
++#endif
++
++/* **************************************************************
++*  template functions type & suffix
++****************************************************************/
++#define FSE_FUNCTION_TYPE BYTE
++#define FSE_FUNCTION_EXTENSION
++#define FSE_DECODE_TYPE FSE_decode_t
++
++/* ***************************************************************
++*  Constants
++*****************************************************************/
++#define FSE_MAX_TABLELOG (FSE_MAX_MEMORY_USAGE - 2)
++#define FSE_MAX_TABLESIZE (1U << FSE_MAX_TABLELOG)
++#define FSE_MAXTABLESIZE_MASK (FSE_MAX_TABLESIZE - 1)
++#define FSE_DEFAULT_TABLELOG (FSE_DEFAULT_MEMORY_USAGE - 2)
++#define FSE_MIN_TABLELOG 5
++
++#define FSE_TABLELOG_ABSOLUTE_MAX 15
++#if FSE_MAX_TABLELOG > FSE_TABLELOG_ABSOLUTE_MAX
++#error "FSE_MAX_TABLELOG > FSE_TABLELOG_ABSOLUTE_MAX is not supported"
++#endif
++
++#define FSE_TABLESTEP(tableSize) ((tableSize >> 1) + (tableSize >> 3) + 3)
++
++#endif /* FSE_H */
+diff --git a/xen/common/zstd/fse_decompress.c b/xen/common/zstd/fse_decompress.c
+new file mode 100644
+index 000000000000..cc51206df614
+--- /dev/null
++++ b/xen/common/zstd/fse_decompress.c
+@@ -0,0 +1,324 @@
++/*
++ * FSE : Finite State Entropy decoder
++ * Copyright (C) 2013-2015, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ * notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ * copyright notice, this list of conditions and the following disclaimer
++ * in the documentation and/or other materials provided with the
++ * distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at :
++ * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
++ */
++
++/* **************************************************************
++*  Compiler specifics
++****************************************************************/
++#define FORCE_INLINE static always_inline
++
++/* **************************************************************
++*  Includes
++****************************************************************/
++#include "bitstream.h"
++#include "fse.h"
++#include "zstd_internal.h"
++#include <xen/compiler.h>
++#include <xen/string.h> /* memcpy, memset */
++
++/* **************************************************************
++*  Error Management
++****************************************************************/
++#define FSE_isError ERR_isError
++#define FSE_STATIC_ASSERT(c)                                   \
++	{                                                      \
++		enum { FSE_static_assert = 1 / (int)(!!(c)) }; \
++	} /* use only *after* variable declarations */
++
++/* **************************************************************
++*  Templates
++****************************************************************/
++/*
++  designed to be included
++  for type-specific functions (template emulation in C)
++  Objective is to write these functions only once, for improved maintenance
++*/
++
++/* safety checks */
++#ifndef FSE_FUNCTION_EXTENSION
++#error "FSE_FUNCTION_EXTENSION must be defined"
++#endif
++#ifndef FSE_FUNCTION_TYPE
++#error "FSE_FUNCTION_TYPE must be defined"
++#endif
++
++/* Function names */
++#define FSE_CAT(X, Y) X##Y
++#define FSE_FUNCTION_NAME(X, Y) FSE_CAT(X, Y)
++#define FSE_TYPE_NAME(X, Y) FSE_CAT(X, Y)
++
++/* Function templates */
++
++size_t INIT FSE_buildDTable_wksp(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void *workspace, size_t workspaceSize)
++{
++	void *const tdPtr = dt + 1; /* because *dt is unsigned, 32-bits aligned on 32-bits */
++	FSE_DECODE_TYPE *const tableDecode = (FSE_DECODE_TYPE *)(tdPtr);
++	U16 *symbolNext = (U16 *)workspace;
++
++	U32 const maxSV1 = maxSymbolValue + 1;
++	U32 const tableSize = 1 << tableLog;
++	U32 highThreshold = tableSize - 1;
++
++	/* Sanity Checks */
++	if (workspaceSize < sizeof(U16) * (FSE_MAX_SYMBOL_VALUE + 1))
++		return ERROR(tableLog_tooLarge);
++	if (maxSymbolValue > FSE_MAX_SYMBOL_VALUE)
++		return ERROR(maxSymbolValue_tooLarge);
++	if (tableLog > FSE_MAX_TABLELOG)
++		return ERROR(tableLog_tooLarge);
++
++	/* Init, lay down lowprob symbols */
++	{
++		FSE_DTableHeader DTableH;
++		DTableH.tableLog = (U16)tableLog;
++		DTableH.fastMode = 1;
++		{
++			S16 const largeLimit = (S16)(1 << (tableLog - 1));
++			U32 s;
++			for (s = 0; s < maxSV1; s++) {
++				if (normalizedCounter[s] == -1) {
++					tableDecode[highThreshold--].symbol = (FSE_FUNCTION_TYPE)s;
++					symbolNext[s] = 1;
++				} else {
++					if (normalizedCounter[s] >= largeLimit)
++						DTableH.fastMode = 0;
++					symbolNext[s] = normalizedCounter[s];
++				}
++			}
++		}
++		memcpy(dt, &DTableH, sizeof(DTableH));
++	}
++
++	/* Spread symbols */
++	{
++		U32 const tableMask = tableSize - 1;
++		U32 const step = FSE_TABLESTEP(tableSize);
++		U32 s, position = 0;
++		for (s = 0; s < maxSV1; s++) {
++			int i;
++			for (i = 0; i < normalizedCounter[s]; i++) {
++				tableDecode[position].symbol = (FSE_FUNCTION_TYPE)s;
++				position = (position + step) & tableMask;
++				while (position > highThreshold)
++					position = (position + step) & tableMask; /* lowprob area */
++			}
++		}
++		if (position != 0)
++			return ERROR(GENERIC); /* position must reach all cells once, otherwise normalizedCounter is incorrect */
++	}
++
++	/* Build Decoding table */
++	{
++		U32 u;
++		for (u = 0; u < tableSize; u++) {
++			FSE_FUNCTION_TYPE const symbol = (FSE_FUNCTION_TYPE)(tableDecode[u].symbol);
++			U16 nextState = symbolNext[symbol]++;
++			tableDecode[u].nbBits = (BYTE)(tableLog - BIT_highbit32((U32)nextState));
++			tableDecode[u].newState = (U16)((nextState << tableDecode[u].nbBits) - tableSize);
++		}
++	}
++
++	return 0;
++}
++
++/*-*******************************************************
++*  Decompression (Byte symbols)
++*********************************************************/
++size_t INIT FSE_buildDTable_rle(FSE_DTable *dt, BYTE symbolValue)
++{
++	void *ptr = dt;
++	FSE_DTableHeader *const DTableH = (FSE_DTableHeader *)ptr;
++	void *dPtr = dt + 1;
++	FSE_decode_t *const cell = (FSE_decode_t *)dPtr;
++
++	DTableH->tableLog = 0;
++	DTableH->fastMode = 0;
++
++	cell->newState = 0;
++	cell->symbol = symbolValue;
++	cell->nbBits = 0;
++
++	return 0;
++}
++
++size_t INIT FSE_buildDTable_raw(FSE_DTable *dt, unsigned nbBits)
++{
++	void *ptr = dt;
++	FSE_DTableHeader *const DTableH = (FSE_DTableHeader *)ptr;
++	void *dPtr = dt + 1;
++	FSE_decode_t *const dinfo = (FSE_decode_t *)dPtr;
++	const unsigned tableSize = 1 << nbBits;
++	const unsigned tableMask = tableSize - 1;
++	const unsigned maxSV1 = tableMask + 1;
++	unsigned s;
++
++	/* Sanity checks */
++	if (nbBits < 1)
++		return ERROR(GENERIC); /* min size */
++
++	/* Build Decoding Table */
++	DTableH->tableLog = (U16)nbBits;
++	DTableH->fastMode = 1;
++	for (s = 0; s < maxSV1; s++) {
++		dinfo[s].newState = 0;
++		dinfo[s].symbol = (BYTE)s;
++		dinfo[s].nbBits = (BYTE)nbBits;
++	}
++
++	return 0;
++}
++
++FORCE_INLINE size_t FSE_decompress_usingDTable_generic(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const FSE_DTable *dt,
++						       const unsigned fast)
++{
++	BYTE *const ostart = (BYTE *)dst;
++	BYTE *op = ostart;
++	BYTE *const omax = op + maxDstSize;
++	BYTE *const olimit = omax - 3;
++
++	BIT_DStream_t bitD;
++	FSE_DState_t state1;
++	FSE_DState_t state2;
++
++	/* Init */
++	CHECK_F(BIT_initDStream(&bitD, cSrc, cSrcSize));
++
++	FSE_initDState(&state1, &bitD, dt);
++	FSE_initDState(&state2, &bitD, dt);
++
++#define FSE_GETSYMBOL(statePtr) fast ? FSE_decodeSymbolFast(statePtr, &bitD) : FSE_decodeSymbol(statePtr, &bitD)
++
++	/* 4 symbols per loop */
++	for (; (BIT_reloadDStream(&bitD) == BIT_DStream_unfinished) & (op < olimit); op += 4) {
++		op[0] = FSE_GETSYMBOL(&state1);
++
++		if (FSE_MAX_TABLELOG * 2 + 7 > sizeof(bitD.bitContainer) * 8) /* This test must be static */
++			BIT_reloadDStream(&bitD);
++
++		op[1] = FSE_GETSYMBOL(&state2);
++
++		if (FSE_MAX_TABLELOG * 4 + 7 > sizeof(bitD.bitContainer) * 8) /* This test must be static */
++		{
++			if (BIT_reloadDStream(&bitD) > BIT_DStream_unfinished) {
++				op += 2;
++				break;
++			}
++		}
++
++		op[2] = FSE_GETSYMBOL(&state1);
++
++		if (FSE_MAX_TABLELOG * 2 + 7 > sizeof(bitD.bitContainer) * 8) /* This test must be static */
++			BIT_reloadDStream(&bitD);
++
++		op[3] = FSE_GETSYMBOL(&state2);
++	}
++
++	/* tail */
++	/* note : BIT_reloadDStream(&bitD) >= FSE_DStream_partiallyFilled; Ends at exactly BIT_DStream_completed */
++	while (1) {
++		if (op > (omax - 2))
++			return ERROR(dstSize_tooSmall);
++		*op++ = FSE_GETSYMBOL(&state1);
++		if (BIT_reloadDStream(&bitD) == BIT_DStream_overflow) {
++			*op++ = FSE_GETSYMBOL(&state2);
++			break;
++		}
++
++		if (op > (omax - 2))
++			return ERROR(dstSize_tooSmall);
++		*op++ = FSE_GETSYMBOL(&state2);
++		if (BIT_reloadDStream(&bitD) == BIT_DStream_overflow) {
++			*op++ = FSE_GETSYMBOL(&state1);
++			break;
++		}
++	}
++
++	return op - ostart;
++}
++
++size_t INIT FSE_decompress_usingDTable(void *dst, size_t originalSize, const void *cSrc, size_t cSrcSize, const FSE_DTable *dt)
++{
++	const void *ptr = dt;
++	const FSE_DTableHeader *DTableH = (const FSE_DTableHeader *)ptr;
++	const U32 fastMode = DTableH->fastMode;
++
++	/* select fast mode (static) */
++	if (fastMode)
++		return FSE_decompress_usingDTable_generic(dst, originalSize, cSrc, cSrcSize, dt, 1);
++	return FSE_decompress_usingDTable_generic(dst, originalSize, cSrc, cSrcSize, dt, 0);
++}
++
++size_t INIT FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, unsigned maxLog, void *workspace, size_t workspaceSize)
++{
++	const BYTE *const istart = (const BYTE *)cSrc;
++	const BYTE *ip = istart;
++	unsigned tableLog;
++	unsigned maxSymbolValue = FSE_MAX_SYMBOL_VALUE;
++	size_t NCountLength;
++
++	FSE_DTable *dt;
++	short *counting;
++	size_t spaceUsed32 = 0;
++
++	FSE_STATIC_ASSERT(sizeof(FSE_DTable) == sizeof(U32));
++
++	dt = (FSE_DTable *)((U32 *)workspace + spaceUsed32);
++	spaceUsed32 += FSE_DTABLE_SIZE_U32(maxLog);
++	counting = (short *)((U32 *)workspace + spaceUsed32);
++	spaceUsed32 += ALIGN(sizeof(short) * (FSE_MAX_SYMBOL_VALUE + 1), sizeof(U32)) >> 2;
++
++	if ((spaceUsed32 << 2) > workspaceSize)
++		return ERROR(tableLog_tooLarge);
++	workspace = (U32 *)workspace + spaceUsed32;
++	workspaceSize -= (spaceUsed32 << 2);
++
++	/* normal FSE decoding mode */
++	NCountLength = FSE_readNCount(counting, &maxSymbolValue, &tableLog, istart, cSrcSize);
++	if (FSE_isError(NCountLength))
++		return NCountLength;
++	// if (NCountLength >= cSrcSize) return ERROR(srcSize_wrong);   /* too small input size; supposed to be already checked in NCountLength, only remaining
++	// case : NCountLength==cSrcSize */
++	if (tableLog > maxLog)
++		return ERROR(tableLog_tooLarge);
++	ip += NCountLength;
++	cSrcSize -= NCountLength;
++
++	CHECK_F(FSE_buildDTable_wksp(dt, counting, maxSymbolValue, tableLog, workspace, workspaceSize));
++
++	return FSE_decompress_usingDTable(dst, dstCapacity, ip, cSrcSize, dt); /* always return, even if it is an error code */
++}
+diff --git a/xen/common/zstd/huf.h b/xen/common/zstd/huf.h
+new file mode 100644
+index 000000000000..a9d522c7bb7b
+--- /dev/null
++++ b/xen/common/zstd/huf.h
+@@ -0,0 +1,212 @@
++/*
++ * Huffman coder, part of New Generation Entropy library
++ * header file
++ * Copyright (C) 2013-2016, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ * notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ * copyright notice, this list of conditions and the following disclaimer
++ * in the documentation and/or other materials provided with the
++ * distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at :
++ * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
++ */
++#ifndef HUF_H_298734234
++#define HUF_H_298734234
++
++/* *** Dependencies *** */
++#include <xen/types.h> /* size_t */
++
++/* ***   Tool functions *** */
++#define HUF_BLOCKSIZE_MAX (128 * 1024) /**< maximum input size for a single block compressed with HUF_compress */
++size_t HUF_compressBound(size_t size); /**< maximum compressed size (worst case) */
++
++/* Error Management */
++unsigned HUF_isError(size_t code); /**< tells if a return value is an error code */
++
++/* ***   Advanced function   *** */
++
++/** HUF_compress4X_wksp() :
++*   Same as HUF_compress2(), but uses externally allocated `workSpace`, which must be a table of >= 1024 unsigned */
++size_t HUF_compress4X_wksp(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace,
++			   size_t wkspSize); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */
++
++/* *** Dependencies *** */
++#include "mem.h" /* U32 */
++
++/* *** Constants *** */
++#define HUF_TABLELOG_MAX 12     /* max configured tableLog (for static allocation); can be modified up to HUF_ABSOLUTEMAX_TABLELOG */
++#define HUF_TABLELOG_DEFAULT 11 /* tableLog by default, when not specified */
++#define HUF_SYMBOLVALUE_MAX 255
++
++#define HUF_TABLELOG_ABSOLUTEMAX 15 /* absolute limit of HUF_MAX_TABLELOG. Beyond that value, code does not work */
++#if (HUF_TABLELOG_MAX > HUF_TABLELOG_ABSOLUTEMAX)
++#error "HUF_TABLELOG_MAX is too large !"
++#endif
++
++/* ****************************************
++*  Static allocation
++******************************************/
++/* HUF buffer bounds */
++#define HUF_CTABLEBOUND 129
++#define HUF_BLOCKBOUND(size) (size + (size >> 8) + 8)			 /* only true if incompressible pre-filtered with fast heuristic */
++#define HUF_COMPRESSBOUND(size) (HUF_CTABLEBOUND + HUF_BLOCKBOUND(size)) /* Macro version, useful for static allocation */
++
++/* static allocation of HUF's Compression Table */
++#define HUF_CREATE_STATIC_CTABLE(name, maxSymbolValue) \
++	U32 name##hb[maxSymbolValue + 1];              \
++	void *name##hv = &(name##hb);                  \
++	HUF_CElt *name = (HUF_CElt *)(name##hv) /* no final ; */
++
++/* static allocation of HUF's DTable */
++typedef U32 HUF_DTable;
++#define HUF_DTABLE_SIZE(maxTableLog) (1 + (1 << (maxTableLog)))
++#define HUF_CREATE_STATIC_DTABLEX2(DTable, maxTableLog) HUF_DTable DTable[HUF_DTABLE_SIZE((maxTableLog)-1)] = {((U32)((maxTableLog)-1) * 0x01000001)}
++#define HUF_CREATE_STATIC_DTABLEX4(DTable, maxTableLog) HUF_DTable DTable[HUF_DTABLE_SIZE(maxTableLog)] = {((U32)(maxTableLog)*0x01000001)}
++
++/* The workspace must have alignment at least 4 and be at least this large */
++#define HUF_COMPRESS_WORKSPACE_SIZE (6 << 10)
++#define HUF_COMPRESS_WORKSPACE_SIZE_U32 (HUF_COMPRESS_WORKSPACE_SIZE / sizeof(U32))
++
++/* The workspace must have alignment at least 4 and be at least this large */
++#define HUF_DECOMPRESS_WORKSPACE_SIZE (3 << 10)
++#define HUF_DECOMPRESS_WORKSPACE_SIZE_U32 (HUF_DECOMPRESS_WORKSPACE_SIZE / sizeof(U32))
++
++/* ****************************************
++*  Advanced decompression functions
++******************************************/
++size_t HUF_decompress4X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize); /**< decodes RLE and uncompressed */
++size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace,
++				size_t workspaceSize);							       /**< considers RLE and uncompressed as errors */
++size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace,
++				   size_t workspaceSize); /**< single-symbol decoder */
++size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace,
++				   size_t workspaceSize); /**< double-symbols decoder */
++
++/* ****************************************
++*  HUF detailed API
++******************************************/
++/*!
++HUF_compress() does the following:
++1. count symbol occurrence from source[] into table count[] using FSE_count()
++2. (optional) refine tableLog using HUF_optimalTableLog()
++3. build Huffman table from count using HUF_buildCTable()
++4. save Huffman table to memory buffer using HUF_writeCTable_wksp()
++5. encode the data stream using HUF_compress4X_usingCTable()
++
++The following API allows targeting specific sub-functions for advanced tasks.
++For example, it's possible to compress several blocks using the same 'CTable',
++or to save and regenerate 'CTable' using external methods.
++*/
++/* FSE_count() : find it within "fse.h" */
++unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigned maxSymbolValue);
++typedef struct HUF_CElt_s HUF_CElt; /* incomplete type */
++size_t HUF_writeCTable_wksp(void *dst, size_t maxDstSize, const HUF_CElt *CTable, unsigned maxSymbolValue, unsigned huffLog, void *workspace, size_t workspaceSize);
++size_t HUF_compress4X_usingCTable(void *dst, size_t dstSize, const void *src, size_t srcSize, const HUF_CElt *CTable);
++
++typedef enum {
++	HUF_repeat_none,  /**< Cannot use the previous table */
++	HUF_repeat_check, /**< Can use the previous table but it must be checked. Note : The previous table must have been constructed by HUF_compress{1,
++			     4}X_repeat */
++	HUF_repeat_valid  /**< Can use the previous table and it is asumed to be valid */
++} HUF_repeat;
++/** HUF_compress4X_repeat() :
++*   Same as HUF_compress4X_wksp(), but considers using hufTable if *repeat != HUF_repeat_none.
++*   If it uses hufTable it does not modify hufTable or repeat.
++*   If it doesn't, it sets *repeat = HUF_repeat_none, and it sets hufTable to the table used.
++*   If preferRepeat then the old table will always be used if valid. */
++size_t HUF_compress4X_repeat(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace,
++			     size_t wkspSize, HUF_CElt *hufTable, HUF_repeat *repeat,
++			     int preferRepeat); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */
++
++/** HUF_buildCTable_wksp() :
++ *  Same as HUF_buildCTable(), but using externally allocated scratch buffer.
++ *  `workSpace` must be aligned on 4-bytes boundaries, and be at least as large as a table of 1024 unsigned.
++ */
++size_t HUF_buildCTable_wksp(HUF_CElt *tree, const U32 *count, U32 maxSymbolValue, U32 maxNbBits, void *workSpace, size_t wkspSize);
++
++/*! HUF_readStats() :
++	Read compact Huffman tree, saved by HUF_writeCTable().
++	`huffWeight` is destination buffer.
++	@return : size read from `src` , or an error Code .
++	Note : Needed by HUF_readCTable() and HUF_readDTableXn() . */
++size_t HUF_readStats_wksp(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize,
++			  void *workspace, size_t workspaceSize);
++
++/** HUF_readCTable() :
++*   Loading a CTable saved with HUF_writeCTable() */
++size_t HUF_readCTable_wksp(HUF_CElt *CTable, unsigned maxSymbolValue, const void *src, size_t srcSize, void *workspace, size_t workspaceSize);
++
++/*
++HUF_decompress() does the following:
++1. select the decompression algorithm (X2, X4) based on pre-computed heuristics
++2. build Huffman table from save, using HUF_readDTableXn()
++3. decode 1 or 4 segments in parallel using HUF_decompressSXn_usingDTable
++*/
++
++/** HUF_selectDecoder() :
++*   Tells which decoder is likely to decode faster,
++*   based on a set of pre-determined metrics.
++*   @return : 0==HUF_decompress4X2, 1==HUF_decompress4X4 .
++*   Assumption : 0 < cSrcSize < dstSize <= 128 KB */
++U32 HUF_selectDecoder(size_t dstSize, size_t cSrcSize);
++
++size_t HUF_readDTableX2_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize);
++size_t HUF_readDTableX4_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize);
++
++size_t HUF_decompress4X_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable);
++size_t HUF_decompress4X2_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable);
++size_t HUF_decompress4X4_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable);
++
++/* single stream variants */
++
++size_t HUF_compress1X_wksp(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace,
++			   size_t wkspSize); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */
++size_t HUF_compress1X_usingCTable(void *dst, size_t dstSize, const void *src, size_t srcSize, const HUF_CElt *CTable);
++/** HUF_compress1X_repeat() :
++*   Same as HUF_compress1X_wksp(), but considers using hufTable if *repeat != HUF_repeat_none.
++*   If it uses hufTable it does not modify hufTable or repeat.
++*   If it doesn't, it sets *repeat = HUF_repeat_none, and it sets hufTable to the table used.
++*   If preferRepeat then the old table will always be used if valid. */
++size_t HUF_compress1X_repeat(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace,
++			     size_t wkspSize, HUF_CElt *hufTable, HUF_repeat *repeat,
++			     int preferRepeat); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */
++
++size_t HUF_decompress1X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize);
++size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace,
++				   size_t workspaceSize); /**< single-symbol decoder */
++size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace,
++				   size_t workspaceSize); /**< double-symbols decoder */
++
++size_t HUF_decompress1X_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize,
++				    const HUF_DTable *DTable); /**< automatic selection of sing or double symbol decoder, based on DTable */
++size_t HUF_decompress1X2_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable);
++size_t HUF_decompress1X4_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable);
++
++#endif /* HUF_H_298734234 */
+diff --git a/xen/common/zstd/huf_decompress.c b/xen/common/zstd/huf_decompress.c
+new file mode 100644
+index 000000000000..341619e64246
+--- /dev/null
++++ b/xen/common/zstd/huf_decompress.c
+@@ -0,0 +1,960 @@
++/*
++ * Huffman decoder, part of New Generation Entropy library
++ * Copyright (C) 2013-2016, Yann Collet.
++ *
++ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
++ *
++ * Redistribution and use in source and binary forms, with or without
++ * modification, are permitted provided that the following conditions are
++ * met:
++ *
++ *   * Redistributions of source code must retain the above copyright
++ * notice, this list of conditions and the following disclaimer.
++ *   * Redistributions in binary form must reproduce the above
++ * copyright notice, this list of conditions and the following disclaimer
++ * in the documentation and/or other materials provided with the
++ * distribution.
++ *
++ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
++ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
++ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
++ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
++ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
++ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
++ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
++ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
++ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
++ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
++ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ *
++ * You can contact the author at :
++ * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
++ */
++
++/* **************************************************************
++*  Compiler specifics
++****************************************************************/
++#define FORCE_INLINE static always_inline
++
++/* **************************************************************
++*  Dependencies
++****************************************************************/
++#include "bitstream.h" /* BIT_* */
++#include "fse.h"       /* header compression */
++#include "huf.h"
++#include <xen/compiler.h>
++#include <xen/string.h> /* memcpy, memset */
++
++/* **************************************************************
++*  Error Management
++****************************************************************/
++#define HUF_STATIC_ASSERT(c)                                   \
++	{                                                      \
++		enum { HUF_static_assert = 1 / (int)(!!(c)) }; \
++	} /* use only *after* variable declarations */
++
++/*-***************************/
++/*  generic DTableDesc       */
++/*-***************************/
++
++typedef struct {
++	BYTE maxTableLog;
++	BYTE tableType;
++	BYTE tableLog;
++	BYTE reserved;
++} DTableDesc;
++
++static DTableDesc INIT HUF_getDTableDesc(const HUF_DTable *table)
++{
++	DTableDesc dtd;
++	memcpy(&dtd, table, sizeof(dtd));
++	return dtd;
++}
++
++/*-***************************/
++/*  single-symbol decoding   */
++/*-***************************/
++
++typedef struct {
++	BYTE byte;
++	BYTE nbBits;
++} HUF_DEltX2; /* single-symbol decoding */
++
++size_t INIT HUF_readDTableX2_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize)
++{
++	U32 tableLog = 0;
++	U32 nbSymbols = 0;
++	size_t iSize;
++	void *const dtPtr = DTable + 1;
++	HUF_DEltX2 *const dt = (HUF_DEltX2 *)dtPtr;
++
++	U32 *rankVal;
++	BYTE *huffWeight;
++	size_t spaceUsed32 = 0;
++
++	rankVal = (U32 *)workspace + spaceUsed32;
++	spaceUsed32 += HUF_TABLELOG_ABSOLUTEMAX + 1;
++	huffWeight = (BYTE *)((U32 *)workspace + spaceUsed32);
++	spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2;
++
++	if ((spaceUsed32 << 2) > workspaceSize)
++		return ERROR(tableLog_tooLarge);
++	workspace = (U32 *)workspace + spaceUsed32;
++	workspaceSize -= (spaceUsed32 << 2);
++
++	HUF_STATIC_ASSERT(sizeof(DTableDesc) == sizeof(HUF_DTable));
++	/* memset(huffWeight, 0, sizeof(huffWeight)); */ /* is not necessary, even though some analyzer complain ... */
++
++	iSize = HUF_readStats_wksp(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize, workspace, workspaceSize);
++	if (HUF_isError(iSize))
++		return iSize;
++
++	/* Table header */
++	{
++		DTableDesc dtd = HUF_getDTableDesc(DTable);
++		if (tableLog > (U32)(dtd.maxTableLog + 1))
++			return ERROR(tableLog_tooLarge); /* DTable too small, Huffman tree cannot fit in */
++		dtd.tableType = 0;
++		dtd.tableLog = (BYTE)tableLog;
++		memcpy(DTable, &dtd, sizeof(dtd));
++	}
++
++	/* Calculate starting value for each rank */
++	{
++		U32 n, nextRankStart = 0;
++		for (n = 1; n < tableLog + 1; n++) {
++			U32 const curr = nextRankStart;
++			nextRankStart += (rankVal[n] << (n - 1));
++			rankVal[n] = curr;
++		}
++	}
++
++	/* fill DTable */
++	{
++		U32 n;
++		for (n = 0; n < nbSymbols; n++) {
++			U32 const w = huffWeight[n];
++			U32 const length = (1 << w) >> 1;
++			U32 u;
++			HUF_DEltX2 D;
++			D.byte = (BYTE)n;
++			D.nbBits = (BYTE)(tableLog + 1 - w);
++			for (u = rankVal[w]; u < rankVal[w] + length; u++)
++				dt[u] = D;
++			rankVal[w] += length;
++		}
++	}
++
++	return iSize;
++}
++
++static BYTE INIT HUF_decodeSymbolX2(BIT_DStream_t *Dstream, const HUF_DEltX2 *dt, const U32 dtLog)
++{
++	size_t const val = BIT_lookBitsFast(Dstream, dtLog); /* note : dtLog >= 1 */
++	BYTE const c = dt[val].byte;
++	BIT_skipBits(Dstream, dt[val].nbBits);
++	return c;
++}
++
++#define HUF_DECODE_SYMBOLX2_0(ptr, DStreamPtr) *ptr++ = HUF_decodeSymbolX2(DStreamPtr, dt, dtLog)
++
++#define HUF_DECODE_SYMBOLX2_1(ptr, DStreamPtr)         \
++	if (ZSTD_64bits() || (HUF_TABLELOG_MAX <= 12)) \
++	HUF_DECODE_SYMBOLX2_0(ptr, DStreamPtr)
++
++#define HUF_DECODE_SYMBOLX2_2(ptr, DStreamPtr) \
++	if (ZSTD_64bits())                     \
++	HUF_DECODE_SYMBOLX2_0(ptr, DStreamPtr)
++
++FORCE_INLINE size_t HUF_decodeStreamX2(BYTE *p, BIT_DStream_t *const bitDPtr, BYTE *const pEnd, const HUF_DEltX2 *const dt, const U32 dtLog)
++{
++	BYTE *const pStart = p;
++
++	/* up to 4 symbols at a time */
++	while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) && (p <= pEnd - 4)) {
++		HUF_DECODE_SYMBOLX2_2(p, bitDPtr);
++		HUF_DECODE_SYMBOLX2_1(p, bitDPtr);
++		HUF_DECODE_SYMBOLX2_2(p, bitDPtr);
++		HUF_DECODE_SYMBOLX2_0(p, bitDPtr);
++	}
++
++	/* closer to the end */
++	while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) && (p < pEnd))
++		HUF_DECODE_SYMBOLX2_0(p, bitDPtr);
++
++	/* no more data to retrieve from bitstream, hence no need to reload */
++	while (p < pEnd)
++		HUF_DECODE_SYMBOLX2_0(p, bitDPtr);
++
++	return pEnd - pStart;
++}
++
++static size_t INIT HUF_decompress1X2_usingDTable_internal(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	BYTE *op = (BYTE *)dst;
++	BYTE *const oend = op + dstSize;
++	const void *dtPtr = DTable + 1;
++	const HUF_DEltX2 *const dt = (const HUF_DEltX2 *)dtPtr;
++	BIT_DStream_t bitD;
++	DTableDesc const dtd = HUF_getDTableDesc(DTable);
++	U32 const dtLog = dtd.tableLog;
++
++	{
++		size_t const errorCode = BIT_initDStream(&bitD, cSrc, cSrcSize);
++		if (HUF_isError(errorCode))
++			return errorCode;
++	}
++
++	HUF_decodeStreamX2(op, &bitD, oend, dt, dtLog);
++
++	/* check */
++	if (!BIT_endOfDStream(&bitD))
++		return ERROR(corruption_detected);
++
++	return dstSize;
++}
++
++size_t INIT HUF_decompress1X2_usingDTable(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	DTableDesc dtd = HUF_getDTableDesc(DTable);
++	if (dtd.tableType != 0)
++		return ERROR(GENERIC);
++	return HUF_decompress1X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable);
++}
++
++size_t INIT HUF_decompress1X2_DCtx_wksp(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize)
++{
++	const BYTE *ip = (const BYTE *)cSrc;
++
++	size_t const hSize = HUF_readDTableX2_wksp(DCtx, cSrc, cSrcSize, workspace, workspaceSize);
++	if (HUF_isError(hSize))
++		return hSize;
++	if (hSize >= cSrcSize)
++		return ERROR(srcSize_wrong);
++	ip += hSize;
++	cSrcSize -= hSize;
++
++	return HUF_decompress1X2_usingDTable_internal(dst, dstSize, ip, cSrcSize, DCtx);
++}
++
++static size_t INIT HUF_decompress4X2_usingDTable_internal(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	/* Check */
++	if (cSrcSize < 10)
++		return ERROR(corruption_detected); /* strict minimum : jump table + 1 byte per stream */
++
++	{
++		const BYTE *const istart = (const BYTE *)cSrc;
++		BYTE *const ostart = (BYTE *)dst;
++		BYTE *const oend = ostart + dstSize;
++		const void *const dtPtr = DTable + 1;
++		const HUF_DEltX2 *const dt = (const HUF_DEltX2 *)dtPtr;
++
++		/* Init */
++		BIT_DStream_t bitD1;
++		BIT_DStream_t bitD2;
++		BIT_DStream_t bitD3;
++		BIT_DStream_t bitD4;
++		size_t const length1 = ZSTD_readLE16(istart);
++		size_t const length2 = ZSTD_readLE16(istart + 2);
++		size_t const length3 = ZSTD_readLE16(istart + 4);
++		size_t const length4 = cSrcSize - (length1 + length2 + length3 + 6);
++		const BYTE *const istart1 = istart + 6; /* jumpTable */
++		const BYTE *const istart2 = istart1 + length1;
++		const BYTE *const istart3 = istart2 + length2;
++		const BYTE *const istart4 = istart3 + length3;
++		const size_t segmentSize = (dstSize + 3) / 4;
++		BYTE *const opStart2 = ostart + segmentSize;
++		BYTE *const opStart3 = opStart2 + segmentSize;
++		BYTE *const opStart4 = opStart3 + segmentSize;
++		BYTE *op1 = ostart;
++		BYTE *op2 = opStart2;
++		BYTE *op3 = opStart3;
++		BYTE *op4 = opStart4;
++		U32 endSignal;
++		DTableDesc const dtd = HUF_getDTableDesc(DTable);
++		U32 const dtLog = dtd.tableLog;
++
++		if (length4 > cSrcSize)
++			return ERROR(corruption_detected); /* overflow */
++		{
++			size_t const errorCode = BIT_initDStream(&bitD1, istart1, length1);
++			if (HUF_isError(errorCode))
++				return errorCode;
++		}
++		{
++			size_t const errorCode = BIT_initDStream(&bitD2, istart2, length2);
++			if (HUF_isError(errorCode))
++				return errorCode;
++		}
++		{
++			size_t const errorCode = BIT_initDStream(&bitD3, istart3, length3);
++			if (HUF_isError(errorCode))
++				return errorCode;
++		}
++		{
++			size_t const errorCode = BIT_initDStream(&bitD4, istart4, length4);
++			if (HUF_isError(errorCode))
++				return errorCode;
++		}
++
++		/* 16-32 symbols per loop (4-8 symbols per stream) */
++		endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
++		for (; (endSignal == BIT_DStream_unfinished) && (op4 < (oend - 7));) {
++			HUF_DECODE_SYMBOLX2_2(op1, &bitD1);
++			HUF_DECODE_SYMBOLX2_2(op2, &bitD2);
++			HUF_DECODE_SYMBOLX2_2(op3, &bitD3);
++			HUF_DECODE_SYMBOLX2_2(op4, &bitD4);
++			HUF_DECODE_SYMBOLX2_1(op1, &bitD1);
++			HUF_DECODE_SYMBOLX2_1(op2, &bitD2);
++			HUF_DECODE_SYMBOLX2_1(op3, &bitD3);
++			HUF_DECODE_SYMBOLX2_1(op4, &bitD4);
++			HUF_DECODE_SYMBOLX2_2(op1, &bitD1);
++			HUF_DECODE_SYMBOLX2_2(op2, &bitD2);
++			HUF_DECODE_SYMBOLX2_2(op3, &bitD3);
++			HUF_DECODE_SYMBOLX2_2(op4, &bitD4);
++			HUF_DECODE_SYMBOLX2_0(op1, &bitD1);
++			HUF_DECODE_SYMBOLX2_0(op2, &bitD2);
++			HUF_DECODE_SYMBOLX2_0(op3, &bitD3);
++			HUF_DECODE_SYMBOLX2_0(op4, &bitD4);
++			endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
++		}
++
++		/* check corruption */
++		if (op1 > opStart2)
++			return ERROR(corruption_detected);
++		if (op2 > opStart3)
++			return ERROR(corruption_detected);
++		if (op3 > opStart4)
++			return ERROR(corruption_detected);
++		/* note : op4 supposed already verified within main loop */
++
++		/* finish bitStreams one by one */
++		HUF_decodeStreamX2(op1, &bitD1, opStart2, dt, dtLog);
++		HUF_decodeStreamX2(op2, &bitD2, opStart3, dt, dtLog);
++		HUF_decodeStreamX2(op3, &bitD3, opStart4, dt, dtLog);
++		HUF_decodeStreamX2(op4, &bitD4, oend, dt, dtLog);
++
++		/* check */
++		endSignal = BIT_endOfDStream(&bitD1) & BIT_endOfDStream(&bitD2) & BIT_endOfDStream(&bitD3) & BIT_endOfDStream(&bitD4);
++		if (!endSignal)
++			return ERROR(corruption_detected);
++
++		/* decoded size */
++		return dstSize;
++	}
++}
++
++size_t INIT HUF_decompress4X2_usingDTable(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	DTableDesc dtd = HUF_getDTableDesc(DTable);
++	if (dtd.tableType != 0)
++		return ERROR(GENERIC);
++	return HUF_decompress4X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable);
++}
++
++size_t INIT HUF_decompress4X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize)
++{
++	const BYTE *ip = (const BYTE *)cSrc;
++
++	size_t const hSize = HUF_readDTableX2_wksp(dctx, cSrc, cSrcSize, workspace, workspaceSize);
++	if (HUF_isError(hSize))
++		return hSize;
++	if (hSize >= cSrcSize)
++		return ERROR(srcSize_wrong);
++	ip += hSize;
++	cSrcSize -= hSize;
++
++	return HUF_decompress4X2_usingDTable_internal(dst, dstSize, ip, cSrcSize, dctx);
++}
++
++/* *************************/
++/* double-symbols decoding */
++/* *************************/
++typedef struct {
++	U16 sequence;
++	BYTE nbBits;
++	BYTE length;
++} HUF_DEltX4; /* double-symbols decoding */
++
++typedef struct {
++	BYTE symbol;
++	BYTE weight;
++} sortedSymbol_t;
++
++/* HUF_fillDTableX4Level2() :
++ * `rankValOrigin` must be a table of at least (HUF_TABLELOG_MAX + 1) U32 */
++static void INIT HUF_fillDTableX4Level2(HUF_DEltX4 *DTable, U32 sizeLog, const U32 consumed, const U32 *rankValOrigin, const int minWeight,
++					const sortedSymbol_t *sortedSymbols, const U32 sortedListSize, U32 nbBitsBaseline, U16 baseSeq)
++{
++	HUF_DEltX4 DElt;
++	U32 rankVal[HUF_TABLELOG_MAX + 1];
++
++	/* get pre-calculated rankVal */
++	memcpy(rankVal, rankValOrigin, sizeof(rankVal));
++
++	/* fill skipped values */
++	if (minWeight > 1) {
++		U32 i, skipSize = rankVal[minWeight];
++		ZSTD_writeLE16(&(DElt.sequence), baseSeq);
++		DElt.nbBits = (BYTE)(consumed);
++		DElt.length = 1;
++		for (i = 0; i < skipSize; i++)
++			DTable[i] = DElt;
++	}
++
++	/* fill DTable */
++	{
++		U32 s;
++		for (s = 0; s < sortedListSize; s++) { /* note : sortedSymbols already skipped */
++			const U32 symbol = sortedSymbols[s].symbol;
++			const U32 weight = sortedSymbols[s].weight;
++			const U32 nbBits = nbBitsBaseline - weight;
++			const U32 length = 1 << (sizeLog - nbBits);
++			const U32 start = rankVal[weight];
++			U32 i = start;
++			const U32 end = start + length;
++
++			ZSTD_writeLE16(&(DElt.sequence), (U16)(baseSeq + (symbol << 8)));
++			DElt.nbBits = (BYTE)(nbBits + consumed);
++			DElt.length = 2;
++			do {
++				DTable[i++] = DElt;
++			} while (i < end); /* since length >= 1 */
++
++			rankVal[weight] += length;
++		}
++	}
++}
++
++typedef U32 rankVal_t[HUF_TABLELOG_MAX][HUF_TABLELOG_MAX + 1];
++typedef U32 rankValCol_t[HUF_TABLELOG_MAX + 1];
++
++static void INIT HUF_fillDTableX4(HUF_DEltX4 *DTable, const U32 targetLog, const sortedSymbol_t *sortedList,
++				  const U32 sortedListSize, const U32 *rankStart,
++			          rankVal_t rankValOrigin, const U32 maxWeight, const U32 nbBitsBaseline)
++{
++	U32 rankVal[HUF_TABLELOG_MAX + 1];
++	const int scaleLog = nbBitsBaseline - targetLog; /* note : targetLog >= srcLog, hence scaleLog <= 1 */
++	const U32 minBits = nbBitsBaseline - maxWeight;
++	U32 s;
++
++	memcpy(rankVal, rankValOrigin, sizeof(rankVal));
++
++	/* fill DTable */
++	for (s = 0; s < sortedListSize; s++) {
++		const U16 symbol = sortedList[s].symbol;
++		const U32 weight = sortedList[s].weight;
++		const U32 nbBits = nbBitsBaseline - weight;
++		const U32 start = rankVal[weight];
++		const U32 length = 1 << (targetLog - nbBits);
++
++		if (targetLog - nbBits >= minBits) { /* enough room for a second symbol */
++			U32 sortedRank;
++			int minWeight = nbBits + scaleLog;
++			if (minWeight < 1)
++				minWeight = 1;
++			sortedRank = rankStart[minWeight];
++			HUF_fillDTableX4Level2(DTable + start, targetLog - nbBits, nbBits, rankValOrigin[nbBits], minWeight, sortedList + sortedRank,
++					       sortedListSize - sortedRank, nbBitsBaseline, symbol);
++		} else {
++			HUF_DEltX4 DElt;
++			ZSTD_writeLE16(&(DElt.sequence), symbol);
++			DElt.nbBits = (BYTE)(nbBits);
++			DElt.length = 1;
++			{
++				U32 const end = start + length;
++				U32 u;
++				for (u = start; u < end; u++)
++					DTable[u] = DElt;
++			}
++		}
++		rankVal[weight] += length;
++	}
++}
++
++size_t INIT HUF_readDTableX4_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize)
++{
++	U32 tableLog, maxW, sizeOfSort, nbSymbols;
++	DTableDesc dtd = HUF_getDTableDesc(DTable);
++	U32 const maxTableLog = dtd.maxTableLog;
++	size_t iSize;
++	void *dtPtr = DTable + 1; /* force compiler to avoid strict-aliasing */
++	HUF_DEltX4 *const dt = (HUF_DEltX4 *)dtPtr;
++	U32 *rankStart;
++
++	rankValCol_t *rankVal;
++	U32 *rankStats;
++	U32 *rankStart0;
++	sortedSymbol_t *sortedSymbol;
++	BYTE *weightList;
++	size_t spaceUsed32 = 0;
++
++	HUF_STATIC_ASSERT((sizeof(rankValCol_t) & 3) == 0);
++
++	rankVal = (rankValCol_t *)((U32 *)workspace + spaceUsed32);
++	spaceUsed32 += (sizeof(rankValCol_t) * HUF_TABLELOG_MAX) >> 2;
++	rankStats = (U32 *)workspace + spaceUsed32;
++	spaceUsed32 += HUF_TABLELOG_MAX + 1;
++	rankStart0 = (U32 *)workspace + spaceUsed32;
++	spaceUsed32 += HUF_TABLELOG_MAX + 2;
++	sortedSymbol = (sortedSymbol_t *)((U32 *)workspace + spaceUsed32);
++	spaceUsed32 += ALIGN(sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1), sizeof(U32)) >> 2;
++	weightList = (BYTE *)((U32 *)workspace + spaceUsed32);
++	spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2;
++
++	if ((spaceUsed32 << 2) > workspaceSize)
++		return ERROR(tableLog_tooLarge);
++	workspace = (U32 *)workspace + spaceUsed32;
++	workspaceSize -= (spaceUsed32 << 2);
++
++	rankStart = rankStart0 + 1;
++	memset(rankStats, 0, sizeof(U32) * (2 * HUF_TABLELOG_MAX + 2 + 1));
++
++	HUF_STATIC_ASSERT(sizeof(HUF_DEltX4) == sizeof(HUF_DTable)); /* if compiler fails here, assertion is wrong */
++	if (maxTableLog > HUF_TABLELOG_MAX)
++		return ERROR(tableLog_tooLarge);
++	/* memset(weightList, 0, sizeof(weightList)); */ /* is not necessary, even though some analyzer complain ... */
++
++	iSize = HUF_readStats_wksp(weightList, HUF_SYMBOLVALUE_MAX + 1, rankStats, &nbSymbols, &tableLog, src, srcSize, workspace, workspaceSize);
++	if (HUF_isError(iSize))
++		return iSize;
++
++	/* check result */
++	if (tableLog > maxTableLog)
++		return ERROR(tableLog_tooLarge); /* DTable can't fit code depth */
++
++	/* find maxWeight */
++	for (maxW = tableLog; rankStats[maxW] == 0; maxW--) {
++	} /* necessarily finds a solution before 0 */
++
++	/* Get start index of each weight */
++	{
++		U32 w, nextRankStart = 0;
++		for (w = 1; w < maxW + 1; w++) {
++			U32 curr = nextRankStart;
++			nextRankStart += rankStats[w];
++			rankStart[w] = curr;
++		}
++		rankStart[0] = nextRankStart; /* put all 0w symbols at the end of sorted list*/
++		sizeOfSort = nextRankStart;
++	}
++
++	/* sort symbols by weight */
++	{
++		U32 s;
++		for (s = 0; s < nbSymbols; s++) {
++			U32 const w = weightList[s];
++			U32 const r = rankStart[w]++;
++			sortedSymbol[r].symbol = (BYTE)s;
++			sortedSymbol[r].weight = (BYTE)w;
++		}
++		rankStart[0] = 0; /* forget 0w symbols; this is beginning of weight(1) */
++	}
++
++	/* Build rankVal */
++	{
++		U32 *const rankVal0 = rankVal[0];
++		{
++			int const rescale = (maxTableLog - tableLog) - 1; /* tableLog <= maxTableLog */
++			U32 nextRankVal = 0;
++			U32 w;
++			for (w = 1; w < maxW + 1; w++) {
++				U32 curr = nextRankVal;
++				nextRankVal += rankStats[w] << (w + rescale);
++				rankVal0[w] = curr;
++			}
++		}
++		{
++			U32 const minBits = tableLog + 1 - maxW;
++			U32 consumed;
++			for (consumed = minBits; consumed < maxTableLog - minBits + 1; consumed++) {
++				U32 *const rankValPtr = rankVal[consumed];
++				U32 w;
++				for (w = 1; w < maxW + 1; w++) {
++					rankValPtr[w] = rankVal0[w] >> consumed;
++				}
++			}
++		}
++	}
++
++	HUF_fillDTableX4(dt, maxTableLog, sortedSymbol, sizeOfSort, rankStart0, rankVal, maxW, tableLog + 1);
++
++	dtd.tableLog = (BYTE)maxTableLog;
++	dtd.tableType = 1;
++	memcpy(DTable, &dtd, sizeof(dtd));
++	return iSize;
++}
++
++static U32 INIT HUF_decodeSymbolX4(void *op, BIT_DStream_t *DStream, const HUF_DEltX4 *dt, const U32 dtLog)
++{
++	size_t const val = BIT_lookBitsFast(DStream, dtLog); /* note : dtLog >= 1 */
++	memcpy(op, dt + val, 2);
++	BIT_skipBits(DStream, dt[val].nbBits);
++	return dt[val].length;
++}
++
++static U32 INIT HUF_decodeLastSymbolX4(void *op, BIT_DStream_t *DStream, const HUF_DEltX4 *dt, const U32 dtLog)
++{
++	size_t const val = BIT_lookBitsFast(DStream, dtLog); /* note : dtLog >= 1 */
++	memcpy(op, dt + val, 1);
++	if (dt[val].length == 1)
++		BIT_skipBits(DStream, dt[val].nbBits);
++	else {
++		if (DStream->bitsConsumed < (sizeof(DStream->bitContainer) * 8)) {
++			BIT_skipBits(DStream, dt[val].nbBits);
++			if (DStream->bitsConsumed > (sizeof(DStream->bitContainer) * 8))
++				/* ugly hack; works only because it's the last symbol. Note : can't easily extract nbBits from just this symbol */
++				DStream->bitsConsumed = (sizeof(DStream->bitContainer) * 8);
++		}
++	}
++	return 1;
++}
++
++#define HUF_DECODE_SYMBOLX4_0(ptr, DStreamPtr) ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
++
++#define HUF_DECODE_SYMBOLX4_1(ptr, DStreamPtr)         \
++	if (ZSTD_64bits() || (HUF_TABLELOG_MAX <= 12)) \
++	ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
++
++#define HUF_DECODE_SYMBOLX4_2(ptr, DStreamPtr) \
++	if (ZSTD_64bits())                     \
++	ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
++
++FORCE_INLINE size_t HUF_decodeStreamX4(BYTE *p, BIT_DStream_t *bitDPtr, BYTE *const pEnd, const HUF_DEltX4 *const dt, const U32 dtLog)
++{
++	BYTE *const pStart = p;
++
++	/* up to 8 symbols at a time */
++	while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) & (p < pEnd - (sizeof(bitDPtr->bitContainer) - 1))) {
++		HUF_DECODE_SYMBOLX4_2(p, bitDPtr);
++		HUF_DECODE_SYMBOLX4_1(p, bitDPtr);
++		HUF_DECODE_SYMBOLX4_2(p, bitDPtr);
++		HUF_DECODE_SYMBOLX4_0(p, bitDPtr);
++	}
++
++	/* closer to end : up to 2 symbols at a time */
++	while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) & (p <= pEnd - 2))
++		HUF_DECODE_SYMBOLX4_0(p, bitDPtr);
++
++	while (p <= pEnd - 2)
++		HUF_DECODE_SYMBOLX4_0(p, bitDPtr); /* no need to reload : reached the end of DStream */
++
++	if (p < pEnd)
++		p += HUF_decodeLastSymbolX4(p, bitDPtr, dt, dtLog);
++
++	return p - pStart;
++}
++
++static size_t INIT HUF_decompress1X4_usingDTable_internal(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	BIT_DStream_t bitD;
++
++	/* Init */
++	{
++		size_t const errorCode = BIT_initDStream(&bitD, cSrc, cSrcSize);
++		if (HUF_isError(errorCode))
++			return errorCode;
++	}
++
++	/* decode */
++	{
++		BYTE *const ostart = (BYTE *)dst;
++		BYTE *const oend = ostart + dstSize;
++		const void *const dtPtr = DTable + 1; /* force compiler to not use strict-aliasing */
++		const HUF_DEltX4 *const dt = (const HUF_DEltX4 *)dtPtr;
++		DTableDesc const dtd = HUF_getDTableDesc(DTable);
++		HUF_decodeStreamX4(ostart, &bitD, oend, dt, dtd.tableLog);
++	}
++
++	/* check */
++	if (!BIT_endOfDStream(&bitD))
++		return ERROR(corruption_detected);
++
++	/* decoded size */
++	return dstSize;
++}
++
++size_t INIT HUF_decompress1X4_usingDTable(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	DTableDesc dtd = HUF_getDTableDesc(DTable);
++	if (dtd.tableType != 1)
++		return ERROR(GENERIC);
++	return HUF_decompress1X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable);
++}
++
++size_t INIT HUF_decompress1X4_DCtx_wksp(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize)
++{
++	const BYTE *ip = (const BYTE *)cSrc;
++
++	size_t const hSize = HUF_readDTableX4_wksp(DCtx, cSrc, cSrcSize, workspace, workspaceSize);
++	if (HUF_isError(hSize))
++		return hSize;
++	if (hSize >= cSrcSize)
++		return ERROR(srcSize_wrong);
++	ip += hSize;
++	cSrcSize -= hSize;
++
++	return HUF_decompress1X4_usingDTable_internal(dst, dstSize, ip, cSrcSize, DCtx);
++}
++
++static size_t INIT HUF_decompress4X4_usingDTable_internal(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	if (cSrcSize < 10)
++		return ERROR(corruption_detected); /* strict minimum : jump table + 1 byte per stream */
++
++	{
++		const BYTE *const istart = (const BYTE *)cSrc;
++		BYTE *const ostart = (BYTE *)dst;
++		BYTE *const oend = ostart + dstSize;
++		const void *const dtPtr = DTable + 1;
++		const HUF_DEltX4 *const dt = (const HUF_DEltX4 *)dtPtr;
++
++		/* Init */
++		BIT_DStream_t bitD1;
++		BIT_DStream_t bitD2;
++		BIT_DStream_t bitD3;
++		BIT_DStream_t bitD4;
++		size_t const length1 = ZSTD_readLE16(istart);
++		size_t const length2 = ZSTD_readLE16(istart + 2);
++		size_t const length3 = ZSTD_readLE16(istart + 4);
++		size_t const length4 = cSrcSize - (length1 + length2 + length3 + 6);
++		const BYTE *const istart1 = istart + 6; /* jumpTable */
++		const BYTE *const istart2 = istart1 + length1;
++		const BYTE *const istart3 = istart2 + length2;
++		const BYTE *const istart4 = istart3 + length3;
++		size_t const segmentSize = (dstSize + 3) / 4;
++		BYTE *const opStart2 = ostart + segmentSize;
++		BYTE *const opStart3 = opStart2 + segmentSize;
++		BYTE *const opStart4 = opStart3 + segmentSize;
++		BYTE *op1 = ostart;
++		BYTE *op2 = opStart2;
++		BYTE *op3 = opStart3;
++		BYTE *op4 = opStart4;
++		U32 endSignal;
++		DTableDesc const dtd = HUF_getDTableDesc(DTable);
++		U32 const dtLog = dtd.tableLog;
++
++		if (length4 > cSrcSize)
++			return ERROR(corruption_detected); /* overflow */
++		{
++			size_t const errorCode = BIT_initDStream(&bitD1, istart1, length1);
++			if (HUF_isError(errorCode))
++				return errorCode;
++		}
++		{
++			size_t const errorCode = BIT_initDStream(&bitD2, istart2, length2);
++			if (HUF_isError(errorCode))
++				return errorCode;
++		}
++		{
++			size_t const errorCode = BIT_initDStream(&bitD3, istart3, length3);
++			if (HUF_isError(errorCode))
++				return errorCode;
++		}
++		{
++			size_t const errorCode = BIT_initDStream(&bitD4, istart4, length4);
++			if (HUF_isError(errorCode))
++				return errorCode;
++		}
++
++		/* 16-32 symbols per loop (4-8 symbols per stream) */
++		endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
++		for (; (endSignal == BIT_DStream_unfinished) & (op4 < (oend - (sizeof(bitD4.bitContainer) - 1)));) {
++			HUF_DECODE_SYMBOLX4_2(op1, &bitD1);
++			HUF_DECODE_SYMBOLX4_2(op2, &bitD2);
++			HUF_DECODE_SYMBOLX4_2(op3, &bitD3);
++			HUF_DECODE_SYMBOLX4_2(op4, &bitD4);
++			HUF_DECODE_SYMBOLX4_1(op1, &bitD1);
++			HUF_DECODE_SYMBOLX4_1(op2, &bitD2);
++			HUF_DECODE_SYMBOLX4_1(op3, &bitD3);
++			HUF_DECODE_SYMBOLX4_1(op4, &bitD4);
++			HUF_DECODE_SYMBOLX4_2(op1, &bitD1);
++			HUF_DECODE_SYMBOLX4_2(op2, &bitD2);
++			HUF_DECODE_SYMBOLX4_2(op3, &bitD3);
++			HUF_DECODE_SYMBOLX4_2(op4, &bitD4);
++			HUF_DECODE_SYMBOLX4_0(op1, &bitD1);
++			HUF_DECODE_SYMBOLX4_0(op2, &bitD2);
++			HUF_DECODE_SYMBOLX4_0(op3, &bitD3);
++			HUF_DECODE_SYMBOLX4_0(op4, &bitD4);
++
++			endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
++		}
++
++		/* check corruption */
++		if (op1 > opStart2)
++			return ERROR(corruption_detected);
++		if (op2 > opStart3)
++			return ERROR(corruption_detected);
++		if (op3 > opStart4)
++			return ERROR(corruption_detected);
++		/* note : op4 already verified within main loop */
++
++		/* finish bitStreams one by one */
++		HUF_decodeStreamX4(op1, &bitD1, opStart2, dt, dtLog);
++		HUF_decodeStreamX4(op2, &bitD2, opStart3, dt, dtLog);
++		HUF_decodeStreamX4(op3, &bitD3, opStart4, dt, dtLog);
++		HUF_decodeStreamX4(op4, &bitD4, oend, dt, dtLog);
++
++		/* check */
++		{
++			U32 const endCheck = BIT_endOfDStream(&bitD1) & BIT_endOfDStream(&bitD2) & BIT_endOfDStream(&bitD3) & BIT_endOfDStream(&bitD4);
++			if (!endCheck)
++				return ERROR(corruption_detected);
++		}
++
++		/* decoded size */
++		return dstSize;
++	}
++}
++
++size_t INIT HUF_decompress4X4_usingDTable(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	DTableDesc dtd = HUF_getDTableDesc(DTable);
++	if (dtd.tableType != 1)
++		return ERROR(GENERIC);
++	return HUF_decompress4X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable);
++}
++
++size_t INIT HUF_decompress4X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize)
++{
++	const BYTE *ip = (const BYTE *)cSrc;
++
++	size_t hSize = HUF_readDTableX4_wksp(dctx, cSrc, cSrcSize, workspace, workspaceSize);
++	if (HUF_isError(hSize))
++		return hSize;
++	if (hSize >= cSrcSize)
++		return ERROR(srcSize_wrong);
++	ip += hSize;
++	cSrcSize -= hSize;
++
++	return HUF_decompress4X4_usingDTable_internal(dst, dstSize, ip, cSrcSize, dctx);
++}
++
++/* ********************************/
++/* Generic decompression selector */
++/* ********************************/
++
++size_t INIT HUF_decompress1X_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	DTableDesc const dtd = HUF_getDTableDesc(DTable);
++	return dtd.tableType ? HUF_decompress1X4_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable)
++			     : HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable);
++}
++
++size_t INIT HUF_decompress4X_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable)
++{
++	DTableDesc const dtd = HUF_getDTableDesc(DTable);
++	return dtd.tableType ? HUF_decompress4X4_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable)
++			     : HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable);
++}
++
++typedef struct {
++	U32 tableTime;
++	U32 decode256Time;
++} algo_time_t;
++static const algo_time_t algoTime[16 /* Quantization */][3 /* single, double, quad */] = {
++    /* single, double, quad */
++    {{0, 0}, {1, 1}, {2, 2}},		     /* Q==0 : impossible */
++    {{0, 0}, {1, 1}, {2, 2}},		     /* Q==1 : impossible */
++    {{38, 130}, {1313, 74}, {2151, 38}},     /* Q == 2 : 12-18% */
++    {{448, 128}, {1353, 74}, {2238, 41}},    /* Q == 3 : 18-25% */
++    {{556, 128}, {1353, 74}, {2238, 47}},    /* Q == 4 : 25-32% */
++    {{714, 128}, {1418, 74}, {2436, 53}},    /* Q == 5 : 32-38% */
++    {{883, 128}, {1437, 74}, {2464, 61}},    /* Q == 6 : 38-44% */
++    {{897, 128}, {1515, 75}, {2622, 68}},    /* Q == 7 : 44-50% */
++    {{926, 128}, {1613, 75}, {2730, 75}},    /* Q == 8 : 50-56% */
++    {{947, 128}, {1729, 77}, {3359, 77}},    /* Q == 9 : 56-62% */
++    {{1107, 128}, {2083, 81}, {4006, 84}},   /* Q ==10 : 62-69% */
++    {{1177, 128}, {2379, 87}, {4785, 88}},   /* Q ==11 : 69-75% */
++    {{1242, 128}, {2415, 93}, {5155, 84}},   /* Q ==12 : 75-81% */
++    {{1349, 128}, {2644, 106}, {5260, 106}}, /* Q ==13 : 81-87% */
++    {{1455, 128}, {2422, 124}, {4174, 124}}, /* Q ==14 : 87-93% */
++    {{722, 128}, {1891, 145}, {1936, 146}},  /* Q ==15 : 93-99% */
++};
++
++/** HUF_selectDecoder() :
++*   Tells which decoder is likely to decode faster,
++*   based on a set of pre-determined metrics.
++*   @return : 0==HUF_decompress4X2, 1==HUF_decompress4X4 .
++*   Assumption : 0 < cSrcSize < dstSize <= 128 KB */
++U32 INIT HUF_selectDecoder(size_t dstSize, size_t cSrcSize)
++{
++	/* decoder timing evaluation */
++	U32 const Q = (U32)(cSrcSize * 16 / dstSize); /* Q < 16 since dstSize > cSrcSize */
++	U32 const D256 = (U32)(dstSize >> 8);
++	U32 const DTime0 = algoTime[Q][0].tableTime + (algoTime[Q][0].decode256Time * D256);
++	U32 DTime1 = algoTime[Q][1].tableTime + (algoTime[Q][1].decode256Time * D256);
++	DTime1 += DTime1 >> 3; /* advantage to algorithm using less memory, for cache eviction */
++
++	return DTime1 < DTime0;
++}
++
++typedef size_t (*decompressionAlgo)(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize);
++
++size_t INIT HUF_decompress4X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize)
++{
++	/* validation checks */
++	if (dstSize == 0)
++		return ERROR(dstSize_tooSmall);
++	if (cSrcSize > dstSize)
++		return ERROR(corruption_detected); /* invalid */
++	if (cSrcSize == dstSize) {
++		memcpy(dst, cSrc, dstSize);
++		return dstSize;
++	} /* not compressed */
++	if (cSrcSize == 1) {
++		memset(dst, *(const BYTE *)cSrc, dstSize);
++		return dstSize;
++	} /* RLE */
++
++	{
++		U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
++		return algoNb ? HUF_decompress4X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize)
++			      : HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize);
++	}
++}
++
++size_t INIT HUF_decompress4X_hufOnly_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize)
++{
++	/* validation checks */
++	if (dstSize == 0)
++		return ERROR(dstSize_tooSmall);
++	if ((cSrcSize >= dstSize) || (cSrcSize <= 1))
++		return ERROR(corruption_detected); /* invalid */
++
++	{
++		U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
++		return algoNb ? HUF_decompress4X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize)
++			      : HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize);
++	}
++}
++
++size_t INIT HUF_decompress1X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize)
++{
++	/* validation checks */
++	if (dstSize == 0)
++		return ERROR(dstSize_tooSmall);
++	if (cSrcSize > dstSize)
++		return ERROR(corruption_detected); /* invalid */
++	if (cSrcSize == dstSize) {
++		memcpy(dst, cSrc, dstSize);
++		return dstSize;
++	} /* not compressed */
++	if (cSrcSize == 1) {
++		memset(dst, *(const BYTE *)cSrc, dstSize);
++		return dstSize;
++	} /* RLE */
++
++	{
++		U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
++		return algoNb ? HUF_decompress1X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize)
++			      : HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize);
++	}
++}
+diff --git a/xen/common/zstd/mem.h b/xen/common/zstd/mem.h
+new file mode 100644
+index 000000000000..288320069654
+--- /dev/null
++++ b/xen/common/zstd/mem.h
+@@ -0,0 +1,151 @@
++/**
++ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
++ * All rights reserved.
++ *
++ * This source code is licensed under the BSD-style license found in the
++ * LICENSE file in the root directory of https://github.com/facebook/zstd.
++ * An additional grant of patent rights can be found in the PATENTS file in the
++ * same directory.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ */
++
++#ifndef MEM_H_MODULE
++#define MEM_H_MODULE
++
++/*-****************************************
++*  Dependencies
++******************************************/
++#include <xen/string.h> /* memcpy */
++#include <xen/types.h>  /* size_t, ptrdiff_t */
++#include <asm/unaligned.h>
++
++/*-****************************************
++*  Compiler specifics
++******************************************/
++#define ZSTD_STATIC static inline
++
++/*-**************************************************************
++*  Basic Types
++*****************************************************************/
++typedef uint8_t BYTE;
++typedef uint16_t U16;
++typedef int16_t S16;
++typedef uint32_t U32;
++typedef int32_t S32;
++typedef uint64_t U64;
++typedef int64_t S64;
++typedef ptrdiff_t iPtrDiff;
++typedef uintptr_t uPtrDiff;
++
++/*-**************************************************************
++*  Memory I/O
++*****************************************************************/
++ZSTD_STATIC unsigned ZSTD_32bits(void) { return sizeof(size_t) == 4; }
++ZSTD_STATIC unsigned ZSTD_64bits(void) { return sizeof(size_t) == 8; }
++
++#if defined(__LITTLE_ENDIAN)
++#define ZSTD_LITTLE_ENDIAN 1
++#else
++#define ZSTD_LITTLE_ENDIAN 0
++#endif
++
++ZSTD_STATIC unsigned ZSTD_isLittleEndian(void) { return ZSTD_LITTLE_ENDIAN; }
++
++ZSTD_STATIC U16 ZSTD_read16(const void *memPtr) { return get_unaligned((const U16 *)memPtr); }
++
++ZSTD_STATIC U32 ZSTD_read32(const void *memPtr) { return get_unaligned((const U32 *)memPtr); }
++
++ZSTD_STATIC U64 ZSTD_read64(const void *memPtr) { return get_unaligned((const U64 *)memPtr); }
++
++ZSTD_STATIC size_t ZSTD_readST(const void *memPtr) { return get_unaligned((const size_t *)memPtr); }
++
++ZSTD_STATIC void ZSTD_write16(void *memPtr, U16 value) { put_unaligned(value, (U16 *)memPtr); }
++
++ZSTD_STATIC void ZSTD_write32(void *memPtr, U32 value) { put_unaligned(value, (U32 *)memPtr); }
++
++ZSTD_STATIC void ZSTD_write64(void *memPtr, U64 value) { put_unaligned(value, (U64 *)memPtr); }
++
++/*=== Little endian r/w ===*/
++
++ZSTD_STATIC U16 ZSTD_readLE16(const void *memPtr) { return get_unaligned_le16(memPtr); }
++
++ZSTD_STATIC void ZSTD_writeLE16(void *memPtr, U16 val) { put_unaligned_le16(val, memPtr); }
++
++ZSTD_STATIC U32 ZSTD_readLE24(const void *memPtr) { return ZSTD_readLE16(memPtr) + (((const BYTE *)memPtr)[2] << 16); }
++
++ZSTD_STATIC void ZSTD_writeLE24(void *memPtr, U32 val)
++{
++	ZSTD_writeLE16(memPtr, (U16)val);
++	((BYTE *)memPtr)[2] = (BYTE)(val >> 16);
++}
++
++ZSTD_STATIC U32 ZSTD_readLE32(const void *memPtr) { return get_unaligned_le32(memPtr); }
++
++ZSTD_STATIC void ZSTD_writeLE32(void *memPtr, U32 val32) { put_unaligned_le32(val32, memPtr); }
++
++ZSTD_STATIC U64 ZSTD_readLE64(const void *memPtr) { return get_unaligned_le64(memPtr); }
++
++ZSTD_STATIC void ZSTD_writeLE64(void *memPtr, U64 val64) { put_unaligned_le64(val64, memPtr); }
++
++ZSTD_STATIC size_t ZSTD_readLEST(const void *memPtr)
++{
++	if (ZSTD_32bits())
++		return (size_t)ZSTD_readLE32(memPtr);
++	else
++		return (size_t)ZSTD_readLE64(memPtr);
++}
++
++ZSTD_STATIC void ZSTD_writeLEST(void *memPtr, size_t val)
++{
++	if (ZSTD_32bits())
++		ZSTD_writeLE32(memPtr, (U32)val);
++	else
++		ZSTD_writeLE64(memPtr, (U64)val);
++}
++
++/*=== Big endian r/w ===*/
++
++ZSTD_STATIC U32 ZSTD_readBE32(const void *memPtr) { return get_unaligned_be32(memPtr); }
++
++ZSTD_STATIC void ZSTD_writeBE32(void *memPtr, U32 val32) { put_unaligned_be32(val32, memPtr); }
++
++ZSTD_STATIC U64 ZSTD_readBE64(const void *memPtr) { return get_unaligned_be64(memPtr); }
++
++ZSTD_STATIC void ZSTD_writeBE64(void *memPtr, U64 val64) { put_unaligned_be64(val64, memPtr); }
++
++ZSTD_STATIC size_t ZSTD_readBEST(const void *memPtr)
++{
++	if (ZSTD_32bits())
++		return (size_t)ZSTD_readBE32(memPtr);
++	else
++		return (size_t)ZSTD_readBE64(memPtr);
++}
++
++ZSTD_STATIC void ZSTD_writeBEST(void *memPtr, size_t val)
++{
++	if (ZSTD_32bits())
++		ZSTD_writeBE32(memPtr, (U32)val);
++	else
++		ZSTD_writeBE64(memPtr, (U64)val);
++}
++
++/* function safe only for comparisons */
++ZSTD_STATIC U32 ZSTD_readMINMATCH(const void *memPtr, U32 length)
++{
++	switch (length) {
++	default:
++	case 4: return ZSTD_read32(memPtr);
++	case 3:
++		if (ZSTD_isLittleEndian())
++			return ZSTD_read32(memPtr) << 8;
++		else
++			return ZSTD_read32(memPtr) >> 8;
++	}
++}
++
++#endif /* MEM_H_MODULE */
+diff --git a/xen/common/zstd/zstd_common.c b/xen/common/zstd/zstd_common.c
+new file mode 100644
+index 000000000000..a35c4a5f14a3
+--- /dev/null
++++ b/xen/common/zstd/zstd_common.c
+@@ -0,0 +1,74 @@
++/**
++ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
++ * All rights reserved.
++ *
++ * This source code is licensed under the BSD-style license found in the
++ * LICENSE file in the root directory of https://github.com/facebook/zstd.
++ * An additional grant of patent rights can be found in the PATENTS file in the
++ * same directory.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ */
++
++/*-*************************************
++*  Dependencies
++***************************************/
++#include "error_private.h"
++#include "zstd_internal.h" /* declaration of ZSTD_isError, ZSTD_getErrorName, ZSTD_getErrorCode, ZSTD_getErrorString, ZSTD_versionNumber */
++
++/*=**************************************************************
++*  Custom allocator
++****************************************************************/
++
++#define stack_push(stack, size)                                 \
++	({                                                      \
++		void *const ptr = ZSTD_PTR_ALIGN((stack)->ptr); \
++		(stack)->ptr = (char *)ptr + (size);            \
++		(stack)->ptr <= (stack)->end ? ptr : NULL;      \
++	})
++
++ZSTD_customMem INIT ZSTD_initStack(void *workspace, size_t workspaceSize)
++{
++	ZSTD_customMem stackMem = {ZSTD_stackAlloc, ZSTD_stackFree, workspace};
++	ZSTD_stack *stack = (ZSTD_stack *)workspace;
++	/* Verify preconditions */
++	if (!workspace || workspaceSize < sizeof(ZSTD_stack) || workspace != ZSTD_PTR_ALIGN(workspace)) {
++		ZSTD_customMem error = {NULL, NULL, NULL};
++		return error;
++	}
++	/* Initialize the stack */
++	stack->ptr = workspace;
++	stack->end = (char *)workspace + workspaceSize;
++	stack_push(stack, sizeof(ZSTD_stack));
++	return stackMem;
++}
++
++void *INIT ZSTD_stackAllocAll(void *opaque, size_t *size)
++{
++	ZSTD_stack *stack = (ZSTD_stack *)opaque;
++	*size = (BYTE const *)stack->end - (BYTE *)ZSTD_PTR_ALIGN(stack->ptr);
++	return stack_push(stack, *size);
++}
++
++void *INIT ZSTD_stackAlloc(void *opaque, size_t size)
++{
++	ZSTD_stack *stack = (ZSTD_stack *)opaque;
++	return stack_push(stack, size);
++}
++void INIT ZSTD_stackFree(void *opaque, void *address)
++{
++	(void)opaque;
++	(void)address;
++}
++
++void *INIT ZSTD_malloc(size_t size, ZSTD_customMem customMem) { return customMem.customAlloc(customMem.opaque, size); }
++
++void INIT ZSTD_free(void *ptr, ZSTD_customMem customMem)
++{
++	if (ptr != NULL)
++		customMem.customFree(customMem.opaque, ptr);
++}
+diff --git a/xen/common/zstd/zstd_internal.h b/xen/common/zstd/zstd_internal.h
+new file mode 100644
+index 000000000000..7f8e5529ebfa
+--- /dev/null
++++ b/xen/common/zstd/zstd_internal.h
+@@ -0,0 +1,372 @@
++/**
++ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
++ * All rights reserved.
++ *
++ * This source code is licensed under the BSD-style license found in the
++ * LICENSE file in the root directory of https://github.com/facebook/zstd.
++ * An additional grant of patent rights can be found in the PATENTS file in the
++ * same directory.
++ *
++ * This program is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU General Public License version 2 as published by the
++ * Free Software Foundation. This program is dual-licensed; you may select
++ * either version 2 of the GNU General Public License ("GPL") or BSD license
++ * ("BSD").
++ */
++
++#ifndef ZSTD_CCOMMON_H_MODULE
++#define ZSTD_CCOMMON_H_MODULE
++
++/*-*******************************************************
++*  Compiler specifics
++*********************************************************/
++#define FORCE_INLINE static always_inline
++#define FORCE_NOINLINE static noinline INIT
++
++/*-*************************************
++*  Dependencies
++***************************************/
++#include "error_private.h"
++#include "mem.h"
++#include <xen/compiler.h>
++#include <xen/xxhash.h>
++
++#define ALIGN(x, a) ((x + (a) - 1) & ~((a) - 1))
++#define PTR_ALIGN(p, a) ((typeof(p))ALIGN((unsigned long)(p), (a)))
++
++typedef enum {
++	ZSTDnit_frameHeader,
++	ZSTDnit_blockHeader,
++	ZSTDnit_block,
++	ZSTDnit_lastBlock,
++	ZSTDnit_checksum,
++	ZSTDnit_skippableFrame
++} ZSTD_nextInputType_e;
++
++/**
++ * struct ZSTD_frameParams - zstd frame parameters stored in the frame header
++ * @frameContentSize: The frame content size, or 0 if not present.
++ * @windowSize:       The window size, or 0 if the frame is a skippable frame.
++ * @dictID:           The dictionary id, or 0 if not present.
++ * @checksumFlag:     Whether a checksum was used.
++ */
++typedef struct {
++	unsigned long long frameContentSize;
++	unsigned int windowSize;
++	unsigned int dictID;
++	unsigned int checksumFlag;
++} ZSTD_frameParams;
++
++/**
++ * struct ZSTD_inBuffer - input buffer for streaming
++ * @src:  Start of the input buffer.
++ * @size: Size of the input buffer.
++ * @pos:  Position where reading stopped. Will be updated.
++ *        Necessarily 0 <= pos <= size.
++ */
++typedef struct ZSTD_inBuffer_s {
++	const void *src;
++	size_t size;
++	size_t pos;
++} ZSTD_inBuffer;
++
++/**
++ * struct ZSTD_outBuffer - output buffer for streaming
++ * @dst:  Start of the output buffer.
++ * @size: Size of the output buffer.
++ * @pos:  Position where writing stopped. Will be updated.
++ *        Necessarily 0 <= pos <= size.
++ */
++typedef struct ZSTD_outBuffer_s {
++	void *dst;
++	size_t size;
++	size_t pos;
++} ZSTD_outBuffer;
++
++typedef struct ZSTD_CCtx_s ZSTD_CCtx;
++typedef struct ZSTD_DCtx_s ZSTD_DCtx;
++
++typedef struct ZSTD_CDict_s ZSTD_CDict;
++typedef struct ZSTD_DDict_s ZSTD_DDict;
++
++typedef struct ZSTD_CStream_s ZSTD_CStream;
++typedef struct ZSTD_DStream_s ZSTD_DStream;
++
++/*-*************************************
++*  shared macros
++***************************************/
++#define MIN(a, b) ((a) < (b) ? (a) : (b))
++#define MAX(a, b) ((a) > (b) ? (a) : (b))
++#define CHECK_F(f)                       \
++	{                                \
++		size_t const errcod = f; \
++		if (ERR_isError(errcod)) \
++			return errcod;   \
++	} /* check and Forward error code */
++#define CHECK_E(f, e)                    \
++	{                                \
++		size_t const errcod = f; \
++		if (ERR_isError(errcod)) \
++			return ERROR(e); \
++	} /* check and send Error code */
++#define ZSTD_STATIC_ASSERT(c)                                   \
++	{                                                       \
++		enum { ZSTD_static_assert = 1 / (int)(!!(c)) }; \
++	}
++
++/*-*************************************
++*  Common constants
++***************************************/
++#define ZSTD_MAGICNUMBER            0xFD2FB528   /* >= v0.8.0 */
++#define ZSTD_MAGIC_SKIPPABLE_START  0x184D2A50U
++
++#define ZSTD_OPT_NUM (1 << 12)
++#define ZSTD_DICT_MAGIC 0xEC30A437 /* v0.7+ */
++
++#define ZSTD_CONTENTSIZE_UNKNOWN (0ULL - 1)
++#define ZSTD_CONTENTSIZE_ERROR   (0ULL - 2)
++
++#define ZSTD_WINDOWLOG_MAX_32  27
++#define ZSTD_WINDOWLOG_MAX_64  27
++#define ZSTD_WINDOWLOG_MAX \
++	((unsigned int)(sizeof(size_t) == 4 \
++		? ZSTD_WINDOWLOG_MAX_32 \
++		: ZSTD_WINDOWLOG_MAX_64))
++#define ZSTD_WINDOWLOG_MIN 10
++#define ZSTD_HASHLOG_MAX ZSTD_WINDOWLOG_MAX
++#define ZSTD_HASHLOG_MIN        6
++#define ZSTD_CHAINLOG_MAX     (ZSTD_WINDOWLOG_MAX+1)
++#define ZSTD_CHAINLOG_MIN      ZSTD_HASHLOG_MIN
++#define ZSTD_HASHLOG3_MAX      17
++#define ZSTD_SEARCHLOG_MAX    (ZSTD_WINDOWLOG_MAX-1)
++#define ZSTD_SEARCHLOG_MIN      1
++/* only for ZSTD_fast, other strategies are limited to 6 */
++#define ZSTD_SEARCHLENGTH_MAX   7
++/* only for ZSTD_btopt, other strategies are limited to 4 */
++#define ZSTD_SEARCHLENGTH_MIN   3
++#define ZSTD_TARGETLENGTH_MIN   4
++#define ZSTD_TARGETLENGTH_MAX 999
++
++#define ZSTD_REP_NUM 3		      /* number of repcodes */
++#define ZSTD_REP_CHECK (ZSTD_REP_NUM) /* number of repcodes to check by the optimal parser */
++#define ZSTD_REP_MOVE (ZSTD_REP_NUM - 1)
++#define ZSTD_REP_MOVE_OPT (ZSTD_REP_NUM)
++static const U32 repStartValue[ZSTD_REP_NUM] = {1, 4, 8};
++
++/* for static allocation */
++#define ZSTD_FRAMEHEADERSIZE_MAX 18
++#define ZSTD_FRAMEHEADERSIZE_MIN  6
++static const size_t ZSTD_frameHeaderSize_prefix = 5;
++static const size_t ZSTD_frameHeaderSize_min = ZSTD_FRAMEHEADERSIZE_MIN;
++static const size_t ZSTD_frameHeaderSize_max = ZSTD_FRAMEHEADERSIZE_MAX;
++/* magic number + skippable frame length */
++static const size_t ZSTD_skippableHeaderSize = 8;
++
++#define ZSTD_BLOCKSIZE_ABSOLUTEMAX (128 * 1024)
++
++#if 0 /* These don't seem to be usable - not sure what their purpose is. */
++#define KB *(1 << 10)
++#define MB *(1 << 20)
++#define GB *(1U << 30)
++#endif
++
++#define BIT7 128
++#define BIT6 64
++#define BIT5 32
++#define BIT4 16
++#define BIT1 2
++#define BIT0 1
++
++#define ZSTD_WINDOWLOG_ABSOLUTEMIN 10
++static const size_t ZSTD_fcs_fieldSize[4] = {0, 2, 4, 8};
++static const size_t ZSTD_did_fieldSize[4] = {0, 1, 2, 4};
++
++#define ZSTD_BLOCKHEADERSIZE 3 /* C standard doesn't allow `static const` variable to be init using another `static const` variable */
++static const size_t ZSTD_blockHeaderSize = ZSTD_BLOCKHEADERSIZE;
++typedef enum { bt_raw, bt_rle, bt_compressed, bt_reserved } blockType_e;
++
++#define MIN_SEQUENCES_SIZE 1									  /* nbSeq==0 */
++#define MIN_CBLOCK_SIZE (1 /*litCSize*/ + 1 /* RLE or RAW */ + MIN_SEQUENCES_SIZE /* nbSeq==0 */) /* for a non-null block */
++
++#define HufLog 12
++typedef enum { set_basic, set_rle, set_compressed, set_repeat } symbolEncodingType_e;
++
++#define LONGNBSEQ 0x7F00
++
++#define MINMATCH 3
++#define EQUAL_READ32 4
++
++#define Litbits 8
++#define MaxLit ((1 << Litbits) - 1)
++#define MaxML 52
++#define MaxLL 35
++#define MaxOff 28
++#define MaxSeq MAX(MaxLL, MaxML) /* Assumption : MaxOff < MaxLL,MaxML */
++#define MLFSELog 9
++#define LLFSELog 9
++#define OffFSELog 8
++
++static const U32 LL_bits[MaxLL + 1] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 3, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
++static const S16 LL_defaultNorm[MaxLL + 1] = {4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1, -1, -1, -1, -1};
++#define LL_DEFAULTNORMLOG 6 /* for static allocation */
++static const U32 LL_defaultNormLog = LL_DEFAULTNORMLOG;
++
++static const U32 ML_bits[MaxML + 1] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0,  0,  0,  0,  0,  0, 0,
++				       0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
++static const S16 ML_defaultNorm[MaxML + 1] = {1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  1,  1,  1,  1,  1,  1, 1,
++					      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1};
++#define ML_DEFAULTNORMLOG 6 /* for static allocation */
++static const U32 ML_defaultNormLog = ML_DEFAULTNORMLOG;
++
++static const S16 OF_defaultNorm[MaxOff + 1] = {1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1};
++#define OF_DEFAULTNORMLOG 5 /* for static allocation */
++static const U32 OF_defaultNormLog = OF_DEFAULTNORMLOG;
++
++/*-*******************************************
++*  Shared functions to include for inlining
++*********************************************/
++ZSTD_STATIC void ZSTD_copy8(void *dst, const void *src) {
++	/*
++	 * zstd relies heavily on gcc being able to analyze and inline this
++	 * memcpy() call, since it is called in a tight loop. Preboot mode
++	 * is compiled in freestanding mode, which stops gcc from analyzing
++	 * memcpy(). Use __builtin_memcpy() to tell gcc to analyze this as a
++	 * regular memcpy().
++	 */
++	__builtin_memcpy(dst, src, 8);
++}
++/*! ZSTD_wildcopy() :
++*   custom version of memcpy(), can copy up to 7 bytes too many (8 bytes if length==0) */
++#define WILDCOPY_OVERLENGTH 8
++ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length)
++{
++	const BYTE* ip = (const BYTE*)src;
++	BYTE* op = (BYTE*)dst;
++	BYTE* const oend = op + length;
++#if defined(GCC_VERSION) && GCC_VERSION >= 70000 && GCC_VERSION < 70200
++	/*
++	 * Work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388.
++	 * Avoid the bad case where the loop only runs once by handling the
++	 * special case separately. This doesn't trigger the bug because it
++	 * doesn't involve pointer/integer overflow.
++	 */
++	if (length <= 8)
++		return ZSTD_copy8(dst, src);
++#endif
++	do {
++		ZSTD_copy8(op, ip);
++		op += 8;
++		ip += 8;
++	} while (op < oend);
++}
++
++/*-*******************************************
++*  Private interfaces
++*********************************************/
++typedef struct ZSTD_stats_s ZSTD_stats_t;
++
++typedef struct {
++	U32 off;
++	U32 len;
++} ZSTD_match_t;
++
++typedef struct {
++	U32 price;
++	U32 off;
++	U32 mlen;
++	U32 litlen;
++	U32 rep[ZSTD_REP_NUM];
++} ZSTD_optimal_t;
++
++typedef struct seqDef_s {
++	U32 offset;
++	U16 litLength;
++	U16 matchLength;
++} seqDef;
++
++typedef struct {
++	seqDef *sequencesStart;
++	seqDef *sequences;
++	BYTE *litStart;
++	BYTE *lit;
++	BYTE *llCode;
++	BYTE *mlCode;
++	BYTE *ofCode;
++	U32 longLengthID; /* 0 == no longLength; 1 == Lit.longLength; 2 == Match.longLength; */
++	U32 longLengthPos;
++	/* opt */
++	ZSTD_optimal_t *priceTable;
++	ZSTD_match_t *matchTable;
++	U32 *matchLengthFreq;
++	U32 *litLengthFreq;
++	U32 *litFreq;
++	U32 *offCodeFreq;
++	U32 matchLengthSum;
++	U32 matchSum;
++	U32 litLengthSum;
++	U32 litSum;
++	U32 offCodeSum;
++	U32 log2matchLengthSum;
++	U32 log2matchSum;
++	U32 log2litLengthSum;
++	U32 log2litSum;
++	U32 log2offCodeSum;
++	U32 factor;
++	U32 staticPrices;
++	U32 cachedPrice;
++	U32 cachedLitLength;
++	const BYTE *cachedLiterals;
++} seqStore_t;
++
++const seqStore_t *ZSTD_getSeqStore(const ZSTD_CCtx *ctx);
++void ZSTD_seqToCodes(const seqStore_t *seqStorePtr);
++int ZSTD_isSkipFrame(ZSTD_DCtx *dctx);
++
++/*= Custom memory allocation functions */
++typedef void *(*ZSTD_allocFunction)(void *opaque, size_t size);
++typedef void (*ZSTD_freeFunction)(void *opaque, void *address);
++typedef struct {
++	ZSTD_allocFunction customAlloc;
++	ZSTD_freeFunction customFree;
++	void *opaque;
++} ZSTD_customMem;
++
++void *ZSTD_malloc(size_t size, ZSTD_customMem customMem);
++void ZSTD_free(void *ptr, ZSTD_customMem customMem);
++
++/*====== stack allocation  ======*/
++
++typedef struct {
++	void *ptr;
++	const void *end;
++} ZSTD_stack;
++
++#define ZSTD_ALIGN(x) ALIGN(x, sizeof(size_t))
++#define ZSTD_PTR_ALIGN(p) PTR_ALIGN(p, sizeof(size_t))
++
++ZSTD_customMem ZSTD_initStack(void *workspace, size_t workspaceSize);
++
++void *ZSTD_stackAllocAll(void *opaque, size_t *size);
++void *ZSTD_stackAlloc(void *opaque, size_t size);
++void ZSTD_stackFree(void *opaque, void *address);
++
++/*======  common function  ======*/
++
++ZSTD_STATIC U32 ZSTD_highbit32(U32 val) { return 31 - __builtin_clz(val); }
++
++/* hidden functions */
++
++/* ZSTD_invalidateRepCodes() :
++ * ensures next compression will not use repcodes from previous block.
++ * Note : only works with regular variant;
++ *        do not use with extDict variant ! */
++void ZSTD_invalidateRepCodes(ZSTD_CCtx *cctx);
++
++size_t ZSTD_freeCCtx(ZSTD_CCtx *cctx);
++size_t ZSTD_freeDCtx(ZSTD_DCtx *dctx);
++size_t ZSTD_freeCDict(ZSTD_CDict *cdict);
++size_t ZSTD_freeDDict(ZSTD_DDict *cdict);
++size_t ZSTD_freeCStream(ZSTD_CStream *zcs);
++size_t ZSTD_freeDStream(ZSTD_DStream *zds);
++
++#endif /* ZSTD_CCOMMON_H_MODULE */
+diff --git a/xen/include/asm-arm/types.h b/xen/include/asm-arm/types.h
+index 30f95078cb0a..47696916d740 100644
+--- a/xen/include/asm-arm/types.h
++++ b/xen/include/asm-arm/types.h
+@@ -61,6 +61,12 @@ typedef unsigned long size_t;
+ #endif
+ typedef signed long ssize_t;
+ 
++#if defined(__PTRDIFF_TYPE__)
++typedef __PTRDIFF_TYPE__ ptrdiff_t;
++#else
++typedef signed long ptrdiff_t;
++#endif
++
+ #endif /* __ASSEMBLY__ */
+ 
+ #endif /* __ARM_TYPES_H__ */
+diff --git a/xen/include/asm-x86/types.h b/xen/include/asm-x86/types.h
+index fdf4f7dcc0bb..781713204876 100644
+--- a/xen/include/asm-x86/types.h
++++ b/xen/include/asm-x86/types.h
+@@ -39,6 +39,12 @@ typedef unsigned long size_t;
+ #endif
+ typedef signed long ssize_t;
+ 
++#if defined(__PTRDIFF_TYPE__)
++typedef __PTRDIFF_TYPE__ ptrdiff_t;
++#else
++typedef signed long ptrdiff_t;
++#endif
++
+ #endif /* __ASSEMBLY__ */
+ 
+ #endif /* __X86_TYPES_H__ */
+diff --git a/xen/include/xen/decompress.h b/xen/include/xen/decompress.h
+index b2955faa4bfb..f5bc17f2b63e 100644
+--- a/xen/include/xen/decompress.h
++++ b/xen/include/xen/decompress.h
+@@ -31,7 +31,7 @@ typedef int decompress_fn(unsigned char *inbuf, unsigned int len,
+  * dependent).
+  */
+ 
+-decompress_fn bunzip2, unxz, unlzma, unlzo, unlz4;
++decompress_fn bunzip2, unxz, unlzma, unlzo, unlz4, unzstd;
+ 
+ int decompress(void *inbuf, unsigned int len, void *outbuf);
+ 
+-- 
+2.34.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0004-libxenguest-add-get_unaligned_le32.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0004-libxenguest-add-get_unaligned_le32.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0004-libxenguest-add-get_unaligned_le32.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0004-libxenguest-add-get_unaligned_le32.patch	2022-07-13 14:06:12.000000000 +0100
@@ -0,0 +1,112 @@
+From 7a763977f5deab7444fd1375d459757ebea64a16 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 26 Jan 2021 14:14:39 +0100
+Subject: [PATCH 4/5] libxenguest: add get_unaligned_le32()
+
+Abstract xc_dom_check_gzip()'s reading of the uncompressed size into a
+helper re-usable, in particular, by other decompressor code.
+
+Sadly in the mini-os case this conflicts with other functions of the
+same name (and purpose), which can't be easily replaced individually.
+Yet it was requested that no full set of helpers be introduced at this
+point in the release cycle. Hence the awkward XG_NEED_UNALIGNED.
+
+Requested-by: Ian Jackson <iwj@xenproject.org>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
+Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
+
+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1956166
+Origin: backport, http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=d8099d94dfaa3573bd86ebfc457cbc8f70a3ecda
+[backport:
+ - rename 'tools/libs/guest/xg_*' to 'tools/libxc/xc_*';
+ - xc_dom_core.c: refresh 2 context lines/includes;
+ - xc_dom_decompress_lz4.c: refresh 1 context line (xg/xc)]
+---
+ tools/libxc/xc_dom_core.c           | 5 ++---
+ tools/libxc/xc_dom_decompress_lz4.c | 1 +
+ tools/libxc/xg_private.h            | 9 +++++++++
+ xen/common/lz4/defs.h               | 5 -----
+ 4 files changed, 12 insertions(+), 8 deletions(-)
+
+diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
+index 9bd04cb2d554..7250d95bc0af 100644
+--- a/tools/libxc/xc_dom_core.c
++++ b/tools/libxc/xc_dom_core.c
+@@ -31,6 +31,7 @@
+ #include <zlib.h>
+ #include <assert.h>
+ 
++#define XG_NEED_UNALIGNED
+ #include "xg_private.h"
+ #include "xc_dom.h"
+ #include "_paths.h"
+@@ -325,7 +326,6 @@ int xc_dom_kernel_check_size(struct xc_dom_image *dom, size_t sz)
+ 
+ size_t xc_dom_check_gzip(xc_interface *xch, void *blob, size_t ziplen)
+ {
+-    unsigned char *gzlen;
+     size_t unziplen;
+ 
+     if ( ziplen < 6 )
+@@ -337,8 +337,7 @@ size_t xc_dom_check_gzip(xc_interface *xch, void *blob, size_t ziplen)
+         /* not gzipped */
+         return 0;
+ 
+-    gzlen = blob + ziplen - 4;
+-    unziplen = (size_t)gzlen[3] << 24 | gzlen[2] << 16 | gzlen[1] << 8 | gzlen[0];
++    unziplen = get_unaligned_le32(blob + ziplen - 4);
+     if ( unziplen > XC_DOM_DECOMPRESS_MAX )
+     {
+         xc_dom_printf
+diff --git a/tools/libxc/xc_dom_decompress_lz4.c b/tools/libxc/xc_dom_decompress_lz4.c
+index b6a33f27a87d..31689c7375ae 100644
+--- a/tools/libxc/xc_dom_decompress_lz4.c
++++ b/tools/libxc/xc_dom_decompress_lz4.c
+@@ -3,6 +3,7 @@
+ #include <inttypes.h>
+ #include <stdint.h>
+ 
++#define XG_NEED_UNALIGNED
+ #include "xg_private.h"
+ #include "xc_dom_decompress.h"
+ 
+diff --git a/tools/libxc/xg_private.h b/tools/libxc/xg_private.h
+index f0a4b2c61699..aa35f82cb36b 100644
+--- a/tools/libxc/xg_private.h
++++ b/tools/libxc/xg_private.h
+@@ -48,6 +48,15 @@ char *xc_inflate_buffer(xc_interface *xch,
+                         unsigned long in_size,
+                         unsigned long *out_size);
+ 
++#if !defined(__MINIOS__) || defined(XG_NEED_UNALIGNED)
++
++static inline unsigned int get_unaligned_le32(const uint8_t *buf)
++{
++    return ((unsigned int)buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
++}
++
++#endif /* !__MINIOS__ || XG_NEED_UNALIGNED */
++
+ unsigned long csum_page (void * page);
+ 
+ #define _PAGE_PRESENT   0x001
+diff --git a/xen/common/lz4/defs.h b/xen/common/lz4/defs.h
+index 4fbea2ac3dd4..10609f5a5317 100644
+--- a/xen/common/lz4/defs.h
++++ b/xen/common/lz4/defs.h
+@@ -18,11 +18,6 @@ static inline u16 get_unaligned_le16(const void *p)
+ 	return le16_to_cpup(p);
+ }
+ 
+-static inline u32 get_unaligned_le32(const void *p)
+-{
+-	return le32_to_cpup(p);
+-}
+-
+ #endif
+ 
+ /*
+-- 
+2.34.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0005-libxenguest-support-zstd-compressed-kernels.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0005-libxenguest-support-zstd-compressed-kernels.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0005-libxenguest-support-zstd-compressed-kernels.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0005-libxenguest-support-zstd-compressed-kernels.patch	2022-07-13 14:06:12.000000000 +0100
@@ -0,0 +1,717 @@
+From 8560749745e41507d9a0a50181fdd92a65ec50c8 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 26 Jan 2021 14:16:34 +0100
+Subject: [PATCH 5/5] libxenguest: support zstd compressed kernels
+
+This follows the logic used for other decompression methods utilizing an
+external library, albeit here we can't ignore the 32-bit size field
+appended to the compressed image - its presence causes decompression to
+fail. Leverage the field instead to allocate the output buffer in one
+go, i.e. without incrementally realloc()ing.
+
+As far as configure.ac goes, I'm pretty sure there is a better (more
+"standard") way of using PKG_CHECK_MODULES(). The construct also gets
+put next to the other decompression library checks, albeit I think they
+all ought to be x86-specific (e.g. placed in the existing case block a
+few lines down).
+
+Note that, where possible, instead of #ifdef-ing xen/*.h inclusions,
+they get removed.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Wei Liu <wl@xen.org>
+Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
+Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
+
+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1956166
+Origin: backport, http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=8169f82049efb5b2044b33aa482ba3a136b7804d
+[backport:
+ - rename 'tools/libs/guest/xg_*' to 'tools/libxc/xc_*';
+ - tools/configure: drop file, removed in our package;
+ - tools/configure.ac, hunk 1: refresh 2 context lines;
+ - tools/libxc/Makefile: s/SRCS-y/GUEST_SRCS-y/; s/xg/xc/;]
+---
+ README                                        |  2 +
+ tools/configure.ac                            |  2 +
+ .../guest/xg_dom_decompress_unsafe_zstd.c     | 45 ++++++++++
+ tools/libxc/Makefile                          |  1 +
+ tools/libxc/xc_dom_bzimageloader.c            | 90 +++++++++++++++++++
+ tools/libxc/xc_dom_decompress_unsafe.h        |  2 +
+ xen/common/zstd/decompress.c                  | 67 +++++++++-----
+ xen/common/zstd/error_private.h               |  5 --
+ xen/common/zstd/fse.h                         |  5 --
+ xen/common/zstd/fse_decompress.c              |  2 -
+ xen/common/zstd/huf.h                         |  3 -
+ xen/common/zstd/huf_decompress.c              |  2 -
+ xen/common/zstd/mem.h                         |  2 +
+ xen/common/zstd/zstd_internal.h               |  4 +
+ xen/include/xen/unaligned.h                   |  2 +
+ xen/lib/xxhash64.c                            |  2 +
+ 16 files changed, 197 insertions(+), 39 deletions(-)
+ create mode 100644 tools/libs/guest/xg_dom_decompress_unsafe_zstd.c
+
+diff --git a/README b/README
+index faf51dc7b657..a0953e5a73c5 100644
+--- a/README
++++ b/README
+@@ -83,6 +83,8 @@ disabled at compile time:
+     * 16-bit x86 assembler, loader and compiler for qemu-traditional / rombios
+       (dev86 rpm or bin86 & bcc debs)
+     * Development install of liblzma for rombios
++    * Development install of libbz2, liblzma, liblzo2, and libzstd for DomU
++      kernel decompression.
+ 
+ Second, you need to acquire a suitable kernel for use in domain 0. If
+ possible you should use a kernel provided by your OS distributor. If
+diff --git a/tools/configure.ac b/tools/configure.ac
+index 0826af8cbc40..ed46fa12c9d9 100644
+--- a/tools/configure.ac
++++ b/tools/configure.ac
+@@ -366,6 +366,8 @@ AC_CHECK_LIB([lzma], [lzma_stream_decoder], [zlib="$zlib -DHAVE_LZMA -llzma"])
+ AC_CHECK_HEADER([lzo/lzo1x.h], [
+ AC_CHECK_LIB([lzo2], [lzo1x_decompress], [zlib="$zlib -DHAVE_LZO1X -llzo2"])
+ ])
++PKG_CHECK_MODULES([libzstd], [libzstd],
++    [zlib="$zlib -DHAVE_ZSTD $libzstd_CFLAGS $libzstd_LIBS"], [true])
+ AC_SUBST(zlib)
+ AS_IF([test "x$enable_blktap2" = "xyes"], [
+ AC_CHECK_LIB([aio], [io_setup], [], [AC_MSG_ERROR([Could not find libaio])])
+diff --git a/tools/libs/guest/xg_dom_decompress_unsafe_zstd.c b/tools/libs/guest/xg_dom_decompress_unsafe_zstd.c
+new file mode 100644
+index 000000000000..52558d2ffc5b
+--- /dev/null
++++ b/tools/libs/guest/xg_dom_decompress_unsafe_zstd.c
+@@ -0,0 +1,45 @@
++#include <stdio.h>
++#include <endian.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <stdint.h>
++#include <inttypes.h>
++
++#include "xg_private.h"
++#include "xg_dom_decompress_unsafe.h"
++
++typedef uint8_t u8;
++
++typedef uint16_t __u16;
++typedef uint32_t __u32;
++typedef uint64_t __u64;
++
++typedef uint16_t __le16;
++typedef uint32_t __le32;
++typedef uint64_t __le64;
++
++typedef uint16_t __be16;
++typedef uint32_t __be32;
++typedef uint64_t __be64;
++
++#define __attribute_const__
++#define __force
++#define always_inline
++#define noinline
++
++#undef ERROR
++
++#define __BYTEORDER_HAS_U64__
++#define __TYPES_H__ /* xen/types.h guard */
++#include "../../xen/include/xen/byteorder/little_endian.h"
++#define __ASM_UNALIGNED_H__ /* asm/unaligned.h guard */
++#include "../../xen/include/xen/unaligned.h"
++#include "../../xen/include/xen/xxhash.h"
++#include "../../xen/lib/xxhash64.c"
++#include "../../xen/common/unzstd.c"
++
++int xc_try_zstd_decode(
++    struct xc_dom_image *dom, void **blob, size_t *size)
++{
++    return xc_dom_decompress_unsafe(unzstd, dom, blob, size);
++}
+diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
+index d26bf8dfac9c..46e1033c3892 100644
+--- a/tools/libxc/Makefile
++++ b/tools/libxc/Makefile
+@@ -100,6 +100,7 @@ GUEST_SRCS-y                 += xc_dom_decompress_unsafe_bzip2.c
+ GUEST_SRCS-y                 += xc_dom_decompress_unsafe_lzma.c
+ GUEST_SRCS-y                 += xc_dom_decompress_unsafe_lzo1x.c
+ GUEST_SRCS-y                 += xc_dom_decompress_unsafe_xz.c
++GUEST_SRCS-y                 += xc_dom_decompress_unsafe_zstd.c
+ endif
+ 
+ -include $(XEN_TARGET_ARCH)/Makefile
+diff --git a/tools/libxc/xc_dom_bzimageloader.c b/tools/libxc/xc_dom_bzimageloader.c
+index a7d70cc7c6df..ceb8a6411702 100644
+--- a/tools/libxc/xc_dom_bzimageloader.c
++++ b/tools/libxc/xc_dom_bzimageloader.c
+@@ -589,6 +589,85 @@ static int xc_try_lzo1x_decode(
+ 
+ #endif
+ 
++#if defined(HAVE_ZSTD)
++
++#include <zstd.h>
++
++static int xc_try_zstd_decode(
++    struct xc_dom_image *dom, void **blob, size_t *size)
++{
++    size_t outsize, insize, actual;
++    unsigned char *outbuf;
++
++    /* Magic, descriptor byte, and trailing size field. */
++    if ( *size <= 9 )
++    {
++        DOMPRINTF("ZSTD: insufficient input data");
++        return -1;
++    }
++
++    insize = *size - 4;
++    outsize = get_unaligned_le32(*blob + insize);
++
++    if ( xc_dom_kernel_check_size(dom, outsize) )
++    {
++        DOMPRINTF("ZSTD: output too large");
++        return -1;
++    }
++
++    outbuf = malloc(outsize);
++    if ( !outbuf )
++    {
++        DOMPRINTF("ZSTD: failed to alloc memory");
++        return -1;
++    }
++
++    actual = ZSTD_decompress(outbuf, outsize, *blob, insize);
++
++    if ( ZSTD_isError(actual) )
++    {
++        DOMPRINTF("ZSTD: error: %s", ZSTD_getErrorName(actual));
++        free(outbuf);
++        return -1;
++    }
++
++    if ( actual != outsize )
++    {
++        DOMPRINTF("ZSTD: got 0x%zx bytes instead of 0x%zx",
++                  actual, outsize);
++        free(outbuf);
++        return -1;
++    }
++
++    if ( xc_dom_register_external(dom, outbuf, outsize) )
++    {
++        DOMPRINTF("ZSTD: error registering stream output");
++        free(outbuf);
++        return -1;
++    }
++
++    DOMPRINTF("%s: ZSTD decompress OK, 0x%zx -> 0x%zx",
++              __FUNCTION__, insize, outsize);
++
++    *blob = outbuf;
++    *size = outsize;
++
++    return 0;
++}
++
++#else /* !defined(HAVE_ZSTD) */
++
++static int xc_try_zstd_decode(
++    struct xc_dom_image *dom, void **blob, size_t *size)
++{
++    xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
++                 "%s: ZSTD decompress support unavailable\n",
++                 __FUNCTION__);
++    return -1;
++}
++
++#endif
++
+ #else /* __MINIOS__ */
+ 
+ int xc_try_bzip2_decode(struct xc_dom_image *dom, void **blob, size_t *size);
+@@ -736,6 +815,17 @@ static int xc_dom_probe_bzimage_kernel(struct xc_dom_image *dom)
+             return -EINVAL;
+         }
+     }
++    else if ( check_magic(dom, "\x28\xb5\x2f\xfd", 4) )
++    {
++        ret = xc_try_zstd_decode(dom, &dom->kernel_blob, &dom->kernel_size);
++        if ( ret < 0 )
++        {
++            xc_dom_panic(dom->xch, XC_INVALID_KERNEL,
++                         "%s unable to ZSTD decompress kernel",
++                         __FUNCTION__);
++            return -EINVAL;
++        }
++    }
+     else if ( check_magic(dom, "\135\000", 2) )
+     {
+         ret = xc_try_lzma_decode(dom, &dom->kernel_blob, &dom->kernel_size);
+diff --git a/tools/libxc/xc_dom_decompress_unsafe.h b/tools/libxc/xc_dom_decompress_unsafe.h
+index 64f68864b165..22ab68da6e5b 100644
+--- a/tools/libxc/xc_dom_decompress_unsafe.h
++++ b/tools/libxc/xc_dom_decompress_unsafe.h
+@@ -18,3 +18,5 @@ int xc_try_lzo1x_decode(struct xc_dom_image *dom, void **blob, size_t *size)
+     __attribute__((visibility("internal")));
+ int xc_try_xz_decode(struct xc_dom_image *dom, void **blob, size_t *size)
+     __attribute__((visibility("internal")));
++int xc_try_zstd_decode(struct xc_dom_image *dom, void **blob, size_t *size)
++    __attribute__((visibility("internal")));
+diff --git a/xen/common/zstd/decompress.c b/xen/common/zstd/decompress.c
+index 3d3ef136e5c2..b0249108145c 100644
+--- a/xen/common/zstd/decompress.c
++++ b/xen/common/zstd/decompress.c
+@@ -33,7 +33,6 @@
+ #include "huf.h"
+ #include "mem.h" /* low level memory routines */
+ #include "zstd_internal.h"
+-#include <xen/string.h> /* memcpy, memmove, memset */
+ 
+ #define ZSTD_PREFETCH(ptr) __builtin_prefetch(ptr, 0, 0)
+ 
+@@ -99,9 +98,12 @@ struct ZSTD_DCtx_s {
+ 	BYTE headerBuffer[ZSTD_FRAMEHEADERSIZE_MAX];
+ }; /* typedef'd to ZSTD_DCtx within "zstd.h" */
+ 
+-size_t INIT ZSTD_DCtxWorkspaceBound(void) { return ZSTD_ALIGN(sizeof(ZSTD_stack)) + ZSTD_ALIGN(sizeof(ZSTD_DCtx)); }
++STATIC size_t INIT ZSTD_DCtxWorkspaceBound(void)
++{
++	return ZSTD_ALIGN(sizeof(ZSTD_stack)) + ZSTD_ALIGN(sizeof(ZSTD_DCtx));
++}
+ 
+-size_t INIT ZSTD_decompressBegin(ZSTD_DCtx *dctx)
++STATIC size_t INIT ZSTD_decompressBegin(ZSTD_DCtx *dctx)
+ {
+ 	dctx->expected = ZSTD_frameHeaderSize_prefix;
+ 	dctx->stage = ZSTDds_getFrameHeaderSize;
+@@ -121,7 +123,7 @@ size_t INIT ZSTD_decompressBegin(ZSTD_DCtx *dctx)
+ 	return 0;
+ }
+ 
+-ZSTD_DCtx *INIT ZSTD_createDCtx_advanced(ZSTD_customMem customMem)
++STATIC ZSTD_DCtx *INIT ZSTD_createDCtx_advanced(ZSTD_customMem customMem)
+ {
+ 	ZSTD_DCtx *dctx;
+ 
+@@ -136,7 +138,7 @@ ZSTD_DCtx *INIT ZSTD_createDCtx_advanced(ZSTD_customMem customMem)
+ 	return dctx;
+ }
+ 
+-ZSTD_DCtx *INIT ZSTD_initDCtx(void *workspace, size_t workspaceSize)
++STATIC ZSTD_DCtx *INIT ZSTD_initDCtx(void *workspace, size_t workspaceSize)
+ {
+ 	ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
+ 	return ZSTD_createDCtx_advanced(stackMem);
+@@ -150,11 +152,13 @@ size_t INIT ZSTD_freeDCtx(ZSTD_DCtx *dctx)
+ 	return 0; /* reserved as a potential error code in the future */
+ }
+ 
++#ifdef BUILD_DEAD_CODE
+ void INIT ZSTD_copyDCtx(ZSTD_DCtx *dstDCtx, const ZSTD_DCtx *srcDCtx)
+ {
+ 	size_t const workSpaceSize = (ZSTD_BLOCKSIZE_ABSOLUTEMAX + WILDCOPY_OVERLENGTH) + ZSTD_frameHeaderSize_max;
+ 	memcpy(dstDCtx, srcDCtx, sizeof(ZSTD_DCtx) - workSpaceSize); /* no need to copy workspace */
+ }
++#endif
+ 
+ STATIC size_t ZSTD_findFrameCompressedSize(const void *src, size_t srcSize);
+ STATIC size_t ZSTD_decompressBegin_usingDict(ZSTD_DCtx *dctx, const void *dict,
+@@ -166,6 +170,7 @@ static void ZSTD_refDDict(ZSTD_DCtx *dstDCtx, const ZSTD_DDict *ddict);
+ *   Decompression section
+ ***************************************************************/
+ 
++#ifdef BUILD_DEAD_CODE
+ /*! ZSTD_isFrame() :
+  *  Tells if the content of `buffer` starts with a valid Frame Identifier.
+  *  Note : Frame Identifier is 4 bytes. If `size < 4`, @return will always be 0.
+@@ -184,6 +189,7 @@ unsigned INIT ZSTD_isFrame(const void *buffer, size_t size)
+ 	}
+ 	return 0;
+ }
++#endif
+ 
+ /** ZSTD_frameHeaderSize() :
+ *   srcSize must be >= ZSTD_frameHeaderSize_prefix.
+@@ -206,7 +212,7 @@ static size_t INIT ZSTD_frameHeaderSize(const void *src, size_t srcSize)
+ *   @return : 0, `fparamsPtr` is correctly filled,
+ *            >0, `srcSize` is too small, result is expected `srcSize`,
+ *             or an error code, which can be tested using ZSTD_isError() */
+-size_t INIT ZSTD_getFrameParams(ZSTD_frameParams *fparamsPtr, const void *src, size_t srcSize)
++STATIC size_t INIT ZSTD_getFrameParams(ZSTD_frameParams *fparamsPtr, const void *src, size_t srcSize)
+ {
+ 	const BYTE *ip = (const BYTE *)src;
+ 
+@@ -291,6 +297,7 @@ size_t INIT ZSTD_getFrameParams(ZSTD_frameParams *fparamsPtr, const void *src, s
+ 	return 0;
+ }
+ 
++#ifdef BUILD_DEAD_CODE
+ /** ZSTD_getFrameContentSize() :
+ *   compatible with legacy mode
+ *   @return : decompressed size of the single frame pointed to be `src` if known, otherwise
+@@ -367,6 +374,7 @@ unsigned long long INIT ZSTD_findDecompressedSize(const void *src, size_t srcSiz
+ 		return totalDstSize;
+ 	}
+ }
++#endif /* BUILD_DEAD_CODE */
+ 
+ /** ZSTD_decodeFrameHeader() :
+ *   `headerSize` must be the size provided by ZSTD_frameHeaderSize().
+@@ -393,7 +401,7 @@ typedef struct {
+ 
+ /*! ZSTD_getcBlockSize() :
+ *   Provides the size of compressed block from block header `src` */
+-size_t INIT ZSTD_getcBlockSize(const void *src, size_t srcSize, blockProperties_t *bpPtr)
++STATIC size_t INIT ZSTD_getcBlockSize(const void *src, size_t srcSize, blockProperties_t *bpPtr)
+ {
+ 	if (srcSize < ZSTD_blockHeaderSize)
+ 		return ERROR(srcSize_wrong);
+@@ -431,7 +439,7 @@ static size_t INIT ZSTD_setRleBlock(void *dst, size_t dstCapacity, const void *s
+ 
+ /*! ZSTD_decodeLiteralsBlock() :
+ 	@return : nb of bytes read from src (< srcSize ) */
+-size_t INIT ZSTD_decodeLiteralsBlock(ZSTD_DCtx *dctx, const void *src, size_t srcSize) /* note : srcSize < BLOCKSIZE */
++STATIC size_t INIT ZSTD_decodeLiteralsBlock(ZSTD_DCtx *dctx, const void *src, size_t srcSize) /* note : srcSize < BLOCKSIZE */
+ {
+ 	if (srcSize < MIN_CBLOCK_SIZE)
+ 		return ERROR(corruption_detected);
+@@ -795,7 +803,7 @@ static size_t INIT ZSTD_buildSeqTable(FSE_DTable *DTableSpace, const FSE_DTable
+ 	}
+ }
+ 
+-size_t INIT ZSTD_decodeSeqHeaders(ZSTD_DCtx *dctx, int *nbSeqPtr, const void *src, size_t srcSize)
++STATIC size_t INIT ZSTD_decodeSeqHeaders(ZSTD_DCtx *dctx, int *nbSeqPtr, const void *src, size_t srcSize)
+ {
+ 	const BYTE *const istart = (const BYTE *const)src;
+ 	const BYTE *const iend = istart + srcSize;
+@@ -1481,6 +1489,7 @@ static void INIT ZSTD_checkContinuity(ZSTD_DCtx *dctx, const void *dst)
+ 	}
+ }
+ 
++#ifdef BUILD_DEAD_CODE
+ size_t INIT ZSTD_decompressBlock(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
+ {
+ 	size_t dSize;
+@@ -1498,8 +1507,9 @@ size_t INIT ZSTD_insertBlock(ZSTD_DCtx *dctx, const void *blockStart, size_t blo
+ 	dctx->previousDstEnd = (const char *)blockStart + blockSize;
+ 	return blockSize;
+ }
++#endif /* BUILD_DEAD_CODE */
+ 
+-size_t INIT ZSTD_generateNxBytes(void *dst, size_t dstCapacity, BYTE byte, size_t length)
++STATIC size_t INIT ZSTD_generateNxBytes(void *dst, size_t dstCapacity, BYTE byte, size_t length)
+ {
+ 	if (length > dstCapacity)
+ 		return ERROR(dstSize_tooSmall);
+@@ -1512,7 +1522,7 @@ size_t INIT ZSTD_generateNxBytes(void *dst, size_t dstCapacity, BYTE byte, size_
+  *  `src` must point to the start of a ZSTD frame, ZSTD legacy frame, or skippable frame
+  *  `srcSize` must be at least as large as the frame contained
+  *  @return : the compressed size of the frame starting at `src` */
+-size_t INIT ZSTD_findFrameCompressedSize(const void *src, size_t srcSize)
++STATIC size_t INIT ZSTD_findFrameCompressedSize(const void *src, size_t srcSize)
+ {
+ 	if (srcSize >= ZSTD_skippableHeaderSize && (ZSTD_readLE32(src) & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {
+ 		return ZSTD_skippableHeaderSize + ZSTD_readLE32((const BYTE *)src + 4);
+@@ -1709,12 +1719,12 @@ static size_t INIT ZSTD_decompressMultiFrame(ZSTD_DCtx *dctx, void *dst, size_t
+ 	return (BYTE *)dst - (BYTE *)dststart;
+ }
+ 
+-size_t INIT ZSTD_decompress_usingDict(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const void *dict, size_t dictSize)
++STATIC size_t INIT ZSTD_decompress_usingDict(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const void *dict, size_t dictSize)
+ {
+ 	return ZSTD_decompressMultiFrame(dctx, dst, dstCapacity, src, srcSize, dict, dictSize, NULL);
+ }
+ 
+-size_t INIT ZSTD_decompressDCtx(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
++STATIC size_t INIT ZSTD_decompressDCtx(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
+ {
+ 	return ZSTD_decompress_usingDict(dctx, dst, dstCapacity, src, srcSize, NULL, 0);
+ }
+@@ -1723,9 +1733,12 @@ size_t INIT ZSTD_decompressDCtx(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity,
+ *   Advanced Streaming Decompression API
+ *   Bufferless and synchronous
+ ****************************************/
+-size_t INIT ZSTD_nextSrcSizeToDecompress(ZSTD_DCtx *dctx) { return dctx->expected; }
++STATIC size_t INIT ZSTD_nextSrcSizeToDecompress(ZSTD_DCtx *dctx)
++{
++	return dctx->expected;
++}
+ 
+-ZSTD_nextInputType_e INIT ZSTD_nextInputType(ZSTD_DCtx *dctx)
++STATIC ZSTD_nextInputType_e INIT ZSTD_nextInputType(ZSTD_DCtx *dctx)
+ {
+ 	switch (dctx->stage) {
+ 	default: /* should not happen */
+@@ -1745,7 +1758,7 @@ int INIT ZSTD_isSkipFrame(ZSTD_DCtx *dctx) { return dctx->stage == ZSTDds_skipFr
+ /** ZSTD_decompressContinue() :
+ *   @return : nb of bytes generated into `dst` (necessarily <= `dstCapacity)
+ *             or an error code, which can be tested using ZSTD_isError() */
+-size_t INIT ZSTD_decompressContinue(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
++STATIC size_t INIT ZSTD_decompressContinue(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize)
+ {
+ 	/* Sanity check */
+ 	if (srcSize != dctx->expected)
+@@ -1971,7 +1984,7 @@ static size_t INIT ZSTD_decompress_insertDictionary(ZSTD_DCtx *dctx, const void
+ 	return ZSTD_refDictContent(dctx, dict, dictSize);
+ }
+ 
+-size_t INIT ZSTD_decompressBegin_usingDict(ZSTD_DCtx *dctx, const void *dict, size_t dictSize)
++STATIC size_t INIT ZSTD_decompressBegin_usingDict(ZSTD_DCtx *dctx, const void *dict, size_t dictSize)
+ {
+ 	CHECK_F(ZSTD_decompressBegin(dctx));
+ 	if (dict && dictSize)
+@@ -1991,7 +2004,9 @@ struct ZSTD_DDict_s {
+ 	ZSTD_customMem cMem;
+ }; /* typedef'd to ZSTD_DDict within "zstd.h" */
+ 
++#ifdef BUILD_DEAD_CODE
+ size_t INIT ZSTD_DDictWorkspaceBound(void) { return ZSTD_ALIGN(sizeof(ZSTD_stack)) + ZSTD_ALIGN(sizeof(ZSTD_DDict)); }
++#endif
+ 
+ static const void *INIT ZSTD_DDictDictContent(const ZSTD_DDict *ddict) { return ddict->dictContent; }
+ 
+@@ -2023,6 +2038,7 @@ static void INIT ZSTD_refDDict(ZSTD_DCtx *dstDCtx, const ZSTD_DDict *ddict)
+ 	}
+ }
+ 
++#ifdef BUILD_DEAD_CODE
+ static size_t INIT ZSTD_loadEntropy_inDDict(ZSTD_DDict *ddict)
+ {
+ 	ddict->dictID = 0;
+@@ -2090,6 +2106,7 @@ ZSTD_DDict *INIT ZSTD_initDDict(const void *dict, size_t dictSize, void *workspa
+ 	ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
+ 	return ZSTD_createDDict_advanced(dict, dictSize, 1, stackMem);
+ }
++#endif /* BUILD_DEAD_CODE */
+ 
+ size_t INIT ZSTD_freeDDict(ZSTD_DDict *ddict)
+ {
+@@ -2103,6 +2120,7 @@ size_t INIT ZSTD_freeDDict(ZSTD_DDict *ddict)
+ 	}
+ }
+ 
++#ifdef BUILD_DEAD_CODE
+ /*! ZSTD_getDictID_fromDict() :
+  *  Provides the dictID stored within dictionary.
+  *  if @return == 0, the dictionary is not conformant with Zstandard specification.
+@@ -2145,11 +2163,12 @@ unsigned INIT ZSTD_getDictID_fromFrame(const void *src, size_t srcSize)
+ 		return 0;
+ 	return zfp.dictID;
+ }
++#endif /* BUILD_DEAD_CODE */
+ 
+ /*! ZSTD_decompress_usingDDict() :
+ *   Decompression using a pre-digested Dictionary
+ *   Use dictionary without significant overhead. */
+-size_t INIT ZSTD_decompress_usingDDict(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const ZSTD_DDict *ddict)
++STATIC size_t INIT ZSTD_decompress_usingDDict(ZSTD_DCtx *dctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const ZSTD_DDict *ddict)
+ {
+ 	/* pass content and size in case legacy frames are encountered */
+ 	return ZSTD_decompressMultiFrame(dctx, dst, dstCapacity, src, srcSize, NULL, 0, ddict);
+@@ -2186,7 +2205,7 @@ struct ZSTD_DStream_s {
+ 	U32 hostageByte;
+ }; /* typedef'd to ZSTD_DStream within "zstd.h" */
+ 
+-size_t INIT ZSTD_DStreamWorkspaceBound(size_t maxWindowSize)
++STATIC size_t INIT ZSTD_DStreamWorkspaceBound(size_t maxWindowSize)
+ {
+ 	size_t const blockSize = MIN(maxWindowSize, ZSTD_BLOCKSIZE_ABSOLUTEMAX);
+ 	size_t const inBuffSize = blockSize;
+@@ -2216,7 +2235,7 @@ static ZSTD_DStream *INIT ZSTD_createDStream_advanced(ZSTD_customMem customMem)
+ 	return zds;
+ }
+ 
+-ZSTD_DStream *INIT ZSTD_initDStream(size_t maxWindowSize, void *workspace, size_t workspaceSize)
++STATIC ZSTD_DStream *INIT ZSTD_initDStream(size_t maxWindowSize, void *workspace, size_t workspaceSize)
+ {
+ 	ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
+ 	ZSTD_DStream *zds = ZSTD_createDStream_advanced(stackMem);
+@@ -2249,6 +2268,7 @@ ZSTD_DStream *INIT ZSTD_initDStream(size_t maxWindowSize, void *workspace, size_
+ 	return zds;
+ }
+ 
++#ifdef BUILD_DEAD_CODE
+ ZSTD_DStream *INIT ZSTD_initDStream_usingDDict(size_t maxWindowSize, const ZSTD_DDict *ddict, void *workspace, size_t workspaceSize)
+ {
+ 	ZSTD_DStream *zds = ZSTD_initDStream(maxWindowSize, workspace, workspaceSize);
+@@ -2257,6 +2277,7 @@ ZSTD_DStream *INIT ZSTD_initDStream_usingDDict(size_t maxWindowSize, const ZSTD_
+ 	}
+ 	return zds;
+ }
++#endif
+ 
+ size_t INIT ZSTD_freeDStream(ZSTD_DStream *zds)
+ {
+@@ -2279,10 +2300,12 @@ size_t INIT ZSTD_freeDStream(ZSTD_DStream *zds)
+ 
+ /* *** Initialization *** */
+ 
++#ifdef BUILD_DEAD_CODE
+ size_t INIT ZSTD_DStreamInSize(void) { return ZSTD_BLOCKSIZE_ABSOLUTEMAX + ZSTD_blockHeaderSize; }
+ size_t INIT ZSTD_DStreamOutSize(void) { return ZSTD_BLOCKSIZE_ABSOLUTEMAX; }
++#endif
+ 
+-size_t INIT ZSTD_resetDStream(ZSTD_DStream *zds)
++STATIC size_t INIT ZSTD_resetDStream(ZSTD_DStream *zds)
+ {
+ 	zds->stage = zdss_loadHeader;
+ 	zds->lhSize = zds->inPos = zds->outStart = zds->outEnd = 0;
+@@ -2300,7 +2323,7 @@ ZSTD_STATIC size_t INIT ZSTD_limitCopy(void *dst, size_t dstCapacity, const void
+ 	return length;
+ }
+ 
+-size_t INIT ZSTD_decompressStream(ZSTD_DStream *zds, ZSTD_outBuffer *output, ZSTD_inBuffer *input)
++STATIC size_t INIT ZSTD_decompressStream(ZSTD_DStream *zds, ZSTD_outBuffer *output, ZSTD_inBuffer *input)
+ {
+ 	const char *const istart = (const char *)(input->src) + input->pos;
+ 	const char *const iend = (const char *)(input->src) + input->size;
+diff --git a/xen/common/zstd/error_private.h b/xen/common/zstd/error_private.h
+index d07bf3cb9b55..906d537e0844 100644
+--- a/xen/common/zstd/error_private.h
++++ b/xen/common/zstd/error_private.h
+@@ -19,11 +19,6 @@
+ #ifndef ERROR_H_MODULE
+ #define ERROR_H_MODULE
+ 
+-/* ****************************************
+-*  Dependencies
+-******************************************/
+-#include <xen/types.h> /* size_t */
+-
+ /**
+  * enum ZSTD_ErrorCode - zstd error codes
+  *
+diff --git a/xen/common/zstd/fse.h b/xen/common/zstd/fse.h
+index b86717c34d0f..5761e09f17ff 100644
+--- a/xen/common/zstd/fse.h
++++ b/xen/common/zstd/fse.h
+@@ -40,11 +40,6 @@
+ #ifndef FSE_H
+ #define FSE_H
+ 
+-/*-*****************************************
+-*  Dependencies
+-******************************************/
+-#include <xen/types.h> /* size_t, ptrdiff_t */
+-
+ /*-*****************************************
+ *  FSE_PUBLIC_API : control library symbols visibility
+ ******************************************/
+diff --git a/xen/common/zstd/fse_decompress.c b/xen/common/zstd/fse_decompress.c
+index cc51206df614..6c61e9002e62 100644
+--- a/xen/common/zstd/fse_decompress.c
++++ b/xen/common/zstd/fse_decompress.c
+@@ -48,8 +48,6 @@
+ #include "bitstream.h"
+ #include "fse.h"
+ #include "zstd_internal.h"
+-#include <xen/compiler.h>
+-#include <xen/string.h> /* memcpy, memset */
+ 
+ /* **************************************************************
+ *  Error Management
+diff --git a/xen/common/zstd/huf.h b/xen/common/zstd/huf.h
+index a9d522c7bb7b..a498e0de2871 100644
+--- a/xen/common/zstd/huf.h
++++ b/xen/common/zstd/huf.h
+@@ -40,9 +40,6 @@
+ #ifndef HUF_H_298734234
+ #define HUF_H_298734234
+ 
+-/* *** Dependencies *** */
+-#include <xen/types.h> /* size_t */
+-
+ /* ***   Tool functions *** */
+ #define HUF_BLOCKSIZE_MAX (128 * 1024) /**< maximum input size for a single block compressed with HUF_compress */
+ size_t HUF_compressBound(size_t size); /**< maximum compressed size (worst case) */
+diff --git a/xen/common/zstd/huf_decompress.c b/xen/common/zstd/huf_decompress.c
+index 341619e64246..f6aca709a6dd 100644
+--- a/xen/common/zstd/huf_decompress.c
++++ b/xen/common/zstd/huf_decompress.c
+@@ -48,8 +48,6 @@
+ #include "bitstream.h" /* BIT_* */
+ #include "fse.h"       /* header compression */
+ #include "huf.h"
+-#include <xen/compiler.h>
+-#include <xen/string.h> /* memcpy, memset */
+ 
+ /* **************************************************************
+ *  Error Management
+diff --git a/xen/common/zstd/mem.h b/xen/common/zstd/mem.h
+index 288320069654..2acae6a8edc8 100644
+--- a/xen/common/zstd/mem.h
++++ b/xen/common/zstd/mem.h
+@@ -20,9 +20,11 @@
+ /*-****************************************
+ *  Dependencies
+ ******************************************/
++#ifdef __XEN__
+ #include <xen/string.h> /* memcpy */
+ #include <xen/types.h>  /* size_t, ptrdiff_t */
+ #include <asm/unaligned.h>
++#endif
+ 
+ /*-****************************************
+ *  Compiler specifics
+diff --git a/xen/common/zstd/zstd_internal.h b/xen/common/zstd/zstd_internal.h
+index 7f8e5529ebfa..caa7aab40699 100644
+--- a/xen/common/zstd/zstd_internal.h
++++ b/xen/common/zstd/zstd_internal.h
+@@ -28,8 +28,10 @@
+ ***************************************/
+ #include "error_private.h"
+ #include "mem.h"
++#ifdef __XEN__
+ #include <xen/compiler.h>
+ #include <xen/xxhash.h>
++#endif
+ 
+ #define ALIGN(x, a) ((x + (a) - 1) & ~((a) - 1))
+ #define PTR_ALIGN(p, a) ((typeof(p))ALIGN((unsigned long)(p), (a)))
+@@ -95,8 +97,10 @@ typedef struct ZSTD_DStream_s ZSTD_DStream;
+ /*-*************************************
+ *  shared macros
+ ***************************************/
++#ifndef MIN
+ #define MIN(a, b) ((a) < (b) ? (a) : (b))
+ #define MAX(a, b) ((a) > (b) ? (a) : (b))
++#endif
+ #define CHECK_F(f)                       \
+ 	{                                \
+ 		size_t const errcod = f; \
+diff --git a/xen/include/xen/unaligned.h b/xen/include/xen/unaligned.h
+index eef7ec73b658..0a2b16d05d92 100644
+--- a/xen/include/xen/unaligned.h
++++ b/xen/include/xen/unaligned.h
+@@ -10,8 +10,10 @@
+ #ifndef __XEN_UNALIGNED_H__
+ #define __XEN_UNALIGNED_H__
+ 
++#ifdef __XEN__
+ #include <xen/types.h>
+ #include <asm/byteorder.h>
++#endif
+ 
+ #define get_unaligned(p) (*(p))
+ #define put_unaligned(val, p) (*(p) = (val))
+diff --git a/xen/lib/xxhash64.c b/xen/lib/xxhash64.c
+index ba6bcf152d6f..481e76fbcf4c 100644
+--- a/xen/lib/xxhash64.c
++++ b/xen/lib/xxhash64.c
+@@ -38,11 +38,13 @@
+  * - xxHash source repository: https://github.com/Cyan4973/xxHash
+  */
+ 
++#ifdef __XEN__
+ #include <xen/compiler.h>
+ #include <xen/errno.h>
+ #include <xen/string.h>
+ #include <xen/xxhash.h>
+ #include <asm/unaligned.h>
++#endif
+ 
+ /*-*************************************
+  * Macros
+-- 
+2.34.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0006-fix-ftbfs-arm-lzo-unaligned.h.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0006-fix-ftbfs-arm-lzo-unaligned.h.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0006-fix-ftbfs-arm-lzo-unaligned.h.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/lp1956166-0006-fix-ftbfs-arm-lzo-unaligned.h.patch	2022-07-13 14:06:46.000000000 +0100
@@ -0,0 +1,50 @@
+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1956166
+Description: Fix FTBFS on armhf/arm64 due to <asm/unaligned.h>
+ Patch lp1956166-0001-introduce-unaligned.h.patch builds fine
+ on x86, but it fails to build from source on armhf and arm64:
+ .
+   lzo.c:100:10: fatal error: asm/unaligned.h: No such file or directory
+ .
+ The <asm/unaligned.h> header is only available on x86, but arm
+ also builds lzo.c in Xen 4.11 ('obj-y' in xen/common/Makefile).
+ This isn't the case in Xen 4.15 (original release of the patch),
+ where lzo.c is obj-$(CONFIG_X86)-based.
+ .
+ So, make the lzo.c changes in the patch more conditional to x86,
+ keeping the (local) previous unaligned macro definitions on arm.
+ .
+ This keeps the spirit of the upstream patch (which is x86-only),
+ and is backwards compatible with 4.11 code.
+Author: Mauricio Faria de Oliveira <mfo@canonical.com>
+Forwarded: not-needed
+Last-Update: 2022-07-07
+---
+This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
+Index: xen/xen/common/lzo.c
+===================================================================
+--- xen.orig/xen/common/lzo.c
++++ xen/xen/common/lzo.c
+@@ -97,12 +97,23 @@
+ #ifdef __XEN__
+ #include <xen/lib.h>
+ #include <asm/byteorder.h>
++#endif
++
++#ifdef CONFIG_X86
++#ifdef __XEN__
+ #include <asm/unaligned.h>
+ #else
+ #define get_unaligned_le16(_p) (*(u16 *)(_p))
+ #endif
++#endif
+ 
+ #include <xen/lzo.h>
++#ifndef CONFIG_X86
++#define get_unaligned(_p) (*(_p))
++#define put_unaligned(_val,_p) (*(_p)=_val)
++#define get_unaligned_le16(_p) (*(u16 *)(_p))
++#define get_unaligned_le32(_p) (*(u32 *)(_p))
++#endif
+ 
+ static noinline size_t
+ lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/series xen-4.11.3+24-g14b62ab3e5/debian/patches/series
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/series	2020-03-09 14:46:02.000000000 +0000
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/series	2022-07-13 14:06:46.000000000 +0100
@@ -52,3 +52,162 @@
 
 1000-flags-fcs-protect-none.patch
 1001-strip-note-gnu-property.patch
+xen-split-parameter-related-definitions-in-own-header-file.patch
+xsa312-4.11.patch
+xsa313-1.patch
+xsa313-2.patch
+xsa314-4.13.patch
+xsa316-xen.patch
+xsa317.patch
+xsa318.patch
+xsa319.patch
+xsa320-4.11-1.patch
+xsa320-4.11-2.patch
+xsa320-4.11-3.patch
+xsa328-4.11-1.patch
+xsa328-4.11-2.patch
+xsa321-4.11-1.patch
+xsa321-4.11-2.patch
+xsa321-4.11-3.patch
+xsa321-4.11-4.patch
+xsa321-4.11-5.patch
+xsa321-4.11-6.patch
+xsa321-4.11-7.patch
+0001-tools-xenstore-allow-removing-child-of-a-node-exceed.patch
+0002-tools-xenstore-ignore-transaction-id-for-un-watch.patch
+0003-tools-xenstore-fix-node-accounting-after-failed-node.patch
+0004-tools-xenstore-simplify-and-rename-check_event_node.patch
+0005-tools-xenstore-check-privilege-for-XS_IS_DOMAIN_INTR.patch
+0006-tools-xenstore-rework-node-removal.patch
+0007-tools-xenstore-fire-watches-only-when-removing-a-spe.patch
+0008-tools-xenstore-introduce-node_perms-structure.patch
+0009-tools-xenstore-allow-special-watches-for-privileged-.patch
+0010-tools-xenstore-avoid-watch-events-for-nodes-without-.patch
+0001-tools-ocaml-xenstored-ignore-transaction-id-for-un-w.patch
+0002-tools-ocaml-xenstored-check-privilege-for-XS_IS_DOMA.patch
+0003-tools-ocaml-xenstored-unify-watch-firing.patch
+0004-tools-ocaml-xenstored-introduce-permissions-for-spec.patch
+0005-tools-ocaml-xenstored-avoid-watch-events-for-nodes-w.patch
+0006-tools-ocaml-xenstored-add-xenstored.conf-flag-to-tur.patch
+xsa322-4.12-c.patch
+xsa322-4.11-o.patch
+xsa323.patch
+xsa324.patch
+xsa325-4.14.patch
+xsa327.patch
+xsa330.patch
+xsa333.patch
+xsa336-4.11.patch
+xsa337-4.12-1.patch
+xsa337-4.12-2.patch
+xsa338.patch
+xsa339.patch
+xsa340.patch
+xsa342-4.13.patch
+xsa343-4.11-1.patch
+xsa343-4.11-2.patch
+xsa343-4.11-3.patch
+xsa344-4.11-1.patch
+xsa344-4.11-2.patch
+0001-x86-mm-Refactor-map_pages_to_xen-to-have-only-a-sing.patch
+0002-x86-mm-Refactor-modify_xen_mappings-to-have-one-exit.patch
+0003-x86-mm-Prevent-some-races-in-hypervisor-mapping-upda.patch
+xsa346-4.11-1.patch
+xsa346-4.11-2.patch
+xsa347-4.11-1.patch
+xsa347-4.11-2.patch
+xsa348-4.11.patch
+xsa351-arm.patch
+xsa351-x86-4.11-1.patch
+xsa351-x86-4.11-2.patch
+xsa352.patch
+xsa353.patch
+xsa355.patch
+evtchn-fifo-use-stable-fields-when-recording-last-queue-information.patch
+xen-evtchn-rework-per-event-channel-lock.patch
+xen-events-access-last_priority-and-last_vcpu_id-together.patch
+fix_event_channel_race.patch
+xsa358-4.14.patch
+xsa359.patch
+xsa364.patch
+xsa366-4.11.patch
+xsa373-4.11-1.patch
+0001-SUPPORT.md-Document-speculative-attacks-status-of-no.patch
+x86-pv-Options-to-disable-and-or-compile-out-32bit-PV-support.patch
+0002-SUPPORT.md-Un-shimmed-32-bit-PV-guests-are-no-longer.patch
+xsa373-4.11-2.patch
+xsa373-4.11-3.patch
+xsa373-4.11-4.patch
+xsa373-4.11-5.patch
+xsa375-4.12.patch
+xsa377-4.11.patch
+xsa378-4.11-0a.patch
+xsa378-4.11-0b.patch
+xsa378-4.11-0c.patch
+xsa378-4.11-1.patch
+xsa378-4.11-2.patch
+xsa378-4.11-3.patch
+xsa378-4.11-4.patch
+xsa378-4.11-5.patch
+AMD-IOMMU-fix-off-by-one-in-amd_iommu_get_paging_mode-callers.patch
+xsa378-4.11-6.patch
+xsa378-4.11-7.patch
+xsa378-4.11-8.patch
+xsa379-4.12.patch
+xsa380-4.11-1.patch
+xsa380-4.11-2.patch
+xsa380-3.patch
+xsa382.patch
+xsa384-4.11.patch
+amd-iommu-get-rid-of-pointless-IOMMU_PAGING_MODE_LEVEL_X-definitions.patch
+xsa385-4.12.patch
+xsa388-4.14-1.patch
+xsa388-4.14-2.patch
+xsa389-4.12.patch
+xsa394-4.12.patch
+xsa395-4.14.patch
+xsa397-4.12.patch
+xsa398-4.12-1-xen-arm-Introduce-new-Arm-processors.patch
+xsa398-4.12-2-xen-arm-move-errata-CSV2-check-earlier.patch
+xsa398-4.12-3-xen-arm-Add-ECBHB-and-CLEARBHB-ID-fields.patch
+xen-arm64-entry-Use-named-label-in-guest_sync.patch
+xen-arm-Add-ARCH_WORKAROUND_2-probing.patch
+xen-arm-Add-command-line-option-to-control-SSBD-mitigation.patch
+xen-arm-Add-ARCH_WORKAROUND_2-support-for-guests.patch
+xen-arm64-Add-generic-assembly-macros.patch
+xen-arm-Simplify-alternative-patching-of-non-writable-region.patch
+xen-arm-alternatives-Add-dynamic-patching-feature.patch
+xen-arm64-Implement-a-fast-path-for-handling-SMCCC_ARCH_WORKAROUND_2.patch
+xsa398-4.12-4-xen-arm-Add-Spectre-BHB-handling.patch
+xsa398-4.12-5-xen-arm-Allow-to-discover-and-use-SMCCC_ARCH_WORKARO.patch
+xsa398-4.12-6-x86-spec-ctrl-Cease-using-thunk-lfence-on-AMD.patch
+xsa399-4.12.patch
+xsa400-4.12-00.patch
+xsa400-4.12-01.patch
+xsa400-4.12-02.patch
+xsa400-4.12-03.patch
+VT-d-dont-pass-bridge-devices-to-domain_context_mapping_one.patch
+xsa400-4.12-04.patch
+xsa400-4.12-05.patch
+xsa400-4.12-06.patch
+xsa400-4.12-07.patch
+xsa400-4.12-08.patch
+xsa400-4.12-09.patch
+xsa400-4.12-10.patch
+xsa400-4.12-11.patch
+xsa401-4.13-1.patch
+xsa401-4.13-2.patch
+xsa402-4.13-1.patch
+xsa402-4.13-2.patch
+xsa402-4.13-3.patch
+x86-feature-Generalise-synth-and-introduce-a-bug-word.patch
+x86-AMD-Fix-handling-of-x87-exception-pointers-on-Fam17h-hardware.patch
+xsa402-4.13-4.patch
+xsa402-4.13-5.patch
+x86-cpu-intel-Clear-cache-self-snoop-capability-in-CPUs-with-known-errata.patch
+lp1956166-0001-introduce-unaligned.h.patch
+lp1956166-0002-lib-introduce-xxhash.patch
+lp1956166-0003-x86-Dom0-support-zstd-compressed-kernels.patch
+lp1956166-0004-libxenguest-add-get_unaligned_le32.patch
+lp1956166-0005-libxenguest-support-zstd-compressed-kernels.patch
+lp1956166-0006-fix-ftbfs-arm-lzo-unaligned.h.patch
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/VT-d-dont-pass-bridge-devices-to-domain_context_mapping_one.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/VT-d-dont-pass-bridge-devices-to-domain_context_mapping_one.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/VT-d-dont-pass-bridge-devices-to-domain_context_mapping_one.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/VT-d-dont-pass-bridge-devices-to-domain_context_mapping_one.patch	2022-06-06 12:26:23.000000000 +0100
@@ -0,0 +1,63 @@
+From b9063ce924bb37986762d33a48c174348c38b61a Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 5 Mar 2020 11:16:46 +0100
+Subject: [PATCH] VT-d: don't pass bridge devices to
+ domain_context_mapping_one()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=utf8
+Content-Transfer-Encoding: 8bit
+
+When passed a non-NULL pdev, the function does an owner check when it
+finds an already existing context mapping. Bridges, however, don't get
+passed through to guests, and hence their owner is always going to be
+Dom0, leading to the assigment of all but one of the function of multi-
+function PCI devices behind bridges to fail.
+
+Reported-by: Marek Marczykowski-GÃ³recki <marmarek@invisiblethingslab.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau MonnÃ© <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: a4d457fd59f4ebfb524aec82cb6a3030087914ca
+master date: 2020-01-22 16:39:58 +0100
+---
+ xen/drivers/passthrough/vtd/iommu.c | 14 ++++++++++++--
+ 1 file changed, 12 insertions(+), 2 deletions(-)
+
+diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
+index 576e72eba1..77ba8e14a6 100644
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -1536,18 +1536,28 @@ static int domain_context_mapping(struct domain *domain, u8 devfn,
+         if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
+             break;
+ 
++        /*
++         * Mapping a bridge should, if anything, pass the struct pci_dev of
++         * that bridge. Since bridges don't normally get assigned to guests,
++         * their owner would be the wrong one. Pass NULL instead.
++         */
+         ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                         pci_get_pdev(seg, bus, devfn));
++                                         NULL);
+ 
+         /*
+          * Devices behind PCIe-to-PCI/PCIx bridge may generate different
+          * requester-id. It may originate from devfn=0 on the secondary bus
+          * behind the bridge. Map that id as well if we didn't already.
++         *
++         * Somewhat similar as for bridges, we don't want to pass a struct
++         * pci_dev here - there may not even exist one for this (secbus,0,0)
++         * tuple. If there is one, without properly working device groups it
++         * may again not have the correct owner.
+          */
+         if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
+              (secbus != pdev->bus || pdev->devfn != 0) )
+             ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
+-                                             pci_get_pdev(seg, secbus, 0));
++                                             NULL);
+ 
+         break;
+ 
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-AMD-Fix-handling-of-x87-exception-pointers-on-Fam17h-hardware.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-AMD-Fix-handling-of-x87-exception-pointers-on-Fam17h-hardware.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-AMD-Fix-handling-of-x87-exception-pointers-on-Fam17h-hardware.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-AMD-Fix-handling-of-x87-exception-pointers-on-Fam17h-hardware.patch	2022-06-16 10:27:25.000000000 +0100
@@ -0,0 +1,190 @@
+From d2a95f1c3ef96f47840ab172278293e55c4fc430 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 27 Dec 2018 15:14:01 +0000
+Subject: [PATCH] x86/AMD: Fix handling of x87 exception pointers on Fam17h
+ hardware
+
+AMD Pre-Fam17h CPUs "optimise" {F,}X{SAVE,RSTOR} by not saving/restoring
+FOP/FIP/FDP if an x87 exception isn't pending.  This causes an information
+leak, CVE-2006-1056, and worked around by several OSes, including Xen.  AMD
+Fam17h CPUs no longer have this leak, and advertise so in a CPUID bit.
+
+Introduce the RSTR_FP_ERR_PTRS feature, as specified by AMD, and expose to all
+guests by default.  While adjusting libxl's cpuid table, add CLZERO which
+looks to have been omitted previously.
+
+Also introduce an X86_BUG bit to trigger the (F)XRSTOR workaround, and set it
+on AMD hardware where RSTR_FP_ERR_PTRS is not advertised.  Optimise the
+conditions for the workaround paths.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+---
+ tools/libxl/libxl_cpuid.c                   |  3 +++
+ tools/misc/xen-cpuid.c                      |  1 +
+ xen/arch/x86/cpu/amd.c                      |  7 +++++++
+ xen/arch/x86/i387.c                         | 16 +++++++---------
+ xen/arch/x86/xstate.c                       |  7 +++----
+ xen/include/asm-x86/cpufeature.h            |  3 +++
+ xen/include/asm-x86/cpufeatures.h           |  2 ++
+ xen/include/public/arch-x86/cpufeatureset.h |  1 +
+ 8 files changed, 27 insertions(+), 13 deletions(-)
+
+diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
+index f1c6ce2076..953a3bbd8c 100644
+--- a/tools/libxl/libxl_cpuid.c
++++ b/tools/libxl/libxl_cpuid.c
+@@ -246,7 +246,11 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
+ 
+         {"invtsc",       0x80000007, NA, CPUID_REG_EDX,  8,  1},
+ 
++        {"clzero",       0x80000008, NA, CPUID_REG_EBX,  0,  1},
++        {"rstr-fp-err-ptrs", 0x80000008, NA, CPUID_REG_EBX, 2, 1},
++        {"wbnoinvd",     0x80000008, NA, CPUID_REG_EBX,  9,  1},
+         {"ibpb",         0x80000008, NA, CPUID_REG_EBX, 12,  1},
++
+         {"nc",           0x80000008, NA, CPUID_REG_ECX,  0,  8},
+         {"apicidsize",   0x80000008, NA, CPUID_REG_ECX, 12,  4},
+ 
+diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
+index be6a8d27a5..f51facffb6 100644
+--- a/tools/misc/xen-cpuid.c
++++ b/tools/misc/xen-cpuid.c
+@@ -135,7 +135,10 @@ static const char *str_e7d[32] =
+ static const char *str_e8b[32] =
+ {
+     [ 0] = "clzero",
++    [ 2] = "rstr-fp-err-ptrs",
+ 
++    /* [ 8] */            [ 9] = "wbnoinvd",
++
+     [12] = "ibpb",
+ };
+ 
+diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
+index a2f83c79a5..fec2830c6a 100644
+--- a/xen/arch/x86/cpu/amd.c
++++ b/xen/arch/x86/cpu/amd.c
+@@ -569,6 +569,13 @@ static void init_amd(struct cpuinfo_x86 *c)
+ 			wrmsr_amd_safe(0xc001100d, l, h & ~1);
+ 	}
+ 
++	/*
++	 * Older AMD CPUs don't save/load FOP/FIP/FDP unless an FPU exception
++	 * is pending.  Xen works around this at (F)XRSTOR time.
++	 */
++	if (!cpu_has(c, X86_FEATURE_RSTR_FP_ERR_PTRS))
++		setup_force_cpu_cap(X86_BUG_FPU_PTRS);
++
+ 	/*
+ 	 * Attempt to set lfence to be Dispatch Serialising.  This MSR almost
+ 	 * certainly isn't virtualised (and Xen at least will leak the real
+diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
+index 88178485cb..677f571792 100644
+--- a/xen/arch/x86/i387.c
++++ b/xen/arch/x86/i387.c
+@@ -43,20 +43,18 @@ static inline void fpu_fxrstor(struct vcpu *v)
+     const typeof(v->arch.xsave_area->fpu_sse) *fpu_ctxt = v->arch.fpu_ctxt;
+ 
+     /*
+-     * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
++     * Some CPUs don't save/restore FDP/FIP/FOP unless an exception
+      * is pending. Clear the x87 state here by setting it to fixed
+      * values. The hypervisor data segment can be sometimes 0 and
+      * sometimes new user value. Both should be ok. Use the FPU saved
+      * data block as a safe address because it should be in L1.
+      */
+-    if ( !(fpu_ctxt->fsw & ~fpu_ctxt->fcw & 0x003f) &&
+-         boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+-    {
++    if ( cpu_bug_fpu_ptrs &&
++         !(fpu_ctxt->fsw & ~fpu_ctxt->fcw & 0x003f) )
+         asm volatile ( "fnclex\n\t"
+                        "ffree %%st(7)\n\t" /* clear stack tag */
+                        "fildl %0"          /* load to clear state */
+                        : : "m" (*fpu_ctxt) );
+-    }
+ 
+     /*
+      * FXRSTOR can fault if passed a corrupted data block. We handle this
+@@ -169,11 +167,11 @@ static inline void fpu_fxsave(struct vcpu *v)
+                        : "=m" (*fpu_ctxt) : "R" (fpu_ctxt) );
+ 
+         /*
+-         * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
+-         * is pending.
++         * Some CPUs don't save/restore FDP/FIP/FOP unless an exception is
++         * pending.  In this case, the restore side will arrange safe values,
++         * and there is no point trying to collect FCS/FDS in addition.
+          */
+-        if ( !(fpu_ctxt->fsw & 0x0080) &&
+-             boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
++        if ( cpu_bug_fpu_ptrs && !(fpu_ctxt->fsw & 0x0080) )
+             return;
+ 
+         /*
+diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
+index 3293ef834f..10016a05d0 100644
+--- a/xen/arch/x86/xstate.c
++++ b/xen/arch/x86/xstate.c
+@@ -369,15 +369,14 @@ void xrstor(struct vcpu *v, uint64_t mask)
+     unsigned int faults, prev_faults;
+ 
+     /*
+-     * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
++     * Some CPUs don't save/restore FDP/FIP/FOP unless an exception
+      * is pending. Clear the x87 state here by setting it to fixed
+      * values. The hypervisor data segment can be sometimes 0 and
+      * sometimes new user value. Both should be ok. Use the FPU saved
+      * data block as a safe address because it should be in L1.
+      */
+-    if ( (mask & ptr->xsave_hdr.xstate_bv & X86_XCR0_FP) &&
+-         !(ptr->fpu_sse.fsw & ~ptr->fpu_sse.fcw & 0x003f) &&
+-         boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
++    if ( cpu_bug_fpu_ptrs &&
++         !(ptr->fpu_sse.fsw & ~ptr->fpu_sse.fcw & 0x003f) )
+         asm volatile ( "fnclex\n\t"        /* clear exceptions */
+                        "ffree %%st(7)\n\t" /* clear stack tag */
+                        "fildl %0"          /* load to clear state */
+diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
+index 7e1ff17ad4..00d22caac7 100644
+--- a/xen/include/asm-x86/cpufeature.h
++++ b/xen/include/asm-x86/cpufeature.h
+@@ -117,6 +117,9 @@
+ #define cpu_has_no_xpti         boot_cpu_has(X86_FEATURE_NO_XPTI)
+ #define cpu_has_xen_lbr         boot_cpu_has(X86_FEATURE_XEN_LBR)
+ 
++/* Bugs. */
++#define cpu_bug_fpu_ptrs        boot_cpu_has(X86_BUG_FPU_PTRS)
++
+ enum _cache_type {
+     CACHE_TYPE_NULL = 0,
+     CACHE_TYPE_DATA = 1,
+diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
+index ab3650f73b..91eccf5161 100644
+--- a/xen/include/asm-x86/cpufeatures.h
++++ b/xen/include/asm-x86/cpufeatures.h
+@@ -43,5 +43,7 @@ XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
+ #define X86_NR_BUG 1
+ #define X86_BUG(x) ((FSCAPINTS + X86_NR_SYNTH) * 32 + (x))
+ 
++#define X86_BUG_FPU_PTRS          X86_BUG( 0) /* (F)X{SAVE,RSTOR} doesn't save/restore FOP/FIP/FDP. */
++
+ /* Total number of capability words, inc synth and bug words. */
+ #define NCAPINTS (FSCAPINTS + X86_NR_SYNTH + X86_NR_BUG) /* N 32-bit words worth of info */
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index f2ec470179..48d8d1f4e2 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -237,6 +237,8 @@ XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF Read Only interface */
+ 
+ /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
+ XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
++XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A  (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */
++XEN_CPUFEATURE(WBNOINVD,      8*32+ 9) /*   WBNOINVD instruction */
+ XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
+ 
+ /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-cpu-intel-Clear-cache-self-snoop-capability-in-CPUs-with-known-errata.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-cpu-intel-Clear-cache-self-snoop-capability-in-CPUs-with-known-errata.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-cpu-intel-Clear-cache-self-snoop-capability-in-CPUs-with-known-errata.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-cpu-intel-Clear-cache-self-snoop-capability-in-CPUs-with-known-errata.patch	2022-06-16 22:03:12.000000000 +0100
@@ -0,0 +1,100 @@
+From f2663ca2e5203bfa082b1d6d2721ad369e00426a Mon Sep 17 00:00:00 2001
+From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
+Date: Fri, 19 Jul 2019 13:50:38 +0200
+Subject: [PATCH] x86/cpu/intel: Clear cache self-snoop capability in CPUs with
+ known errata
+
+From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
+
+Processors which have self-snooping capability can handle conflicting
+memory type across CPUs by snooping its own cache. However, there exists
+CPU models in which having conflicting memory types still leads to
+unpredictable behavior, machine check errors, or hangs.
+
+Clear this feature on affected CPUs to prevent its use.
+
+Suggested-by: Alan Cox <alan.cox@intel.com>
+Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
+[Linux commit 1e03bff3600101bd9158d005e4313132e55bdec8]
+
+Strip Yonah - as per ark.intel.com it doesn't look to be 64-bit capable.
+Call the new function on the boot CPU only. Don't clear the CPU feature
+flag itself, as it is exposed to guests (who could otherwise observe it
+disappear after migration).
+
+Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/arch/x86/cpu/intel.c          | 35 ++++++++++++++++++++++++++++++-
+ xen/include/asm-x86/cpufeatures.h |  1 +
+ 2 files changed, 35 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
+index 0dd8f98607..5356a6ae10 100644
+--- a/xen/arch/x86/cpu/intel.c
++++ b/xen/arch/x86/cpu/intel.c
+@@ -15,6 +15,36 @@
+ 
+ #include "cpu.h"
+ 
++/*
++ * Processors which have self-snooping capability can handle conflicting
++ * memory type across CPUs by snooping its own cache. However, there exists
++ * CPU models in which having conflicting memory types still leads to
++ * unpredictable behavior, machine check errors, or hangs. Clear this
++ * feature to prevent its use on machines with known erratas.
++ */
++static void __init check_memory_type_self_snoop_errata(void)
++{
++	if (!boot_cpu_has(X86_FEATURE_SS))
++		return;
++
++	switch (boot_cpu_data.x86_model) {
++	case 0x0f: /* Merom */
++	case 0x16: /* Merom L */
++	case 0x17: /* Penryn */
++	case 0x1d: /* Dunnington */
++	case 0x1e: /* Nehalem */
++	case 0x1f: /* Auburndale / Havendale */
++	case 0x1a: /* Nehalem EP */
++	case 0x2e: /* Nehalem EX */
++	case 0x25: /* Westmere */
++	case 0x2c: /* Westmere EP */
++	case 0x2a: /* SandyBridge */
++		return;
++	}
++
++	setup_force_cpu_cap(X86_FEATURE_XEN_SELFSNOOP);
++}
++
+ /*
+  * Set caps in expected_levelling_cap, probe a specific masking MSR, and set
+  * caps in levelling_caps if it is found, or clobber the MSR index if missing.
+@@ -257,8 +287,11 @@ static void early_init_intel(struct cpuinfo_x86 *c)
+ 	    (boot_cpu_data.x86_mask == 3 || boot_cpu_data.x86_mask == 4))
+ 		paddr_bits = 36;
+ 
+-	if (c == &boot_cpu_data)
++	if (c == &boot_cpu_data) {
++		check_memory_type_self_snoop_errata();
++
+ 		intel_init_levelling();
++	}
+ 
+ 	ctxt_switch_levelling(NULL);
+ }
+diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
+index 996f89df9a..57f3e61fd5 100644
+--- a/xen/include/asm-x86/cpufeatures.h
++++ b/xen/include/asm-x86/cpufeatures.h
+@@ -38,6 +38,7 @@ XEN_CPUFEATURE(SC_MSR_PV,       (FSCAPINTS+0)*32+16) /* MSR_SPEC_CTRL used by Xe
+ XEN_CPUFEATURE(SC_VERW_PV,        X86_SYNTH(23)) /* VERW used by Xen for PV */
+ XEN_CPUFEATURE(SC_VERW_HVM,       X86_SYNTH(24)) /* VERW used by Xen for HVM */
+ XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
++XEN_CPUFEATURE(XEN_SELFSNOOP,     X86_SYNTH(26)) /* SELFSNOOP gets used by Xen itself */
+ 
+ /* Bug words follow the synthetic words. */
+ #define X86_NR_BUG 1
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-feature-Generalise-synth-and-introduce-a-bug-word.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-feature-Generalise-synth-and-introduce-a-bug-word.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-feature-Generalise-synth-and-introduce-a-bug-word.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-feature-Generalise-synth-and-introduce-a-bug-word.patch	2022-06-16 09:36:03.000000000 +0100
@@ -0,0 +1,96 @@
+From 6408ae3f80287e194cd66218f28edcec939b6fca Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 27 Dec 2018 15:13:55 +0000
+Subject: [PATCH] x86/feature: Generalise synth and introduce a bug word
+
+Future changes are going to want to use cpu_bug_* in a mannor similar to
+Linux.  Introduce one bug word, and generalise the calculation of
+NCAPINTS.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/include/asm-x86/cpufeatures.h | 67 ++++++++++++++++++-------------
+ 1 file changed, 38 insertions(+), 29 deletions(-)
+
+diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
+index 57f3e61fd5..ab3650f73b 100644
+--- a/xen/include/asm-x86/cpufeatures.h
++++ b/xen/include/asm-x86/cpufeatures.h
+@@ -4,35 +4,44 @@
+ 
+ #include <asm/cpuid-autogen.h>
+ 
++/* Number of capability words covered by the featureset words. */
+ #define FSCAPINTS FEATURESET_NR_ENTRIES
+ 
+-#define NCAPINTS (FSCAPINTS + 1) /* N 32-bit words worth of info */
++/* Synthetic words follow the featureset words. */
++#define X86_NR_SYNTH 1
++#define X86_SYNTH(x) (FSCAPINTS * 32 + (x))
+ 
+-/* Other features, Xen-defined mapping. */
+-/* This range is used for feature bits which conflict or are synthesized */
+-XEN_CPUFEATURE(CONSTANT_TSC,    (FSCAPINTS+0)*32+ 0) /* TSC ticks at a constant rate */
+-XEN_CPUFEATURE(NONSTOP_TSC,     (FSCAPINTS+0)*32+ 1) /* TSC does not stop in C states */
+-XEN_CPUFEATURE(ARAT,            (FSCAPINTS+0)*32+ 2) /* Always running APIC timer */
+-XEN_CPUFEATURE(ARCH_PERFMON,    (FSCAPINTS+0)*32+ 3) /* Intel Architectural PerfMon */
+-XEN_CPUFEATURE(TSC_RELIABLE,    (FSCAPINTS+0)*32+ 4) /* TSC is known to be reliable */
+-XEN_CPUFEATURE(XTOPOLOGY,       (FSCAPINTS+0)*32+ 5) /* cpu topology enum extensions */
+-XEN_CPUFEATURE(CPUID_FAULTING,  (FSCAPINTS+0)*32+ 6) /* cpuid faulting */
+-XEN_CPUFEATURE(CLFLUSH_MONITOR, (FSCAPINTS+0)*32+ 7) /* clflush reqd with monitor */
+-XEN_CPUFEATURE(APERFMPERF,      (FSCAPINTS+0)*32+ 8) /* APERFMPERF */
+-XEN_CPUFEATURE(MFENCE_RDTSC,    (FSCAPINTS+0)*32+ 9) /* MFENCE synchronizes RDTSC */
+-XEN_CPUFEATURE(XEN_SMEP,        (FSCAPINTS+0)*32+10) /* SMEP gets used by Xen itself */
+-XEN_CPUFEATURE(XEN_SMAP,        (FSCAPINTS+0)*32+11) /* SMAP gets used by Xen itself */
+-XEN_CPUFEATURE(LFENCE_DISPATCH, (FSCAPINTS+0)*32+12) /* lfence set as Dispatch Serialising */
+-XEN_CPUFEATURE(IND_THUNK_LFENCE,(FSCAPINTS+0)*32+13) /* Use IND_THUNK_LFENCE */
+-XEN_CPUFEATURE(IND_THUNK_JMP,   (FSCAPINTS+0)*32+14) /* Use IND_THUNK_JMP */
+-XEN_CPUFEATURE(XEN_IBPB,        (FSCAPINTS+0)*32+15) /* IBRSB || IBPB */
+-XEN_CPUFEATURE(SC_MSR_PV,       (FSCAPINTS+0)*32+16) /* MSR_SPEC_CTRL used by Xen for PV */
+-XEN_CPUFEATURE(SC_MSR_HVM,      (FSCAPINTS+0)*32+17) /* MSR_SPEC_CTRL used by Xen for HVM */
+-XEN_CPUFEATURE(SC_RSB_PV,       (FSCAPINTS+0)*32+18) /* RSB overwrite needed for PV */
+-XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPINTS+0)*32+19) /* RSB overwrite needed for HVM */
+-XEN_CPUFEATURE(NO_XPTI,         (FSCAPINTS+0)*32+20) /* XPTI mitigation not in use */
+-XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+21) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
+-XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+22) /* Xen uses MSR_DEBUGCTL.LBR */
+-XEN_CPUFEATURE(SC_VERW_PV,      (FSCAPINTS+0)*32+23) /* VERW used by Xen for PV */
+-XEN_CPUFEATURE(SC_VERW_HVM,     (FSCAPINTS+0)*32+24) /* VERW used by Xen for HVM */
+-XEN_CPUFEATURE(SC_VERW_IDLE,    (FSCAPINTS+0)*32+25) /* VERW used by Xen for idle */
++/* Synthetic features */
++XEN_CPUFEATURE(CONSTANT_TSC,      X86_SYNTH( 0)) /* TSC ticks at a constant rate */
++XEN_CPUFEATURE(NONSTOP_TSC,       X86_SYNTH( 1)) /* TSC does not stop in C states */
++XEN_CPUFEATURE(ARAT,              X86_SYNTH( 2)) /* Always running APIC timer */
++XEN_CPUFEATURE(ARCH_PERFMON,      X86_SYNTH( 3)) /* Intel Architectural PerfMon */
++XEN_CPUFEATURE(TSC_RELIABLE,      X86_SYNTH( 4)) /* TSC is known to be reliable */
++XEN_CPUFEATURE(XTOPOLOGY,         X86_SYNTH( 5)) /* cpu topology enum extensions */
++XEN_CPUFEATURE(CPUID_FAULTING,    X86_SYNTH( 6)) /* cpuid faulting */
++XEN_CPUFEATURE(CLFLUSH_MONITOR,   X86_SYNTH( 7)) /* clflush reqd with monitor */
++XEN_CPUFEATURE(APERFMPERF,        X86_SYNTH( 8)) /* APERFMPERF */
++XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
++XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
++XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
++XEN_CPUFEATURE(LFENCE_DISPATCH,   X86_SYNTH(12)) /* lfence set as Dispatch Serialising */
++XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
++XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
++XEN_CPUFEATURE(XEN_IBPB,          X86_SYNTH(15)) /* IBRSB || IBPB */
++XEN_CPUFEATURE(SC_MSR_PV,         X86_SYNTH(16)) /* MSR_SPEC_CTRL used by Xen for PV */
++XEN_CPUFEATURE(SC_MSR_HVM,        X86_SYNTH(17)) /* MSR_SPEC_CTRL used by Xen for HVM */
++XEN_CPUFEATURE(SC_RSB_PV,         X86_SYNTH(18)) /* RSB overwrite needed for PV */
++XEN_CPUFEATURE(SC_RSB_HVM,        X86_SYNTH(19)) /* RSB overwrite needed for HVM */
++XEN_CPUFEATURE(NO_XPTI,           X86_SYNTH(20)) /* XPTI mitigation not in use */
++XEN_CPUFEATURE(SC_MSR_IDLE,       X86_SYNTH(21)) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
++XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
++XEN_CPUFEATURE(SC_VERW_PV,        X86_SYNTH(23)) /* VERW used by Xen for PV */
++XEN_CPUFEATURE(SC_VERW_HVM,       X86_SYNTH(24)) /* VERW used by Xen for HVM */
++XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
++
++/* Bug words follow the synthetic words. */
++#define X86_NR_BUG 1
++#define X86_BUG(x) ((FSCAPINTS + X86_NR_SYNTH) * 32 + (x))
++
++/* Total number of capability words, inc synth and bug words. */
++#define NCAPINTS (FSCAPINTS + X86_NR_SYNTH + X86_NR_BUG) /* N 32-bit words worth of info */
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-pv-Options-to-disable-and-or-compile-out-32bit-PV-support.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-pv-Options-to-disable-and-or-compile-out-32bit-PV-support.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-pv-Options-to-disable-and-or-compile-out-32bit-PV-support.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/x86-pv-Options-to-disable-and-or-compile-out-32bit-PV-support.patch	2022-06-16 16:13:08.000000000 +0100
@@ -0,0 +1,207 @@
+From 68d757df8dd23b88bebfb6a56c9f51df59de969f Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 17 Apr 2020 12:39:40 +0100
+Subject: [PATCH] x86/pv: Options to disable and/or compile out 32bit PV
+ support
+MIME-Version: 1.0
+Content-Type: text/plain; charset=utf8
+Content-Transfer-Encoding: 8bit
+
+This is the start of some performance and security-hardening improvements,
+based on the fact that 32bit PV guests are few and far between these days.
+
+Ring1 is full of architectural corner cases, such as counting as supervisor
+from a paging point of view.  This accounts for a substantial performance hit
+on processors from the last 8 years (adjusting SMEP/SMAP on every privilege
+transition), and the gap is only going to get bigger with new hardware
+features.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Wei Liu <wl@xen.org>
+Reviewed-by: Roger Pau MonnÃ© <roger.pau@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+---
+ docs/misc/xen-command-line.markdown | 12 ++++++++++-
+ xen/arch/x86/Kconfig                | 16 +++++++++++++++
+ xen/arch/x86/pv/domain.c            | 34 +++++++++++++++++++++++++++++++
+ xen/arch/x86/setup.c                |  9 ++++++--
+ xen/include/asm-x86/pv/domain.h     |  6 ++++++
+ xen/include/xen/param.h             |  9 ++++++++
+ 6 files changed, 83 insertions(+), 3 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
+index acd0b3d994..ee12b0f53f 100644
+--- a/docs/misc/xen-command-line.markdown
++++ b/docs/misc/xen-command-line.markdown
+@@ -1592,7 +1592,17 @@ The following resources are available:
+     CDP, one COS will corespond two CBMs other than one with CAT, due to the
+     sum of CBMs is fixed, that means actual `cos_max` in use will automatically
+     reduce to half when CDP is enabled.
+-	
++
++### pv
++    = List of [ 32=<bool> ]
++
++    Applicability: x86
++
++Controls for aspects of PV guest support.
++
++*   The `32` boolean controls whether 32bit PV guests can be created.  It
++    defaults to `true`, and is ignored when `CONFIG_PV32` is compiled out.
++
+ ### pv-linear-pt (x86)
+ > `= <boolean>`
+ 
+diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
+index a69be983d6..96432f1f69 100644
+--- a/xen/arch/x86/Kconfig
++++ b/xen/arch/x86/Kconfig
+@@ -37,6 +37,22 @@ config PV
+ config PV
+ 	def_bool y
+ 
++config PV32
++	bool "Support for 32bit PV guests"
++	depends on PV
++	default y
++	---help---
++	  The 32bit PV ABI uses Ring1, an area of the x86 architecture which
++	  was deprecated and mostly removed in the AMD64 spec.  As a result,
++	  it occasionally conflicts with newer x86 hardware features, causing
++	  overheads for Xen to maintain backwards compatibility.
++
++	  People may wish to disable 32bit PV guests for attack surface
++	  reduction, or performance reasons.  Backwards compatibility can be
++	  provided via the PV Shim mechanism.
++
++	  If unsure, say Y.
++
+ config PV_LINEAR_PT
+        bool "Support for PV linear pagetables"
+        depends on PV
+diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
+index 43da5c179f..3579dc063e 100644
+--- a/xen/arch/x86/pv/domain.c
++++ b/xen/arch/x86/pv/domain.c
+@@ -16,6 +16,38 @@
+ #include <asm/pv/domain.h>
+ #include <asm/shadow.h>
+ 
++#ifdef CONFIG_PV32
++int8_t __read_mostly opt_pv32 = -1;
++#endif
++
++static __init int parse_pv(const char *s)
++{
++    const char *ss;
++    int val, rc = 0;
++
++    do {
++        ss = strchr(s, ',');
++        if ( !ss )
++            ss = strchr(s, '\0');
++
++        if ( (val = parse_boolean("32", s, ss)) >= 0 )
++        {
++#ifdef CONFIG_PV32
++            opt_pv32 = val;
++#else
++            no_config_param("PV32", "pv", s, ss);
++#endif
++        }
++        else
++            rc = -EINVAL;
++
++        s = ss + 1;
++    } while ( *ss );
++
++    return rc;
++}
++custom_param("pv", parse_pv);
++
+ static __read_mostly enum {
+     PCID_OFF,
+     PCID_ALL,
+@@ -161,6 +193,8 @@ int switch_compat(struct domain *d)
+ 
+     BUILD_BUG_ON(offsetof(struct shared_info, vcpu_info) != 0);
+ 
++    if ( !opt_pv32 )
++        return -EOPNOTSUPP;
+     if ( is_hvm_domain(d) || d->tot_pages != 0 )
+         return -EACCES;
+     if ( is_pv_32bit_domain(d) )
+diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
+index eb56d78c2f..9e9576344c 100644
+--- a/xen/arch/x86/setup.c
++++ b/xen/arch/x86/setup.c
+@@ -53,6 +53,7 @@
+ #include <asm/cpuid.h>
+ #include <asm/spec_ctrl.h>
+ #include <asm/guest.h>
++#include <asm/pv/domain.h>
+ 
+ /* opt_nosmp: If true, secondary processors are ignored. */
+ static bool __initdata opt_nosmp;
+@@ -1807,8 +1808,12 @@ void arch_get_xen_caps(xen_capabilities_info_t *info)
+ 
+     snprintf(s, sizeof(s), "xen-%d.%d-x86_64 ", major, minor);
+     safe_strcat(*info, s);
+-    snprintf(s, sizeof(s), "xen-%d.%d-x86_32p ", major, minor);
+-    safe_strcat(*info, s);
++
++    if ( opt_pv32 )
++    {
++        snprintf(s, sizeof(s), "xen-%d.%d-x86_32p ", major, minor);
++        safe_strcat(*info, s);
++    }
+     if ( hvm_enabled )
+     {
+         snprintf(s, sizeof(s), "hvm-%d.%d-x86_32 ", major, minor);
+diff --git a/xen/include/asm-x86/pv/domain.h b/xen/include/asm-x86/pv/domain.h
+index 7a69bfb303..df9716ff26 100644
+--- a/xen/include/asm-x86/pv/domain.h
++++ b/xen/include/asm-x86/pv/domain.h
+@@ -21,6 +21,12 @@
+ #ifndef __X86_PV_DOMAIN_H__
+ #define __X86_PV_DOMAIN_H__
+ 
++#ifdef CONFIG_PV32
++extern int8_t opt_pv32;
++#else
++# define opt_pv32 false
++#endif
++
+ /*
+  * PCID values for the address spaces of 64-bit pv domains:
+  *
+diff --git a/xen/include/xen/param.h b/xen/include/xen/param.h
+index d4578cd27f..a1dc3ba8f0 100644
+--- a/xen/include/xen/param.h
++++ b/xen/include/xen/param.h
+@@ -2,6 +2,8 @@
+ #define _XEN_PARAM_H
+ 
+ #include <xen/init.h>
++#include <xen/lib.h>
++#include <xen/string.h>
+ 
+ /*
+  * Used for kernel command line parameter setup
+@@ -116,4 +118,13 @@ extern const struct kernel_param __param_start[], __param_end[];
+     string_param(_name, _var); \
+     string_runtime_only_param(_name, _var)
+ 
++static inline void no_config_param(const char *cfg, const char *param,
++                                   const char *s, const char *e)
++{
++    int len = e ? ({ ASSERT(e >= s); e - s; }) : strlen(s);
++
++    printk(XENLOG_INFO "CONFIG_%s disabled - ignoring '%s=%*s' setting\n",
++           cfg, param, len, s);
++}
++
+ #endif /* _XEN_PARAM_H */
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-Add-generic-assembly-macros.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-Add-generic-assembly-macros.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-Add-generic-assembly-macros.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-Add-generic-assembly-macros.patch	2022-06-19 23:08:45.000000000 +0100
@@ -0,0 +1,66 @@
+From bb2e9fc7df592753e1fd73b4fec21c375cd3e2e1 Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien.grall@arm.com>
+Date: Tue, 12 Jun 2018 12:36:39 +0100
+Subject: [PATCH] xen/arm64: Add generic assembly macros
+
+Add assembly macros to simplify assembly code:
+    - adr_cpu_info: Get the address to the current cpu_info structure
+    - ldr_this_cpu: Load a per-cpu value
+
+This is part of XSA-263.
+
+Signed-off-by: Julien Grall <julien.grall@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ xen/include/asm-arm/arm64/macros.h | 25 +++++++++++++++++++++++++
+ xen/include/asm-arm/macros.h       |  2 +-
+ 2 files changed, 26 insertions(+), 1 deletion(-)
+ create mode 100644 xen/include/asm-arm/arm64/macros.h
+
+diff --git a/xen/include/asm-arm/arm64/macros.h b/xen/include/asm-arm/arm64/macros.h
+new file mode 100644
+index 0000000000..9c5e676b37
+--- /dev/null
++++ b/xen/include/asm-arm/arm64/macros.h
+@@ -0,0 +1,25 @@
++#ifndef __ASM_ARM_ARM64_MACROS_H
++#define __ASM_ARM_ARM64_MACROS_H
++
++    /*
++     * @dst: Result of get_cpu_info()
++     */
++    .macro  adr_cpu_info, dst
++    add     \dst, sp, #STACK_SIZE
++    and     \dst, \dst, #~(STACK_SIZE - 1)
++    sub     \dst, \dst, #CPUINFO_sizeof
++    .endm
++
++    /*
++     * @dst: Result of READ_ONCE(per_cpu(sym, smp_processor_id()))
++     * @sym: The name of the per-cpu variable
++     * @tmp: scratch register
++     */
++    .macro  ldr_this_cpu, dst, sym, tmp
++    ldr     \dst, =per_cpu__\sym
++    mrs     \tmp, tpidr_el2
++    ldr     \dst, [\dst, \tmp]
++    .endm
++
++#endif /* __ASM_ARM_ARM64_MACROS_H */
++
+diff --git a/xen/include/asm-arm/macros.h b/xen/include/asm-arm/macros.h
+index 5d837cb38b..1d4bb41d15 100644
+--- a/xen/include/asm-arm/macros.h
++++ b/xen/include/asm-arm/macros.h
+@@ -8,7 +8,7 @@
+ #if defined (CONFIG_ARM_32)
+ # include <asm/arm32/macros.h>
+ #elif defined(CONFIG_ARM_64)
+-/* No specific ARM64 macros for now */
++# include <asm/arm64/macros.h>
+ #else
+ # error "unknown ARM variant"
+ #endif
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-entry-Use-named-label-in-guest_sync.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-entry-Use-named-label-in-guest_sync.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-entry-Use-named-label-in-guest_sync.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-entry-Use-named-label-in-guest_sync.patch	2022-06-05 21:39:43.000000000 +0100
@@ -0,0 +1,54 @@
+From beb8ae4e767f8fe1d982127ba9049c5f3b2bd5b6 Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien.grall@arm.com>
+Date: Tue, 12 Jun 2018 12:36:32 +0100
+Subject: [PATCH] xen/arm64: entry: Use named label in guest_sync
+
+This will improve readability for future changes.
+
+This is part of XSA-263.
+
+Signed-off-by: Julien Grall <julien.grall@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ xen/arch/arm/arm64/entry.S | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
+index ffa9a1c492..e2344e565f 100644
+--- a/xen/arch/arm/arm64/entry.S
++++ b/xen/arch/arm/arm64/entry.S
+@@ -266,11 +266,11 @@ guest_sync:
+         mrs     x1, esr_el2
+         lsr     x1, x1, #HSR_EC_SHIFT           /* x1 = ESR_EL2.EC */
+         cmp     x1, #HSR_EC_HVC64
+-        b.ne    1f                              /* Not a HVC skip fastpath. */
++        b.ne    guest_sync_slowpath             /* Not a HVC skip fastpath. */
+ 
+         mrs     x1, esr_el2
+         and     x1, x1, #0xffff                 /* Check the immediate [0:16] */
+-        cbnz    x1, 1f                          /* should be 0 for HVC #0 */
++        cbnz    x1, guest_sync_slowpath         /* should be 0 for HVC #0 */
+ 
+         /*
+          * Fastest path possible for ARM_SMCCC_ARCH_WORKAROUND_1.
+@@ -281,7 +281,7 @@ guest_sync:
+          * be encoded as an immediate for cmp.
+          */
+         eor     w0, w0, #ARM_SMCCC_ARCH_WORKAROUND_1_FID
+-        cbnz    w0, 1f
++        cbnz    w0, guest_sync_slowpath
+ 
+         /*
+          * Clobber both x0 and x1 to prevent leakage. Note that thanks
+@@ -291,7 +291,7 @@ guest_sync:
+         eret
+         sb
+ 
+-1:
++guest_sync_slowpath:
+         /*
+          * x0/x1 may have been scratch by the fast path above, so avoid
+          * to save them.
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-Implement-a-fast-path-for-handling-SMCCC_ARCH_WORKAROUND_2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-Implement-a-fast-path-for-handling-SMCCC_ARCH_WORKAROUND_2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-Implement-a-fast-path-for-handling-SMCCC_ARCH_WORKAROUND_2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm64-Implement-a-fast-path-for-handling-SMCCC_ARCH_WORKAROUND_2.patch	2022-06-05 21:45:55.000000000 +0100
@@ -0,0 +1,153 @@
+From 6dec2c87c4d7b2f03806266c5ceff82b69792a17 Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien.grall@arm.com>
+Date: Tue, 12 Jun 2018 12:36:40 +0100
+Subject: [PATCH] xen/arm64: Implement a fast path for handling
+ SMCCC_ARCH_WORKAROUND_2
+
+The function ARM_SMCCC_ARCH_WORKAROUND_2 will be called by the guest for
+enabling/disabling the ssbd mitigation. So we want the handling to
+be as fast as possible.
+
+The new sequence will forward guest's ARCH_WORKAROUND_2 call to EL3 and
+also track the state of the workaround per-vCPU.
+
+Note that since we need to execute branches, this always executes after
+the spectre-v2 mitigation.
+
+This code is based on KVM counterpart "arm64: KVM: Handle guest's
+ARCH_WORKAROUND_2 requests" written by Marc Zyngier.
+
+This is part of XSA-263.
+
+Signed-off-by: Julien Grall <julien.grall@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ xen/arch/arm/arm64/asm-offsets.c |  2 ++
+ xen/arch/arm/arm64/entry.S       | 42 +++++++++++++++++++++++++++++++-
+ xen/arch/arm/cpuerrata.c         | 18 ++++++++++++++
+ 3 files changed, 61 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/arm/arm64/asm-offsets.c b/xen/arch/arm/arm64/asm-offsets.c
+index ce24e44473..f5c696d092 100644
+--- a/xen/arch/arm/arm64/asm-offsets.c
++++ b/xen/arch/arm/arm64/asm-offsets.c
+@@ -22,6 +22,7 @@
+ void __dummy__(void)
+ {
+    OFFSET(UREGS_X0, struct cpu_user_regs, x0);
++   OFFSET(UREGS_X1, struct cpu_user_regs, x1);
+    OFFSET(UREGS_LR, struct cpu_user_regs, lr);
+ 
+    OFFSET(UREGS_SP, struct cpu_user_regs, sp);
+@@ -45,6 +46,7 @@ void __dummy__(void)
+    BLANK();
+ 
+    DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
++   OFFSET(CPUINFO_flags, struct cpu_info, flags);
+ 
+    OFFSET(VCPU_arch_saved_context, struct vcpu, arch.saved_context);
+ 
+diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
+index e2344e565f..97b05f53ea 100644
+--- a/xen/arch/arm/arm64/entry.S
++++ b/xen/arch/arm/arm64/entry.S
+@@ -1,4 +1,6 @@
+ #include <asm/asm_defns.h>
++#include <asm/current.h>
++#include <asm/macros.h>
+ #include <asm/regs.h>
+ #include <asm/alternative.h>
+ #include <asm/smccc.h>
+@@ -281,7 +283,7 @@ guest_sync:
+          * be encoded as an immediate for cmp.
+          */
+         eor     w0, w0, #ARM_SMCCC_ARCH_WORKAROUND_1_FID
+-        cbnz    w0, guest_sync_slowpath
++        cbnz    w0, check_wa2
+ 
+         /*
+          * Clobber both x0 and x1 to prevent leakage. Note that thanks
+@@ -291,6 +293,44 @@ guest_sync:
+         eret
+         sb
+ 
++check_wa2:
++        /* ARM_SMCCC_ARCH_WORKAROUND_2 handling */
++        eor     w0, w0, #(ARM_SMCCC_ARCH_WORKAROUND_1_FID ^ ARM_SMCCC_ARCH_WORKAROUND_2_FID)
++        cbnz    w0, guest_sync_slowpath
++#ifdef CONFIG_ARM_SSBD
++alternative_cb arm_enable_wa2_handling
++        b       wa2_end
++alternative_cb_end
++        /* Sanitize the argument */
++        mov     x0, #-(UREGS_kernel_sizeof - UREGS_X1)  /* x0 := offset of guest's x1 on the stack */
++        ldr     x1, [sp, x0]                            /* Load guest's x1 */
++        cmp     w1, wzr
++        cset    x1, ne
++
++        /*
++         * Update the guest flag. At this stage sp point after the field
++         * guest_cpu_user_regs in cpu_info.
++         */
++        adr_cpu_info x2
++        ldr     x0, [x2, #CPUINFO_flags]
++        bfi     x0, x1, #CPUINFO_WORKAROUND_2_FLAG_SHIFT, #1
++        str     x0, [x2, #CPUINFO_flags]
++
++        /* Check that we actually need to perform the call */
++        ldr_this_cpu x0, ssbd_callback_required, x2
++        cbz     x0, wa2_end
++
++        mov     w0, #ARM_SMCCC_ARCH_WORKAROUND_2_FID
++        smc     #0
++
++wa2_end:
++        /* Don't leak data from the SMC call */
++        mov     x1, xzr
++        mov     x2, xzr
++        mov     x3, xzr
++#endif /* !CONFIG_ARM_SSBD */
++        mov     x0, xzr
++        eret
+ guest_sync_slowpath:
+         /*
+          * x0/x1 may have been scratch by the fast path above, so avoid
+diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
+index 1e642c416a..97a118293b 100644
+--- a/xen/arch/arm/cpuerrata.c
++++ b/xen/arch/arm/cpuerrata.c
+@@ -9,6 +9,7 @@
+ #include <xen/warning.h>
+ #include <asm/cpufeature.h>
+ #include <asm/cpuerrata.h>
++#include <asm/insn.h>
+ #include <asm/psci.h>
+ 
+ /* Override macros from asm/page.h to make them work with mfn_t */
+@@ -274,6 +275,23 @@ static int __init parse_spec_ctrl(const char *s)
+ }
+ custom_param("spec-ctrl", parse_spec_ctrl);
+ 
++/* Arm64 only for now as for Arm32 the workaround is currently handled in C. */
++#ifdef CONFIG_ARM_64
++void __init arm_enable_wa2_handling(const struct alt_instr *alt,
++                                    const uint32_t *origptr,
++                                    uint32_t *updptr, int nr_inst)
++{
++    BUG_ON(nr_inst != 1);
++
++    /*
++     * Only allow mitigation on guest ARCH_WORKAROUND_2 if the SSBD
++     * state allow it to be flipped.
++     */
++    if ( get_ssbd_state() == ARM_SSBD_RUNTIME )
++        *updptr = aarch64_insn_gen_nop();
++}
++#endif
++
+ /*
+  * Assembly code may use the variable directly, so we need to make sure
+  * it fits in a register.
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-probing.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-probing.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-probing.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-probing.patch	2022-06-05 21:24:39.000000000 +0100
@@ -0,0 +1,192 @@
+From 280997891e8ca583f1b7a43297e197c0e4be8f0c Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien.grall@arm.com>
+Date: Tue, 12 Jun 2018 12:36:34 +0100
+Subject: [PATCH 1/1] xen/arm: Add ARCH_WORKAROUND_2 probing
+
+As for Spectre variant-2, we rely on SMCCC 1.1 to provide the discovery
+mechanism for detecting the SSBD mitigation.
+
+A new capability is also allocated for that purpose, and a config
+option.
+
+This is part of XSA-263.
+
+Signed-off-by: Julien Grall <julien.grall@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ xen/arch/arm/Kconfig             | 10 ++++++
+ xen/arch/arm/cpuerrata.c         | 58 ++++++++++++++++++++++++++++++++
+ xen/include/asm-arm/cpuerrata.h  | 21 ++++++++++++
+ xen/include/asm-arm/cpufeature.h |  3 +-
+ xen/include/asm-arm/smccc.h      |  7 ++++
+ 5 files changed, 98 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
+index 4dc7ef5351..2cbe9dd43b 100644
+--- a/xen/arch/arm/Kconfig
++++ b/xen/arch/arm/Kconfig
+@@ -73,6 +73,16 @@ config SBSA_VUART_CONSOLE
+ 	  Allows a guest to use SBSA Generic UART as a console. The
+ 	  SBSA Generic UART implements a subset of ARM PL011 UART.
+ 
++config ARM_SSBD
++	bool "Speculative Store Bypass Disable" if EXPERT = "y"
++	depends on HAS_ALTERNATIVE
++	default y
++	help
++	  This enables mitigation of bypassing of previous stores by speculative
++	  loads.
++
++	  If unsure, say Y.
++
+ endmenu
+ 
+ menu "ARM errata workaround via the alternative framework"
+diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
+index b829d226ef..03f78fec96 100644
+--- a/xen/arch/arm/cpuerrata.c
++++ b/xen/arch/arm/cpuerrata.c
+@@ -331,6 +331,58 @@ static int enable_ic_inv_hardening(void *data)
+ 
+ #endif
+ 
++#ifdef CONFIG_ARM_SSBD
++
++/*
++ * Assembly code may use the variable directly, so we need to make sure
++ * it fits in a register.
++ */
++DEFINE_PER_CPU_READ_MOSTLY(register_t, ssbd_callback_required);
++
++static bool has_ssbd_mitigation(const struct arm_cpu_capabilities *entry)
++{
++    struct arm_smccc_res res;
++    bool required;
++
++    if ( smccc_ver < SMCCC_VERSION(1, 1) )
++        return false;
++
++    /*
++     * The probe function return value is either negative (unsupported
++     * or mitigated), positive (unaffected), or zero (requires
++     * mitigation). We only need to do anything in the last case.
++     */
++    arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FID,
++                      ARM_SMCCC_ARCH_WORKAROUND_2_FID, &res);
++
++    switch ( (int)res.a0 )
++    {
++    case ARM_SMCCC_NOT_SUPPORTED:
++        return false;
++
++    case ARM_SMCCC_NOT_REQUIRED:
++        return false;
++
++    case ARM_SMCCC_SUCCESS:
++        required = true;
++        break;
++
++    case 1: /* Mitigation not required on this CPU. */
++        required = false;
++        break;
++
++    default:
++        ASSERT_UNREACHABLE();
++        return false;
++    }
++
++    if ( required )
++        this_cpu(ssbd_callback_required) = 1;
++
++    return required;
++}
++#endif
++
+ #define MIDR_RANGE(model, min, max)     \
+     .matches = is_affected_midr_range,  \
+     .midr_model = model,                \
+@@ -489,6 +541,12 @@ static const struct arm_cpu_capabilities arm_errata[] = {
+         MIDR_ALL_VERSIONS(MIDR_CORTEX_A15),
+         .enable = enable_ic_inv_hardening,
+     },
++#endif
++#ifdef CONFIG_ARM_SSBD
++    {
++        .capability = ARM_SSBD,
++        .matches = has_ssbd_mitigation,
++    },
+ #endif
+     {},
+ };
+diff --git a/xen/include/asm-arm/cpuerrata.h b/xen/include/asm-arm/cpuerrata.h
+index 4e45b237c8..e628d3ff56 100644
+--- a/xen/include/asm-arm/cpuerrata.h
++++ b/xen/include/asm-arm/cpuerrata.h
+@@ -27,9 +27,30 @@ static inline bool check_workaround_##erratum(void)             \
+ 
+ CHECK_WORKAROUND_HELPER(766422, ARM32_WORKAROUND_766422, CONFIG_ARM_32)
+ CHECK_WORKAROUND_HELPER(834220, ARM64_WORKAROUND_834220, CONFIG_ARM_64)
++CHECK_WORKAROUND_HELPER(ssbd, ARM_SSBD, CONFIG_ARM_SSBD)
+ 
+ #undef CHECK_WORKAROUND_HELPER
+ 
++#ifdef CONFIG_ARM_SSBD
++
++#include <asm/current.h>
++
++DECLARE_PER_CPU(register_t, ssbd_callback_required);
++
++static inline bool cpu_require_ssbd_mitigation(void)
++{
++    return this_cpu(ssbd_callback_required);
++}
++
++#else
++
++static inline bool cpu_require_ssbd_mitigation(void)
++{
++    return false;
++}
++
++#endif
++
+ #endif /* __ARM_CPUERRATA_H__ */
+ /*
+  * Local variables:
+diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
+index c5d046218b..3de6b54301 100644
+--- a/xen/include/asm-arm/cpufeature.h
++++ b/xen/include/asm-arm/cpufeature.h
+@@ -43,8 +43,9 @@
+ #define SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT 5
+ #define SKIP_CTXT_SWITCH_SERROR_SYNC 6
+ #define ARM_HARDEN_BRANCH_PREDICTOR 7
++#define ARM_SSBD 8
+ 
+-#define ARM_NCAPS           8
++#define ARM_NCAPS           9
+ 
+ #ifndef __ASSEMBLY__
+ 
+diff --git a/xen/include/asm-arm/smccc.h b/xen/include/asm-arm/smccc.h
+index 8342cc33fe..a6804cec99 100644
+--- a/xen/include/asm-arm/smccc.h
++++ b/xen/include/asm-arm/smccc.h
+@@ -258,7 +258,14 @@ struct arm_smccc_res {
+                       ARM_SMCCC_OWNER_ARCH,         \
+                       0x8000)
+ 
++#define ARM_SMCCC_ARCH_WORKAROUND_2_FID             \
++    ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,         \
++                       ARM_SMCCC_CONV_32,           \
++                       ARM_SMCCC_OWNER_ARCH,        \
++                       0x7FFF)
++
+ /* SMCCC error codes */
++#define ARM_SMCCC_NOT_REQUIRED          (-2)
+ #define ARM_SMCCC_ERR_UNKNOWN_FUNCTION  (-1)
+ #define ARM_SMCCC_NOT_SUPPORTED         (-1)
+ #define ARM_SMCCC_SUCCESS               (0)
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-support-for-guests.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-support-for-guests.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-support-for-guests.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-ARCH_WORKAROUND_2-support-for-guests.patch	2022-06-05 21:20:56.000000000 +0100
@@ -0,0 +1,187 @@
+From a7898e4c593f83cc5db419d99bdecc0b220bf4e3 Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien.grall@arm.com>
+Date: Tue, 12 Jun 2018 12:36:36 +0100
+Subject: [PATCH] xen/arm: Add ARCH_WORKAROUND_2 support for guests
+
+In order to offer ARCH_WORKAROUND_2 support to guests, we need to track the
+state of the workaround per-vCPU. The field 'pad' in cpu_info is now
+repurposed to store flags easily accessible in assembly.
+
+As the hypervisor will always run with the workaround enabled, we may
+need to enable (on guest exit) or disable (on guest entry) the
+workaround.
+
+A follow-up patch will add fastpath for the workaround for arm64 guests.
+
+Note that check_workaround_ssbd() is used instead of ssbd_get_state()
+because the former is implemented using an alternative. Thefore the code
+will be shortcut on affected platform.
+
+This is part of XSA-263.
+
+Signed-off-by: Julien Grall <julien.grall@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ xen/arch/arm/domain.c         |  8 ++++++++
+ xen/arch/arm/traps.c          | 20 +++++++++++++++++++
+ xen/arch/arm/vsmc.c           | 37 +++++++++++++++++++++++++++++++++++
+ xen/include/asm-arm/current.h |  6 +++++-
+ 4 files changed, 70 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
+index 5a2a9a6b83..4baecc2447 100644
+--- a/xen/arch/arm/domain.c
++++ b/xen/arch/arm/domain.c
+@@ -21,6 +21,7 @@
+ #include <xen/wait.h>
+ 
+ #include <asm/alternative.h>
++#include <asm/cpuerrata.h>
+ #include <asm/cpufeature.h>
+ #include <asm/current.h>
+ #include <asm/event.h>
+@@ -572,6 +573,13 @@ int vcpu_initialise(struct vcpu *v)
+     if ( (rc = vcpu_vtimer_init(v)) != 0 )
+         goto fail;
+ 
++    /*
++     * The workaround 2 (i.e SSBD mitigation) is enabled by default if
++     * supported.
++     */
++    if ( get_ssbd_state() == ARM_SSBD_RUNTIME )
++        v->arch.cpu_info->flags |= CPUINFO_WORKAROUND_2_FLAG;
++
+     return rc;
+ 
+ fail:
+diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
+index d71adfa745..e47ec8aad5 100644
+--- a/xen/arch/arm/traps.c
++++ b/xen/arch/arm/traps.c
+@@ -2021,10 +2021,23 @@ inject_abt:
+         inject_iabt_exception(regs, gva, hsr.len);
+ }
+ 
++static inline bool needs_ssbd_flip(struct vcpu *v)
++{
++    if ( !check_workaround_ssbd() )
++        return false;
++
++    return !(v->arch.cpu_info->flags & CPUINFO_WORKAROUND_2_FLAG) &&
++             cpu_require_ssbd_mitigation();
++}
++
+ static void enter_hypervisor_head(struct cpu_user_regs *regs)
+ {
+     if ( guest_mode(regs) )
+     {
++        /* If the guest has disabled the workaround, bring it back on. */
++        if ( needs_ssbd_flip(current) )
++            arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
++
+         /*
+          * If we pended a virtual abort, preserve it until it gets cleared.
+          * See ARM ARM DDI 0487A.j D1.14.3 (Virtual Interrupts) for details,
+@@ -2270,6 +2283,13 @@ void leave_hypervisor_tail(void)
+              */
+             SYNCHRONIZE_SERROR(SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT);
+ 
++            /*
++             * The hypervisor runs with the workaround always present.
++             * If the guest wants it disabled, so be it...
++             */
++            if ( needs_ssbd_flip(current) )
++                arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 0, NULL);
++
+             return;
+         }
+         local_irq_enable();
+diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c
+index 40a80d5760..c4ccae6030 100644
+--- a/xen/arch/arm/vsmc.c
++++ b/xen/arch/arm/vsmc.c
+@@ -18,6 +18,7 @@
+ #include <xen/lib.h>
+ #include <xen/types.h>
+ #include <public/arch-arm/smccc.h>
++#include <asm/cpuerrata.h>
+ #include <asm/cpufeature.h>
+ #include <asm/monitor.h>
+ #include <asm/regs.h>
+@@ -104,6 +105,23 @@ static bool handle_arch(struct cpu_user_regs *regs)
+             if ( cpus_have_cap(ARM_HARDEN_BRANCH_PREDICTOR) )
+                 ret = 0;
+             break;
++        case ARM_SMCCC_ARCH_WORKAROUND_2_FID:
++            switch ( get_ssbd_state() )
++            {
++            case ARM_SSBD_UNKNOWN:
++            case ARM_SSBD_FORCE_DISABLE:
++                break;
++
++            case ARM_SSBD_RUNTIME:
++                ret = ARM_SMCCC_SUCCESS;
++                break;
++
++            case ARM_SSBD_FORCE_ENABLE:
++            case ARM_SSBD_MITIGATED:
++                ret = ARM_SMCCC_NOT_REQUIRED;
++                break;
++            }
++            break;
+         }
+ 
+         set_user_reg(regs, 0, ret);
+@@ -114,6 +132,25 @@ static bool handle_arch(struct cpu_user_regs *regs)
+     case ARM_SMCCC_ARCH_WORKAROUND_1_FID:
+         /* No return value */
+         return true;
++
++    case ARM_SMCCC_ARCH_WORKAROUND_2_FID:
++    {
++        bool enable = (uint32_t)get_user_reg(regs, 1);
++
++        /*
++         * ARM_WORKAROUND_2_FID should only be called when mitigation
++         * state can be changed at runtime.
++         */
++        if ( unlikely(get_ssbd_state() != ARM_SSBD_RUNTIME) )
++            return true;
++
++        if ( enable )
++            get_cpu_info()->flags |= CPUINFO_WORKAROUND_2_FLAG;
++        else
++            get_cpu_info()->flags &= ~CPUINFO_WORKAROUND_2_FLAG;
++
++        return true;
++    }
+     }
+ 
+     return false;
+diff --git a/xen/include/asm-arm/current.h b/xen/include/asm-arm/current.h
+index 7a0971fdea..f9819b34fc 100644
+--- a/xen/include/asm-arm/current.h
++++ b/xen/include/asm-arm/current.h
+@@ -7,6 +7,10 @@
+ #include <asm/percpu.h>
+ #include <asm/processor.h>
+ 
++/* Tell whether the guest vCPU enabled Workaround 2 (i.e variant 4) */
++#define CPUINFO_WORKAROUND_2_FLAG_SHIFT   0
++#define CPUINFO_WORKAROUND_2_FLAG (_AC(1, U) << CPUINFO_WORKAROUND_2_FLAG_SHIFT)
++
+ #ifndef __ASSEMBLY__
+ 
+ struct vcpu;
+@@ -21,7 +25,7 @@ DECLARE_PER_CPU(struct vcpu *, curr_vcpu);
+ struct cpu_info {
+     struct cpu_user_regs guest_cpu_user_regs;
+     unsigned long elr;
+-    unsigned int pad;
++    uint32_t flags;
+ };
+ 
+ static inline struct cpu_info *get_cpu_info(void)
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-command-line-option-to-control-SSBD-mitigation.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-command-line-option-to-control-SSBD-mitigation.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-command-line-option-to-control-SSBD-mitigation.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Add-command-line-option-to-control-SSBD-mitigation.patch	2022-06-05 21:56:15.000000000 +0100
@@ -0,0 +1,246 @@
+From 07182e7d490aa6318a9d33706d8b40cbdb62e51d Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien.grall@arm.com>
+Date: Tue, 12 Jun 2018 12:36:35 +0100
+Subject: [PATCH] xen/arm: Add command line option to control SSBD mitigation
+
+On a system where the firmware implements ARCH_WORKAROUND_2, it may be
+useful to either permanently enable or disable the workaround for cases
+where the user decides that they'd rather not get a trap overhead, and
+keep the mitigation permanently on or off instead of switching it on
+exception entry/exit. In any case, default to mitigation being enabled.
+
+The new command line option is implemented as list of one option to
+follow x86 option and also allow to extend it more easily in the future.
+
+Note that for convenience, the full implemention of the workaround is
+done in the .matches callback.
+
+Lastly, a accessor is provided to know the state of the mitigation.
+
+After this patch, there are 3 methods complementing each other to find the
+state of the mitigation:
+    - The capability ARM_SSBD indicates the platform is affected by the
+      vulnerability. This will also return false if the user decide to force
+      disabled the mitigation (spec-ctrl="ssbd=force-disable"). The
+      capability is useful for putting shortcut in place using alternative.
+    - ssbd_state indicates the global state of the mitigation (e.g
+      unknown, force enable...). The global state is required to report
+      the state to a guest.
+    - The per-cpu ssbd_callback_required indicates whether a pCPU
+      requires to call the SMC. This allows to shortcut SMC call
+      and save an entry/exit to EL3.
+
+This is part of XSA-263.
+
+Signed-off-by: Julien Grall <julien.grall@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ docs/misc/xen-command-line.markdown | 18 ++++++
+ xen/arch/arm/cpuerrata.c            | 88 ++++++++++++++++++++++++++---
+ xen/include/asm-arm/cpuerrata.h     | 21 +++++++
+ 3 files changed, 120 insertions(+), 7 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
+index 8712a833a2..962028b6ed 100644
+--- a/docs/misc/xen-command-line.markdown
++++ b/docs/misc/xen-command-line.markdown
+@@ -1756,6 +1756,24 @@ enforces the maximum theoretically necessary timeout of 670ms. Any number
+ is being interpreted as a custom timeout in milliseconds. Zero or boolean
+ false disable the quirk workaround, which is also the default.
+ 
++### spec-ctrl (Arm)
++> `= List of [ ssbd=force-disable|runtime|force-enable ]`
++
++Controls for speculative execution sidechannel mitigations.
++
++The option `ssbd=` is used to control the state of Speculative Store
++Bypass Disable (SSBD) mitigation.
++
++* `ssbd=force-disable` will keep the mitigation permanently off. The guest
++will not be able to control the state of the mitigation.
++* `ssbd=runtime` will always turn on the mitigation when running in the
++hypervisor context. The guest will be to turn on/off the mitigation for
++itself by using the firmware interface ARCH\_WORKAROUND\_2.
++* `ssbd=force-enable` will keep the mitigation permanently on. The guest will
++not be able to control the state of the mitigation.
++
++By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
++
+ ### spec-ctrl (x86)
+ > `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
+ >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
+diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
+index 03f78fec96..1e642c416a 100644
+--- a/xen/arch/arm/cpuerrata.c
++++ b/xen/arch/arm/cpuerrata.c
+@@ -237,6 +237,41 @@ static int enable_ic_inv_hardening(void *data)
+ 
+ #ifdef CONFIG_ARM_SSBD
+ 
++enum ssbd_state ssbd_state = ARM_SSBD_RUNTIME;
++
++static int __init parse_spec_ctrl(const char *s)
++{
++    const char *ss;
++    int rc = 0;
++
++    do {
++        ss = strchr(s, ',');
++        if ( !ss )
++            ss = strchr(s, '\0');
++
++        if ( !strncmp(s, "ssbd=", 5) )
++        {
++            s += 5;
++
++            if ( !strncmp(s, "force-disable", ss - s) )
++                ssbd_state = ARM_SSBD_FORCE_DISABLE;
++            else if ( !strncmp(s, "runtime", ss - s) )
++                ssbd_state = ARM_SSBD_RUNTIME;
++            else if ( !strncmp(s, "force-enable", ss - s) )
++                ssbd_state = ARM_SSBD_FORCE_ENABLE;
++            else
++                rc = -EINVAL;
++        }
++        else
++            rc = -EINVAL;
++
++        s = ss + 1;
++    } while ( *ss );
++
++    return rc;
++}
++custom_param("spec-ctrl", parse_spec_ctrl);
++
+ /*
+  * Assembly code may use the variable directly, so we need to make sure
+  * it fits in a register.
+@@ -251,20 +286,17 @@ static bool has_ssbd_mitigation(const struct arm_cpu_capabilities *entry)
+     if ( smccc_ver < SMCCC_VERSION(1, 1) )
+         return false;
+ 
+-    /*
+-     * The probe function return value is either negative (unsupported
+-     * or mitigated), positive (unaffected), or zero (requires
+-     * mitigation). We only need to do anything in the last case.
+-     */
+     arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FID,
+                       ARM_SMCCC_ARCH_WORKAROUND_2_FID, &res);
+ 
+     switch ( (int)res.a0 )
+     {
+     case ARM_SMCCC_NOT_SUPPORTED:
++        ssbd_state = ARM_SSBD_UNKNOWN;
+         return false;
+ 
+     case ARM_SMCCC_NOT_REQUIRED:
++        ssbd_state = ARM_SSBD_MITIGATED;
+         return false;
+ 
+     case ARM_SMCCC_SUCCESS:
+@@ -280,8 +312,49 @@ static bool has_ssbd_mitigation(const struct arm_cpu_capabilities *entry)
+         return false;
+     }
+ 
+-    if ( required )
+-        this_cpu(ssbd_callback_required) = 1;
++    switch ( ssbd_state )
++    {
++    case ARM_SSBD_FORCE_DISABLE:
++    {
++        static bool once = true;
++
++        if ( once )
++            printk("%s disabled from command-line\n", entry->desc);
++        once = false;
++
++        arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 0, NULL);
++        required = false;
++
++        break;
++    }
++
++    case ARM_SSBD_RUNTIME:
++        if ( required )
++        {
++            this_cpu(ssbd_callback_required) = 1;
++            arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
++        }
++
++        break;
++
++    case ARM_SSBD_FORCE_ENABLE:
++    {
++        static bool once = true;
++
++        if ( once )
++            printk("%s forced from command-line\n", entry->desc);
++        once = false;
++
++        arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
++        required = true;
++
++        break;
++    }
++
++    default:
++        ASSERT_UNREACHABLE();
++        return false;
++    }
+ 
+     return required;
+ }
+@@ -390,6 +463,7 @@ static const struct arm_cpu_capabilities arm_errata[] = {
+ #endif
+ #ifdef CONFIG_ARM_SSBD
+     {
++        .desc = "Speculative Store Bypass Disabled",
+         .capability = ARM_SSBD,
+         .matches = has_ssbd_mitigation,
+     },
+diff --git a/xen/include/asm-arm/cpuerrata.h b/xen/include/asm-arm/cpuerrata.h
+index e628d3ff56..55ddfda272 100644
+--- a/xen/include/asm-arm/cpuerrata.h
++++ b/xen/include/asm-arm/cpuerrata.h
+@@ -31,10 +31,26 @@ CHECK_WORKAROUND_HELPER(ssbd, ARM_SSBD, CONFIG_ARM_SSBD)
+ 
+ #undef CHECK_WORKAROUND_HELPER
+ 
++enum ssbd_state
++{
++    ARM_SSBD_UNKNOWN,
++    ARM_SSBD_FORCE_DISABLE,
++    ARM_SSBD_RUNTIME,
++    ARM_SSBD_FORCE_ENABLE,
++    ARM_SSBD_MITIGATED,
++};
++
+ #ifdef CONFIG_ARM_SSBD
+ 
+ #include <asm/current.h>
+ 
++extern enum ssbd_state ssbd_state;
++
++static inline enum ssbd_state get_ssbd_state(void)
++{
++    return ssbd_state;
++}
++
+ DECLARE_PER_CPU(register_t, ssbd_callback_required);
+ 
+ static inline bool cpu_require_ssbd_mitigation(void)
+@@ -49,6 +65,11 @@ static inline bool cpu_require_ssbd_mitigation(void)
+     return false;
+ }
+ 
++static inline enum ssbd_state get_ssbd_state(void)
++{
++    return ARM_SSBD_UNKNOWN;
++}
++
+ #endif
+ 
+ #endif /* __ARM_CPUERRATA_H__ */
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-alternatives-Add-dynamic-patching-feature.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-alternatives-Add-dynamic-patching-feature.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-alternatives-Add-dynamic-patching-feature.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-alternatives-Add-dynamic-patching-feature.patch	2022-06-19 23:04:12.000000000 +0100
@@ -0,0 +1,246 @@
+From 3e9db39ea06e726c66c40cb8f2f0e5fa62de9c7c Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien.grall@arm.com>
+Date: Tue, 12 Jun 2018 12:36:38 +0100
+Subject: [PATCH] xen/arm: alternatives: Add dynamic patching feature
+
+This is based on the Linux commit dea5e2a4c5bc "arm64: alternatives: Add
+dynamic patching feature" written by Marc Zyngier:
+
+    We've so far relied on a patching infrastructure that only gave us
+    a single alternative, without any way to provide a range of potential
+    replacement instructions. For a single feature, this is an all or
+    nothing thing.
+
+    It would be interesting to have a more flexible grained way of patching the
+    kernel though, where we could dynamically tune the code that gets injected.
+
+    In order to achive this, let's introduce a new form of dynamic patching,
+    assiciating a callback to a patching site. This callback gets source and
+    target locations of the patching request, as well as the number of
+    instructions to be patched.
+
+    Dynamic patching is declared with the new ALTERNATIVE_CB and alternative_cb
+    directives:
+                    asm volatile(ALTERNATIVE_CB("mov %0, #0\n", callback)
+                                 : "r" (v));
+    or
+
+                    alternative_cb callback
+                            mov x0, #0
+                    alternative_cb_end
+
+    where callback is the C function computing the alternative.
+
+    Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
+    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
+    Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
+
+This is part of XSA-263.
+
+Signed-off-by: Julien Grall <julien.grall@arm.com>
+Acked-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ xen/arch/arm/alternative.c        | 48 +++++++++++++++++++++++--------
+ xen/include/asm-arm/alternative.h | 44 ++++++++++++++++++++++++----
+ 2 files changed, 75 insertions(+), 17 deletions(-)
+
+diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
+index 936cf04956..52ed7edf69 100644
+--- a/xen/arch/arm/alternative.c
++++ b/xen/arch/arm/alternative.c
+@@ -30,6 +30,8 @@
+ #include <asm/byteorder.h>
+ #include <asm/cpufeature.h>
+ #include <asm/insn.h>
++/* XXX: Move ARCH_PATCH_INSN_SIZE out of livepatch.h */
++#include <asm/livepatch.h>
+ #include <asm/page.h>
+ 
+ /* Override macros from asm/page.h to make them work with mfn_t */
+@@ -94,6 +96,23 @@ static u32 get_alt_insn(const struct alt_instr *alt,
+     return insn;
+ }
+ 
++static void patch_alternative(const struct alt_instr *alt,
++                              const uint32_t *origptr,
++                              uint32_t *updptr, int nr_inst)
++{
++    const uint32_t *replptr;
++    unsigned int i;
++
++    replptr = ALT_REPL_PTR(alt);
++    for ( i = 0; i < nr_inst; i++ )
++    {
++        uint32_t insn;
++
++        insn = get_alt_insn(alt, origptr + i, replptr + i);
++        updptr[i] = cpu_to_le32(insn);
++    }
++}
++
+ /*
+  * The region patched should be read-write to allow __apply_alternatives
+  * to replacing the instructions when necessary.
+@@ -105,33 +124,38 @@ static int __apply_alternatives(const struct alt_region *region,
+                                 paddr_t update_offset)
+ {
+     const struct alt_instr *alt;
+-    const u32 *replptr, *origptr;
++    const u32 *origptr;
+     u32 *updptr;
++    alternative_cb_t alt_cb;
+ 
+     printk(XENLOG_INFO "alternatives: Patching with alt table %p -> %p\n",
+            region->begin, region->end);
+ 
+     for ( alt = region->begin; alt < region->end; alt++ )
+     {
+-        u32 insn;
+-        int i, nr_inst;
++        int nr_inst;
+ 
+-        if ( !cpus_have_cap(alt->cpufeature) )
++        /* Use ARM_CB_PATCH as an unconditional patch */
++        if ( alt->cpufeature < ARM_CB_PATCH &&
++             !cpus_have_cap(alt->cpufeature) )
+             continue;
+ 
+-        BUG_ON(alt->alt_len != alt->orig_len);
++        if ( alt->cpufeature == ARM_CB_PATCH )
++            BUG_ON(alt->alt_len != 0);
++        else
++            BUG_ON(alt->alt_len != alt->orig_len);
+ 
+         origptr = ALT_ORIG_PTR(alt);
+         updptr = (void *)origptr + update_offset;
+-        replptr = ALT_REPL_PTR(alt);
+ 
+-        nr_inst = alt->alt_len / sizeof(insn);
++        nr_inst = alt->orig_len / ARCH_PATCH_INSN_SIZE;
+ 
+-        for ( i = 0; i < nr_inst; i++ )
+-        {
+-            insn = get_alt_insn(alt, origptr + i, replptr + i);
+-            *(updptr + i) = cpu_to_le32(insn);
+-        }
++        if ( alt->cpufeature < ARM_CB_PATCH )
++            alt_cb = patch_alternative;
++        else
++            alt_cb = ALT_REPL_PTR(alt);
++
++        alt_cb(alt, origptr, updptr, nr_inst);
+ 
+         /* Ensure the new instructions reached the memory and nuke */
+         clean_and_invalidate_dcache_va_range(origptr,
+diff --git a/xen/include/asm-arm/alternative.h b/xen/include/asm-arm/alternative.h
+index 4e33d1cdf7..9b4b02811b 100644
+--- a/xen/include/asm-arm/alternative.h
++++ b/xen/include/asm-arm/alternative.h
+@@ -3,6 +3,8 @@
+ 
+ #include <asm/cpufeature.h>
+ 
++#define ARM_CB_PATCH ARM_NCAPS
++
+ #ifndef __ASSEMBLY__
+ 
+ #include <xen/init.h>
+@@ -18,16 +20,24 @@ struct alt_instr {
+ };
+ 
+ /* Xen: helpers used by common code. */
+-#define __ALT_PTR(a,f)		((u32 *)((void *)&(a)->f + (a)->f))
++#define __ALT_PTR(a,f)		((void *)&(a)->f + (a)->f)
+ #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
+ #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
+ 
++typedef void (*alternative_cb_t)(const struct alt_instr *alt,
++				 const uint32_t *origptr, uint32_t *updptr,
++				 int nr_inst);
++
+ void __init apply_alternatives_all(void);
+ int apply_alternatives(const struct alt_instr *start, const struct alt_instr *end);
+ 
+-#define ALTINSTR_ENTRY(feature)						      \
++#define ALTINSTR_ENTRY(feature, cb)					      \
+ 	" .word 661b - .\n"				/* label           */ \
++	" .if " __stringify(cb) " == 0\n"				      \
+ 	" .word 663f - .\n"				/* new instruction */ \
++	" .else\n"							      \
++	" .word " __stringify(cb) "- .\n"		/* callback */	      \
++	" .endif\n"							      \
+ 	" .hword " __stringify(feature) "\n"		/* feature bit     */ \
+ 	" .byte 662b-661b\n"				/* source len      */ \
+ 	" .byte 664f-663f\n"				/* replacement len */
+@@ -45,15 +55,18 @@ int apply_alternatives(const struct alt_instr *start, const struct alt_instr *en
+  * but most assemblers die if insn1 or insn2 have a .inst. This should
+  * be fixed in a binutils release posterior to 2.25.51.0.2 (anything
+  * containing commit 4e4d08cf7399b606 or c1baaddf8861).
++ *
++ * Alternatives with callbacks do not generate replacement instructions.
+  */
+-#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled)	\
++#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled, cb)	\
+ 	".if "__stringify(cfg_enabled)" == 1\n"				\
+ 	"661:\n\t"							\
+ 	oldinstr "\n"							\
+ 	"662:\n"							\
+ 	".pushsection .altinstructions,\"a\"\n"				\
+-	ALTINSTR_ENTRY(feature)						\
++	ALTINSTR_ENTRY(feature,cb)					\
+ 	".popsection\n"							\
++	" .if " __stringify(cb) " == 0\n"				\
+ 	".pushsection .altinstr_replacement, \"a\"\n"			\
+ 	"663:\n\t"							\
+ 	newinstr "\n"							\
+@@ -61,11 +74,17 @@ int apply_alternatives(const struct alt_instr *start, const struct alt_instr *en
+ 	".popsection\n\t"						\
+ 	".org	. - (664b-663b) + (662b-661b)\n\t"			\
+ 	".org	. - (662b-661b) + (664b-663b)\n"			\
++	".else\n\t"							\
++	"663:\n\t"							\
++	"664:\n\t"							\
++	".endif\n"							\
+ 	".endif\n"
+ 
+ #define _ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg, ...)	\
+-	__ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg))
++	__ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg), 0)
+ 
++#define ALTERNATIVE_CB(oldinstr, cb) \
++	__ALTERNATIVE_CFG(oldinstr, "NOT_AN_INSTRUCTION", ARM_CB_PATCH, 1, cb)
+ #else
+ 
+ #include <asm/asm_defns.h>
+@@ -126,6 +145,14 @@ int apply_alternatives(const struct alt_instr *start, const struct alt_instr *en
+ 663:
+ .endm
+ 
++.macro alternative_cb cb
++	.set .Lasm_alt_mode, 0
++	.pushsection .altinstructions, "a"
++	altinstruction_entry 661f, \cb, ARM_CB_PATCH, 662f-661f, 0
++	.popsection
++661:
++.endm
++
+ /*
+  * Complete an alternative code sequence.
+  */
+@@ -135,6 +162,13 @@ int apply_alternatives(const struct alt_instr *start, const struct alt_instr *en
+ 	.org	. - (662b-661b) + (664b-663b)
+ .endm
+ 
++/*
++ * Callback-based alternative epilogue
++ */
++.macro alternative_cb_end
++662:
++.endm
++
+ #define _ALTERNATIVE_CFG(insn1, insn2, cap, cfg, ...)	\
+ 	alternative_insn insn1, insn2, cap, IS_ENABLED(cfg)
+ 
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Simplify-alternative-patching-of-non-writable-region.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Simplify-alternative-patching-of-non-writable-region.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Simplify-alternative-patching-of-non-writable-region.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-arm-Simplify-alternative-patching-of-non-writable-region.patch	2022-06-19 23:17:54.000000000 +0100
@@ -0,0 +1,128 @@
+From 7c98c24e9ba76df3b9f531353d99e5c1bfa8b9a9 Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien.grall@arm.com>
+Date: Tue, 12 Jun 2018 12:36:37 +0100
+Subject: [PATCH] xen/arm: Simplify alternative patching of non-writable region
+
+During the MMU setup process, Xen will set SCTLR_EL2.WNX
+(Write-Non-eXecutable) bit. Because of that, the alternative code need
+to re-mapped the region in a difference place in order to modify the
+text section.
+
+At the moment, the function patching the code is only aware of the
+re-mapped region. This requires the caller to mess with Xen internal in
+order to have function such as is_active_kernel_text() working.
+
+All the interactions with Xen internal can be removed by specifying the
+offset between the region patch and the writable region for updating the
+instruction
+
+This simplification will also make it easier to integrate dynamic patching
+in a follow-up patch. Indeed, the callback address should be in
+an original region and not re-mapped only which is writeable non-executable.
+
+This is part of XSA-263.
+
+Signed-off-by: Julien Grall <julien.grall@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ xen/arch/arm/alternative.c | 42 ++++++++++++--------------------------
+ 1 file changed, 13 insertions(+), 29 deletions(-)
+
+diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
+index 9ffdc475d6..936cf04956 100644
+--- a/xen/arch/arm/alternative.c
++++ b/xen/arch/arm/alternative.c
+@@ -97,12 +97,16 @@ static u32 get_alt_insn(const struct alt_instr *alt,
+ /*
+  * The region patched should be read-write to allow __apply_alternatives
+  * to replacing the instructions when necessary.
++ *
++ * @update_offset: Offset between the region patched and the writable
++ * region for the update. 0 if the patched region is writable.
+  */
+-static int __apply_alternatives(const struct alt_region *region)
++static int __apply_alternatives(const struct alt_region *region,
++                                paddr_t update_offset)
+ {
+     const struct alt_instr *alt;
+-    const u32 *replptr;
+-    u32 *origptr;
++    const u32 *replptr, *origptr;
++    u32 *updptr;
+ 
+     printk(XENLOG_INFO "alternatives: Patching with alt table %p -> %p\n",
+            region->begin, region->end);
+@@ -118,6 +122,7 @@ static int __apply_alternatives(const struct alt_region *region)
+         BUG_ON(alt->alt_len != alt->orig_len);
+ 
+         origptr = ALT_ORIG_PTR(alt);
++        updptr = (void *)origptr + update_offset;
+         replptr = ALT_REPL_PTR(alt);
+ 
+         nr_inst = alt->alt_len / sizeof(insn);
+@@ -125,7 +130,7 @@ static int __apply_alternatives(const struct alt_region *region)
+         for ( i = 0; i < nr_inst; i++ )
+         {
+             insn = get_alt_insn(alt, origptr + i, replptr + i);
+-            *(origptr + i) = cpu_to_le32(insn);
++            *(updptr + i) = cpu_to_le32(insn);
+         }
+ 
+         /* Ensure the new instructions reached the memory and nuke */
+@@ -162,9 +167,6 @@ static int __apply_alternatives_multi_stop(void *unused)
+         paddr_t xen_size = _end - _start;
+         unsigned int xen_order = get_order_from_bytes(xen_size);
+         void *xenmap;
+-        struct virtual_region patch_region = {
+-            .list = LIST_HEAD_INIT(patch_region.list),
+-        };
+ 
+         BUG_ON(patched);
+ 
+@@ -177,31 +179,13 @@ static int __apply_alternatives_multi_stop(void *unused)
+         /* Re-mapping Xen is not expected to fail during boot. */
+         BUG_ON(!xenmap);
+ 
+-        /*
+-         * If we generate a new branch instruction, the target will be
+-         * calculated in this re-mapped Xen region. So we have to register
+-         * this re-mapped Xen region as a virtual region temporarily.
+-         */
+-        patch_region.start = xenmap;
+-        patch_region.end = xenmap + xen_size;
+-        register_virtual_region(&patch_region);
++        region.begin = __alt_instructions;
++        region.end = __alt_instructions_end;
+ 
+-        /*
+-         * Find the virtual address of the alternative region in the new
+-         * mapping.
+-         * alt_instr contains relative offset, so the function
+-         * __apply_alternatives will patch in the re-mapped version of
+-         * Xen.
+-         */
+-        region.begin = (void *)__alt_instructions - (void *)_start + xenmap;
+-        region.end = (void *)__alt_instructions_end - (void *)_start + xenmap;
+-
+-        ret = __apply_alternatives(&region);
++        ret = __apply_alternatives(&region, xenmap - (void *)_start);
+         /* The patching is not expected to fail during boot. */
+         BUG_ON(ret != 0);
+ 
+-        unregister_virtual_region(&patch_region);
+-
+         vunmap(xenmap);
+ 
+         /* Barriers provided by the cache flushing */
+@@ -235,7 +219,7 @@ int apply_alternatives(const struct alt_instr *start, const struct alt_instr *en
+         .end = end,
+     };
+ 
+-    return __apply_alternatives(&region);
++    return __apply_alternatives(&region, 0);
+ }
+ 
+ /*
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-events-access-last_priority-and-last_vcpu_id-together.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-events-access-last_priority-and-last_vcpu_id-together.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-events-access-last_priority-and-last_vcpu_id-together.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-events-access-last_priority-and-last_vcpu_id-together.patch	2022-06-01 10:53:11.000000000 +0100
@@ -0,0 +1,98 @@
+From 8ab4af91fab6618994e5857b65aec5896915d9c5 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 1 Dec 2020 17:06:15 +0100
+Subject: [PATCH 1/1] xen/events: access last_priority and last_vcpu_id
+ together
+
+The queue for a fifo event is depending on the vcpu_id and the
+priority of the event. When sending an event it might happen the
+event needs to change queues and the old queue needs to be kept for
+keeping the links between queue elements intact. For this purpose
+the event channel contains last_priority and last_vcpu_id values
+elements for being able to identify the old queue.
+
+In order to avoid races always access last_priority and last_vcpu_id
+with a single atomic operation avoiding any inconsistencies.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+master commit: 1277cb9dc5e966f1faf665bcded02b7533e38078
+master date: 2020-11-24 11:23:42 +0100
+---
+ xen/common/event_fifo.c | 25 +++++++++++++++++++------
+ xen/include/xen/sched.h |  3 +--
+ 2 files changed, 20 insertions(+), 8 deletions(-)
+
+diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
+index 98742ba9cb..b1951a29ad 100644
+--- a/xen/common/event_fifo.c
++++ b/xen/common/event_fifo.c
+@@ -21,6 +21,14 @@
+ 
+ #include <public/event_channel.h>
+ 
++union evtchn_fifo_lastq {
++    uint32_t raw;
++    struct {
++        uint8_t last_priority;
++        uint16_t last_vcpu_id;
++    };
++};
++
+ static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d,
+                                                        unsigned int port)
+ {
+@@ -64,16 +72,18 @@ static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
+     struct vcpu *v;
+     struct evtchn_fifo_queue *q, *old_q;
+     unsigned int try;
++    union evtchn_fifo_lastq lastq;
+ 
+     for ( try = 0; try < 3; try++ )
+     {
+-        v = d->vcpu[evtchn->last_vcpu_id];
+-        old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
++        lastq.raw = read_atomic(&evtchn->fifo_lastq);
++        v = d->vcpu[lastq.last_vcpu_id];
++        old_q = &v->evtchn_fifo->queue[lastq.last_priority];
+ 
+         spin_lock_irqsave(&old_q->lock, *flags);
+ 
+-        v = d->vcpu[evtchn->last_vcpu_id];
+-        q = &v->evtchn_fifo->queue[evtchn->last_priority];
++        v = d->vcpu[lastq.last_vcpu_id];
++        q = &v->evtchn_fifo->queue[lastq.last_priority];
+ 
+         if ( old_q == q )
+             return old_q;
+@@ -224,8 +234,11 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+         /* Moved to a different queue? */
+         if ( old_q != q )
+         {
+-            evtchn->last_vcpu_id = v->vcpu_id;
+-            evtchn->last_priority = q->priority;
++            union evtchn_fifo_lastq lastq = { };
++
++            lastq.last_vcpu_id = v->vcpu_id;
++            lastq.last_priority = q->priority;
++            write_atomic(&evtchn->fifo_lastq, lastq.raw);
+ 
+             spin_unlock_irqrestore(&old_q->lock, flags);
+             spin_lock_irqsave(&q->lock, flags);
+diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
+index 3c1284c3da..81af1209dd 100644
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -114,8 +114,7 @@ struct evtchn
+ #ifndef NDEBUG
+     u8 old_state;      /* State when taking lock in write mode. */
+ #endif
+-    u8 last_priority;
+-    u16 last_vcpu_id;
++    u32 fifo_lastq;    /* Data for fifo events identifying last queue. */
+ #ifdef CONFIG_XSM
+     union {
+ #ifdef XSM_NEED_GENERIC_EVTCHN_SSID
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-evtchn-rework-per-event-channel-lock.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-evtchn-rework-per-event-channel-lock.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-evtchn-rework-per-event-channel-lock.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-evtchn-rework-per-event-channel-lock.patch	2022-06-01 12:11:00.000000000 +0100
@@ -0,0 +1,592 @@
+From 4438fc14a6c60b265a47c780d6a61d4f3015a297 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 1 Dec 2020 17:04:43 +0100
+Subject: [PATCH] xen/evtchn: rework per event channel lock
+
+Currently the lock for a single event channel needs to be taken with
+interrupts off, which causes deadlocks in some cases.
+
+Rework the per event channel lock to be non-blocking for the case of
+sending an event and removing the need for disabling interrupts for
+taking the lock.
+
+The lock is needed for avoiding races between event channel state
+changes (creation, closing, binding) against normal operations (set
+pending, [un]masking, priority changes).
+
+Use a rwlock, but with some restrictions:
+
+- Changing the state of an event channel (creation, closing, binding)
+  needs to use write_lock(), with ASSERT()ing that the lock is taken as
+  writer only when the state of the event channel is either before or
+  after the locked region appropriate (either free or unbound).
+
+- Sending an event needs to use read_trylock() mostly, in case of not
+  obtaining the lock the operation is omitted. This is needed as
+  sending an event can happen with interrupts off (at least in some
+  cases).
+
+- Dumping the event channel state for debug purposes is using
+  read_trylock(), too, in order to avoid blocking in case the lock is
+  taken as writer for a long time.
+
+- All other cases can use read_lock().
+
+Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+xen/events: fix build
+
+Commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
+introduced a build failure for NDEBUG builds.
+
+Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+master commit: 5f2df45ead7c1195142f68b7923047a1e9479d54
+master date: 2020-11-10 14:36:15 +0100
+master commit: 53bacb86f496fdb11560d9e3b361bca7de60d268
+master date: 2020-11-11 08:56:21 +0100
+---
+ xen/arch/x86/irq.c         |   6 +-
+ xen/arch/x86/pv/shim.c     |   9 +--
+ xen/common/event_channel.c | 141 ++++++++++++++++++++++---------------
+ xen/include/xen/event.h    |  27 +++++--
+ xen/include/xen/sched.h    |   5 +-
+ 5 files changed, 116 insertions(+), 72 deletions(-)
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index 9878486073..a293270cf2 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2331,14 +2331,12 @@ static void dump_irqs(unsigned char key)
+                 pirq = domain_irq_to_pirq(d, irq);
+                 info = pirq_info(d, pirq);
+                 evtchn = evtchn_from_port(d, info->evtchn);
+-                local_irq_disable();
+-                if ( spin_trylock(&evtchn->lock) )
++                if ( evtchn_read_trylock(evtchn) )
+                 {
+                     pending = evtchn_is_pending(d, evtchn);
+                     masked = evtchn_is_masked(d, evtchn);
+-                    spin_unlock(&evtchn->lock);
++                    evtchn_read_unlock(evtchn);
+                 }
+-                local_irq_enable();
+                 printk("%u:%3d(%c%c%c)",
+                        d->domain_id, pirq, "-P?"[pending],
+                        "-M?"[masked], info->masked ? 'M' : '-');
+diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
+index 4b7d498c00..f8883bd102 100644
+--- a/xen/arch/x86/pv/shim.c
++++ b/xen/arch/x86/pv/shim.c
+@@ -616,11 +616,12 @@ void pv_shim_inject_evtchn(unsigned int port)
+     if ( port_is_valid(guest, port) )
+     {
+         struct evtchn *chn = evtchn_from_port(guest, port);
+-        unsigned long flags;
+ 
+-        spin_lock_irqsave(&chn->lock, flags);
+-        evtchn_port_set_pending(guest, chn->notify_vcpu_id, chn);
+-        spin_unlock_irqrestore(&chn->lock, flags);
++        if ( evtchn_read_trylock(chn) )
++        {
++            evtchn_port_set_pending(guest, chn->notify_vcpu_id, chn);
++            evtchn_read_unlock(chn);
++        }
+     }
+ }
+ 
+diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
+index 0066c8a87f..7f2ad9d826 100644
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -50,6 +50,40 @@
+ 
+ #define consumer_is_xen(e) (!!(e)->xen_consumer)
+ 
++/*
++ * Lock an event channel exclusively. This is allowed only when the channel is
++ * free or unbound either when taking or when releasing the lock, as any
++ * concurrent operation on the event channel using evtchn_read_trylock() will
++ * just assume the event channel is free or unbound at the moment when the
++ * evtchn_read_trylock() returns false.
++ */
++static inline void evtchn_write_lock(struct evtchn *evtchn)
++{
++    write_lock(&evtchn->lock);
++
++#ifndef NDEBUG
++    evtchn->old_state = evtchn->state;
++#endif
++}
++
++static inline unsigned int old_state(const struct evtchn *evtchn)
++{
++#ifndef NDEBUG
++    return evtchn->old_state;
++#else
++    return ECS_RESERVED; /* Just to allow things to build. */
++#endif
++}
++
++static inline void evtchn_write_unlock(struct evtchn *evtchn)
++{
++    /* Enforce lock discipline. */
++    ASSERT(old_state(evtchn) == ECS_FREE || old_state(evtchn) == ECS_UNBOUND ||
++           evtchn->state == ECS_FREE || evtchn->state == ECS_UNBOUND);
++
++    write_unlock(&evtchn->lock);
++}
++
+ /*
+  * The function alloc_unbound_xen_event_channel() allows an arbitrary
+  * notifier function to be specified. However, very few unique functions
+@@ -131,7 +165,7 @@ static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned int port)
+             return NULL;
+         }
+         chn[i].port = port + i;
+-        spin_lock_init(&chn[i].lock);
++        rwlock_init(&chn[i].lock);
+     }
+     return chn;
+ }
+@@ -249,7 +283,6 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
+     int            port;
+     domid_t        dom = alloc->dom;
+     long           rc;
+-    unsigned long  flags;
+ 
+     d = rcu_lock_domain_by_any_id(dom);
+     if ( d == NULL )
+@@ -265,14 +298,14 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
+     if ( rc )
+         goto out;
+ 
+-    spin_lock_irqsave(&chn->lock, flags);
++    evtchn_write_lock(chn);
+ 
+     chn->state = ECS_UNBOUND;
+     if ( (chn->u.unbound.remote_domid = alloc->remote_dom) == DOMID_SELF )
+         chn->u.unbound.remote_domid = current->domain->domain_id;
+     evtchn_port_init(d, chn);
+ 
+-    spin_unlock_irqrestore(&chn->lock, flags);
++    evtchn_write_unlock(chn);
+ 
+     alloc->port = port;
+ 
+@@ -285,32 +318,26 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
+ }
+ 
+ 
+-static unsigned long double_evtchn_lock(struct evtchn *lchn,
+-                                        struct evtchn *rchn)
++static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
+ {
+-    unsigned long flags;
+-
+     if ( lchn <= rchn )
+     {
+-        spin_lock_irqsave(&lchn->lock, flags);
++        evtchn_write_lock(lchn);
+         if ( lchn != rchn )
+-            spin_lock(&rchn->lock);
++            evtchn_write_lock(rchn);
+     }
+     else
+     {
+-        spin_lock_irqsave(&rchn->lock, flags);
+-        spin_lock(&lchn->lock);
++        evtchn_write_lock(rchn);
++        evtchn_write_lock(lchn);
+     }
+-
+-    return flags;
+ }
+ 
+-static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn,
+-                                 unsigned long flags)
++static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn)
+ {
+     if ( lchn != rchn )
+-        spin_unlock(&lchn->lock);
+-    spin_unlock_irqrestore(&rchn->lock, flags);
++        evtchn_write_unlock(lchn);
++    evtchn_write_unlock(rchn);
+ }
+ 
+ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
+@@ -320,7 +347,6 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
+     int            lport, rport = bind->remote_port;
+     domid_t        rdom = bind->remote_dom;
+     long           rc;
+-    unsigned long  flags;
+ 
+     if ( rdom == DOMID_SELF )
+         rdom = current->domain->domain_id;
+@@ -356,7 +382,7 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
+     if ( rc )
+         goto out;
+ 
+-    flags = double_evtchn_lock(lchn, rchn);
++    double_evtchn_lock(lchn, rchn);
+ 
+     lchn->u.interdomain.remote_dom  = rd;
+     lchn->u.interdomain.remote_port = rport;
+@@ -373,7 +399,7 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
+      */
+     evtchn_port_set_pending(ld, lchn->notify_vcpu_id, lchn);
+ 
+-    double_evtchn_unlock(lchn, rchn, flags);
++    double_evtchn_unlock(lchn, rchn);
+ 
+     bind->local_port = lport;
+ 
+@@ -396,7 +422,6 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
+     struct domain *d = current->domain;
+     int            virq = bind->virq, vcpu = bind->vcpu;
+     int            rc = 0;
+-    unsigned long  flags;
+ 
+     if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
+         return -EINVAL;
+@@ -429,14 +454,14 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port)
+ 
+     chn = evtchn_from_port(d, port);
+ 
+-    spin_lock_irqsave(&chn->lock, flags);
++    evtchn_write_lock(chn);
+ 
+     chn->state          = ECS_VIRQ;
+     chn->notify_vcpu_id = vcpu;
+     chn->u.virq         = virq;
+     evtchn_port_init(d, chn);
+ 
+-    spin_unlock_irqrestore(&chn->lock, flags);
++    evtchn_write_unlock(chn);
+ 
+     v->virq_to_evtchn[virq] = bind->port = port;
+ 
+@@ -453,7 +478,6 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
+     struct domain *d = current->domain;
+     int            port, vcpu = bind->vcpu;
+     long           rc = 0;
+-    unsigned long  flags;
+ 
+     if ( (vcpu < 0) || (vcpu >= d->max_vcpus) ||
+          (d->vcpu[vcpu] == NULL) )
+@@ -466,13 +490,13 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind)
+ 
+     chn = evtchn_from_port(d, port);
+ 
+-    spin_lock_irqsave(&chn->lock, flags);
++    evtchn_write_lock(chn);
+ 
+     chn->state          = ECS_IPI;
+     chn->notify_vcpu_id = vcpu;
+     evtchn_port_init(d, chn);
+ 
+-    spin_unlock_irqrestore(&chn->lock, flags);
++    evtchn_write_unlock(chn);
+ 
+     bind->port = port;
+ 
+@@ -516,7 +540,6 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
+     struct pirq   *info;
+     int            port = 0, pirq = bind->pirq;
+     long           rc;
+-    unsigned long  flags;
+ 
+     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
+         return -EINVAL;
+@@ -549,14 +572,14 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
+         goto out;
+     }
+ 
+-    spin_lock_irqsave(&chn->lock, flags);
++    evtchn_write_lock(chn);
+ 
+     chn->state  = ECS_PIRQ;
+     chn->u.pirq.irq = pirq;
+     link_pirq_port(port, chn, v);
+     evtchn_port_init(d, chn);
+ 
+-    spin_unlock_irqrestore(&chn->lock, flags);
++    evtchn_write_unlock(chn);
+ 
+     bind->port = port;
+ 
+@@ -577,7 +600,6 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
+     struct evtchn *chn1, *chn2;
+     int            port2;
+     long           rc = 0;
+-    unsigned long  flags;
+ 
+  again:
+     spin_lock(&d1->event_lock);
+@@ -677,14 +699,14 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
+         BUG_ON(chn2->state != ECS_INTERDOMAIN);
+         BUG_ON(chn2->u.interdomain.remote_dom != d1);
+ 
+-        flags = double_evtchn_lock(chn1, chn2);
++        double_evtchn_lock(chn1, chn2);
+ 
+         evtchn_free(d1, chn1);
+ 
+         chn2->state = ECS_UNBOUND;
+         chn2->u.unbound.remote_domid = d1->domain_id;
+ 
+-        double_evtchn_unlock(chn1, chn2, flags);
++        double_evtchn_unlock(chn1, chn2);
+ 
+         goto out;
+ 
+@@ -692,9 +714,9 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
+         BUG();
+     }
+ 
+-    spin_lock_irqsave(&chn1->lock, flags);
++    evtchn_write_lock(chn1);
+     evtchn_free(d1, chn1);
+-    spin_unlock_irqrestore(&chn1->lock, flags);
++    evtchn_write_unlock(chn1);
+ 
+  out:
+     if ( d2 != NULL )
+@@ -714,7 +736,6 @@ int evtchn_send(struct domain *ld, unsigned int lport)
+     struct evtchn *lchn, *rchn;
+     struct domain *rd;
+     int            rport, ret = 0;
+-    unsigned long  flags;
+ 
+     if ( !port_is_valid(ld, lport) )
+         return -EINVAL;
+@@ -727,7 +748,7 @@ int evtchn_send(struct domain *ld, unsigned int lport)
+ 
+     lchn = evtchn_from_port(ld, lport);
+ 
+-    spin_lock_irqsave(&lchn->lock, flags);
++    evtchn_read_lock(lchn);
+ 
+     /* Guest cannot send via a Xen-attached event channel. */
+     if ( unlikely(consumer_is_xen(lchn)) )
+@@ -762,7 +783,7 @@ int evtchn_send(struct domain *ld, unsigned int lport)
+     }
+ 
+ out:
+-    spin_unlock_irqrestore(&lchn->lock, flags);
++    evtchn_read_unlock(lchn);
+ 
+     return ret;
+ }
+@@ -789,9 +810,11 @@ void send_guest_vcpu_virq(struct vcpu *v, uint32_t virq)
+ 
+     d = v->domain;
+     chn = evtchn_from_port(d, port);
+-    spin_lock(&chn->lock);
+-    evtchn_port_set_pending(d, v->vcpu_id, chn);
+-    spin_unlock(&chn->lock);
++    if ( evtchn_read_trylock(chn) )
++    {
++        evtchn_port_set_pending(d, v->vcpu_id, chn);
++        evtchn_read_unlock(chn);
++    }
+ 
+  out:
+     spin_unlock_irqrestore(&v->virq_lock, flags);
+@@ -820,9 +843,11 @@ static void send_guest_global_virq(struct domain *d, uint32_t virq)
+         goto out;
+ 
+     chn = evtchn_from_port(d, port);
+-    spin_lock(&chn->lock);
+-    evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
+-    spin_unlock(&chn->lock);
++    if ( evtchn_read_trylock(chn) )
++    {
++        evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
++        evtchn_read_unlock(chn);
++    }
+ 
+  out:
+     spin_unlock_irqrestore(&v->virq_lock, flags);
+@@ -832,7 +857,6 @@ void send_guest_pirq(struct domain *d, const struct pirq *pirq)
+ {
+     int port;
+     struct evtchn *chn;
+-    unsigned long flags;
+ 
+     /*
+      * PV guests: It should not be possible to race with __evtchn_close(). The
+@@ -847,9 +871,11 @@ void send_guest_pirq(struct domain *d, const struct pirq *pirq)
+     }
+ 
+     chn = evtchn_from_port(d, port);
+-    spin_lock_irqsave(&chn->lock, flags);
+-    evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
+-    spin_unlock_irqrestore(&chn->lock, flags);
++    if ( evtchn_read_trylock(chn) )
++    {
++        evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
++        evtchn_read_unlock(chn);
++    }
+ }
+ 
+ static struct domain *global_virq_handlers[NR_VIRQS] __read_mostly;
+@@ -1044,15 +1070,17 @@ int evtchn_unmask(unsigned int port)
+ {
+     struct domain *d = current->domain;
+     struct evtchn *evtchn;
+-    unsigned long flags;
+ 
+     if ( unlikely(!port_is_valid(d, port)) )
+         return -EINVAL;
+ 
+     evtchn = evtchn_from_port(d, port);
+-    spin_lock_irqsave(&evtchn->lock, flags);
++
++    evtchn_read_lock(evtchn);
++
+     evtchn_port_unmask(d, evtchn);
+-    spin_unlock_irqrestore(&evtchn->lock, flags);
++
++    evtchn_read_unlock(evtchn);
+ 
+     return 0;
+ }
+@@ -1298,7 +1326,6 @@ int alloc_unbound_xen_event_channel(
+ {
+     struct evtchn *chn;
+     int            port, rc;
+-    unsigned long  flags;
+ 
+     spin_lock(&ld->event_lock);
+ 
+@@ -1311,14 +1338,14 @@ int alloc_unbound_xen_event_channel(
+     if ( rc )
+         goto out;
+ 
+-    spin_lock_irqsave(&chn->lock, flags);
++    evtchn_write_lock(chn);
+ 
+     chn->state = ECS_UNBOUND;
+     chn->xen_consumer = get_xen_consumer(notification_fn);
+     chn->notify_vcpu_id = lvcpu;
+     chn->u.unbound.remote_domid = remote_domid;
+ 
+-    spin_unlock_irqrestore(&chn->lock, flags);
++    evtchn_write_unlock(chn);
+ 
+     write_atomic(&ld->xen_evtchns, ld->xen_evtchns + 1);
+ 
+@@ -1350,7 +1377,6 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
+ {
+     struct evtchn *lchn, *rchn;
+     struct domain *rd;
+-    unsigned long flags;
+ 
+     if ( !port_is_valid(ld, lport) )
+     {
+@@ -1365,7 +1391,8 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
+ 
+     lchn = evtchn_from_port(ld, lport);
+ 
+-    spin_lock_irqsave(&lchn->lock, flags);
++    if ( !evtchn_read_trylock(lchn) )
++        return;
+ 
+     if ( likely(lchn->state == ECS_INTERDOMAIN) )
+     {
+@@ -1375,7 +1402,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
+         evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
+     }
+ 
+-    spin_unlock_irqrestore(&lchn->lock, flags);
++    evtchn_read_unlock(lchn);
+ }
+ 
+ void evtchn_check_pollers(struct domain *d, unsigned int port)
+diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
+index 87a4aade86..b0c39d402c 100644
+--- a/xen/include/xen/event.h
++++ b/xen/include/xen/event.h
+@@ -103,6 +103,21 @@ static inline unsigned int max_evtchns(const struct domain *d)
+                           : BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
+ }
+ 
++static inline void evtchn_read_lock(struct evtchn *evtchn)
++{
++    read_lock(&evtchn->lock);
++}
++
++static inline bool evtchn_read_trylock(struct evtchn *evtchn)
++{
++    return read_trylock(&evtchn->lock);
++}
++
++static inline void evtchn_read_unlock(struct evtchn *evtchn)
++{
++    read_unlock(&evtchn->lock);
++}
++
+ static inline bool_t port_is_valid(struct domain *d, unsigned int p)
+ {
+     if ( p >= read_atomic(&d->valid_evtchns) )
+@@ -236,11 +251,10 @@ static inline bool evtchn_port_is_pending(struct domain *d, evtchn_port_t port)
+ {
+     struct evtchn *evtchn = evtchn_from_port(d, port);
+     bool rc;
+-    unsigned long flags;
+ 
+-    spin_lock_irqsave(&evtchn->lock, flags);
++    evtchn_read_lock(evtchn);
+     rc = evtchn_is_pending(d, evtchn);
+-    spin_unlock_irqrestore(&evtchn->lock, flags);
++    evtchn_read_unlock(evtchn);
+ 
+     return rc;
+ }
+@@ -255,11 +269,12 @@ static inline bool evtchn_port_is_masked(struct domain *d, evtchn_port_t port)
+ {
+     struct evtchn *evtchn = evtchn_from_port(d, port);
+     bool rc;
+-    unsigned long flags;
+ 
+-    spin_lock_irqsave(&evtchn->lock, flags);
++    evtchn_read_lock(evtchn);
++
+     rc = evtchn_is_masked(d, evtchn);
+-    spin_unlock_irqrestore(&evtchn->lock, flags);
++
++    evtchn_read_unlock(evtchn);
+ 
+     return rc;
+ }
+diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
+index 7e4ad5d51b..3c1284c3da 100644
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -82,7 +82,7 @@ extern domid_t hardware_domid;
+ 
+ struct evtchn
+ {
+-    spinlock_t lock;
++    rwlock_t lock;
+ #define ECS_FREE         0 /* Channel is available for use.                  */
+ #define ECS_RESERVED     1 /* Channel is reserved.                           */
+ #define ECS_UNBOUND      2 /* Channel is waiting to bind to a remote domain. */
+@@ -111,6 +111,9 @@ struct evtchn
+         u16 virq;      /* state == ECS_VIRQ */
+     } u;
+     u8 priority;
++#ifndef NDEBUG
++    u8 old_state;      /* State when taking lock in write mode. */
++#endif
+     u8 last_priority;
+     u16 last_vcpu_id;
+ #ifdef CONFIG_XSM
+-- 
+2.30.2
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-split-parameter-related-definitions-in-own-header-file.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-split-parameter-related-definitions-in-own-header-file.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-split-parameter-related-definitions-in-own-header-file.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xen-split-parameter-related-definitions-in-own-header-file.patch	2022-06-16 22:48:10.000000000 +0100
@@ -0,0 +1,1409 @@
+From ffdeb6dea596c077aebbdf7d864cdd67d6a6b2f8 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Mon, 3 Feb 2020 13:04:30 +0100
+Subject: [PATCH] xen: split parameter related definitions in own header file
+
+Move the parameter related definitions from init.h into a new header
+file param.h. This will avoid include hell when new dependencies are
+added to parameter definitions.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <julien@xen.org>
+Acked-by: Dario Faggioli <dfaggioli@suse.com>
+Acked-by: Paul Durrant <pdurrant@amazon.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/arch/arm/acpi/boot.c                 |   1 +
+ xen/arch/arm/cpuerrata.c                 |   1 +
+ xen/arch/arm/domain_build.c              |   1 +
+ xen/arch/arm/gic-v3-lpi.c                |   1 +
+ xen/arch/arm/setup.c                     |   1 +
+ xen/arch/arm/smpboot.c                   |   1 +
+ xen/arch/arm/traps.c                     |   1 +
+ xen/arch/x86/acpi/boot.c                 |   1 +
+ xen/arch/x86/acpi/cpu_idle.c             |   1 +
+ xen/arch/x86/acpi/cpufreq/cpufreq.c      |   1 +
+ xen/arch/x86/acpi/power.c                |   1 +
+ xen/arch/x86/apic.c                      |   1 +
+ xen/arch/x86/cpu/amd.c                   |   1 +
+ xen/arch/x86/cpu/common.c                |   1 +
+ xen/arch/x86/cpu/mcheck/mce.c            |   1 +
+ xen/arch/x86/cpu/mcheck/mce_intel.c      |   1 +
+ xen/arch/x86/cpu/mtrr/generic.c          |   1 +
+ xen/arch/x86/cpu/mwait-idle.c            |   1 +
+ xen/arch/x86/cpu/vpmu.c                  |   1 +
+ xen/arch/x86/cpuid.c                     |   1 +
+ xen/arch/x86/dom0_build.c                |   1 +
+ xen/arch/x86/e820.c                      |   1 +
+ xen/arch/x86/genapic/probe.c             |   1 +
+ xen/arch/x86/genapic/x2apic.c            |   1 +
+ xen/arch/x86/hpet.c                      |   1 +
+ xen/arch/x86/hvm/asid.c                  |   1 +
+ xen/arch/x86/hvm/hvm.c                   |   1 +
+ xen/arch/x86/hvm/quirks.c                |   1 +
+ xen/arch/x86/hvm/viridian.c              |   1 +
+ xen/arch/x86/hvm/vmx/vmcs.c              |   1 +
+ xen/arch/x86/hvm/vmx/vmx.c               |   1 +
+ xen/arch/x86/io_apic.c                   |   1 +
+ xen/arch/x86/irq.c                       |   1 +
+ xen/arch/x86/microcode.c                 |   1 +
+ xen/arch/x86/mm.c                        |   1 +
+ xen/arch/x86/mm/p2m.c                    |   1 +
+ xen/arch/x86/msi.c                       |   1 +
+ xen/arch/x86/nmi.c                       |   1 +
+ xen/arch/x86/numa.c                      |   1 +
+ xen/arch/x86/oprofile/nmi_int.c          |   1 +
+ xen/arch/x86/psr.c                       |   1 +
+ xen/arch/x86/pv/domain.c                 |   1 +
+ xen/arch/x86/pv/shim.c                   |   1 +
+ xen/arch/x86/setup.c                     |   1 +
+ xen/arch/x86/shutdown.c                  |   1 +
+ xen/arch/x86/spec_ctrl.c                 |   1 +
+ xen/arch/x86/tboot.c                     |   1 +
+ xen/arch/x86/time.c                      |   1 +
+ xen/arch/x86/traps.c                     |   1 +
+ xen/arch/x86/tsx.c                       |   1 +
+ xen/arch/x86/x86_64/mmconfig-shared.c    |   1 +
+ xen/arch/x86/xstate.c                    |   1 +
+ xen/common/core_parking.c                |   1 +
+ xen/common/domain.c                      |   1 +
+ xen/common/efi/boot.c                    |   1 +
+ xen/common/gdbstub.c                     |   1 +
+ xen/common/grant_table.c                 |   1 +
+ xen/common/kernel.c                      |   1 +
+ xen/common/kexec.c                       |   1 +
+ xen/common/memory.c                      |   1 +
+ xen/common/page_alloc.c                  |   1 +
+ xen/common/rcupdate.c                    |   1 +
+ xen/common/sched_credit.c                |   1 +
+ xen/common/sched_credit2.c               |   1 +
+ xen/common/schedule.c                    |   1 +
+ xen/common/shutdown.c                    |   1 +
+ xen/common/timer.c                       |   1 +
+ xen/common/tmem_xen.c                    |   1 +
+ xen/common/trace.c                       |   1 +
+ xen/drivers/acpi/apei/hest.c             |   1 +
+ xen/drivers/acpi/tables.c                |   1 +
+ xen/drivers/char/arm-uart.c              |   1 +
+ xen/drivers/char/console.c               |   1 +
+ xen/drivers/char/ehci-dbgp.c             |   1 +
+ xen/drivers/char/ns16550.c               |   1 +
+ xen/drivers/char/serial.c                |   1 +
+ xen/drivers/cpufreq/cpufreq.c            |   1 +
+ xen/drivers/passthrough/amd/iommu_acpi.c |   1 +
+ xen/drivers/passthrough/iommu.c          |   1 +
+ xen/drivers/passthrough/pci.c            |   1 +
+ xen/drivers/passthrough/vtd/dmar.c       |   1 +
+ xen/drivers/passthrough/vtd/quirks.c     |   1 +
+ xen/drivers/passthrough/vtd/x86/vtd.c    |   1 +
+ xen/drivers/passthrough/x86/ats.c        |   1 +
+ xen/drivers/video/vesa.c                 |   1 +
+ xen/drivers/video/vga.c                  |   1 +
+ xen/include/xen/init.h                   | 120 ---------------------
+ xen/include/xen/param.h                  | 126 +++++++++++++++++++++++
+ xen/xsm/flask/flask_op.c                 |   1 +
+ xen/xsm/xsm_core.c                       |   1 +
+ 92 files changed, 216 insertions(+), 120 deletions(-)
+ create mode 100644 xen/include/xen/param.h
+
+diff --git a/xen/arch/arm/acpi/boot.c b/xen/arch/arm/acpi/boot.c
+index bf9c78b02c..30e4bd1bc5 100644
+--- a/xen/arch/arm/acpi/boot.c
++++ b/xen/arch/arm/acpi/boot.c
+@@ -30,6 +30,7 @@
+ #include <xen/errno.h>
+ #include <acpi/actables.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/device_tree.h>
+ 
+ #include <asm/acpi.h>
+diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
+index da72b02442..0248893de0 100644
+--- a/xen/arch/arm/cpuerrata.c
++++ b/xen/arch/arm/cpuerrata.c
+@@ -1,6 +1,7 @@
+ #include <xen/cpumask.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/sizes.h>
+ #include <xen/smp.h>
+ #include <xen/spinlock.h>
+ #include <xen/vmap.h>
+diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
+index dd9c3b73ba..d2d11eda26 100644
+--- a/xen/arch/arm/domain_build.c
++++ b/xen/arch/arm/domain_build.c
+@@ -2,6 +2,7 @@
+ #include <xen/compile.h>
+ #include <xen/lib.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/domain_page.h>
+ #include <xen/sched.h>
+ #include <asm/irq.h>
+diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
+index 78b9521b21..869bc97fa1 100644
+--- a/xen/arch/arm/gic-v3-lpi.c
++++ b/xen/arch/arm/gic-v3-lpi.c
+@@ -20,6 +20,7 @@
+ 
+ #include <xen/lib.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/sizes.h>
+ #include <xen/warning.h>
+diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
+index 494f70546b..3c8ae11b73 100644
+--- a/xen/arch/arm/setup.c
++++ b/xen/arch/arm/setup.c
+@@ -29,6 +29,7 @@
+ #include <xen/init.h>
+ #include <xen/irq.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/softirq.h>
+ #include <xen/keyhandler.h>
+ #include <xen/cpu.h>
+diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
+index 00b64c3322..cae2179126 100644
+--- a/xen/arch/arm/smpboot.c
++++ b/xen/arch/arm/smpboot.c
+@@ -23,6 +23,7 @@
+ #include <xen/errno.h>
+ #include <xen/init.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/smp.h>
+ #include <xen/softirq.h>
+diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
+index a20474f87c..6f9bec22d3 100644
+--- a/xen/arch/arm/traps.c
++++ b/xen/arch/arm/traps.c
+@@ -26,6 +26,7 @@
+ #include <xen/livepatch.h>
+ #include <xen/mem_access.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/perfc.h>
+ #include <xen/smp.h>
+ #include <xen/softirq.h>
+diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c
+index afc6ed9d99..bcba52e232 100644
+--- a/xen/arch/x86/acpi/boot.c
++++ b/xen/arch/x86/acpi/boot.c
+@@ -27,6 +27,7 @@
+ #include <xen/acpi.h>
+ #include <xen/irq.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/dmi.h>
+ #include <asm/fixmap.h>
+ #include <asm/page.h>
+diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
+index 2676f0d7da..5cd70d7a40 100644
+--- a/xen/arch/x86/acpi/cpu_idle.c
++++ b/xen/arch/x86/acpi/cpu_idle.c
+@@ -37,6 +37,7 @@
+ #include <xen/smp.h>
+ #include <xen/guest_access.h>
+ #include <xen/keyhandler.h>
++#include <xen/param.h>
+ #include <xen/trace.h>
+ #include <xen/sched-if.h>
+ #include <xen/irq.h>
+diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
+index f05275578d..281be131a3 100644
+--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
++++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
+@@ -31,6 +31,7 @@
+ #include <xen/errno.h>
+ #include <xen/delay.h>
+ #include <xen/cpumask.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/timer.h>
+ #include <xen/xmalloc.h>
+diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
+index feb0f6ce20..b5df00b22c 100644
+--- a/xen/arch/x86/acpi/power.c
++++ b/xen/arch/x86/acpi/power.c
+@@ -14,6 +14,7 @@
+ #include <xen/acpi.h>
+ #include <xen/errno.h>
+ #include <xen/iocap.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <asm/acpi.h>
+ #include <asm/irq.h>
+diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
+index 508b1586f2..a361781456 100644
+--- a/xen/arch/x86/apic.c
++++ b/xen/arch/x86/apic.c
+@@ -20,6 +20,7 @@
+ #include <xen/errno.h>
+ #include <xen/init.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/irq.h>
+ #include <xen/delay.h>
+diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
+index 8b5f0f2e4c..e351dd227f 100644
+--- a/xen/arch/x86/cpu/amd.c
++++ b/xen/arch/x86/cpu/amd.c
+@@ -1,6 +1,7 @@
+ #include <xen/init.h>
+ #include <xen/bitops.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/smp.h>
+ #include <xen/pci.h>
+ #include <asm/io.h>
+diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
+index e5ad17d8d9..1b33f1ed71 100644
+--- a/xen/arch/x86/cpu/common.c
++++ b/xen/arch/x86/cpu/common.c
+@@ -1,6 +1,7 @@
+ #include <xen/init.h>
+ #include <xen/string.h>
+ #include <xen/delay.h>
++#include <xen/param.h>
+ #include <xen/smp.h>
+ #include <asm/current.h>
+ #include <asm/debugreg.h>
+diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
+index 198595ff97..d61e582af3 100644
+--- a/xen/arch/x86/cpu/mcheck/mce.c
++++ b/xen/arch/x86/cpu/mcheck/mce.c
+@@ -6,6 +6,7 @@
+ #include <xen/init.h>
+ #include <xen/types.h>
+ #include <xen/kernel.h>
++#include <xen/param.h>
+ #include <xen/smp.h>
+ #include <xen/errno.h>
+ #include <xen/console.h>
+diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
+index 70738852b9..6f23ea5329 100644
+--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
++++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
+@@ -4,6 +4,7 @@
+ #include <xen/event.h>
+ #include <xen/kernel.h>
+ #include <xen/delay.h>
++#include <xen/param.h>
+ #include <xen/smp.h>
+ #include <xen/mm.h>
+ #include <xen/cpu.h>
+diff --git a/xen/arch/x86/cpu/mtrr/generic.c b/xen/arch/x86/cpu/mtrr/generic.c
+index cc0bf4c310..89634f918f 100644
+--- a/xen/arch/x86/cpu/mtrr/generic.c
++++ b/xen/arch/x86/cpu/mtrr/generic.c
+@@ -3,6 +3,7 @@
+ #include <xen/lib.h>
+ #include <xen/init.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/stdbool.h>
+ #include <asm/flushtlb.h>
+ #include <asm/invpcid.h>
+diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
+index f49b04c45b..b81937966e 100644
+--- a/xen/arch/x86/cpu/mwait-idle.c
++++ b/xen/arch/x86/cpu/mwait-idle.c
+@@ -52,6 +52,7 @@
+ #include <xen/lib.h>
+ #include <xen/cpu.h>
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/softirq.h>
+ #include <xen/trace.h>
+ #include <asm/cpuidle.h>
+diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
+index b62095eef2..3c778450ac 100644
+--- a/xen/arch/x86/cpu/vpmu.c
++++ b/xen/arch/x86/cpu/vpmu.c
+@@ -22,6 +22,7 @@
+ #include <xen/event.h>
+ #include <xen/guest_access.h>
+ #include <xen/cpu.h>
++#include <xen/param.h>
+ #include <asm/regs.h>
+ #include <asm/types.h>
+ #include <asm/msr.h>
+diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
+index b1ed33d524..aee221dc44 100644
+--- a/xen/arch/x86/cpuid.c
++++ b/xen/arch/x86/cpuid.c
+@@ -1,5 +1,6 @@
+ #include <xen/init.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <asm/cpuid.h>
+ #include <asm/hvm/hvm.h>
+diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
+index 56c2dee0fc..6bf5365582 100644
+--- a/xen/arch/x86/dom0_build.c
++++ b/xen/arch/x86/dom0_build.c
+@@ -7,6 +7,7 @@
+ #include <xen/init.h>
+ #include <xen/iocap.h>
+ #include <xen/libelf.h>
++#include <xen/param.h>
+ #include <xen/pfn.h>
+ #include <xen/sched.h>
+ #include <xen/sched-if.h>
+diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
+index 3892c9cfb7..b9f589cac3 100644
+--- a/xen/arch/x86/e820.c
++++ b/xen/arch/x86/e820.c
+@@ -1,6 +1,7 @@
+ #include <xen/init.h>
+ #include <xen/lib.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/compat.h>
+ #include <xen/dmi.h>
+ #include <xen/pfn.h>
+diff --git a/xen/arch/x86/genapic/probe.c b/xen/arch/x86/genapic/probe.c
+index 1fcc1734f5..d4d7a554a0 100644
+--- a/xen/arch/x86/genapic/probe.c
++++ b/xen/arch/x86/genapic/probe.c
+@@ -8,6 +8,7 @@
+ #include <xen/kernel.h>
+ #include <xen/ctype.h>
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <asm/cache.h>
+ #include <asm/fixmap.h>
+ #include <asm/mpspec.h>
+diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
+index 1cb16bc10d..f9b5e49761 100644
+--- a/xen/arch/x86/genapic/x2apic.c
++++ b/xen/arch/x86/genapic/x2apic.c
+@@ -19,6 +19,7 @@
+ #include <xen/init.h>
+ #include <xen/cpu.h>
+ #include <xen/cpumask.h>
++#include <xen/param.h>
+ #include <asm/apicdef.h>
+ #include <asm/genapic.h>
+ #include <asm/apic.h>
+diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
+index 57f68fa81b..ae99993d90 100644
+--- a/xen/arch/x86/hpet.c
++++ b/xen/arch/x86/hpet.c
+@@ -11,6 +11,7 @@
+ #include <xen/softirq.h>
+ #include <xen/irq.h>
+ #include <xen/numa.h>
++#include <xen/param.h>
+ #include <asm/fixmap.h>
+ #include <asm/div64.h>
+ #include <asm/hpet.h>
+diff --git a/xen/arch/x86/hvm/asid.c b/xen/arch/x86/hvm/asid.c
+index 9d3c671a5f..8e00a28443 100644
+--- a/xen/arch/x86/hvm/asid.c
++++ b/xen/arch/x86/hvm/asid.c
+@@ -18,6 +18,7 @@
+ 
+ #include <xen/init.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/smp.h>
+ #include <xen/percpu.h>
+diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
+index ea99417f08..2fee569a5f 100644
+--- a/xen/arch/x86/hvm/hvm.c
++++ b/xen/arch/x86/hvm/hvm.c
+@@ -35,6 +35,7 @@
+ #include <xen/mem_access.h>
+ #include <xen/rangeset.h>
+ #include <xen/monitor.h>
++#include <xen/param.h>
+ #include <xen/warning.h>
+ #include <xen/vpci.h>
+ #include <asm/shadow.h>
+diff --git a/xen/arch/x86/hvm/quirks.c b/xen/arch/x86/hvm/quirks.c
+index 881c6b99d2..54cc66c382 100644
+--- a/xen/arch/x86/hvm/quirks.c
++++ b/xen/arch/x86/hvm/quirks.c
+@@ -19,6 +19,7 @@
+ #include <xen/lib.h>
+ #include <xen/dmi.h>
+ #include <xen/bitmap.h>
++#include <xen/param.h>
+ #include <asm/hvm/support.h>
+ 
+ s8 __read_mostly hvm_port80_allowed = -1;
+diff -u a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
+--- a/xen/arch/x86/hvm/viridian.c
++++ b/xen/arch/x86/hvm/viridian.c
+@@ -9,6 +9,7 @@
+  * for more information.
+  */
+ 
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/version.h>
+ #include <xen/perfc.h>
+diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
+index 634d1946d3..65445afeb0 100644
+--- a/xen/arch/x86/hvm/vmx/vmcs.c
++++ b/xen/arch/x86/hvm/vmx/vmcs.c
+@@ -18,6 +18,7 @@
+ #include <xen/init.h>
+ #include <xen/mm.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/errno.h>
+ #include <xen/domain_page.h>
+ #include <xen/event.h>
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index b262d38a7c..35c8402ea0 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -17,6 +17,7 @@
+ 
+ #include <xen/init.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/trace.h>
+ #include <xen/sched.h>
+ #include <xen/irq.h>
+diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
+index 4125ea0c0c..e98e08e9c8 100644
+--- a/xen/arch/x86/io_apic.c
++++ b/xen/arch/x86/io_apic.c
+@@ -24,6 +24,7 @@
+ #include <xen/init.h>
+ #include <xen/irq.h>
+ #include <xen/delay.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/acpi.h>
+ #include <xen/keyhandler.h>
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index 310ac00a60..cc2eb8e925 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -10,6 +10,7 @@
+ #include <xen/errno.h>
+ #include <xen/event.h>
+ #include <xen/irq.h>
++#include <xen/param.h>
+ #include <xen/perfc.h>
+ #include <xen/sched.h>
+ #include <xen/keyhandler.h>
+diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
+index 71e881b243..c0fb690f79 100644
+--- a/xen/arch/x86/microcode.c
++++ b/xen/arch/x86/microcode.c
+@@ -26,6 +26,7 @@
+ #include <xen/kernel.h>
+ #include <xen/init.h>
+ #include <xen/notifier.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/smp.h>
+ #include <xen/softirq.h>
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index f50c065af3..a05a713276 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -103,6 +103,7 @@
+ #include <xen/kernel.h>
+ #include <xen/lib.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/domain.h>
+ #include <xen/sched.h>
+ #include <xen/err.h>
+diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
+index 49cc138362..def13f657b 100644
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -27,6 +27,7 @@
+ #include <xen/iommu.h>
+ #include <xen/vm_event.h>
+ #include <xen/event.h>
++#include <xen/param.h>
+ #include <public/vm_event.h>
+ #include <asm/domain.h>
+ #include <asm/page.h>
+diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
+index df97ce0c72..c85cf9f85a 100644
+--- a/xen/arch/x86/msi.c
++++ b/xen/arch/x86/msi.c
+@@ -14,6 +14,7 @@
+ #include <xen/acpi.h>
+ #include <xen/cpu.h>
+ #include <xen/errno.h>
++#include <xen/param.h>
+ #include <xen/pci.h>
+ #include <xen/pci_regs.h>
+ #include <xen/iocap.h>
+diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
+index e26121a737..a5c6bdd0ce 100644
+--- a/xen/arch/x86/nmi.c
++++ b/xen/arch/x86/nmi.c
+@@ -16,6 +16,7 @@
+ #include <xen/init.h>
+ #include <xen/lib.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/irq.h>
+ #include <xen/delay.h>
+ #include <xen/time.h>
+diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
+index 7e1f563012..6ef15b34d5 100644
+--- a/xen/arch/x86/numa.c
++++ b/xen/arch/x86/numa.c
+@@ -11,6 +11,7 @@
+ #include <xen/nodemask.h>
+ #include <xen/numa.h>
+ #include <xen/keyhandler.h>
++#include <xen/param.h>
+ #include <xen/time.h>
+ #include <xen/smp.h>
+ #include <xen/pfn.h>
+diff --git a/xen/arch/x86/oprofile/nmi_int.c b/xen/arch/x86/oprofile/nmi_int.c
+index 3dfb8fef93..8f97f7522c 100644
+--- a/xen/arch/x86/oprofile/nmi_int.c
++++ b/xen/arch/x86/oprofile/nmi_int.c
+@@ -15,6 +15,7 @@
+ #include <xen/types.h>
+ #include <xen/errno.h>
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/string.h>
+ #include <xen/delay.h>
+ #include <xen/xenoprof.h>
+diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
+index 8bf1c23751..d7f8864651 100644
+--- a/xen/arch/x86/psr.c
++++ b/xen/arch/x86/psr.c
+@@ -16,6 +16,7 @@
+ #include <xen/cpu.h>
+ #include <xen/err.h>
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <asm/psr.h>
+ 
+diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
+index 4da0b2afff..c3473b9a47 100644
+--- a/xen/arch/x86/pv/domain.c
++++ b/xen/arch/x86/pv/domain.c
+@@ -7,6 +7,7 @@
+ #include <xen/domain_page.h>
+ #include <xen/errno.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ 
+ #include <asm/cpufeature.h>
+diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
+index 7a898fdbe5..76fb380100 100644
+--- a/xen/arch/x86/pv/shim.c
++++ b/xen/arch/x86/pv/shim.c
+@@ -23,6 +23,7 @@
+ #include <xen/hypercall.h>
+ #include <xen/init.h>
+ #include <xen/iocap.h>
++#include <xen/param.h>
+ #include <xen/shutdown.h>
+ #include <xen/types.h>
+ #include <xen/consoled.h>
+diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
+index 0223967b24..e50e1f86b3 100644
+--- a/xen/arch/x86/setup.c
++++ b/xen/arch/x86/setup.c
+@@ -1,6 +1,7 @@
+ #include <xen/init.h>
+ #include <xen/lib.h>
+ #include <xen/err.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/sched-if.h>
+ #include <xen/domain.h>
+diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
+index 005c0bf4fa..acef033143 100644
+--- a/xen/arch/x86/shutdown.c
++++ b/xen/arch/x86/shutdown.c
+@@ -6,6 +6,7 @@
+ 
+ #include <xen/init.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/smp.h>
+ #include <xen/delay.h>
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index aa632bdcee..20f562902b 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -19,6 +19,7 @@
+ #include <xen/errno.h>
+ #include <xen/init.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/warning.h>
+ 
+ #include <asm/microcode.h>
+diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
+index 5020c4ad49..8c232270b4 100644
+--- a/xen/arch/x86/tboot.c
++++ b/xen/arch/x86/tboot.c
+@@ -1,6 +1,7 @@
+ #include <xen/init.h>
+ #include <xen/types.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/domain_page.h>
+ #include <xen/iommu.h>
+diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
+index f6b26f8883..cf3e51fb5e 100644
+--- a/xen/arch/x86/time.c
++++ b/xen/arch/x86/time.c
+@@ -14,6 +14,7 @@
+ #include <xen/sched.h>
+ #include <xen/lib.h>
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/time.h>
+ #include <xen/timer.h>
+ #include <xen/smp.h>
+diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
+index 97499a0c79..56067f85d1 100644
+--- a/xen/arch/x86/traps.c
++++ b/xen/arch/x86/traps.c
+@@ -30,6 +30,7 @@
+ #include <xen/err.h>
+ #include <xen/errno.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/console.h>
+ #include <xen/shutdown.h>
+ #include <xen/guest_access.h>
+diff --git a/xen/arch/x86/tsx.c b/xen/arch/x86/tsx.c
+index 2d202a0d4e..39e483640a 100644
+--- a/xen/arch/x86/tsx.c
++++ b/xen/arch/x86/tsx.c
+@@ -1,4 +1,5 @@
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <asm/msr.h>
+ 
+ /*
+diff --git a/xen/arch/x86/x86_64/mmconfig-shared.c b/xen/arch/x86/x86_64/mmconfig-shared.c
+index cc08b52a35..0c55c7206e 100644
+--- a/xen/arch/x86/x86_64/mmconfig-shared.c
++++ b/xen/arch/x86/x86_64/mmconfig-shared.c
+@@ -14,6 +14,7 @@
+ 
+ #include <xen/init.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/acpi.h>
+ #include <xen/xmalloc.h>
+ #include <xen/pci.h>
+diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
+index 243495ed07..078419a171 100644
+--- a/xen/arch/x86/xstate.c
++++ b/xen/arch/x86/xstate.c
+@@ -5,6 +5,7 @@
+  *
+  */
+ 
++#include <xen/param.h>
+ #include <xen/percpu.h>
+ #include <xen/sched.h>
+ #include <asm/current.h>
+diff --git a/xen/common/core_parking.c b/xen/common/core_parking.c
+index a6669e1766..411106c675 100644
+--- a/xen/common/core_parking.c
++++ b/xen/common/core_parking.c
+@@ -19,6 +19,7 @@
+ #include <xen/cpu.h>
+ #include <xen/init.h>
+ #include <xen/cpumask.h>
++#include <xen/param.h>
+ #include <asm/percpu.h>
+ #include <asm/smp.h>
+ 
+diff --git a/xen/common/domain.c b/xen/common/domain.c
+index dfea575b49..0ae04d5bb9 100644
+--- a/xen/common/domain.c
++++ b/xen/common/domain.c
+@@ -9,6 +9,7 @@
+ #include <xen/lib.h>
+ #include <xen/ctype.h>
+ #include <xen/err.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/sched-if.h>
+ #include <xen/domain.h>
+diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
+index bf7bb95999..b9f461505c 100644
+--- a/xen/common/efi/boot.c
++++ b/xen/common/efi/boot.c
+@@ -11,6 +11,7 @@
+ #include <xen/lib.h>
+ #include <xen/mm.h>
+ #include <xen/multiboot.h>
++#include <xen/param.h>
+ #include <xen/pci_regs.h>
+ #include <xen/pfn.h>
+ #if EFI_PAGE_SIZE != PAGE_SIZE
+diff --git a/xen/common/gdbstub.c b/xen/common/gdbstub.c
+index 6234834a20..848c1f4327 100644
+--- a/xen/common/gdbstub.c
++++ b/xen/common/gdbstub.c
+@@ -40,6 +40,7 @@
+ #include <xen/watchdog.h>
+ #include <asm/debugger.h>
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/smp.h>
+ #include <xen/console.h>
+ #include <xen/errno.h>
+diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
+index 5536d282b9..2ecf38dfbe 100644
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -28,6 +28,7 @@
+ #include <xen/lib.h>
+ #include <xen/sched.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/event.h>
+ #include <xen/trace.h>
+ #include <xen/grant_table.h>
+diff --git a/xen/common/kernel.c b/xen/common/kernel.c
+index 760917dab5..22941cec94 100644
+--- a/xen/common/kernel.c
++++ b/xen/common/kernel.c
+@@ -7,6 +7,7 @@
+ #include <xen/init.h>
+ #include <xen/lib.h>
+ #include <xen/errno.h>
++#include <xen/param.h>
+ #include <xen/version.h>
+ #include <xen/sched.h>
+ #include <xen/paging.h>
+diff --git a/xen/common/kexec.c b/xen/common/kexec.c
+index a262cc5a18..9af7de4df3 100644
+--- a/xen/common/kexec.c
++++ b/xen/common/kexec.c
+@@ -12,6 +12,7 @@
+ #include <xen/ctype.h>
+ #include <xen/errno.h>
+ #include <xen/guest_access.h>
++#include <xen/param.h>
+ #include <xen/watchdog.h>
+ #include <xen/sched.h>
+ #include <xen/types.h>
+diff --git a/xen/common/memory.c b/xen/common/memory.c
+index c7d2bac452..ecc7e64334 100644
+--- a/xen/common/memory.c
++++ b/xen/common/memory.c
+@@ -10,6 +10,7 @@
+ #include <xen/types.h>
+ #include <xen/lib.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/perfc.h>
+ #include <xen/sched.h>
+ #include <xen/event.h>
+diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
+index 919a270587..97902d42c1 100644
+--- a/xen/common/page_alloc.c
++++ b/xen/common/page_alloc.c
+@@ -126,6 +126,7 @@
+ #include <xen/sched.h>
+ #include <xen/spinlock.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/irq.h>
+ #include <xen/softirq.h>
+ #include <xen/domain_page.h>
+diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
+index cb712c8690..91d4ad0fd8 100644
+--- a/xen/common/rcupdate.c
++++ b/xen/common/rcupdate.c
+@@ -34,6 +34,7 @@
+ #include <xen/types.h>
+ #include <xen/kernel.h>
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/spinlock.h>
+ #include <xen/smp.h>
+ #include <xen/rcupdate.h>
+diff -u a/xen/common/sched_credit.c b/xen/common/sched_credit.c
+--- a/xen/common/sched_credit.c
++++ b/xen/common/sched_credit.c
+@@ -9,6 +9,7 @@
+  */
+ 
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/lib.h>
+ #include <xen/sched.h>
+ #include <xen/domain.h>
+diff -u a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
+--- a/xen/common/sched_credit2.c
++++ b/xen/common/sched_credit2.c
+@@ -11,6 +11,7 @@
+  */
+ 
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/lib.h>
+ #include <xen/sched.h>
+ #include <xen/domain.h>
+diff -u a/xen/common/schedule.c b/xen/common/schedule.c
+--- a/xen/common/schedule.c
++++ b/xen/common/schedule.c
+@@ -15,6 +15,7 @@
+ 
+ #ifndef COMPAT
+ #include <xen/init.h>
++#include <xen/param.h>
+ #include <xen/lib.h>
+ #include <xen/sched.h>
+ #include <xen/domain.h>
+diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
+index 2ed4d62214..912593915b 100644
+--- a/xen/common/shutdown.c
++++ b/xen/common/shutdown.c
+@@ -1,5 +1,6 @@
+ #include <xen/init.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/domain.h>
+ #include <xen/delay.h>
+diff --git a/xen/common/timer.c b/xen/common/timer.c
+index 645206a989..1bb265ceea 100644
+--- a/xen/common/timer.c
++++ b/xen/common/timer.c
+@@ -10,6 +10,7 @@
+ #include <xen/errno.h>
+ #include <xen/sched.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/smp.h>
+ #include <xen/perfc.h>
+ #include <xen/time.h>
+diff -u a/xen/common/tmem_xen.c b/xen/common/tmem_xen.c
+--- a/xen/common/tmem_xen.c
++++ b/xen/common/tmem_xen.c
+@@ -13,6 +13,7 @@
+ #include <xen/domain_page.h>
+ #include <xen/cpu.h>
+ #include <xen/init.h>
++#include <xen/param.h>
+ 
+ bool __read_mostly opt_tmem;
+ boolean_param("tmem", opt_tmem);
+diff --git a/xen/common/trace.c b/xen/common/trace.c
+index ebfc735b31..a2a389a1c7 100644
+--- a/xen/common/trace.c
++++ b/xen/common/trace.c
+@@ -19,6 +19,7 @@
+ #include <asm/types.h>
+ #include <asm/io.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/smp.h>
+ #include <xen/trace.h>
+diff --git a/xen/drivers/acpi/apei/hest.c b/xen/drivers/acpi/apei/hest.c
+index 70734ab0e2..c5f3aaab7c 100644
+--- a/xen/drivers/acpi/apei/hest.c
++++ b/xen/drivers/acpi/apei/hest.c
+@@ -30,6 +30,7 @@
+ #include <xen/init.h>
+ #include <xen/kernel.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/pfn.h>
+ #include <acpi/acpi.h>
+ #include <acpi/apei.h>
+diff --git a/xen/drivers/acpi/tables.c b/xen/drivers/acpi/tables.c
+index b890b73901..8c2a279e18 100644
+--- a/xen/drivers/acpi/tables.c
++++ b/xen/drivers/acpi/tables.c
+@@ -24,6 +24,7 @@
+ 
+ #include <xen/init.h>
+ #include <xen/kernel.h>
++#include <xen/param.h>
+ #include <xen/smp.h>
+ #include <xen/string.h>
+ #include <xen/types.h>
+diff --git a/xen/drivers/char/arm-uart.c b/xen/drivers/char/arm-uart.c
+index 627746ba89..eeb9ceefc0 100644
+--- a/xen/drivers/char/arm-uart.c
++++ b/xen/drivers/char/arm-uart.c
+@@ -21,6 +21,7 @@
+ #include <asm/types.h>
+ #include <xen/console.h>
+ #include <xen/device_tree.h>
++#include <xen/param.h>
+ #include <xen/serial.h>
+ #include <xen/errno.h>
+ #include <xen/acpi.h>
+diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
+index 4bcbbfa7d6..913ae1b66a 100644
+--- a/xen/drivers/char/console.c
++++ b/xen/drivers/char/console.c
+@@ -15,6 +15,7 @@
+ #include <xen/init.h>
+ #include <xen/event.h>
+ #include <xen/console.h>
++#include <xen/param.h>
+ #include <xen/serial.h>
+ #include <xen/softirq.h>
+ #include <xen/keyhandler.h>
+diff --git a/xen/drivers/char/ehci-dbgp.c b/xen/drivers/char/ehci-dbgp.c
+index b6e155d17b..c893d246de 100644
+--- a/xen/drivers/char/ehci-dbgp.c
++++ b/xen/drivers/char/ehci-dbgp.c
+@@ -8,6 +8,7 @@
+ #include <xen/console.h>
+ #include <xen/delay.h>
+ #include <xen/errno.h>
++#include <xen/param.h>
+ #include <xen/pci.h>
+ #include <xen/serial.h>
+ #include <asm/byteorder.h>
+diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
+index aa87c57fc9..bd048f307a 100644
+--- a/xen/drivers/char/ns16550.c
++++ b/xen/drivers/char/ns16550.c
+@@ -11,6 +11,7 @@
+ #include <xen/console.h>
+ #include <xen/init.h>
+ #include <xen/irq.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/timer.h>
+ #include <xen/serial.h>
+diff --git a/xen/drivers/char/serial.c b/xen/drivers/char/serial.c
+index 88cd876790..5ecba0af33 100644
+--- a/xen/drivers/char/serial.c
++++ b/xen/drivers/char/serial.c
+@@ -9,6 +9,7 @@
+ #include <xen/delay.h>
+ #include <xen/init.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/serial.h>
+ #include <xen/cache.h>
+ 
+diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
+index 2d716abf72..e630a47419 100644
+--- a/xen/drivers/cpufreq/cpufreq.c
++++ b/xen/drivers/cpufreq/cpufreq.c
+@@ -31,6 +31,7 @@
+ #include <xen/delay.h>
+ #include <xen/cpumask.h>
+ #include <xen/list.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/string.h>
+ #include <xen/timer.h>
+diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c b/xen/drivers/passthrough/amd/iommu_acpi.c
+index 9fbc343c58..6c5f8e46ec 100644
+--- a/xen/drivers/passthrough/amd/iommu_acpi.c
++++ b/xen/drivers/passthrough/amd/iommu_acpi.c
+@@ -19,6 +19,7 @@
+ 
+ #include <xen/errno.h>
+ #include <xen/acpi.h>
++#include <xen/param.h>
+ #include <asm/apicdef.h>
+ #include <asm/io_apic.h>
+ #include <asm/amd-iommu.h>
+diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
+index 4e19cf56cc..9d421e06de 100644
+--- a/xen/drivers/passthrough/iommu.c
++++ b/xen/drivers/passthrough/iommu.c
+@@ -17,6 +17,7 @@
+ #include <xen/paging.h>
+ #include <xen/guest_access.h>
+ #include <xen/event.h>
++#include <xen/param.h>
+ #include <xen/softirq.h>
+ #include <xen/keyhandler.h>
+ #include <xsm/xsm.h>
+diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
+index 65d1d457ff..5660f7e1c2 100644
+--- a/xen/drivers/passthrough/pci.c
++++ b/xen/drivers/passthrough/pci.c
+@@ -21,6 +21,7 @@
+ #include <xen/prefetch.h>
+ #include <xen/iommu.h>
+ #include <xen/irq.h>
++#include <xen/param.h>
+ #include <xen/vm_event.h>
+ #include <asm/hvm/irq.h>
+ #include <xen/delay.h>
+diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
+index f36b99ae37..1784f91b34 100644
+--- a/xen/drivers/passthrough/vtd/dmar.c
++++ b/xen/drivers/passthrough/vtd/dmar.c
+@@ -24,6 +24,7 @@
+ #include <xen/kernel.h>
+ #include <xen/acpi.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/xmalloc.h>
+ #include <xen/pci.h>
+ #include <xen/pci_regs.h>
+diff --git a/xen/drivers/passthrough/vtd/quirks.c b/xen/drivers/passthrough/vtd/quirks.c
+index 4dadd9523f..5594270678 100644
+--- a/xen/drivers/passthrough/vtd/quirks.c
++++ b/xen/drivers/passthrough/vtd/quirks.c
+@@ -17,6 +17,7 @@
+  */
+ 
+ #include <xen/irq.h>
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/xmalloc.h>
+ #include <xen/domain_page.h>
+diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
+index ff456e1e70..f379afac03 100644
+--- a/xen/drivers/passthrough/vtd/x86/vtd.c
++++ b/xen/drivers/passthrough/vtd/x86/vtd.c
+@@ -17,6 +17,7 @@
+  * Copyright (C) Weidong Han <weidong.han@intel.com>
+  */
+ 
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/softirq.h>
+ #include <xen/domain_page.h>
+diff --git a/xen/drivers/passthrough/x86/ats.c b/xen/drivers/passthrough/x86/ats.c
+index 3eea7f89fc..8ae0eae4a2 100644
+--- a/xen/drivers/passthrough/x86/ats.c
++++ b/xen/drivers/passthrough/x86/ats.c
+@@ -12,6 +12,7 @@
+  * this program; If not, see <http://www.gnu.org/licenses/>.
+  */
+ 
++#include <xen/param.h>
+ #include <xen/sched.h>
+ #include <xen/pci.h>
+ #include <xen/pci_regs.h>
+diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c
+index fd2cb1312d..2c1bbd9278 100644
+--- a/xen/drivers/video/vesa.c
++++ b/xen/drivers/video/vesa.c
+@@ -6,6 +6,7 @@
+ 
+ #include <xen/init.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ #include <xen/xmalloc.h>
+ #include <xen/kernel.h>
+ #include <xen/vga.h>
+diff --git a/xen/drivers/video/vga.c b/xen/drivers/video/vga.c
+index 666f2e2509..b7f04d0d97 100644
+--- a/xen/drivers/video/vga.c
++++ b/xen/drivers/video/vga.c
+@@ -7,6 +7,7 @@
+ #include <xen/init.h>
+ #include <xen/lib.h>
+ #include <xen/mm.h>
++#include <xen/param.h>
+ #include <xen/vga.h>
+ #include <xen/pci.h>
+ #include <asm/io.h>
+diff --git a/xen/include/xen/init.h b/xen/include/xen/init.h
+index d0f3a007d0..bfe789e93f 100644
+--- a/xen/include/xen/init.h
++++ b/xen/include/xen/init.h
+@@ -71,120 +71,6 @@ typedef void (*exitcall_t)(void);
+ void do_presmp_initcalls(void);
+ void do_initcalls(void);
+ 
+-/*
+- * Used for kernel command line parameter setup
+- */
+-struct kernel_param {
+-    const char *name;
+-    enum {
+-        OPT_STR,
+-        OPT_UINT,
+-        OPT_BOOL,
+-        OPT_SIZE,
+-        OPT_CUSTOM
+-    } type;
+-    unsigned int len;
+-    union {
+-        void *var;
+-        int (*func)(const char *);
+-    } par;
+-};
+-
+-extern const struct kernel_param __setup_start[], __setup_end[];
+-extern const struct kernel_param __param_start[], __param_end[];
+-
+-#define __dataparam       __used_section(".data.param")
+-
+-#define __param(att)      static const att \
+-    __attribute__((__aligned__(sizeof(void *)))) struct kernel_param
+-
+-#define __setup_str static const __initconst \
+-    __attribute__((__aligned__(1))) char
+-#define __kparam          __param(__initsetup)
+-
+-#define custom_param(_name, _var) \
+-    __setup_str __setup_str_##_var[] = _name; \
+-    __kparam __setup_##_var = \
+-        { .name = __setup_str_##_var, \
+-          .type = OPT_CUSTOM, \
+-          .par.func = _var }
+-#define boolean_param(_name, _var) \
+-    __setup_str __setup_str_##_var[] = _name; \
+-    __kparam __setup_##_var = \
+-        { .name = __setup_str_##_var, \
+-          .type = OPT_BOOL, \
+-          .len = sizeof(_var), \
+-          .par.var = &_var }
+-#define integer_param(_name, _var) \
+-    __setup_str __setup_str_##_var[] = _name; \
+-    __kparam __setup_##_var = \
+-        { .name = __setup_str_##_var, \
+-          .type = OPT_UINT, \
+-          .len = sizeof(_var), \
+-          .par.var = &_var }
+-#define size_param(_name, _var) \
+-    __setup_str __setup_str_##_var[] = _name; \
+-    __kparam __setup_##_var = \
+-        { .name = __setup_str_##_var, \
+-          .type = OPT_SIZE, \
+-          .len = sizeof(_var), \
+-          .par.var = &_var }
+-#define string_param(_name, _var) \
+-    __setup_str __setup_str_##_var[] = _name; \
+-    __kparam __setup_##_var = \
+-        { .name = __setup_str_##_var, \
+-          .type = OPT_STR, \
+-          .len = sizeof(_var), \
+-          .par.var = &_var }
+-
+-#define __rtparam         __param(__dataparam)
+-
+-#define custom_runtime_only_param(_name, _var) \
+-    __rtparam __rtpar_##_var = \
+-      { .name = _name, \
+-          .type = OPT_CUSTOM, \
+-          .par.func = _var }
+-#define boolean_runtime_only_param(_name, _var) \
+-    __rtparam __rtpar_##_var = \
+-        { .name = _name, \
+-          .type = OPT_BOOL, \
+-          .len = sizeof(_var), \
+-          .par.var = &_var }
+-#define integer_runtime_only_param(_name, _var) \
+-    __rtparam __rtpar_##_var = \
+-        { .name = _name, \
+-          .type = OPT_UINT, \
+-          .len = sizeof(_var), \
+-          .par.var = &_var }
+-#define size_runtime_only_param(_name, _var) \
+-    __rtparam __rtpar_##_var = \
+-        { .name = _name, \
+-          .type = OPT_SIZE, \
+-          .len = sizeof(_var), \
+-          .par.var = &_var }
+-#define string_runtime_only_param(_name, _var) \
+-    __rtparam __rtpar_##_var = \
+-        { .name = _name, \
+-          .type = OPT_STR, \
+-          .len = sizeof(_var), \
+-          .par.var = &_var }
+-
+-#define custom_runtime_param(_name, _var) \
+-    custom_param(_name, _var); \
+-    custom_runtime_only_param(_name, _var)
+-#define boolean_runtime_param(_name, _var) \
+-    boolean_param(_name, _var); \
+-    boolean_runtime_only_param(_name, _var)
+-#define integer_runtime_param(_name, _var) \
+-    integer_param(_name, _var); \
+-    integer_runtime_only_param(_name, _var)
+-#define size_runtime_param(_name, _var) \
+-    size_param(_name, _var); \
+-    size_runtime_only_param(_name, _var)
+-#define string_runtime_param(_name, _var) \
+-    string_param(_name, _var); \
+-    string_runtime_only_param(_name, _var)
+-
+ #endif /* __ASSEMBLY__ */
+ 
+ #ifdef CONFIG_LATE_HWDOM
+diff --git a/xen/include/xen/param.h b/xen/include/xen/param.h
+new file mode 100644
+index 0000000000..75471eb4ad
+--- /dev/null
++++ b/xen/include/xen/param.h
+@@ -0,0 +1,119 @@
++#ifndef _XEN_PARAM_H
++#define _XEN_PARAM_H
++
++#include <xen/init.h>
++
++/*
++ * Used for kernel command line parameter setup
++ */
++struct kernel_param {
++    const char *name;
++    enum {
++        OPT_STR,
++        OPT_UINT,
++        OPT_BOOL,
++        OPT_SIZE,
++        OPT_CUSTOM
++    } type;
++    unsigned int len;
++    union {
++        void *var;
++        int (*func)(const char *);
++    } par;
++};
++
++extern const struct kernel_param __setup_start[], __setup_end[];
++extern const struct kernel_param __param_start[], __param_end[];
++
++#define __dataparam       __used_section(".data.param")
++
++#define __param(att)      static const att \
++    __attribute__((__aligned__(sizeof(void *)))) struct kernel_param
++
++#define __setup_str static const __initconst \
++    __attribute__((__aligned__(1))) char
++#define __kparam          __param(__initsetup)
++
++#define custom_param(_name, _var) \
++    __setup_str __setup_str_##_var[] = _name; \
++    __kparam __setup_##_var = \
++        { .name = __setup_str_##_var, \
++          .type = OPT_CUSTOM, \
++          .par.func = _var }
++#define boolean_param(_name, _var) \
++    __setup_str __setup_str_##_var[] = _name; \
++    __kparam __setup_##_var = \
++        { .name = __setup_str_##_var, \
++          .type = OPT_BOOL, \
++          .len = sizeof(_var), \
++          .par.var = &_var }
++#define integer_param(_name, _var) \
++    __setup_str __setup_str_##_var[] = _name; \
++    __kparam __setup_##_var = \
++        { .name = __setup_str_##_var, \
++          .type = OPT_UINT, \
++          .len = sizeof(_var), \
++          .par.var = &_var }
++#define size_param(_name, _var) \
++    __setup_str __setup_str_##_var[] = _name; \
++    __kparam __setup_##_var = \
++        { .name = __setup_str_##_var, \
++          .type = OPT_SIZE, \
++          .len = sizeof(_var), \
++          .par.var = &_var }
++#define string_param(_name, _var) \
++    __setup_str __setup_str_##_var[] = _name; \
++    __kparam __setup_##_var = \
++        { .name = __setup_str_##_var, \
++          .type = OPT_STR, \
++          .len = sizeof(_var), \
++          .par.var = &_var }
++#define __rtparam         __param(__dataparam)
++
++#define custom_runtime_only_param(_name, _var) \
++    __rtparam __rtpar_##_var = \
++      { .name = _name, \
++          .type = OPT_CUSTOM, \
++          .par.func = _var }
++#define boolean_runtime_only_param(_name, _var) \
++    __rtparam __rtpar_##_var = \
++        { .name = _name, \
++          .type = OPT_BOOL, \
++          .len = sizeof(_var), \
++          .par.var = &_var }
++#define integer_runtime_only_param(_name, _var) \
++    __rtparam __rtpar_##_var = \
++        { .name = _name, \
++          .type = OPT_UINT, \
++          .len = sizeof(_var), \
++          .par.var = &_var }
++#define size_runtime_only_param(_name, _var) \
++    __rtparam __rtpar_##_var = \
++        { .name = _name, \
++          .type = OPT_SIZE, \
++          .len = sizeof(_var), \
++          .par.var = &_var }
++#define string_runtime_only_param(_name, _var) \
++    __rtparam __rtpar_##_var = \
++        { .name = _name, \
++          .type = OPT_STR, \
++          .len = sizeof(_var), \
++          .par.var = &_var }
++
++#define custom_runtime_param(_name, _var) \
++    custom_param(_name, _var); \
++    custom_runtime_only_param(_name, _var)
++#define boolean_runtime_param(_name, _var) \
++    boolean_param(_name, _var); \
++    boolean_runtime_only_param(_name, _var)
++#define integer_runtime_param(_name, _var) \
++    integer_param(_name, _var); \
++    integer_runtime_only_param(_name, _var)
++#define size_runtime_param(_name, _var) \
++    size_param(_name, _var); \
++    size_runtime_only_param(_name, _var)
++#define string_runtime_param(_name, _var) \
++    string_param(_name, _var); \
++    string_runtime_only_param(_name, _var)
++
++#endif /* _XEN_PARAM_H */
+diff --git a/xen/xsm/flask/flask_op.c b/xen/xsm/flask/flask_op.c
+index 1c4decc6cd..a5f2b104e2 100644
+--- a/xen/xsm/flask/flask_op.c
++++ b/xen/xsm/flask/flask_op.c
+@@ -13,6 +13,7 @@
+ #include <xsm/xsm.h>
+ #include <xen/guest_access.h>
+ #include <xen/err.h>
++#include <xen/param.h>
+ 
+ #include <public/xsm/flask_op.h>
+ 
+diff --git a/xen/xsm/xsm_core.c b/xen/xsm/xsm_core.c
+index a319df253d..5eab21e1b1 100644
+--- a/xen/xsm/xsm_core.c
++++ b/xen/xsm/xsm_core.c
+@@ -13,6 +13,7 @@
+ #include <xen/init.h>
+ #include <xen/errno.h>
+ #include <xen/lib.h>
++#include <xen/param.h>
+ 
+ #include <xen/hypercall.h>
+ #include <xsm/xsm.h>
+-- 
+2.25.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa312-4.11.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa312-4.11.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa312-4.11.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa312-4.11.patch	2022-05-21 13:43:08.000000000 +0100
@@ -0,0 +1,98 @@
+From 35cb81a9967a061df7d0eb8c387395f1c1984454 Mon Sep 17 00:00:00 2001
+From: Julien Grall <julien@xen.org>
+Date: Thu, 19 Dec 2019 08:12:21 +0000
+Subject: [PATCH] xen/arm: Place a speculation barrier sequence following an
+ eret instruction
+
+Some CPUs can speculate past an ERET instruction and potentially perform
+speculative accesses to memory before processing the exception return.
+Since the register state is often controlled by lower privilege level
+at the point of an ERET, this could potentially be used as part of a
+side-channel attack.
+
+Newer CPUs may implement a new SB barrier instruction which acts
+as an architected speculation barrier. For current CPUs, the sequence
+DSB; ISB is known to prevent speculation.
+
+The latter sequence is heavier than SB but it would never be executed
+(this is speculation after all!).
+
+Introduce a new macro 'sb' that could be used when a speculation barrier
+is required. For now it is using dsb; isb but this could easily be
+updated to cater SB in the future.
+
+This is XSA-312.
+
+Signed-off-by: Julien Grall <julien@xen.org>
+---
+ xen/arch/arm/arm32/entry.S   | 2 ++
+ xen/arch/arm/arm64/entry.S   | 3 +++
+ xen/include/asm-arm/macros.h | 9 +++++++++
+ 3 files changed, 14 insertions(+)
+
+diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
+index 16d9f93653..464c8b8645 100644
+--- a/xen/arch/arm/arm32/entry.S
++++ b/xen/arch/arm/arm32/entry.S
+@@ -1,4 +1,5 @@
+ #include <asm/asm_defns.h>
++#include <asm/macros.h>
+ #include <asm/regs.h>
+ #include <asm/alternative.h>
+ #include <public/xen.h>
+@@ -379,6 +380,7 @@ return_to_hypervisor:
+         add sp, #(UREGS_SP_usr - UREGS_sp); /* SP, LR, SPSR, PC */
+         clrex
+         eret
++        sb
+ 
+ /*
+  * struct vcpu *__context_switch(struct vcpu *prev, struct vcpu *next)
+diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
+index 12df95e901..a42c51e489 100644
+--- a/xen/arch/arm/arm64/entry.S
++++ b/xen/arch/arm/arm64/entry.S
+@@ -2,6 +2,7 @@
+ #include <asm/regs.h>
+ #include <asm/alternative.h>
+ #include <asm/smccc.h>
++#include <asm/macros.h>
+ #include <public/xen.h>
+ 
+ /*
+@@ -288,6 +289,7 @@ guest_sync:
+          */
+         mov     x1, xzr
+         eret
++        sb
+ 
+ 1:
+         /*
+@@ -413,6 +415,7 @@ return_from_trap:
+         ldr     lr, [sp], #(UREGS_SPSR_el1 - UREGS_LR) /* CPSR, PC, SP, LR */
+ 
+         eret
++        sb
+ 
+ /*
+  * This function is used to check pending virtual SError in the gap of
+diff --git a/xen/include/asm-arm/macros.h b/xen/include/asm-arm/macros.h
+index 5d837cb38b..539f613ee5 100644
+--- a/xen/include/asm-arm/macros.h
++++ b/xen/include/asm-arm/macros.h
+@@ -13,4 +13,13 @@
+ # error "unknown ARM variant"
+ #endif
+ 
++    /*
++     * Speculative barrier
++     * XXX: Add support for the 'sb' instruction
++     */
++    .macro sb
++    dsb nsh
++    isb
++    .endm
++
+ #endif /* __ASM_ARM_MACROS_H */
+-- 
+2.17.1
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa313-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa313-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa313-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa313-1.patch	2022-05-21 14:18:38.000000000 +0100
@@ -0,0 +1,26 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: xenoprof: clear buffer intended to be shared with guests
+
+alloc_xenheap_pages() making use of MEMF_no_scrub is fine for Xen
+internally used allocations, but buffers allocated to be shared with
+(unpriviliged) guests need to be zapped of their prior content.
+
+This is part of XSA-313.
+
+Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Wei Liu <wl@xen.org>
+
+--- a/xen/common/xenoprof.c
++++ b/xen/common/xenoprof.c
+@@ -253,6 +253,9 @@ static int alloc_xenoprof_struct(
+         return -ENOMEM;
+     }
+ 
++    for ( i = 0; i < npages; ++i )
++        clear_page(d->xenoprof->rawbuf + i * PAGE_SIZE);
++
+     d->xenoprof->npages = npages;
+     d->xenoprof->nbuf = nvcpu;
+     d->xenoprof->bufsize = bufsize;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa313-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa313-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa313-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa313-2.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,132 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: xenoprof: limit consumption of shared buffer data
+
+Since a shared buffer can be written to by the guest, we may only read
+the head and tail pointers from there (all other fields should only ever
+be written to). Furthermore, for any particular operation the two values
+must be read exactly once, with both checks and consumption happening
+with the thus read values. (The backtrace related xenoprof_buf_space()
+use in xenoprof_log_event() is an exception: The values used there get
+re-checked by every subsequent xenoprof_add_sample().)
+
+Since that code needed touching, also fix the double increment of the
+lost samples count in case the backtrace related xenoprof_add_sample()
+invocation in xenoprof_log_event() fails.
+
+Where code is being touched anyway, add const as appropriate, but take
+the opportunity to entirely drop the now unused domain parameter of
+xenoprof_buf_space().
+
+This is part of XSA-313.
+
+Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: George Dunlap <george.dunlap@citrix.com>
+Reviewed-by: Wei Liu <wl@xen.org>
+
+--- a/xen/common/xenoprof.c
++++ b/xen/common/xenoprof.c
+@@ -479,25 +479,22 @@ static int add_passive_list(XEN_GUEST_HA
+ 
+ 
+ /* Get space in the buffer */
+-static int xenoprof_buf_space(struct domain *d, xenoprof_buf_t * buf, int size)
++static int xenoprof_buf_space(int head, int tail, int size)
+ {
+-    int head, tail;
+-
+-    head = xenoprof_buf(d, buf, event_head);
+-    tail = xenoprof_buf(d, buf, event_tail);
+-
+     return ((tail > head) ? 0 : size) + tail - head - 1;
+ }
+ 
+ /* Check for space and add a sample. Return 1 if successful, 0 otherwise. */
+-static int xenoprof_add_sample(struct domain *d, xenoprof_buf_t *buf,
++static int xenoprof_add_sample(const struct domain *d,
++                               const struct xenoprof_vcpu *v,
+                                uint64_t eip, int mode, int event)
+ {
++    xenoprof_buf_t *buf = v->buffer;
+     int head, tail, size;
+ 
+     head = xenoprof_buf(d, buf, event_head);
+     tail = xenoprof_buf(d, buf, event_tail);
+-    size = xenoprof_buf(d, buf, event_size);
++    size = v->event_size;
+     
+     /* make sure indexes in shared buffer are sane */
+     if ( (head < 0) || (head >= size) || (tail < 0) || (tail >= size) )
+@@ -506,7 +503,7 @@ static int xenoprof_add_sample(struct do
+         return 0;
+     }
+ 
+-    if ( xenoprof_buf_space(d, buf, size) > 0 )
++    if ( xenoprof_buf_space(head, tail, size) > 0 )
+     {
+         xenoprof_buf(d, buf, event_log[head].eip) = eip;
+         xenoprof_buf(d, buf, event_log[head].mode) = mode;
+@@ -530,7 +527,6 @@ static int xenoprof_add_sample(struct do
+ int xenoprof_add_trace(struct vcpu *vcpu, uint64_t pc, int mode)
+ {
+     struct domain *d = vcpu->domain;
+-    xenoprof_buf_t *buf = d->xenoprof->vcpu[vcpu->vcpu_id].buffer;
+ 
+     /* Do not accidentally write an escape code due to a broken frame. */
+     if ( pc == XENOPROF_ESCAPE_CODE )
+@@ -539,7 +535,8 @@ int xenoprof_add_trace(struct vcpu *vcpu
+         return 0;
+     }
+ 
+-    return xenoprof_add_sample(d, buf, pc, mode, 0);
++    return xenoprof_add_sample(d, &d->xenoprof->vcpu[vcpu->vcpu_id],
++                               pc, mode, 0);
+ }
+ 
+ void xenoprof_log_event(struct vcpu *vcpu, const struct cpu_user_regs *regs,
+@@ -570,17 +567,22 @@ void xenoprof_log_event(struct vcpu *vcp
+     /* Provide backtrace if requested. */
+     if ( backtrace_depth > 0 )
+     {
+-        if ( (xenoprof_buf_space(d, buf, v->event_size) < 2) ||
+-             !xenoprof_add_sample(d, buf, XENOPROF_ESCAPE_CODE, mode, 
+-                                  XENOPROF_TRACE_BEGIN) )
++        if ( xenoprof_buf_space(xenoprof_buf(d, buf, event_head),
++                                xenoprof_buf(d, buf, event_tail),
++                                v->event_size) < 2 )
+         {
+             xenoprof_buf(d, buf, lost_samples)++;
+             lost_samples++;
+             return;
+         }
++
++        /* xenoprof_add_sample() will increment lost_samples on failure */
++        if ( !xenoprof_add_sample(d, v, XENOPROF_ESCAPE_CODE, mode,
++                                  XENOPROF_TRACE_BEGIN) )
++            return;
+     }
+ 
+-    if ( xenoprof_add_sample(d, buf, pc, mode, event) )
++    if ( xenoprof_add_sample(d, v, pc, mode, event) )
+     {
+         if ( is_active(vcpu->domain) )
+             active_samples++;
+--- a/xen/include/xen/xenoprof.h
++++ b/xen/include/xen/xenoprof.h
+@@ -61,12 +61,12 @@ struct xenoprof {
+ 
+ #ifndef CONFIG_COMPAT
+ #define XENOPROF_COMPAT(x) 0
+-#define xenoprof_buf(d, b, field) ((b)->field)
++#define xenoprof_buf(d, b, field) ACCESS_ONCE((b)->field)
+ #else
+ #define XENOPROF_COMPAT(x) ((x)->is_compat)
+-#define xenoprof_buf(d, b, field) (*(!(d)->xenoprof->is_compat ? \
+-                                       &(b)->native.field : \
+-                                       &(b)->compat.field))
++#define xenoprof_buf(d, b, field) ACCESS_ONCE(*(!(d)->xenoprof->is_compat \
++                                                ? &(b)->native.field \
++                                                : &(b)->compat.field))
+ #endif
+ 
+ struct domain;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa314-4.13.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa314-4.13.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa314-4.13.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa314-4.13.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,121 @@
+From ab49f005f7d01d4004d76f2e295d31aca7d4f93a Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Thu, 20 Feb 2020 20:54:40 +0000
+Subject: [PATCH] xen/rwlock: Add missing memory barrier in the unlock path of
+ rwlock
+
+The rwlock unlock paths are using atomic_sub() to release the lock.
+However the implementation of atomic_sub() rightfully doesn't contain a
+memory barrier. On Arm, this means a processor is allowed to re-order
+the memory access with the preceeding access.
+
+In other words, the unlock may be seen by another processor before all
+the memory accesses within the "critical" section.
+
+The rwlock paths already contains barrier indirectly, but they are not
+very useful without the counterpart in the unlock paths.
+
+The memory barriers are not necessary on x86 because loads/stores are
+not re-ordered with lock instructions.
+
+So add arch_lock_release_barrier() in the unlock paths that will only
+add memory barrier on Arm.
+
+Take the opportunity to document each lock paths explaining why a
+barrier is not necessary.
+
+This is XSA-314.
+
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+
+---
+ xen/include/xen/rwlock.h | 29 ++++++++++++++++++++++++++++-
+ 1 file changed, 28 insertions(+), 1 deletion(-)
+
+diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
+index 3dfea1ac2a..516486306f 100644
+--- a/xen/include/xen/rwlock.h
++++ b/xen/include/xen/rwlock.h
+@@ -48,6 +48,10 @@ static inline int _read_trylock(rwlock_t *lock)
+     if ( likely(!(cnts & _QW_WMASK)) )
+     {
+         cnts = (u32)atomic_add_return(_QR_BIAS, &lock->cnts);
++        /*
++         * atomic_add_return() is a full barrier so no need for an
++         * arch_lock_acquire_barrier().
++         */
+         if ( likely(!(cnts & _QW_WMASK)) )
+             return 1;
+         atomic_sub(_QR_BIAS, &lock->cnts);
+@@ -64,11 +68,19 @@ static inline void _read_lock(rwlock_t *lock)
+     u32 cnts;
+ 
+     cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
++    /*
++     * atomic_add_return() is a full barrier so no need for an
++     * arch_lock_acquire_barrier().
++     */
+     if ( likely(!(cnts & _QW_WMASK)) )
+         return;
+ 
+     /* The slowpath will decrement the reader count, if necessary. */
+     queue_read_lock_slowpath(lock);
++    /*
++     * queue_read_lock_slowpath() is using spinlock and therefore is a
++     * full barrier. So no need for an arch_lock_acquire_barrier().
++     */
+ }
+ 
+ static inline void _read_lock_irq(rwlock_t *lock)
+@@ -92,6 +104,7 @@ static inline unsigned long _read_lock_irqsave(rwlock_t *lock)
+  */
+ static inline void _read_unlock(rwlock_t *lock)
+ {
++    arch_lock_release_barrier();
+     /*
+      * Atomically decrement the reader count
+      */
+@@ -121,11 +134,20 @@ static inline int _rw_is_locked(rwlock_t *lock)
+  */
+ static inline void _write_lock(rwlock_t *lock)
+ {
+-    /* Optimize for the unfair lock case where the fair flag is 0. */
++    /*
++     * Optimize for the unfair lock case where the fair flag is 0.
++     *
++     * atomic_cmpxchg() is a full barrier so no need for an
++     * arch_lock_acquire_barrier().
++     */
+     if ( atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0 )
+         return;
+ 
+     queue_write_lock_slowpath(lock);
++    /*
++     * queue_write_lock_slowpath() is using spinlock and therefore is a
++     * full barrier. So no need for an arch_lock_acquire_barrier().
++     */
+ }
+ 
+ static inline void _write_lock_irq(rwlock_t *lock)
+@@ -157,11 +179,16 @@ static inline int _write_trylock(rwlock_t *lock)
+     if ( unlikely(cnts) )
+         return 0;
+ 
++    /*
++     * atomic_cmpxchg() is a full barrier so no need for an
++     * arch_lock_acquire_barrier().
++     */
+     return likely(atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0);
+ }
+ 
+ static inline void _write_unlock(rwlock_t *lock)
+ {
++    arch_lock_release_barrier();
+     /*
+      * If the writer field is atomic, it can be cleared directly.
+      * Otherwise, an atomic subtraction will be used to clear it.
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa316-xen.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa316-xen.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa316-xen.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa316-xen.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,30 @@
+From: Ross Lagerwall <ross.lagerwall@citrix.com>
+Subject: xen/gnttab: Fix error path in map_grant_ref()
+
+Part of XSA-295 (c/s 863e74eb2cffb) inadvertently re-positioned the brackets,
+changing the logic.  If the _set_status() call fails, the grant_map hypercall
+would fail with a status of 1 (rc != GNTST_okay) instead of the expected
+negative GNTST_* error.
+
+This error path can be taken due to bad guest state, and causes net/blk-back
+in Linux to crash.
+
+This is XSA-316.
+
+Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
+index 9fd6e60416..4b5344dc21 100644
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -1031,7 +1031,7 @@ map_grant_ref(
+     {
+         if ( (rc = _set_status(shah, status, rd, rgt->gt_version, act,
+                                op->flags & GNTMAP_readonly, 1,
+-                               ld->domain_id) != GNTST_okay) )
++                               ld->domain_id)) != GNTST_okay )
+             goto act_release_out;
+ 
+         if ( !act->pin )
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa317.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa317.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa317.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa317.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,50 @@
+From aeb46e92f915f19a61d5a8a1f4b696793f64e6fb Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Thu, 19 Mar 2020 13:17:31 +0000
+Subject: [PATCH] xen/common: event_channel: Don't ignore error in
+ get_free_port()
+
+Currently, get_free_port() is assuming that the port has been allocated
+when evtchn_allocate_port() is not return -EBUSY.
+
+However, the function may return an error when:
+    - We exhausted all the event channels. This can happen if the limit
+    configured by the administrator for the guest ('max_event_channels'
+    in xl cfg) is higher than the ABI used by the guest. For instance,
+    if the guest is using 2L, the limit should not be higher than 4095.
+    - We cannot allocate memory (e.g Xen has not more memory).
+
+Users of get_free_port() (such as EVTCHNOP_alloc_unbound) will validly
+assuming the port was valid and will next call evtchn_from_port(). This
+will result to a crash as the memory backing the event channel structure
+is not present.
+
+Fixes: 368ae9a05fe ("xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU")
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/common/event_channel.c | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
+index e86e2bfab0..a8d182b584 100644
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -195,10 +195,10 @@ static int get_free_port(struct domain *d)
+     {
+         int rc = evtchn_allocate_port(d, port);
+ 
+-        if ( rc == -EBUSY )
+-            continue;
+-
+-        return port;
++        if ( rc == 0 )
++            return port;
++        else if ( rc != -EBUSY )
++            return rc;
+     }
+ 
+     return -ENOSPC;
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa318.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa318.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa318.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa318.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,39 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: gnttab: fix GNTTABOP_copy continuation handling
+
+The XSA-226 fix was flawed - the backwards transformation on rc was done
+too early, causing a continuation to not get invoked when the need for
+preemption was determined at the very first iteration of the request.
+This in particular means that all of the status fields of the individual
+operations would be left untouched, i.e. set to whatever the caller may
+or may not have initialized them to.
+
+This is part of XSA-318.
+
+Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
+Tested-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -3576,8 +3576,7 @@ do_grant_table_op(
+         rc = gnttab_copy(copy, count);
+         if ( rc > 0 )
+         {
+-            rc = count - rc;
+-            guest_handle_add_offset(copy, rc);
++            guest_handle_add_offset(copy, count - rc);
+             uop = guest_handle_cast(copy, void);
+         }
+         break;
+@@ -3644,6 +3643,9 @@ do_grant_table_op(
+   out:
+     if ( rc > 0 || opaque_out != 0 )
+     {
++        /* Adjust rc, see gnttab_copy() for why this is needed. */
++        if ( cmd == GNTTABOP_copy )
++            rc = count - rc;
+         ASSERT(rc < count);
+         ASSERT((opaque_out & GNTTABOP_CMD_MASK) == 0);
+         rc = hypercall_create_continuation(__HYPERVISOR_grant_table_op, "ihi",
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa319.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa319.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa319.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa319.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,27 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/shadow: correct an inverted conditional in dirty VRAM tracking
+
+This originally was "mfn_x(mfn) == INVALID_MFN". Make it like this
+again, taking the opportunity to also drop the unnecessary nearby
+braces.
+
+This is XSA-319.
+
+Fixes: 246a5a3377c2 ("xen: Use a typesafe to define INVALID_MFN")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -3252,10 +3252,8 @@ int shadow_track_dirty_vram(struct domai
+             int dirty = 0;
+             paddr_t sl1ma = dirty_vram->sl1ma[i];
+ 
+-            if ( !mfn_eq(mfn, INVALID_MFN) )
+-            {
++            if ( mfn_eq(mfn, INVALID_MFN) )
+                 dirty = 1;
+-            }
+             else
+             {
+                 page = mfn_to_page(mfn);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-1.patch	2022-06-16 10:08:34.000000000 +0100
@@ -0,0 +1,133 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/spec-ctrl: CPUID/MSR definitions for Special Register Buffer Data Sampling
+
+This is part of XSA-320 / CVE-2020-0543
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Wei Liu <wl@xen.org>
+
+diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
+index 194615bfc5..9be18ac99f 100644
+--- a/docs/misc/xen-command-line.markdown
++++ b/docs/misc/xen-command-line.markdown
+@@ -489,10 +489,10 @@ accounting for hardware capabilities as enumerated via CPUID.
+ 
+ Currently accepted:
+ 
+-The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
+-`l1d-flush` and `ssbd` are used by default if available and applicable.  They can
+-be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
+-won't offer them to guests.
++The Speculation Control hardware features `srbds-ctrl`, `md-clear`, `ibrsb`,
++`stibp`, `ibpb`, `l1d-flush` and `ssbd` are used by default if available and
++applicable.  They can be ignored, e.g. `no-ibrsb`, at which point Xen won't
++use them itself, and won't offer them to guests.
+ 
+ ### cpuid\_mask\_cpu (AMD only)
+ > `= fam_0f_rev_c | fam_0f_rev_d | fam_0f_rev_e | fam_0f_rev_f | fam_0f_rev_g | fam_10_rev_b | fam_10_rev_c | fam_11_rev_b`
+diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
+index 5a1702d703..1235c8b91e 100644
+--- a/tools/libxl/libxl_cpuid.c
++++ b/tools/libxl/libxl_cpuid.c
+@@ -202,6 +202,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
+ 
+         {"avx512-4vnniw",0x00000007,  0, CPUID_REG_EDX,  2,  1},
+         {"avx512-4fmaps",0x00000007,  0, CPUID_REG_EDX,  3,  1},
++        {"srbds-ctrl",   0x00000007,  0, CPUID_REG_EDX,  9,  1},
+         {"md-clear",     0x00000007,  0, CPUID_REG_EDX, 10,  1},
+         {"ibrsb",        0x00000007,  0, CPUID_REG_EDX, 26,  1},
+         {"stibp",        0x00000007,  0, CPUID_REG_EDX, 27,  1},
+diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
+index 4c9af6b7f0..8fb54c3001 100644
+--- a/tools/misc/xen-cpuid.c
++++ b/tools/misc/xen-cpuid.c
+@@ -143,6 +143,7 @@ static const char *str_7d0[32] =
+ {
+     [ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps",
+ 
++    /*  8 */                [ 9] = "srbds-ctrl",
+     [10] = "md-clear",
+     /* 12 */                [13] = "tsx-force-abort",
+ 
+diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
+index 04aefa555d..b8e5b6fe67 100644
+--- a/xen/arch/x86/cpuid.c
++++ b/xen/arch/x86/cpuid.c
+@@ -58,6 +58,11 @@ static int __init parse_xen_cpuid(const char *s)
+             if ( !val )
+                 setup_clear_cpu_cap(X86_FEATURE_SSBD);
+         }
++        else if ( (val = parse_boolean("srbds-ctrl", s, ss)) >= 0 )
++        {
++            if ( !val )
++                setup_clear_cpu_cap(X86_FEATURE_SRBDS_CTRL);
++        }
+         else
+             rc = -EINVAL;
+ 
+diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
+index ccb316c547..256e58d82b 100644
+--- a/xen/arch/x86/msr.c
++++ b/xen/arch/x86/msr.c
+@@ -154,6 +154,7 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
+         /* Write-only */
+     case MSR_TSX_FORCE_ABORT:
+     case MSR_TSX_CTRL:
++    case MSR_MCU_OPT_CTRL:
+         /* Not offered to guests. */
+         goto gp_fault;
+ 
+@@ -243,6 +244,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
+         /* Read-only */
+     case MSR_TSX_FORCE_ABORT:
+     case MSR_TSX_CTRL:
++    case MSR_MCU_OPT_CTRL:
+         /* Not offered to guests. */
+         goto gp_fault;
+ 
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index ab196b156d..94ab8dd786 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -365,12 +365,13 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+     printk("Speculative mitigation facilities:\n");
+ 
+     /* Hardware features which pertain to speculative mitigations. */
+-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
++    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
+            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
+            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
+            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
+            (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
++           (_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL)) ? " SRBDS_CTRL" : "",
+            (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
+            (caps & ARCH_CAPS_IBRS_ALL)              ? " IBRS_ALL"  : "",
+            (caps & ARCH_CAPS_RDCL_NO)               ? " RDCL_NO"   : "",
+diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
+index 1761a01f1f..480d1d8102 100644
+--- a/xen/include/asm-x86/msr-index.h
++++ b/xen/include/asm-x86/msr-index.h
+@@ -177,6 +177,9 @@
+ #define MSR_IA32_VMX_TRUE_ENTRY_CTLS            0x490
+ #define MSR_IA32_VMX_VMFUNC                     0x491
+ 
++#define MSR_MCU_OPT_CTRL                    0x00000123
++#define  MCU_OPT_CTRL_RNGDS_MITG_DIS        (_AC(1, ULL) <<  0)
++
+ /* K7/K8 MSRs. Not complete. See the architecture manual for a more
+    complete list. */
+ #define MSR_K7_EVNTSEL0			0xc0010000
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index a14d8a7013..9d210e74a0 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -242,6 +242,7 @@ XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by
+ /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
+ XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
+ XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
++XEN_CPUFEATURE(SRBDS_CTRL,    9*32+ 9) /*   MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS. */
+ XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
+ XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
+ XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-2.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,179 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/spec-ctrl: Mitigate the Special Register Buffer Data Sampling sidechannel
+
+See patch documentation and comments.
+
+This is part of XSA-320 / CVE-2020-0543
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
+index 9be18ac99f..3356e59fee 100644
+--- a/docs/misc/xen-command-line.markdown
++++ b/docs/misc/xen-command-line.markdown
+@@ -1858,7 +1858,7 @@ false disable the quirk workaround, which is also the default.
+ ### spec-ctrl (x86)
+ > `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
+ >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
+->              l1d-flush}=<bool> ]`
++>              l1d-flush,srb-lock}=<bool> ]`
+ 
+ Controls for speculative execution sidechannel mitigations.  By default, Xen
+ will pick the most appropriate mitigations based on compiled in support,
+@@ -1930,6 +1930,12 @@ Irrespective of Xen's setting, the feature is virtualised for HVM guests to
+ use.  By default, Xen will enable this mitigation on hardware believed to be
+ vulnerable to L1TF.
+ 
++On hardware supporting SRBDS_CTRL, the `srb-lock=` option can be used to force
++or prevent Xen from protect the Special Register Buffer from leaking stale
++data. By default, Xen will enable this mitigation, except on parts where MDS
++is fixed and TAA is fixed/mitigated (in which case, there is believed to be no
++way for an attacker to obtain the stale data).
++
+ ### sync\_console
+ > `= <boolean>`
+ 
+diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
+index 4c12794809..30e1bd5cd3 100644
+--- a/xen/arch/x86/acpi/power.c
++++ b/xen/arch/x86/acpi/power.c
+@@ -266,6 +266,9 @@ static int enter_state(u32 state)
+     ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
+     spec_ctrl_exit_idle(ci);
+ 
++    if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
++        wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl);
++
+  done:
+     spin_debug_enable();
+     local_irq_restore(flags);
+diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
+index 0887806e85..d24d215946 100644
+--- a/xen/arch/x86/smpboot.c
++++ b/xen/arch/x86/smpboot.c
+@@ -369,12 +369,14 @@ void start_secondary(void *unused)
+         microcode_resume_cpu(cpu);
+ 
+     /*
+-     * If MSR_SPEC_CTRL is available, apply Xen's default setting and discard
+-     * any firmware settings.  Note: MSR_SPEC_CTRL may only become available
+-     * after loading microcode.
++     * If any speculative control MSRs are available, apply Xen's default
++     * settings.  Note: These MSRs may only become available after loading
++     * microcode.
+      */
+     if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+         wrmsrl(MSR_SPEC_CTRL, default_xen_spec_ctrl);
++    if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
++        wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl);
+ 
+     tsx_init(); /* Needs microcode.  May change HLE/RTM feature bits. */
+ 
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 94ab8dd786..a306d10c34 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -63,6 +63,9 @@ static unsigned int __initdata l1d_maxphysaddr;
+ static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+ static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
+ 
++static int8_t __initdata opt_srb_lock = -1;
++uint64_t __read_mostly default_xen_mcu_opt_ctrl;
++
+ static int __init parse_bti(const char *s)
+ {
+     const char *ss;
+@@ -166,6 +169,7 @@ static int __init parse_spec_ctrl(const char *s)
+             opt_ibpb = false;
+             opt_ssbd = false;
+             opt_l1d_flush = 0;
++            opt_srb_lock = 0;
+         }
+         else if ( val > 0 )
+             rc = -EINVAL;
+@@ -231,6 +235,8 @@ static int __init parse_spec_ctrl(const char *s)
+             opt_eager_fpu = val;
+         else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
+             opt_l1d_flush = val;
++        else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
++            opt_srb_lock = val;
+         else
+             rc = -EINVAL;
+ 
+@@ -394,7 +400,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+                "\n");
+ 
+     /* Settings for Xen's protection, irrespective of guests. */
+-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s, Other:%s%s%s\n",
++    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s, Other:%s%s%s%s\n",
+            thunk == THUNK_NONE      ? "N/A" :
+            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
+            thunk == THUNK_LFENCE    ? "LFENCE" :
+@@ -405,6 +411,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
+            !(caps & ARCH_CAPS_TSX_CTRL)              ? "" :
+            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
++           !boot_cpu_has(X86_FEATURE_SRBDS_CTRL)     ? "" :
++           opt_srb_lock                              ? " SRB_LOCK+" : " SRB_LOCK-",
+            opt_ibpb                                  ? " IBPB"  : "",
+            opt_l1d_flush                             ? " L1D_FLUSH" : "",
+            opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "");
+@@ -1196,6 +1204,34 @@ void __init init_speculation_mitigations(void)
+         tsx_init();
+     }
+ 
++    /* Calculate suitable defaults for MSR_MCU_OPT_CTRL */
++    if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
++    {
++        uint64_t val;
++
++        rdmsrl(MSR_MCU_OPT_CTRL, val);
++
++        /*
++         * On some SRBDS-affected hardware, it may be safe to relax srb-lock
++         * by default.
++         *
++         * On parts which enumerate MDS_NO and not TAA_NO, TSX is the only way
++         * to access the Fill Buffer.  If TSX isn't available (inc. SKU
++         * reasons on some models), or TSX is explicitly disabled, then there
++         * is no need for the extra overhead to protect RDRAND/RDSEED.
++         */
++        if ( opt_srb_lock == -1 &&
++             (caps & (ARCH_CAPS_MDS_NO|ARCH_CAPS_TAA_NO)) == ARCH_CAPS_MDS_NO &&
++             (!cpu_has_hle || ((caps & ARCH_CAPS_TSX_CTRL) && opt_tsx == 0)) )
++            opt_srb_lock = 0;
++
++        val &= ~MCU_OPT_CTRL_RNGDS_MITG_DIS;
++        if ( !opt_srb_lock )
++            val |= MCU_OPT_CTRL_RNGDS_MITG_DIS;
++
++        default_xen_mcu_opt_ctrl = val;
++    }
++
+     print_details(thunk, caps);
+ 
+     /*
+@@ -1227,6 +1263,9 @@ void __init init_speculation_mitigations(void)
+ 
+         wrmsrl(MSR_SPEC_CTRL, bsp_delay_spec_ctrl ? 0 : default_xen_spec_ctrl);
+     }
++
++    if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
++        wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl);
+ }
+ 
+ static void __init __maybe_unused build_assertions(void)
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index 333d180b7e..bf10d2ce5c 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -46,6 +46,8 @@ extern int8_t opt_pv_l1tf_hwdom, opt_pv_l1tf_domu;
+  */
+ extern paddr_t l1tf_addr_mask, l1tf_safe_maddr;
+ 
++extern uint64_t default_xen_mcu_opt_ctrl;
++
+ static inline void init_shadow_spec_ctrl_state(void)
+ {
+     struct cpu_info *info = get_cpu_info();
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-3.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-3.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-3.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa320-4.11-3.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,57 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/spec-ctrl: Allow the RDRAND/RDSEED features to be hidden
+
+RDRAND/RDSEED can be hidden using cpuid= to mitigate SRBDS if microcode
+isn't available.
+
+This is part of XSA-320 / CVE-2020-0543.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
+index 3356e59fee..ac397e7de0 100644
+--- a/docs/misc/xen-command-line.markdown
++++ b/docs/misc/xen-command-line.markdown
+@@ -487,12 +487,18 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
+ This option allows for fine tuning of the facilities Xen will use, after
+ accounting for hardware capabilities as enumerated via CPUID.
+ 
++Unless otherwise noted, options only have any effect in their negative form,
++to hide the named feature(s).  Ignoring a feature using this mechanism will
++cause Xen not to use the feature, nor offer them as usable to guests.
++
+ Currently accepted:
+ 
+ The Speculation Control hardware features `srbds-ctrl`, `md-clear`, `ibrsb`,
+ `stibp`, `ibpb`, `l1d-flush` and `ssbd` are used by default if available and
+-applicable.  They can be ignored, e.g. `no-ibrsb`, at which point Xen won't
+-use them itself, and won't offer them to guests.
++applicable.  They can all be ignored.
++
++`rdrand` and `rdseed` can be ignored, as a mitigation to XSA-320 /
++CVE-2020-0543.
+ 
+ ### cpuid\_mask\_cpu (AMD only)
+ > `= fam_0f_rev_c | fam_0f_rev_d | fam_0f_rev_e | fam_0f_rev_f | fam_0f_rev_g | fam_10_rev_b | fam_10_rev_c | fam_11_rev_b`
+diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
+index b8e5b6fe67..78d08dbb32 100644
+--- a/xen/arch/x86/cpuid.c
++++ b/xen/arch/x86/cpuid.c
+@@ -63,6 +63,16 @@ static int __init parse_xen_cpuid(const char *s)
+             if ( !val )
+                 setup_clear_cpu_cap(X86_FEATURE_SRBDS_CTRL);
+         }
++        else if ( (val = parse_boolean("rdrand", s, ss)) >= 0 )
++        {
++            if ( !val )
++                setup_clear_cpu_cap(X86_FEATURE_RDRAND);
++        }
++        else if ( (val = parse_boolean("rdseed", s, ss)) >= 0 )
++        {
++            if ( !val )
++                setup_clear_cpu_cap(X86_FEATURE_RDSEED);
++        }
+         else
+             rc = -EINVAL;
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-1.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,31 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: vtd: improve IOMMU TLB flush
+
+Do not limit PSI flushes to order 0 pages, in order to avoid doing a
+full TLB flush if the passed in page has an order greater than 0 and
+is aligned. Should increase the performance of IOMMU TLB flushes when
+dealing with page orders greater than 0.
+
+This is part of XSA-321.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -612,13 +612,14 @@ static int __must_check iommu_flush_iotl
+         if ( iommu_domid == -1 )
+             continue;
+ 
+-        if ( page_count != 1 || gfn == gfn_x(INVALID_GFN) )
++        if ( !page_count || (page_count & (page_count - 1)) ||
++             gfn == gfn_x(INVALID_GFN) || !IS_ALIGNED(gfn, page_count) )
+             rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
+                                        0, flush_dev_iotlb);
+         else
+             rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
+                                        (paddr_t)gfn << PAGE_SHIFT_4K,
+-                                       PAGE_ORDER_4K,
++                                       get_order_from_pages(page_count),
+                                        !dma_old_pte_present,
+                                        flush_dev_iotlb);
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-2.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,175 @@
+From: <security@xenproject.org>
+Subject: vtd: prune (and rename) cache flush functions
+
+Rename __iommu_flush_cache to iommu_sync_cache and remove
+iommu_flush_cache_page. Also remove the iommu_flush_cache_entry
+wrapper and just use iommu_sync_cache instead. Note the _entry suffix
+was meaningless as the wrapper was already taking a size parameter in
+bytes. While there also constify the addr parameter.
+
+No functional change intended.
+
+This is part of XSA-321.
+
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/drivers/passthrough/vtd/extern.h
++++ b/xen/drivers/passthrough/vtd/extern.h
+@@ -37,8 +37,7 @@ void disable_qinval(struct iommu *iommu)
+ int enable_intremap(struct iommu *iommu, int eim);
+ void disable_intremap(struct iommu *iommu);
+ 
+-void iommu_flush_cache_entry(void *addr, unsigned int size);
+-void iommu_flush_cache_page(void *addr, unsigned long npages);
++void iommu_sync_cache(const void *addr, unsigned int size);
+ int iommu_alloc(struct acpi_drhd_unit *drhd);
+ void iommu_free(struct acpi_drhd_unit *drhd);
+ 
+--- a/xen/drivers/passthrough/vtd/intremap.c
++++ b/xen/drivers/passthrough/vtd/intremap.c
+@@ -231,7 +231,7 @@ static void free_remap_entry(struct iomm
+                      iremap_entries, iremap_entry);
+ 
+     update_irte(iommu, iremap_entry, &new_ire, false);
+-    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
++    iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
+     iommu_flush_iec_index(iommu, 0, index);
+ 
+     unmap_vtd_domain_page(iremap_entries);
+@@ -403,7 +403,7 @@ static int ioapic_rte_to_remap_entry(str
+     }
+ 
+     update_irte(iommu, iremap_entry, &new_ire, !init);
+-    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
++    iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
+     iommu_flush_iec_index(iommu, 0, index);
+ 
+     unmap_vtd_domain_page(iremap_entries);
+@@ -694,7 +694,7 @@ static int msi_msg_to_remap_entry(
+     update_irte(iommu, iremap_entry, &new_ire, msi_desc->irte_initialized);
+     msi_desc->irte_initialized = true;
+ 
+-    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
++    iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
+     iommu_flush_iec_index(iommu, 0, index);
+ 
+     unmap_vtd_domain_page(iremap_entries);
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -158,7 +158,8 @@ static void __init free_intel_iommu(stru
+ }
+ 
+ static int iommus_incoherent;
+-static void __iommu_flush_cache(void *addr, unsigned int size)
++
++void iommu_sync_cache(const void *addr, unsigned int size)
+ {
+     int i;
+     static unsigned int clflush_size = 0;
+@@ -173,16 +174,6 @@ static void __iommu_flush_cache(void *ad
+         cacheline_flush((char *)addr + i);
+ }
+ 
+-void iommu_flush_cache_entry(void *addr, unsigned int size)
+-{
+-    __iommu_flush_cache(addr, size);
+-}
+-
+-void iommu_flush_cache_page(void *addr, unsigned long npages)
+-{
+-    __iommu_flush_cache(addr, PAGE_SIZE * npages);
+-}
+-
+ /* Allocate page table, return its machine address */
+ u64 alloc_pgtable_maddr(struct acpi_drhd_unit *drhd, unsigned long npages)
+ {
+@@ -207,7 +198,7 @@ u64 alloc_pgtable_maddr(struct acpi_drhd
+         vaddr = __map_domain_page(cur_pg);
+         memset(vaddr, 0, PAGE_SIZE);
+ 
+-        iommu_flush_cache_page(vaddr, 1);
++        iommu_sync_cache(vaddr, PAGE_SIZE);
+         unmap_domain_page(vaddr);
+         cur_pg++;
+     }
+@@ -242,7 +233,7 @@ static u64 bus_to_context_maddr(struct i
+         }
+         set_root_value(*root, maddr);
+         set_root_present(*root);
+-        iommu_flush_cache_entry(root, sizeof(struct root_entry));
++        iommu_sync_cache(root, sizeof(struct root_entry));
+     }
+     maddr = (u64) get_context_addr(*root);
+     unmap_vtd_domain_page(root_entries);
+@@ -300,7 +291,7 @@ static u64 addr_to_dma_page_maddr(struct
+              */
+             dma_set_pte_readable(*pte);
+             dma_set_pte_writable(*pte);
+-            iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
++            iommu_sync_cache(pte, sizeof(struct dma_pte));
+         }
+ 
+         if ( level == 2 )
+@@ -674,7 +665,7 @@ static int __must_check dma_pte_clear_on
+ 
+     dma_clear_pte(*pte);
+     spin_unlock(&hd->arch.mapping_lock);
+-    iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
++    iommu_sync_cache(pte, sizeof(struct dma_pte));
+ 
+     if ( !this_cpu(iommu_dont_flush_iotlb) )
+         rc = iommu_flush_iotlb_pages(domain, addr >> PAGE_SHIFT_4K, 1);
+@@ -716,7 +707,7 @@ static void iommu_free_page_table(struct
+             iommu_free_pagetable(dma_pte_addr(*pte), next_level);
+ 
+         dma_clear_pte(*pte);
+-        iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
++        iommu_sync_cache(pte, sizeof(struct dma_pte));
+     }
+ 
+     unmap_vtd_domain_page(pt_vaddr);
+@@ -1449,7 +1440,7 @@ int domain_context_mapping_one(
+     context_set_address_width(*context, agaw);
+     context_set_fault_enable(*context);
+     context_set_present(*context);
+-    iommu_flush_cache_entry(context, sizeof(struct context_entry));
++    iommu_sync_cache(context, sizeof(struct context_entry));
+     spin_unlock(&iommu->lock);
+ 
+     /* Context entry was previously non-present (with domid 0). */
+@@ -1602,7 +1593,7 @@ int domain_context_unmap_one(
+ 
+     context_clear_present(*context);
+     context_clear_entry(*context);
+-    iommu_flush_cache_entry(context, sizeof(struct context_entry));
++    iommu_sync_cache(context, sizeof(struct context_entry));
+ 
+     iommu_domid= domain_iommu_domid(domain, iommu);
+     if ( iommu_domid == -1 )
+@@ -1828,7 +1819,7 @@ static int __must_check intel_iommu_map_
+ 
+     *pte = new;
+ 
+-    iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
++    iommu_sync_cache(pte, sizeof(struct dma_pte));
+     spin_unlock(&hd->arch.mapping_lock);
+     unmap_vtd_domain_page(page);
+ 
+@@ -1862,7 +1853,7 @@ int iommu_pte_flush(struct domain *d, u6
+     int iommu_domid;
+     int rc = 0;
+ 
+-    iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
++    iommu_sync_cache(pte, sizeof(struct dma_pte));
+ 
+     for_each_drhd_unit ( drhd )
+     {
+@@ -2725,7 +2716,7 @@ static int __init intel_iommu_quarantine
+             dma_set_pte_addr(*pte, maddr);
+             dma_set_pte_readable(*pte);
+         }
+-        iommu_flush_cache_page(parent, 1);
++        iommu_sync_cache(parent, PAGE_SIZE);
+ 
+         unmap_vtd_domain_page(parent);
+         parent = map_vtd_domain_page(maddr);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-3.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-3.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-3.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-3.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,82 @@
+From: <security@xenproject.org>
+Subject: x86/iommu: introduce a cache sync hook
+
+The hook is only implemented for VT-d and it uses the already existing
+iommu_sync_cache function present in VT-d code. The new hook is
+added so that the cache can be flushed by code outside of VT-d when
+using shared page tables.
+
+Note that alloc_pgtable_maddr must use the now locally defined
+sync_cache function, because IOMMU ops are not yet setup the first
+time the function gets called during IOMMU initialization.
+
+No functional change intended.
+
+This is part of XSA-321.
+
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/drivers/passthrough/vtd/extern.h
++++ b/xen/drivers/passthrough/vtd/extern.h
+@@ -37,7 +37,6 @@ void disable_qinval(struct iommu *iommu)
+ int enable_intremap(struct iommu *iommu, int eim);
+ void disable_intremap(struct iommu *iommu);
+ 
+-void iommu_sync_cache(const void *addr, unsigned int size);
+ int iommu_alloc(struct acpi_drhd_unit *drhd);
+ void iommu_free(struct acpi_drhd_unit *drhd);
+ 
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -159,7 +159,7 @@ static void __init free_intel_iommu(stru
+ 
+ static int iommus_incoherent;
+ 
+-void iommu_sync_cache(const void *addr, unsigned int size)
++static void sync_cache(const void *addr, unsigned int size)
+ {
+     int i;
+     static unsigned int clflush_size = 0;
+@@ -198,7 +198,7 @@ u64 alloc_pgtable_maddr(struct acpi_drhd
+         vaddr = __map_domain_page(cur_pg);
+         memset(vaddr, 0, PAGE_SIZE);
+ 
+-        iommu_sync_cache(vaddr, PAGE_SIZE);
++        sync_cache(vaddr, PAGE_SIZE);
+         unmap_domain_page(vaddr);
+         cur_pg++;
+     }
+@@ -2760,6 +2760,7 @@ const struct iommu_ops intel_iommu_ops =
+     .iotlb_flush_all = iommu_flush_iotlb_all,
+     .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
+     .dump_p2m_table = vtd_dump_p2m_table,
++    .sync_cache = sync_cache,
+ };
+ 
+ /*
+--- a/xen/include/asm-x86/iommu.h
++++ b/xen/include/asm-x86/iommu.h
+@@ -98,6 +98,13 @@ extern bool untrusted_msi;
+ int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
+                    const uint8_t gvec);
+ 
++#define iommu_sync_cache(addr, size) ({                 \
++    const struct iommu_ops *ops = iommu_get_ops();      \
++                                                        \
++    if ( ops->sync_cache )                              \
++        ops->sync_cache(addr, size);                    \
++})
++
+ #endif /* !__ARCH_X86_IOMMU_H__ */
+ /*
+  * Local variables:
+--- a/xen/include/xen/iommu.h
++++ b/xen/include/xen/iommu.h
+@@ -161,6 +161,7 @@ struct iommu_ops {
+     void (*update_ire_from_apic)(unsigned int apic, unsigned int reg, unsigned int value);
+     unsigned int (*read_apic_from_ire)(unsigned int apic, unsigned int reg);
+     int (*setup_hpet_msi)(struct msi_desc *);
++    void (*sync_cache)(const void *addr, unsigned int size);
+ #endif /* CONFIG_X86 */
+     int __must_check (*suspend)(void);
+     void (*resume)(void);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-4.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-4.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-4.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-4.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,36 @@
+From: <security@xenproject.org>
+Subject: vtd: don't assume addresses are aligned in sync_cache
+
+Current code in sync_cache assume that the address passed in is
+aligned to a cache line size. Fix the code to support passing in
+arbitrary addresses not necessarily aligned to a cache line size.
+
+This is part of XSA-321.
+
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -161,8 +161,8 @@ static int iommus_incoherent;
+ 
+ static void sync_cache(const void *addr, unsigned int size)
+ {
+-    int i;
+-    static unsigned int clflush_size = 0;
++    static unsigned long clflush_size = 0;
++    const void *end = addr + size;
+ 
+     if ( !iommus_incoherent )
+         return;
+@@ -170,8 +170,9 @@ static void sync_cache(const void *addr,
+     if ( clflush_size == 0 )
+         clflush_size = get_cache_line_size();
+ 
+-    for ( i = 0; i < size; i += clflush_size )
+-        cacheline_flush((char *)addr + i);
++    addr -= (unsigned long)addr & (clflush_size - 1);
++    for ( ; addr < end; addr += clflush_size )
++        cacheline_flush((char *)addr);
+ }
+ 
+ /* Allocate page table, return its machine address */
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-5.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-5.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-5.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-5.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,24 @@
+From: <security@xenproject.org>
+Subject: x86/alternative: introduce alternative_2
+
+It's based on alternative_io_2 without inputs or outputs but with an
+added memory clobber.
+
+This is part of XSA-321.
+
+Acked-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/include/asm-x86/alternative.h
++++ b/xen/include/asm-x86/alternative.h
+@@ -113,6 +113,11 @@ extern void alternative_instructions(voi
+ #define alternative(oldinstr, newinstr, feature)                        \
+         asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
+ 
++#define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
++	asm volatile (ALTERNATIVE_2(oldinstr, newinstr1, feature1,	\
++				    newinstr2, feature2)		\
++		      : : : "memory")
++
+ /*
+  * Alternative inline assembly with input.
+  *
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-6.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-6.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-6.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-6.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,91 @@
+From: <security@xenproject.org>
+Subject: vtd: optimize CPU cache sync
+
+Some VT-d IOMMUs are non-coherent, which requires a cache write back
+in order for the changes made by the CPU to be visible to the IOMMU.
+This cache write back was unconditionally done using clflush, but there are
+other more efficient instructions to do so, hence implement support
+for them using the alternative framework.
+
+This is part of XSA-321.
+
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/drivers/passthrough/vtd/extern.h
++++ b/xen/drivers/passthrough/vtd/extern.h
+@@ -63,7 +63,6 @@ int __must_check qinval_device_iotlb_syn
+                                           u16 did, u16 size, u64 addr);
+ 
+ unsigned int get_cache_line_size(void);
+-void cacheline_flush(char *);
+ void flush_all_cache(void);
+ 
+ u64 alloc_pgtable_maddr(struct acpi_drhd_unit *drhd, unsigned long npages);
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -31,6 +31,7 @@
+ #include <xen/pci_regs.h>
+ #include <xen/keyhandler.h>
+ #include <asm/msi.h>
++#include <asm/nops.h>
+ #include <asm/irq.h>
+ #include <asm/hvm/vmx/vmx.h>
+ #include <asm/p2m.h>
+@@ -172,7 +173,42 @@ static void sync_cache(const void *addr,
+ 
+     addr -= (unsigned long)addr & (clflush_size - 1);
+     for ( ; addr < end; addr += clflush_size )
+-        cacheline_flush((char *)addr);
++/*
++ * The arguments to a macro must not include preprocessor directives. Doing so
++ * results in undefined behavior, so we have to create some defines here in
++ * order to avoid it.
++ */
++#if defined(HAVE_AS_CLWB)
++# define CLWB_ENCODING "clwb %[p]"
++#elif defined(HAVE_AS_XSAVEOPT)
++# define CLWB_ENCODING "data16 xsaveopt %[p]" /* clwb */
++#else
++# define CLWB_ENCODING ".byte 0x66, 0x0f, 0xae, 0x30" /* clwb (%%rax) */
++#endif
++
++#define BASE_INPUT(addr) [p] "m" (*(const char *)(addr))
++#if defined(HAVE_AS_CLWB) || defined(HAVE_AS_XSAVEOPT)
++# define INPUT BASE_INPUT
++#else
++# define INPUT(addr) "a" (addr), BASE_INPUT(addr)
++#endif
++        /*
++         * Note regarding the use of NOP_DS_PREFIX: it's faster to do a clflush
++         * + prefix than a clflush + nop, and hence the prefix is added instead
++         * of letting the alternative framework fill the gap by appending nops.
++         */
++        alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %[p]",
++                         "data16 clflush %[p]", /* clflushopt */
++                         X86_FEATURE_CLFLUSHOPT,
++                         CLWB_ENCODING,
++                         X86_FEATURE_CLWB, /* no outputs */,
++                         INPUT(addr));
++#undef INPUT
++#undef BASE_INPUT
++#undef CLWB_ENCODING
++
++    alternative_2("", "sfence", X86_FEATURE_CLFLUSHOPT,
++                      "sfence", X86_FEATURE_CLWB);
+ }
+ 
+ /* Allocate page table, return its machine address */
+--- a/xen/drivers/passthrough/vtd/x86/vtd.c
++++ b/xen/drivers/passthrough/vtd/x86/vtd.c
+@@ -53,11 +53,6 @@ unsigned int get_cache_line_size(void)
+     return ((cpuid_ebx(1) >> 8) & 0xff) * 8;
+ }
+ 
+-void cacheline_flush(char * addr)
+-{
+-    clflush(addr);
+-}
+-
+ void flush_all_cache()
+ {
+     wbinvd();
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-7.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-7.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-7.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa321-4.11-7.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,164 @@
+From: <security@xenproject.org>
+Subject: x86/ept: flush cache when modifying PTEs and sharing page tables
+
+Modifications made to the page tables by EPT code need to be written
+to memory when the page tables are shared with the IOMMU, as Intel
+IOMMUs can be non-coherent and thus require changes to be written to
+memory in order to be visible to the IOMMU.
+
+In order to achieve this make sure data is written back to memory
+after writing an EPT entry when the recalc bit is not set in
+atomic_write_ept_entry. If such bit is set, the entry will be
+adjusted and atomic_write_ept_entry will be called a second time
+without the recalc bit set. Note that when splitting a super page the
+new tables resulting of the split should also be written back.
+
+Failure to do so can allow devices behind the IOMMU access to the
+stale super page, or cause coherency issues as changes made by the
+processor to the page tables are not visible to the IOMMU.
+
+This allows to remove the VT-d specific iommu_pte_flush helper, since
+the cache write back is now performed by atomic_write_ept_entry, and
+hence iommu_iotlb_flush can be used to flush the IOMMU TLB. The newly
+used method (iommu_iotlb_flush) can result in less flushes, since it
+might sometimes be called rightly with 0 flags, in which case it
+becomes a no-op.
+
+This is part of XSA-321.
+
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/arch/x86/mm/p2m-ept.c
++++ b/xen/arch/x86/mm/p2m-ept.c
+@@ -90,6 +90,19 @@ static int atomic_write_ept_entry(ept_en
+ 
+     write_atomic(&entryptr->epte, new.epte);
+ 
++    /*
++     * The recalc field on the EPT is used to signal either that a
++     * recalculation of the EMT field is required (which doesn't effect the
++     * IOMMU), or a type change. Type changes can only be between ram_rw,
++     * logdirty and ioreq_server: changes to/from logdirty won't work well with
++     * an IOMMU anyway, as IOMMU #PFs are not synchronous and will lead to
++     * aborts, and changes to/from ioreq_server are already fully flushed
++     * before returning to guest context (see
++     * XEN_DMOP_map_mem_type_to_ioreq_server).
++     */
++    if ( !new.recalc && iommu_hap_pt_share )
++        iommu_sync_cache(entryptr, sizeof(*entryptr));
++
+     if ( unlikely(oldmfn != mfn_x(INVALID_MFN)) )
+         put_page(mfn_to_page(_mfn(oldmfn)));
+ 
+@@ -319,6 +332,9 @@ static bool_t ept_split_super_page(struc
+             break;
+     }
+ 
++    if ( iommu_hap_pt_share )
++        iommu_sync_cache(table, EPT_PAGETABLE_ENTRIES * sizeof(ept_entry_t));
++
+     unmap_domain_page(table);
+ 
+     /* Even failed we should install the newly allocated ept page. */
+@@ -378,6 +394,9 @@ static int ept_next_level(struct p2m_dom
+         if ( !next )
+             return GUEST_TABLE_MAP_FAILED;
+ 
++        if ( iommu_hap_pt_share )
++            iommu_sync_cache(next, EPT_PAGETABLE_ENTRIES * sizeof(ept_entry_t));
++
+         rc = atomic_write_ept_entry(ept_entry, e, next_level);
+         ASSERT(rc == 0);
+     }
+@@ -875,7 +894,7 @@ out:
+          need_modify_vtd_table )
+     {
+         if ( iommu_hap_pt_share )
+-            rc = iommu_pte_flush(d, gfn, &ept_entry->epte, order, vtd_pte_present);
++            rc = iommu_flush_iotlb(d, gfn, vtd_pte_present, 1u << order);
+         else
+         {
+             if ( iommu_flags )
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -612,10 +612,8 @@ static int __must_check iommu_flush_all(
+     return rc;
+ }
+ 
+-static int __must_check iommu_flush_iotlb(struct domain *d,
+-                                          unsigned long gfn,
+-                                          bool_t dma_old_pte_present,
+-                                          unsigned int page_count)
++int iommu_flush_iotlb(struct domain *d, unsigned long gfn,
++                      bool dma_old_pte_present, unsigned int page_count)
+ {
+     struct domain_iommu *hd = dom_iommu(d);
+     struct acpi_drhd_unit *drhd;
+@@ -1880,53 +1878,6 @@ static int __must_check intel_iommu_unma
+     return dma_pte_clear_one(d, (paddr_t)gfn << PAGE_SHIFT_4K);
+ }
+ 
+-int iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
+-                    int order, int present)
+-{
+-    struct acpi_drhd_unit *drhd;
+-    struct iommu *iommu = NULL;
+-    struct domain_iommu *hd = dom_iommu(d);
+-    bool_t flush_dev_iotlb;
+-    int iommu_domid;
+-    int rc = 0;
+-
+-    iommu_sync_cache(pte, sizeof(struct dma_pte));
+-
+-    for_each_drhd_unit ( drhd )
+-    {
+-        iommu = drhd->iommu;
+-        if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
+-            continue;
+-
+-        flush_dev_iotlb = !!find_ats_dev_drhd(iommu);
+-        iommu_domid= domain_iommu_domid(d, iommu);
+-        if ( iommu_domid == -1 )
+-            continue;
+-
+-        rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
+-                                   (paddr_t)gfn << PAGE_SHIFT_4K,
+-                                   order, !present, flush_dev_iotlb);
+-        if ( rc > 0 )
+-        {
+-            iommu_flush_write_buffer(iommu);
+-            rc = 0;
+-        }
+-    }
+-
+-    if ( unlikely(rc) )
+-    {
+-        if ( !d->is_shutting_down && printk_ratelimit() )
+-            printk(XENLOG_ERR VTDPREFIX
+-                   " d%d: IOMMU pages flush failed: %d\n",
+-                   d->domain_id, rc);
+-
+-        if ( !is_hardware_domain(d) )
+-            domain_crash(d);
+-    }
+-
+-    return rc;
+-}
+-
+ static int __init vtd_ept_page_compatible(struct iommu *iommu)
+ {
+     u64 ept_cap, vtd_cap = iommu->cap;
+--- a/xen/include/asm-x86/iommu.h
++++ b/xen/include/asm-x86/iommu.h
+@@ -87,8 +87,9 @@ int iommu_setup_hpet_msi(struct msi_desc
+ 
+ /* While VT-d specific, this must get declared in a generic header. */
+ int adjust_vtd_irq_affinities(void);
+-int __must_check iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
+-                                 int order, int present);
++int __must_check iommu_flush_iotlb(struct domain *d, unsigned long gfn,
++                                   bool dma_old_pte_present,
++                                   unsigned int page_count);
+ bool_t iommu_supports_eim(void);
+ int iommu_enable_x2apic_IR(void);
+ void iommu_disable_x2apic_IR(void);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa322-4.11-o.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa322-4.11-o.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa322-4.11-o.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa322-4.11-o.patch	2022-04-05 13:04:21.000000000 +0100
@@ -0,0 +1,110 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: clean up permissions for dead domains
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+domain ids are prone to wrapping (15-bits), and with sufficient number
+of VMs in a reboot loop it is possible to trigger it.  Xenstore entries
+may linger after a domain dies, until a toolstack cleans it up. During
+this time there is a window where a wrapped domid could access these
+xenstore keys (that belonged to another VM).
+
+To prevent this do a cleanup when a domain dies:
+ * walk the entire xenstore tree and update permissions for all nodes
+   * if the dead domain had an ACL entry: remove it
+   * if the dead domain was the owner: change the owner to Dom0
+
+This is done without quota checks or a transaction. Quota checks would
+be a no-op (either the domain is dead, or it is Dom0 where they are not
+enforced).  Transactions are not needed, because this is all done
+atomically by oxenstored's single thread.
+
+The xenstore entries owned by the dead domain are not deleted, because
+that could confuse a toolstack / backends that are still bound to it
+(or generate unexpected watch events). It is the responsibility of a
+toolstack to remove the xenstore entries themselves.
+
+This is part of XSA-322.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
+index ee7fee6bda..e8a16221f8 100644
+--- a/tools/ocaml/xenstored/perms.ml
++++ b/tools/ocaml/xenstored/perms.ml
+@@ -58,6 +58,15 @@ let get_other perms = perms.other
+ let get_acl perms = perms.acl
+ let get_owner perm = perm.owner
+ 
++(** [remote_domid ~domid perm] removes all ACLs for [domid] from perm.
++* If [domid] was the owner then it is changed to Dom0.
++* This is used for cleaning up after dead domains.
++* *)
++let remove_domid ~domid perm =
++	let acl = List.filter (fun (acl_domid, _) -> acl_domid <> domid) perm.acl in
++	let owner = if perm.owner = domid then 0 else perm.owner in
++	{ perm with acl; owner }
++
+ let default0 = create 0 NONE []
+ 
+ let perm_of_string s =
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index 3cd0097db9..6a998f8764 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -437,6 +437,7 @@ let do_release con t domains cons data =
+ 	let fire_spec_watches = Domains.exist domains domid in
+ 	Domains.del domains domid;
+ 	Connections.del_domain cons domid;
++	Store.reset_permissions (Transaction.get_store t) domid;
+ 	if fire_spec_watches 
+ 	then Connections.fire_spec_watches (Transaction.get_root t) cons Store.Path.release_domain
+ 	else raise Invalid_Cmd_Args
+diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
+index 0ce6f68e8d..101c094715 100644
+--- a/tools/ocaml/xenstored/store.ml
++++ b/tools/ocaml/xenstored/store.ml
+@@ -89,6 +89,13 @@ let check_owner node connection =
+ 
+ let rec recurse fct node = fct node; List.iter (recurse fct) node.children
+ 
++(** [recurse_map f tree] applies [f] on each node in the tree recursively *)
++let recurse_map f =
++	let rec walk node =
++		f { node with children = List.rev_map walk node.children |> List.rev }
++	in
++	walk
++
+ let unpack node = (Symbol.to_string node.name, node.perms, node.value)
+ 
+ end
+@@ -405,6 +412,15 @@ let setperms store perm path nperms =
+ 		Quota.del_entry store.quota old_owner;
+ 		Quota.add_entry store.quota new_owner
+ 
++let reset_permissions store domid =
++	Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
++	store.root <- Node.recurse_map (fun node ->
++		let perms = Perms.Node.remove_domid ~domid node.perms in
++		if perms <> node.perms then
++			Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
++		{ node with perms }
++	) store.root
++
+ type ops = {
+ 	store: t;
+ 	write: Path.t -> string -> unit;
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index 30fc874327..183dd2754b 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -340,6 +340,7 @@ let _ =
+ 			finally (fun () ->
+ 				if Some port = eventchn.Event.virq_port then (
+ 					let (notify, deaddom) = Domains.cleanup domains in
++					List.iter (Store.reset_permissions store) deaddom;
+ 					List.iter (Connections.del_domain cons) deaddom;
+ 					if deaddom <> [] || notify then
+ 						Connections.fire_spec_watches
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa322-4.12-c.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa322-4.12-c.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa322-4.12-c.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa322-4.12-c.patch	2022-05-28 19:29:36.000000000 +0100
@@ -0,0 +1,534 @@
+From: Juergen Gross <jgross@suse.com>
+Subject: tools/xenstore: revoke access rights for removed domains
+
+Access rights of Xenstore nodes are per domid. Unfortunately existing
+granted access rights are not removed when a domain is being destroyed.
+This means that a new domain created with the same domid will inherit
+the access rights to Xenstore nodes from the previous domain(s) with
+the same domid.
+
+This can be avoided by adding a generation counter to each domain.
+The generation counter of the domain is set to the global generation
+counter when a domain structure is being allocated. When reading or
+writing a node all permissions of domains which are younger than the
+node itself are dropped. This is done by flagging the related entry
+as invalid in order to avoid modifying permissions in a way the user
+could detect.
+
+A special case has to be considered: for a new domain the first
+Xenstore entries are already written before the domain is officially
+introduced in Xenstore. In order not to drop the permissions for the
+new domain a domain struct is allocated even before introduction if
+the hypervisor is aware of the domain. This requires adding another
+bool "introduced" to struct domain in xenstored. In order to avoid
+additional padding holes convert the shutdown flag to bool, too.
+
+As verifying permissions has its price regarding runtime add a new
+quota for limiting the number of permissions an unprivileged domain
+can set for a node. The default for that new quota is 5.
+
+This is part of XSA-322.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Acked-by: Julien Grall <julien@amazon.com>
+
+diff --git a/tools/xenstore/include/xenstore_lib.h b/tools/xenstore/include/xenstore_lib.h
+index 0ffbae9eb574..4c9b6d16858d 100644
+--- a/tools/xenstore/include/xenstore_lib.h
++++ b/tools/xenstore/include/xenstore_lib.h
+@@ -34,6 +34,7 @@ enum xs_perm_type {
+ 	/* Internal use. */
+ 	XS_PERM_ENOENT_OK = 4,
+ 	XS_PERM_OWNER = 8,
++	XS_PERM_IGNORE = 16,
+ };
+ 
+ struct xs_permissions
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 2a86c4aa5bce..4fbe5c759c1b 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -101,6 +101,7 @@ int quota_nb_entry_per_domain = 1000;
+ int quota_nb_watch_per_domain = 128;
+ int quota_max_entry_size = 2048; /* 2K */
+ int quota_max_transaction = 10;
++int quota_nb_perms_per_node = 5;
+ 
+ void trace(const char *fmt, ...)
+ {
+@@ -407,8 +408,13 @@ struct node *read_node(struct connection *conn, const void *ctx,
+ 
+ 	/* Permissions are struct xs_permissions. */
+ 	node->perms.p = hdr->perms;
++	if (domain_adjust_node_perms(node)) {
++		talloc_free(node);
++		return NULL;
++	}
++
+ 	/* Data is binary blob (usually ascii, no nul). */
+-	node->data = node->perms.p + node->perms.num;
++	node->data = node->perms.p + hdr->num_perms;
+ 	/* Children is strings, nul separated. */
+ 	node->children = node->data + node->datalen;
+ 
+@@ -424,6 +430,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ 	void *p;
+ 	struct xs_tdb_record_hdr *hdr;
+ 
++	if (domain_adjust_node_perms(node))
++		return errno;
++
+ 	data.dsize = sizeof(*hdr)
+ 		+ node->perms.num * sizeof(node->perms.p[0])
+ 		+ node->datalen + node->childlen;
+@@ -483,8 +492,9 @@ enum xs_perm_type perm_for_conn(struct connection *conn,
+ 		return (XS_PERM_READ|XS_PERM_WRITE|XS_PERM_OWNER) & mask;
+ 
+ 	for (i = 1; i < perms->num; i++)
+-		if (perms->p[i].id == conn->id
+-                        || (conn->target && perms->p[i].id == conn->target->id))
++		if (!(perms->p[i].perms & XS_PERM_IGNORE) &&
++		    (perms->p[i].id == conn->id ||
++		     (conn->target && perms->p[i].id == conn->target->id)))
+ 			return perms->p[i].perms & mask;
+ 
+ 	return perms->p[0].perms & mask;
+@@ -1246,8 +1256,12 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ 	if (perms.num < 2)
+ 		return EINVAL;
+ 
+-	permstr = in->buffer + strlen(in->buffer) + 1;
+ 	perms.num--;
++	if (domain_is_unprivileged(conn) &&
++	    perms.num > quota_nb_perms_per_node)
++		return ENOSPC;
++
++	permstr = in->buffer + strlen(in->buffer) + 1;
+ 
+ 	perms.p = talloc_array(in, struct xs_permissions, perms.num);
+ 	if (!perms.p)
+@@ -1919,6 +1933,7 @@ static void usage(void)
+ "  -S, --entry-size <size> limit the size of entry per domain, and\n"
+ "  -W, --watch-nb <nb>     limit the number of watches per domain,\n"
+ "  -t, --transaction <nb>  limit the number of transaction allowed per domain,\n"
++"  -A, --perm-nb <nb>      limit the number of permissions per node,\n"
+ "  -R, --no-recovery       to request that no recovery should be attempted when\n"
+ "                          the store is corrupted (debug only),\n"
+ "  -I, --internal-db       store database in memory, not on disk\n"
+@@ -1939,6 +1954,7 @@ static struct option options[] = {
+ 	{ "entry-size", 1, NULL, 'S' },
+ 	{ "trace-file", 1, NULL, 'T' },
+ 	{ "transaction", 1, NULL, 't' },
++	{ "perm-nb", 1, NULL, 'A' },
+ 	{ "no-recovery", 0, NULL, 'R' },
+ 	{ "internal-db", 0, NULL, 'I' },
+ 	{ "verbose", 0, NULL, 'V' },
+@@ -1961,7 +1977,7 @@ int main(int argc, char *argv[])
+ 	int timeout;
+ 
+ 
+-	while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:T:RVW:", options,
++	while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:T:RVW:", options,
+ 				  NULL)) != -1) {
+ 		switch (opt) {
+ 		case 'D':
+@@ -2003,6 +2019,9 @@ int main(int argc, char *argv[])
+ 		case 'W':
+ 			quota_nb_watch_per_domain = strtol(optarg, NULL, 10);
+ 			break;
++		case 'A':
++			quota_nb_perms_per_node = strtol(optarg, NULL, 10);
++			break;
+ 		case 'e':
+ 			dom0_event = strtol(optarg, NULL, 10);
+ 			break;
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index 0b2f49ac7d4c..f5e7af46e8aa 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -71,8 +71,14 @@ struct domain
+ 	/* The connection associated with this. */
+ 	struct connection *conn;
+ 
++	/* Generation count at domain introduction time. */
++	uint64_t generation;
++
+ 	/* Have we noticed that this domain is shutdown? */
+-	int shutdown;
++	bool shutdown;
++
++	/* Has domain been officially introduced? */
++	bool introduced;
+ 
+ 	/* number of entry from this domain in the store */
+ 	int nbentry;
+@@ -200,6 +206,9 @@ static int destroy_domain(void *_domain)
+ 
+ 	list_del(&domain->list);
+ 
++	if (!domain->introduced)
++		return 0;
++
+ 	if (domain->port) {
+ 		if (xenevtchn_unbind(xce_handle, domain->port) == -1)
+ 			eprintf("> Unbinding port %i failed!\n", domain->port);
+@@ -221,20 +230,33 @@ static int destroy_domain(void *_domain)
+ 	return 0;
+ }
+ 
++static bool get_domain_info(unsigned int domid, xc_dominfo_t *dominfo)
++{
++	return xc_domain_getinfo(*xc_handle, domid, 1, dominfo) == 1 &&
++	       dominfo->domid == domid;
++}
++
+ static void domain_cleanup(void)
+ {
+ 	xc_dominfo_t dominfo;
+ 	struct domain *domain;
+ 	int notify = 0;
++	bool dom_valid;
+ 
+  again:
+ 	list_for_each_entry(domain, &domains, list) {
+-		if (xc_domain_getinfo(*xc_handle, domain->domid, 1,
+-				      &dominfo) == 1 &&
+-		    dominfo.domid == domain->domid) {
++		dom_valid = get_domain_info(domain->domid, &dominfo);
++		if (!domain->introduced) {
++			if (!dom_valid) {
++				talloc_free(domain);
++				goto again;
++			}
++			continue;
++		}
++		if (dom_valid) {
+ 			if ((dominfo.crashed || dominfo.shutdown)
+ 			    && !domain->shutdown) {
+-				domain->shutdown = 1;
++				domain->shutdown = true;
+ 				notify = 1;
+ 			}
+ 			if (!dominfo.dying)
+@@ -301,58 +323,84 @@ static char *talloc_domain_path(void *context, unsigned int domid)
+ 	return talloc_asprintf(context, "/local/domain/%u", domid);
+ }
+ 
+-static struct domain *new_domain(void *context, unsigned int domid,
+-				 int port)
++static struct domain *find_domain_struct(unsigned int domid)
++{
++	struct domain *i;
++
++	list_for_each_entry(i, &domains, list) {
++		if (i->domid == domid)
++			return i;
++	}
++	return NULL;
++}
++
++static struct domain *alloc_domain(void *context, unsigned int domid)
+ {
+ 	struct domain *domain;
+-	int rc;
+ 
+ 	domain = talloc(context, struct domain);
+-	if (!domain)
++	if (!domain) {
++		errno = ENOMEM;
+ 		return NULL;
++	}
+ 
+-	domain->port = 0;
+-	domain->shutdown = 0;
+ 	domain->domid = domid;
+-	domain->path = talloc_domain_path(domain, domid);
+-	if (!domain->path)
+-		return NULL;
++	domain->generation = generation;
++	domain->introduced = false;
+ 
+-	wrl_domain_new(domain);
++	talloc_set_destructor(domain, destroy_domain);
+ 
+ 	list_add(&domain->list, &domains);
+-	talloc_set_destructor(domain, destroy_domain);
++
++	return domain;
++}
++
++static int new_domain(struct domain *domain, int port)
++{
++	int rc;
++
++	domain->port = 0;
++	domain->shutdown = false;
++	domain->path = talloc_domain_path(domain, domain->domid);
++	if (!domain->path) {
++		errno = ENOMEM;
++		return errno;
++	}
++
++	wrl_domain_new(domain);
+ 
+ 	/* Tell kernel we're interested in this event. */
+-	rc = xenevtchn_bind_interdomain(xce_handle, domid, port);
++	rc = xenevtchn_bind_interdomain(xce_handle, domain->domid, port);
+ 	if (rc == -1)
+-	    return NULL;
++		return errno;
+ 	domain->port = rc;
+ 
++	domain->introduced = true;
++
+ 	domain->conn = new_connection(writechn, readchn);
+-	if (!domain->conn)
+-		return NULL;
++	if (!domain->conn)  {
++		errno = ENOMEM;
++		return errno;
++	}
+ 
+ 	domain->conn->domain = domain;
+-	domain->conn->id = domid;
++	domain->conn->id = domain->domid;
+ 
+ 	domain->remote_port = port;
+ 	domain->nbentry = 0;
+ 	domain->nbwatch = 0;
+ 
+-	return domain;
++	return 0;
+ }
+ 
+ 
+ static struct domain *find_domain_by_domid(unsigned int domid)
+ {
+-	struct domain *i;
++	struct domain *d;
+ 
+-	list_for_each_entry(i, &domains, list) {
+-		if (i->domid == domid)
+-			return i;
+-	}
+-	return NULL;
++	d = find_domain_struct(domid);
++
++	return (d && d->introduced) ? d : NULL;
+ }
+ 
+ static void domain_conn_reset(struct domain *domain)
+@@ -399,15 +447,21 @@ int do_introduce(struct connection *conn, struct buffered_data *in)
+ 	if (port <= 0)
+ 		return EINVAL;
+ 
+-	domain = find_domain_by_domid(domid);
++	domain = find_domain_struct(domid);
+ 
+ 	if (domain == NULL) {
++		/* Hang domain off "in" until we're finished. */
++		domain = alloc_domain(in, domid);
++		if (domain == NULL)
++			return ENOMEM;
++	}
++
++	if (!domain->introduced) {
+ 		interface = map_interface(domid, mfn);
+ 		if (!interface)
+ 			return errno;
+ 		/* Hang domain off "in" until we're finished. */
+-		domain = new_domain(in, domid, port);
+-		if (!domain) {
++		if (new_domain(domain, port)) {
+ 			rc = errno;
+ 			unmap_interface(interface);
+ 			return rc;
+@@ -518,8 +572,8 @@ int do_resume(struct connection *conn, struct buffered_data *in)
+ 	if (IS_ERR(domain))
+ 		return -PTR_ERR(domain);
+ 
+-	domain->shutdown = 0;
+-	
++	domain->shutdown = false;
++
+ 	send_ack(conn, XS_RESUME);
+ 
+ 	return 0;
+@@ -662,8 +716,10 @@ static int dom0_init(void)
+ 	if (port == -1)
+ 		return -1;
+ 
+-	dom0 = new_domain(NULL, xenbus_master_domid(), port);
+-	if (dom0 == NULL)
++	dom0 = alloc_domain(NULL, xenbus_master_domid());
++	if (!dom0)
++		return -1;
++	if (new_domain(dom0, port))
+ 		return -1;
+ 
+ 	dom0->interface = xenbus_map();
+@@ -744,6 +800,66 @@ void domain_entry_inc(struct connection *conn, struct node *node)
+ 	}
+ }
+ 
++/*
++ * Check whether a domain was created before or after a specific generation
++ * count (used for testing whether a node permission is older than a domain).
++ *
++ * Return values:
++ * -1: error
++ *  0: domain has higher generation count (it is younger than a node with the
++ *     given count), or domain isn't existing any longer
++ *  1: domain is older than the node
++ */
++static int chk_domain_generation(unsigned int domid, uint64_t gen)
++{
++	struct domain *d;
++	xc_dominfo_t dominfo;
++
++	if (!xc_handle && domid == 0)
++		return 1;
++
++	d = find_domain_struct(domid);
++	if (d)
++		return (d->generation <= gen) ? 1 : 0;
++
++	if (!get_domain_info(domid, &dominfo))
++		return 0;
++
++	d = alloc_domain(NULL, domid);
++	return d ? 1 : -1;
++}
++
++/*
++ * Remove permissions for no longer existing domains in order to avoid a new
++ * domain with the same domid inheriting the permissions.
++ */
++int domain_adjust_node_perms(struct node *node)
++{
++	unsigned int i;
++	int ret;
++
++	ret = chk_domain_generation(node->perms.p[0].id, node->generation);
++	if (ret < 0)
++		return errno;
++
++	/* If the owner doesn't exist any longer give it to priv domain. */
++	if (!ret)
++		node->perms.p[0].id = priv_domid;
++
++	for (i = 1; i < node->perms.num; i++) {
++		if (node->perms.p[i].perms & XS_PERM_IGNORE)
++			continue;
++		ret = chk_domain_generation(node->perms.p[i].id,
++					    node->generation);
++		if (ret < 0)
++			return errno;
++		if (!ret)
++			node->perms.p[i].perms |= XS_PERM_IGNORE;
++	}
++
++	return 0;
++}
++
+ void domain_entry_dec(struct connection *conn, struct node *node)
+ {
+ 	struct domain *d;
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index 259183962a9c..5e00087206c7 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -56,6 +56,9 @@ bool domain_can_write(struct connection *conn);
+ 
+ bool domain_is_unprivileged(struct connection *conn);
+ 
++/* Remove node permissions for no longer existing domains. */
++int domain_adjust_node_perms(struct node *node);
++
+ /* Quota manipulation */
+ void domain_entry_inc(struct connection *conn, struct node *);
+ void domain_entry_dec(struct connection *conn, struct node *);
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index 36793b9b1af3..9fcb4c9ba986 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -47,7 +47,12 @@
+  * transaction.
+  * Each time the global generation count is copied to either a node or a
+  * transaction it is incremented. This ensures all nodes and/or transactions
+- * are having a unique generation count.
++ * are having a unique generation count. The increment is done _before_ the
++ * copy as that is needed for checking whether a domain was created before
++ * or after a node has been written (the domain's generation is set with the
++ * actual generation count without incrementing it, in order to support
++ * writing a node for a domain before the domain has been officially
++ * introduced).
+  *
+  * Transaction conflicts are detected by checking the generation count of all
+  * nodes read in the transaction to match with the generation count in the
+@@ -161,7 +166,7 @@ struct transaction
+ };
+ 
+ extern int quota_max_transaction;
+-static uint64_t generation;
++uint64_t generation;
+ 
+ static void set_tdb_key(const char *name, TDB_DATA *key)
+ {
+@@ -237,7 +242,7 @@ int access_node(struct connection *conn, struct node *node,
+ 	bool introduce = false;
+ 
+ 	if (type != NODE_ACCESS_READ) {
+-		node->generation = generation++;
++		node->generation = ++generation;
+ 		if (conn && !conn->transaction)
+ 			wrl_apply_debit_direct(conn);
+ 	}
+@@ -374,7 +379,7 @@ static int finalize_transaction(struct connection *conn,
+ 				if (!data.dptr)
+ 					goto err;
+ 				hdr = (void *)data.dptr;
+-				hdr->generation = generation++;
++				hdr->generation = ++generation;
+ 				ret = tdb_store(tdb_ctx, key, data,
+ 						TDB_REPLACE);
+ 				talloc_free(data.dptr);
+@@ -462,7 +467,7 @@ int do_transaction_start(struct connection *conn, struct buffered_data *in)
+ 	INIT_LIST_HEAD(&trans->accessed);
+ 	INIT_LIST_HEAD(&trans->changed_domains);
+ 	trans->fail = false;
+-	trans->generation = generation++;
++	trans->generation = ++generation;
+ 
+ 	/* Pick an unused transaction identifier. */
+ 	do {
+diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
+index 3386bac56508..43a162bea3f3 100644
+--- a/tools/xenstore/xenstored_transaction.h
++++ b/tools/xenstore/xenstored_transaction.h
+@@ -27,6 +27,8 @@ enum node_access_type {
+ 
+ struct transaction;
+ 
++extern uint64_t generation;
++
+ int do_transaction_start(struct connection *conn, struct buffered_data *node);
+ int do_transaction_end(struct connection *conn, struct buffered_data *in);
+ 
+diff --git a/tools/xenstore/xs_lib.c b/tools/xenstore/xs_lib.c
+index 3e43f8809d42..d407d5713aff 100644
+--- a/tools/xenstore/xs_lib.c
++++ b/tools/xenstore/xs_lib.c
+@@ -152,7 +152,7 @@ bool xs_strings_to_perms(struct xs_permissions *perms, unsigned int num,
+ bool xs_perm_to_string(const struct xs_permissions *perm,
+                        char *buffer, size_t buf_len)
+ {
+-	switch ((int)perm->perms) {
++	switch ((int)perm->perms & ~XS_PERM_IGNORE) {
+ 	case XS_PERM_WRITE:
+ 		*buffer = 'w';
+ 		break;
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa323.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa323.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa323.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa323.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,140 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: Fix path length validation
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Currently, oxenstored checks the length of paths against 1024, then
+prepends "/local/domain/$DOMID/" to relative paths.  This allows a domU
+to create paths which can't subsequently be read by anyone, even dom0.
+This also interferes with listing directories, etc.
+
+Define a new oxenstored.conf entry: quota-path-max, defaulting to 1024
+as before.  For paths that begin with "/local/domain/$DOMID/" check the
+relative path length against this quota. For all other paths check the
+entire path length.
+
+This ensures that if the domid changes (and thus the length of a prefix
+changes) a path that used to be valid stays valid (e.g. after a
+live-migration).  It also ensures that regardless how the client tries
+to access a path (domid-relative or absolute) it will get consistent
+results, since the limit is always applied on the final canonicalized
+path.
+
+Delete the unused Domain.get_path to avoid it being confused with
+Connection.get_path (which differs by a trailing slash only).
+
+Rewrite Util.path_validate to apply the appropriate length restriction
+based on whether the path is relative or not.  Remove the check for
+connection_path being absolute, because it is not guest controlled data.
+
+This is part of XSA-323.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+
+diff --git a/tools/ocaml/libs/xb/partial.ml b/tools/ocaml/libs/xb/partial.ml
+index d4d1c7bdec..b6e2a716e2 100644
+--- a/tools/ocaml/libs/xb/partial.ml
++++ b/tools/ocaml/libs/xb/partial.ml
+@@ -28,6 +28,7 @@ external header_of_string_internal: string -> int * int * int * int
+          = "stub_header_of_string"
+ 
+ let xenstore_payload_max = 4096 (* xen/include/public/io/xs_wire.h *)
++let xenstore_rel_path_max = 2048 (* xen/include/public/io/xs_wire.h *)
+ 
+ let of_string s =
+ 	let tid, rid, opint, dlen = header_of_string_internal s in
+diff --git a/tools/ocaml/libs/xb/partial.mli b/tools/ocaml/libs/xb/partial.mli
+index 359a75e88d..b9216018f5 100644
+--- a/tools/ocaml/libs/xb/partial.mli
++++ b/tools/ocaml/libs/xb/partial.mli
+@@ -9,6 +9,7 @@ external header_size : unit -> int = "stub_header_size"
+ external header_of_string_internal : string -> int * int * int * int
+   = "stub_header_of_string"
+ val xenstore_payload_max : int
++val xenstore_rel_path_max : int
+ val of_string : string -> pkt
+ val append : pkt -> string -> int -> unit
+ val to_complete : pkt -> int
+diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
+index ea9e1b7620..ebe18b8e31 100644
+--- a/tools/ocaml/xenstored/define.ml
++++ b/tools/ocaml/xenstored/define.ml
+@@ -31,6 +31,8 @@ let conflict_rate_limit_is_aggregate = ref true
+ 
+ let domid_self = 0x7FF0
+ 
++let path_max = ref Xenbus.Partial.xenstore_rel_path_max
++
+ exception Not_a_directory of string
+ exception Not_a_value of string
+ exception Already_exist
+diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
+index aeb185ff7e..81cb59b8f1 100644
+--- a/tools/ocaml/xenstored/domain.ml
++++ b/tools/ocaml/xenstored/domain.ml
+@@ -38,7 +38,6 @@ type t =
+ }
+ 
+ let is_dom0 d = d.id = 0
+-let get_path dom = "/local/domain/" ^ (sprintf "%u" dom.id)
+ let get_id domain = domain.id
+ let get_interface d = d.interface
+ let get_mfn d = d.mfn
+diff --git a/tools/ocaml/xenstored/oxenstored.conf.in b/tools/ocaml/xenstored/oxenstored.conf.in
+index f843482981..4ae48e42d4 100644
+--- a/tools/ocaml/xenstored/oxenstored.conf.in
++++ b/tools/ocaml/xenstored/oxenstored.conf.in
+@@ -61,6 +61,7 @@ quota-maxsize = 2048
+ quota-maxwatch = 100
+ quota-transaction = 10
+ quota-maxrequests = 1024
++quota-path-max = 1024
+ 
+ # Activate filed base backend
+ persistent = false
+diff --git a/tools/ocaml/xenstored/utils.ml b/tools/ocaml/xenstored/utils.ml
+index e8c9fe4e94..eb79bf0146 100644
+--- a/tools/ocaml/xenstored/utils.ml
++++ b/tools/ocaml/xenstored/utils.ml
+@@ -93,7 +93,7 @@ let read_file_single_integer filename =
+ let path_validate path connection_path =
+ 	let len = String.length path in
+ 
+-	if len = 0 || len > 1024 then raise Define.Invalid_path;
++	if len = 0 then raise Define.Invalid_path;
+ 
+ 	let abs_path =
+ 		match String.get path 0 with
+@@ -101,4 +101,17 @@ let path_validate path connection_path =
+ 		| _   -> connection_path ^ path
+ 	in
+ 
++	(* Regardless whether client specified absolute or relative path,
++	   canonicalize it (above) and, for domain-relative paths, check the
++	   length of the relative part.
++
++	   This prevents paths becoming invalid across migrate when the length
++	   of the domid changes in @param connection_path.
++	 *)
++	let len = String.length abs_path in
++	let on_absolute _ _ = len in
++	let on_relative _ offset = len - offset in
++	let len = Scanf.ksscanf abs_path on_absolute "/local/domain/%d/%n" on_relative in
++	if len > !Define.path_max then raise Define.Invalid_path;
++
+ 	abs_path
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index ff9fbbbac2..39d6d767e4 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -102,6 +102,7 @@ let parse_config filename =
+ 		("quota-maxentity", Config.Set_int Quota.maxent);
+ 		("quota-maxsize", Config.Set_int Quota.maxsize);
+ 		("quota-maxrequests", Config.Set_int Define.maxrequests);
++		("quota-path-max", Config.Set_int Define.path_max);
+ 		("test-eagain", Config.Set_bool Transaction.test_eagain);
+ 		("persistent", Config.Set_bool Disk.enable);
+ 		("xenstored-log-file", Config.String Logging.set_xenstored_log_destination);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa324.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa324.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa324.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa324.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,48 @@
+From: Juergen Gross <jgross@suse.com>
+Subject: tools/xenstore: drop watch event messages exceeding maximum size
+
+By setting a watch with a very large tag it is possible to trick
+xenstored to send watch event messages exceeding the maximum allowed
+payload size. This might in turn lead to a crash of xenstored as the
+resulting error can cause dereferencing a NULL pointer in case there
+is no active request being handled by the guest the watch event is
+being sent to.
+
+Fix that by just dropping such watch events. Additionally modify the
+error handling to test the pointer to be not NULL before dereferencing
+it.
+
+This is XSA-324.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 33f95dcf3c..3d74dbbb40 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -674,6 +674,9 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ 	/* Replies reuse the request buffer, events need a new one. */
+ 	if (type != XS_WATCH_EVENT) {
+ 		bdata = conn->in;
++		/* Drop asynchronous responses, e.g. errors for watch events. */
++		if (!bdata)
++			return;
+ 		bdata->inhdr = true;
+ 		bdata->used = 0;
+ 		conn->in = NULL;
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index 71c108ea99..9ff20690c0 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -92,6 +92,10 @@ static void add_event(struct connection *conn,
+ 	}
+ 
+ 	len = strlen(name) + 1 + strlen(watch->token) + 1;
++	/* Don't try to send over-long events. */
++	if (len > XENSTORE_PAYLOAD_MAX)
++		return;
++
+ 	data = talloc_array(ctx, char, len);
+ 	if (!data)
+ 		return;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa325-4.14.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa325-4.14.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa325-4.14.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa325-4.14.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,192 @@
+From: Harsha Shamsundara Havanur <havanur@amazon.com>
+Subject: tools/xenstore: Preserve bad client until they are destroyed
+
+XenStored will kill any connection that it thinks has misbehaved,
+this is currently happening in two places:
+ * In `handle_input()` if the sanity check on the ring and the message
+   fails.
+ * In `handle_output()` when failing to write the response in the ring.
+
+As the domain structure is a child of the connection, XenStored will
+destroy its view of the domain when killing the connection. This will
+result in sending @releaseDomain event to all the watchers.
+
+As the watch event doesn't carry which domain has been released,
+the watcher (such as XenStored) will generally go through the list of
+domains registers and check if one of them is shutting down/dying.
+In the case of a client misbehaving, the domain will likely to be
+running, so no action will be performed.
+
+When the domain is effectively destroyed, XenStored will not be aware of
+the domain anymore. So the watch event is not going to be sent.
+By consequence, the watchers of the event will not release mappings
+they may have on the domain. This will result in a zombie domain.
+
+In order to send @releaseDomain event at the correct time, we want
+to keep the domain structure until the domain is effectively
+shutting-down/dying.
+
+We also want to keep the connection around so we could possibly revive
+the connection in the future.
+
+A new flag 'is_ignored' is added to mark whether a connection should be
+ignored when checking if there are work to do. Additionally any
+transactions, watches, buffers associated to the connection will be
+freed as you can't do much with them (restarting the connection will
+likely need a reset).
+
+As a side note, when the device model were running in a stubdomain, a
+guest would have been able to introduce a use-after-free because there
+is two parents for a guest connection.
+
+This is XSA-325.
+
+Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
+Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index af3d17004b3f..27d8f15b6b76 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1355,6 +1355,32 @@ static struct {
+ 	[XS_DIRECTORY_PART]    = { "DIRECTORY_PART",    send_directory_part },
+ };
+ 
++/*
++ * Keep the connection alive but stop processing any new request or sending
++ * reponse. This is to allow sending @releaseDomain watch event at the correct
++ * moment and/or to allow the connection to restart (not yet implemented).
++ *
++ * All watches, transactions, buffers will be freed.
++ */
++static void ignore_connection(struct connection *conn)
++{
++	struct buffered_data *out, *tmp;
++
++	trace("CONN %p ignored\n", conn);
++
++	conn->is_ignored = true;
++	conn_delete_all_watches(conn);
++	conn_delete_all_transactions(conn);
++
++	list_for_each_entry_safe(out, tmp, &conn->out_list, list) {
++		list_del(&out->list);
++		talloc_free(out);
++	}
++
++	talloc_free(conn->in);
++	conn->in = NULL;
++}
++
+ static const char *sockmsg_string(enum xsd_sockmsg_type type)
+ {
+ 	if ((unsigned int)type < ARRAY_SIZE(wire_funcs) && wire_funcs[type].str)
+@@ -1413,8 +1439,10 @@ static void consider_message(struct connection *conn)
+ 	assert(conn->in == NULL);
+ }
+ 
+-/* Errors in reading or allocating here mean we get out of sync, so we
+- * drop the whole client connection. */
++/*
++ * Errors in reading or allocating here means we get out of sync, so we mark
++ * the connection as ignored.
++ */
+ static void handle_input(struct connection *conn)
+ {
+ 	int bytes;
+@@ -1471,14 +1499,14 @@ static void handle_input(struct connection *conn)
+ 	return;
+ 
+ bad_client:
+-	/* Kill it. */
+-	talloc_free(conn);
++	ignore_connection(conn);
+ }
+ 
+ static void handle_output(struct connection *conn)
+ {
++	/* Ignore the connection if an error occured */
+ 	if (!write_messages(conn))
+-		talloc_free(conn);
++		ignore_connection(conn);
+ }
+ 
+ struct connection *new_connection(connwritefn_t *write, connreadfn_t *read)
+@@ -1494,6 +1522,7 @@ struct connection *new_connection(connwritefn_t *write, connreadfn_t *read)
+ 	new->write = write;
+ 	new->read = read;
+ 	new->can_write = true;
++	new->is_ignored = false;
+ 	new->transaction_started = 0;
+ 	INIT_LIST_HEAD(&new->out_list);
+ 	INIT_LIST_HEAD(&new->watches);
+@@ -2186,8 +2215,9 @@ int main(int argc, char *argv[])
+ 					if (fds[conn->pollfd_idx].revents
+ 					    & ~(POLLIN|POLLOUT))
+ 						talloc_free(conn);
+-					else if (fds[conn->pollfd_idx].revents
+-						 & POLLIN)
++					else if ((fds[conn->pollfd_idx].revents
++						  & POLLIN) &&
++						 !conn->is_ignored)
+ 						handle_input(conn);
+ 				}
+ 				if (talloc_free(conn) == 0)
+@@ -2199,8 +2229,9 @@ int main(int argc, char *argv[])
+ 					if (fds[conn->pollfd_idx].revents
+ 					    & ~(POLLIN|POLLOUT))
+ 						talloc_free(conn);
+-					else if (fds[conn->pollfd_idx].revents
+-						 & POLLOUT)
++					else if ((fds[conn->pollfd_idx].revents
++						  & POLLOUT) &&
++						 !conn->is_ignored)
+ 						handle_output(conn);
+ 				}
+ 				if (talloc_free(conn) == 0)
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index eb19b71f5f46..196a6fd2b0be 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -80,6 +80,9 @@ struct connection
+ 	/* Is this a read-only connection? */
+ 	bool can_write;
+ 
++	/* Is this connection ignored? */
++	bool is_ignored;
++
+ 	/* Buffered incoming data. */
+ 	struct buffered_data *in;
+ 
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index dc635e9be30c..d5e1e3e9d42d 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -286,6 +286,10 @@ bool domain_can_read(struct connection *conn)
+ 
+ 	if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
+ 		return false;
++
++	if (conn->is_ignored)
++		return false;
++
+ 	return (intf->req_cons != intf->req_prod);
+ }
+ 
+@@ -303,6 +307,10 @@ bool domain_is_unprivileged(struct connection *conn)
+ bool domain_can_write(struct connection *conn)
+ {
+ 	struct xenstore_domain_interface *intf = conn->domain->interface;
++
++	if (conn->is_ignored)
++		return false;
++
+ 	return ((intf->rsp_prod - intf->rsp_cons) != XENSTORE_RING_SIZE);
+ }
+ 
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa327.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa327.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa327.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa327.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,63 @@
+From 030300ebbb86c40c12db038714479d746167c767 Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Tue, 26 May 2020 18:31:33 +0100
+Subject: [PATCH] xen: Check the alignment of the offset pased via
+ VCPUOP_register_vcpu_info
+
+Currently a guest is able to register any guest physical address to use
+for the vcpu_info structure as long as the structure can fits in the
+rest of the frame.
+
+This means a guest can provide an address that is not aligned to the
+natural alignment of the structure.
+
+On Arm 32-bit, unaligned access are completely forbidden by the
+hypervisor. This will result to a data abort which is fatal.
+
+On Arm 64-bit, unaligned access are only forbidden when used for atomic
+access. As the structure contains fields (such as evtchn_pending_self)
+that are updated using atomic operations, any unaligned access will be
+fatal as well.
+
+While the misalignment is only fatal on Arm, a generic check is added
+as an x86 guest shouldn't sensibly pass an unaligned address (this
+would result to a split lock).
+
+This is XSA-327.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+---
+ xen/common/domain.c | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+diff --git a/xen/common/domain.c b/xen/common/domain.c
+index 7cc9526139a6..e9be05f1d05f 100644
+--- a/xen/common/domain.c
++++ b/xen/common/domain.c
+@@ -1227,10 +1227,20 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
+     void *mapping;
+     vcpu_info_t *new_info;
+     struct page_info *page;
++    unsigned int align;
+ 
+     if ( offset > (PAGE_SIZE - sizeof(vcpu_info_t)) )
+         return -EINVAL;
+ 
++#ifdef CONFIG_COMPAT
++    if ( has_32bit_shinfo(d) )
++        align = alignof(new_info->compat);
++    else
++#endif
++        align = alignof(*new_info);
++    if ( offset & (align - 1) )
++        return -EINVAL;
++
+     if ( !mfn_eq(v->vcpu_info_mfn, INVALID_MFN) )
+         return -EINVAL;
+ 
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa328-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa328-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa328-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa328-4.11-1.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,118 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/EPT: ept_set_middle_entry() related adjustments
+
+ept_split_super_page() wants to further modify the newly allocated
+table, so have ept_set_middle_entry() return the mapped pointer rather
+than tearing it down and then getting re-established right again.
+
+Similarly ept_next_level() wants to hand back a mapped pointer of
+the next level page, so re-use the one established by
+ept_set_middle_entry() in case that path was taken.
+
+Pull the setting of suppress_ve ahead of insertion into the higher level
+table, and don't have ept_split_super_page() set the field a 2nd time.
+
+This is part of XSA-328.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/arch/x86/mm/p2m-ept.c
++++ b/xen/arch/x86/mm/p2m-ept.c
+@@ -228,8 +228,9 @@ static void ept_p2m_type_to_flags(struct
+ #define GUEST_TABLE_SUPER_PAGE  2
+ #define GUEST_TABLE_POD_PAGE    3
+ 
+-/* Fill in middle levels of ept table */
+-static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
++/* Fill in middle level of ept table; return pointer to mapped new table. */
++static ept_entry_t *ept_set_middle_entry(struct p2m_domain *p2m,
++                                         ept_entry_t *ept_entry)
+ {
+     mfn_t mfn;
+     ept_entry_t *table;
+@@ -237,7 +238,12 @@ static int ept_set_middle_entry(struct p
+ 
+     mfn = p2m_alloc_ptp(p2m, 0);
+     if ( mfn_eq(mfn, INVALID_MFN) )
+-        return 0;
++        return NULL;
++
++    table = map_domain_page(mfn);
++
++    for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
++        table[i].suppress_ve = 1;
+ 
+     ept_entry->epte = 0;
+     ept_entry->mfn = mfn_x(mfn);
+@@ -249,14 +255,7 @@ static int ept_set_middle_entry(struct p
+ 
+     ept_entry->suppress_ve = 1;
+ 
+-    table = map_domain_page(mfn);
+-
+-    for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+-        table[i].suppress_ve = 1;
+-
+-    unmap_domain_page(table);
+-
+-    return 1;
++    return table;
+ }
+ 
+ /* free ept sub tree behind an entry */
+@@ -294,10 +293,10 @@ static bool_t ept_split_super_page(struc
+ 
+     ASSERT(is_epte_superpage(ept_entry));
+ 
+-    if ( !ept_set_middle_entry(p2m, &new_ept) )
++    table = ept_set_middle_entry(p2m, &new_ept);
++    if ( !table )
+         return 0;
+ 
+-    table = map_domain_page(_mfn(new_ept.mfn));
+     trunk = 1UL << ((level - 1) * EPT_TABLE_ORDER);
+ 
+     for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+@@ -308,7 +307,6 @@ static bool_t ept_split_super_page(struc
+         epte->sp = (level > 1);
+         epte->mfn += i * trunk;
+         epte->snp = (iommu_enabled && iommu_snoop);
+-        epte->suppress_ve = 1;
+ 
+         ept_p2m_type_to_flags(p2m, epte, epte->sa_p2mt, epte->access);
+ 
+@@ -347,8 +345,7 @@ static int ept_next_level(struct p2m_dom
+                           ept_entry_t **table, unsigned long *gfn_remainder,
+                           int next_level)
+ {
+-    unsigned long mfn;
+-    ept_entry_t *ept_entry, e;
++    ept_entry_t *ept_entry, *next = NULL, e;
+     u32 shift, index;
+ 
+     shift = next_level * EPT_TABLE_ORDER;
+@@ -373,19 +370,17 @@ static int ept_next_level(struct p2m_dom
+         if ( read_only )
+             return GUEST_TABLE_MAP_FAILED;
+ 
+-        if ( !ept_set_middle_entry(p2m, ept_entry) )
++        next = ept_set_middle_entry(p2m, ept_entry);
++        if ( !next )
+             return GUEST_TABLE_MAP_FAILED;
+-        else
+-            e = atomic_read_ept_entry(ept_entry); /* Refresh */
++        /* e is now stale and hence may not be used anymore below. */
+     }
+-
+     /* The only time sp would be set here is if we had hit a superpage */
+-    if ( is_epte_superpage(&e) )
++    else if ( is_epte_superpage(&e) )
+         return GUEST_TABLE_SUPER_PAGE;
+ 
+-    mfn = e.mfn;
+     unmap_domain_page(*table);
+-    *table = map_domain_page(_mfn(mfn));
++    *table = next ?: map_domain_page(_mfn(e.mfn));
+     *gfn_remainder &= (1UL << shift) - 1;
+     return GUEST_TABLE_NORMAL_PAGE;
+ }
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa328-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa328-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa328-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa328-4.11-2.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,48 @@
+From: <security@xenproject.org>
+Subject: x86/ept: atomically modify entries in ept_next_level
+
+ept_next_level was passing a live PTE pointer to ept_set_middle_entry,
+which was then modified without taking into account that the PTE could
+be part of a live EPT table. This wasn't a security issue because the
+pages returned by p2m_alloc_ptp are zeroed, so adding such an entry
+before actually initializing it didn't allow a guest to access
+physical memory addresses it wasn't supposed to access.
+
+This is part of XSA-328.
+
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/arch/x86/mm/p2m-ept.c
++++ b/xen/arch/x86/mm/p2m-ept.c
+@@ -348,6 +348,8 @@ static int ept_next_level(struct p2m_dom
+     ept_entry_t *ept_entry, *next = NULL, e;
+     u32 shift, index;
+ 
++    ASSERT(next_level);
++
+     shift = next_level * EPT_TABLE_ORDER;
+ 
+     index = *gfn_remainder >> shift;
+@@ -364,16 +366,20 @@ static int ept_next_level(struct p2m_dom
+ 
+     if ( !is_epte_present(&e) )
+     {
++        int rc;
++
+         if ( e.sa_p2mt == p2m_populate_on_demand )
+             return GUEST_TABLE_POD_PAGE;
+ 
+         if ( read_only )
+             return GUEST_TABLE_MAP_FAILED;
+ 
+-        next = ept_set_middle_entry(p2m, ept_entry);
++        next = ept_set_middle_entry(p2m, &e);
+         if ( !next )
+             return GUEST_TABLE_MAP_FAILED;
+-        /* e is now stale and hence may not be used anymore below. */
++
++        rc = atomic_write_ept_entry(ept_entry, e, next_level);
++        ASSERT(rc == 0);
+     }
+     /* The only time sp would be set here is if we had hit a superpage */
+     else if ( is_epte_superpage(&e) )
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa330.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa330.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa330.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa330.patch	2022-05-30 08:15:43.000000000 +0100
@@ -0,0 +1,68 @@
+Variable names modified for version in Ubuntu 20.04.
+
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: delete watch from trie too when resetting
+ watches
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+c/s f8c72b526129 "oxenstored: implement XS_RESET_WATCHES" from Xen 4.6
+introduced reset watches support in oxenstored by mirroring the change
+in cxenstored.
+
+However the OCaml version has some additional data structures to
+optimize watch firing, and just resetting the watches in one of the data
+structures creates a security bug where a malicious guest kernel can
+exceed its watch quota, driving oxenstored into OOM:
+ * create watches
+ * reset watches (this still keeps the watches lingering in another data
+   structure, using memory)
+ * create some more watches
+ * loop until oxenstored dies
+
+The guest kernel doesn't necessarily have to be malicious to trigger
+this:
+ * if control/platform-feature-xs_reset_watches is set
+ * the guest kexecs (e.g. because it crashes)
+ * on boot more watches are set up
+ * this will slowly "leak" memory for watches in oxenstored, driving it
+   towards OOM.
+
+This is XSA-330.
+
+Fixes: f8c72b526129 ("oxenstored: implement XS_RESET_WATCHES")
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/connections.ml b/tools/ocaml/xenstored/connections.ml
+index 9f9f7ee2f0..6ee3552ec2 100644
+--- a/tools/ocaml/xenstored/connections.ml
++++ b/tools/ocaml/xenstored/connections.ml
+@@ -134,6 +134,10 @@ let del_watch cons con path token =
+ 		cons.watches <- Trie.set cons.watches key watches;
+  	watch
+ 
++let del_watches cons con =
++	Connection.del_watches con;
++	cons.watches <- Trie.map (del_watches_of_con con) cons.watches
++
+ (* path is absolute *)
+ let fire_watches ?oldroot root cons path recurse =
+ 	let key = key_of_path path in
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index 73e04cc18b..437d2dcf9e 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -179,8 +179,8 @@ let do_isintroduced con t domains cons data =
+ 	if domid = Define.domid_self || Domains.exist domains domid then "T\000" else "F\000"
+ 
+ (* only in xen >= 4.2 *)
+-let do_reset_watches con t domains cons data =
+-  Connection.del_watches con;
++let do_reset_watches con t domains cons data =
++  Connections.del_watches cons con;
+   Connection.del_transactions con
+ 
+ (* only in >= xen3.3                                                                                    *)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa333.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa333.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa333.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa333.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,39 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/pv: Handle the Intel-specific MSR_MISC_ENABLE correctly
+
+This MSR doesn't exist on AMD hardware, and switching away from the safe
+functions in the common MSR path was an erroneous change.
+
+Partially revert the change.
+
+This is XSA-333.
+
+Fixes: 4fdc932b3cc ("x86/Intel: drop another 32-bit leftover")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Wei Liu <wl@xen.org>
+
+diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
+index efeb2a727e..6332c74b80 100644
+--- a/xen/arch/x86/pv/emul-priv-op.c
++++ b/xen/arch/x86/pv/emul-priv-op.c
+@@ -924,7 +924,8 @@ static int read_msr(unsigned int reg, uint64_t *val,
+         return X86EMUL_OKAY;
+ 
+     case MSR_IA32_MISC_ENABLE:
+-        rdmsrl(reg, *val);
++        if ( rdmsr_safe(reg, *val) )
++            break;
+         *val = guest_misc_enable(*val);
+         return X86EMUL_OKAY;
+ 
+@@ -1059,7 +1060,8 @@ static int write_msr(unsigned int reg, uint64_t val,
+         break;
+ 
+     case MSR_IA32_MISC_ENABLE:
+-        rdmsrl(reg, temp);
++        if ( rdmsr_safe(reg, temp) )
++            break;
+         if ( val != guest_misc_enable(temp) )
+             goto invalid;
+         return X86EMUL_OKAY;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa336-4.11.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa336-4.11.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa336-4.11.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa336-4.11.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,256 @@
+From: Roger Pau Monné <roger.pau@citrix.com>
+Subject: x86/vpt: fix race when migrating timers between vCPUs
+
+The current vPT code will migrate the emulated timers between vCPUs
+(change the pt->vcpu field) while just holding the destination lock,
+either from create_periodic_time or pt_adjust_global_vcpu_target if
+the global target is adjusted. Changing the periodic_timer vCPU field
+in this way creates a race where a third party could grab the lock in
+the unlocked region of pt_adjust_global_vcpu_target (or before
+create_periodic_time performs the vcpu change) and then release the
+lock from a different vCPU, creating a locking imbalance.
+
+Introduce a per-domain rwlock in order to protect periodic_time
+migration between vCPU lists. Taking the lock in read mode prevents
+any timer from being migrated to a different vCPU, while taking it in
+write mode allows performing migration of timers across vCPUs. The
+per-vcpu locks are still used to protect all the other fields from the
+periodic_timer struct.
+
+Note that such migration shouldn't happen frequently, and hence
+there's no performance drop as a result of such locking.
+
+This is XSA-336.
+
+Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
+Tested-by: Igor Druzhinin <igor.druzhinin@citrix.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/arch/x86/hvm/hvm.c
++++ b/xen/arch/x86/hvm/hvm.c
+@@ -627,6 +627,8 @@ int hvm_domain_initialise(struct domain
+     /* need link to containing domain */
+     d->arch.hvm_domain.pl_time->domain = d;
+ 
++    rwlock_init(&d->arch.hvm_domain.pl_time->pt_migrate);
++
+     /* Set the default IO Bitmap. */
+     if ( is_hardware_domain(d) )
+     {
+--- a/xen/arch/x86/hvm/vpt.c
++++ b/xen/arch/x86/hvm/vpt.c
+@@ -152,23 +152,32 @@ static int pt_irq_masked(struct periodic
+     return 1;
+ }
+ 
+-static void pt_lock(struct periodic_time *pt)
++static void pt_vcpu_lock(struct vcpu *v)
+ {
+-    struct vcpu *v;
++    read_lock(&v->domain->arch.hvm_domain.pl_time->pt_migrate);
++    spin_lock(&v->arch.hvm_vcpu.tm_lock);
++}
+ 
+-    for ( ; ; )
+-    {
+-        v = pt->vcpu;
+-        spin_lock(&v->arch.hvm_vcpu.tm_lock);
+-        if ( likely(pt->vcpu == v) )
+-            break;
+-        spin_unlock(&v->arch.hvm_vcpu.tm_lock);
+-    }
++static void pt_vcpu_unlock(struct vcpu *v)
++{
++    spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++    read_unlock(&v->domain->arch.hvm_domain.pl_time->pt_migrate);
++}
++
++static void pt_lock(struct periodic_time *pt)
++{
++    /*
++     * We cannot use pt_vcpu_lock here, because we need to acquire the
++     * per-domain lock first and then (re-)fetch the value of pt->vcpu, or
++     * else we might be using a stale value of pt->vcpu.
++     */
++    read_lock(&pt->vcpu->domain->arch.hvm_domain.pl_time->pt_migrate);
++    spin_lock(&pt->vcpu->arch.hvm_vcpu.tm_lock);
+ }
+ 
+ static void pt_unlock(struct periodic_time *pt)
+ {
+-    spin_unlock(&pt->vcpu->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_unlock(pt->vcpu);
+ }
+ 
+ static void pt_process_missed_ticks(struct periodic_time *pt)
+@@ -218,7 +227,7 @@ void pt_save_timer(struct vcpu *v)
+     if ( v->pause_flags & VPF_blocked )
+         return;
+ 
+-    spin_lock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_lock(v);
+ 
+     list_for_each_entry ( pt, head, list )
+         if ( !pt->do_not_freeze )
+@@ -226,7 +235,7 @@ void pt_save_timer(struct vcpu *v)
+ 
+     pt_freeze_time(v);
+ 
+-    spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_unlock(v);
+ }
+ 
+ void pt_restore_timer(struct vcpu *v)
+@@ -234,7 +243,7 @@ void pt_restore_timer(struct vcpu *v)
+     struct list_head *head = &v->arch.hvm_vcpu.tm_list;
+     struct periodic_time *pt;
+ 
+-    spin_lock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_lock(v);
+ 
+     list_for_each_entry ( pt, head, list )
+     {
+@@ -247,7 +256,7 @@ void pt_restore_timer(struct vcpu *v)
+ 
+     pt_thaw_time(v);
+ 
+-    spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_unlock(v);
+ }
+ 
+ static void pt_timer_fn(void *data)
+@@ -272,7 +281,7 @@ int pt_update_irq(struct vcpu *v)
+     uint64_t max_lag;
+     int irq, pt_vector = -1;
+ 
+-    spin_lock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_lock(v);
+ 
+     earliest_pt = NULL;
+     max_lag = -1ULL;
+@@ -300,14 +309,14 @@ int pt_update_irq(struct vcpu *v)
+ 
+     if ( earliest_pt == NULL )
+     {
+-        spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++        pt_vcpu_unlock(v);
+         return -1;
+     }
+ 
+     earliest_pt->irq_issued = 1;
+     irq = earliest_pt->irq;
+ 
+-    spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_unlock(v);
+ 
+     switch ( earliest_pt->source )
+     {
+@@ -377,12 +386,12 @@ void pt_intr_post(struct vcpu *v, struct
+     if ( intack.source == hvm_intsrc_vector )
+         return;
+ 
+-    spin_lock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_lock(v);
+ 
+     pt = is_pt_irq(v, intack);
+     if ( pt == NULL )
+     {
+-        spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++        pt_vcpu_unlock(v);
+         return;
+     }
+ 
+@@ -421,7 +430,7 @@ void pt_intr_post(struct vcpu *v, struct
+     cb = pt->cb;
+     cb_priv = pt->priv;
+ 
+-    spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_unlock(v);
+ 
+     if ( cb != NULL )
+         cb(v, cb_priv);
+@@ -432,12 +441,12 @@ void pt_migrate(struct vcpu *v)
+     struct list_head *head = &v->arch.hvm_vcpu.tm_list;
+     struct periodic_time *pt;
+ 
+-    spin_lock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_lock(v);
+ 
+     list_for_each_entry ( pt, head, list )
+         migrate_timer(&pt->timer, v->processor);
+ 
+-    spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++    pt_vcpu_unlock(v);
+ }
+ 
+ void create_periodic_time(
+@@ -455,7 +464,7 @@ void create_periodic_time(
+ 
+     destroy_periodic_time(pt);
+ 
+-    spin_lock(&v->arch.hvm_vcpu.tm_lock);
++    write_lock(&v->domain->arch.hvm_domain.pl_time->pt_migrate);
+ 
+     pt->pending_intr_nr = 0;
+     pt->do_not_freeze = 0;
+@@ -504,7 +513,7 @@ void create_periodic_time(
+     init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
+     set_timer(&pt->timer, pt->scheduled);
+ 
+-    spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++    write_unlock(&v->domain->arch.hvm_domain.pl_time->pt_migrate);
+ }
+ 
+ void destroy_periodic_time(struct periodic_time *pt)
+@@ -529,30 +538,20 @@ void destroy_periodic_time(struct period
+ 
+ static void pt_adjust_vcpu(struct periodic_time *pt, struct vcpu *v)
+ {
+-    int on_list;
+-
+     ASSERT(pt->source == PTSRC_isa || pt->source == PTSRC_ioapic);
+ 
+     if ( pt->vcpu == NULL )
+         return;
+ 
+-    pt_lock(pt);
+-    on_list = pt->on_list;
+-    if ( pt->on_list )
+-        list_del(&pt->list);
+-    pt->on_list = 0;
+-    pt_unlock(pt);
+-
+-    spin_lock(&v->arch.hvm_vcpu.tm_lock);
++    write_lock(&pt->vcpu->domain->arch.hvm_domain.pl_time->pt_migrate);
+     pt->vcpu = v;
+-    if ( on_list )
++    if ( pt->on_list )
+     {
+-        pt->on_list = 1;
++        list_del(&pt->list);
+         list_add(&pt->list, &v->arch.hvm_vcpu.tm_list);
+-
+         migrate_timer(&pt->timer, v->processor);
+     }
+-    spin_unlock(&v->arch.hvm_vcpu.tm_lock);
++    write_unlock(&pt->vcpu->domain->arch.hvm_domain.pl_time->pt_migrate);
+ }
+ 
+ void pt_adjust_global_vcpu_target(struct vcpu *v)
+--- a/xen/include/asm-x86/hvm/vpt.h
++++ b/xen/include/asm-x86/hvm/vpt.h
+@@ -133,6 +133,13 @@ struct pl_time {    /* platform time */
+     struct RTCState  vrtc;
+     struct HPETState vhpet;
+     struct PMTState  vpmt;
++    /*
++     * rwlock to prevent periodic_time vCPU migration. Take the lock in read
++     * mode in order to prevent the vcpu field of periodic_time from changing.
++     * Lock must be taken in write mode when changes to the vcpu field are
++     * performed, as it allows exclusive access to all the timers of a domain.
++     */
++    rwlock_t pt_migrate;
+     /* guest_time = Xen sys time + stime_offset */
+     int64_t stime_offset;
+     /* Ensures monotonicity in appropriate timer modes. */
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa337-4.12-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa337-4.12-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa337-4.12-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa337-4.12-1.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,92 @@
+From: Roger Pau Monné <roger.pau@citrix.com>
+Subject: x86/msi: get rid of read_msi_msg
+
+It's safer and faster to just use the cached last written
+(untranslated) MSI message stored in msi_desc for the single user that
+calls read_msi_msg.
+
+This also prevents relying on the data read from the device MSI
+registers in order to figure out the index into the IOMMU interrupt
+remapping table, which is not safe.
+
+This is part of XSA-337.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/arch/x86/msi.c
++++ b/xen/arch/x86/msi.c
+@@ -192,59 +192,6 @@ void msi_compose_msg(unsigned vector, co
+                 MSI_DATA_VECTOR(vector);
+ }
+
+-static bool read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
+-{
+-    switch ( entry->msi_attrib.type )
+-    {
+-    case PCI_CAP_ID_MSI:
+-    {
+-        struct pci_dev *dev = entry->dev;
+-        int pos = entry->msi_attrib.pos;
+-        u16 data, seg = dev->seg;
+-        u8 bus = dev->bus;
+-        u8 slot = PCI_SLOT(dev->devfn);
+-        u8 func = PCI_FUNC(dev->devfn);
+-
+-        msg->address_lo = pci_conf_read32(seg, bus, slot, func,
+-                                          msi_lower_address_reg(pos));
+-        if ( entry->msi_attrib.is_64 )
+-        {
+-            msg->address_hi = pci_conf_read32(seg, bus, slot, func,
+-                                              msi_upper_address_reg(pos));
+-            data = pci_conf_read16(seg, bus, slot, func,
+-                                   msi_data_reg(pos, 1));
+-        }
+-        else
+-        {
+-            msg->address_hi = 0;
+-            data = pci_conf_read16(seg, bus, slot, func,
+-                                   msi_data_reg(pos, 0));
+-        }
+-        msg->data = data;
+-        break;
+-    }
+-    case PCI_CAP_ID_MSIX:
+-    {
+-        void __iomem *base = entry->mask_base;
+-
+-        if ( unlikely(!msix_memory_decoded(entry->dev,
+-                                           entry->msi_attrib.pos)) )
+-            return false;
+-        msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+-        msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+-        msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
+-        break;
+-    }
+-    default:
+-        BUG();
+-    }
+-
+-    if ( iommu_intremap )
+-        iommu_read_msi_from_ire(entry, msg);
+-
+-    return true;
+-}
+-
+ static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
+ {
+     entry->msg = *msg;
+@@ -322,10 +269,7 @@ void set_msi_affinity(struct irq_desc *d
+
+     ASSERT(spin_is_locked(&desc->lock));
+
+-    memset(&msg, 0, sizeof(msg));
+-    if ( !read_msi_msg(msi_desc, &msg) )
+-        return;
+-
++    msg = msi_desc->msg;
+     msg.data &= ~MSI_DATA_VECTOR_MASK;
+     msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
+     msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa337-4.12-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa337-4.12-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa337-4.12-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa337-4.12-2.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,182 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/MSI-X: restrict reading of table/PBA bases from BARs
+
+When assigned to less trusted or un-trusted guests, devices may change
+state behind our backs (they may e.g. get reset by means we may not know
+about). Therefore we should avoid reading BARs from hardware once a
+device is no longer owned by Dom0. Furthermore when we can't read a BAR,
+or when we read zero, we shouldn't instead use the caller provided
+address unless that caller can be trusted.
+
+Re-arrange the logic in msix_capability_init() such that only Dom0 (and
+only if the device isn't DomU-owned yet) or calls through
+PHYSDEVOP_prepare_msix will actually result in the reading of the
+respective BAR register(s). Additionally do so only as long as in-use
+table entries are known (note that invocation of PHYSDEVOP_prepare_msix
+counts as a "pseudo" entry). In all other uses the value already
+recorded will get used instead.
+
+Clear the recorded values in _pci_cleanup_msix() as well as on the one
+affected error path. (Adjust this error path to also avoid blindly
+disabling MSI-X when it was enabled on entry to the function.)
+
+While moving around variable declarations (in many cases to reduce their
+scopes), also adjust some of their types.
+
+This is part of XSA-337.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/arch/x86/msi.c
++++ b/xen/arch/x86/msi.c
+@@ -790,16 +790,14 @@ static int msix_capability_init(struct p
+ {
+     struct arch_msix *msix = dev->msix;
+     struct msi_desc *entry = NULL;
+-    int vf;
+     u16 control;
+     u64 table_paddr;
+     u32 table_offset;
+-    u8 bir, pbus, pslot, pfunc;
+     u16 seg = dev->seg;
+     u8 bus = dev->bus;
+     u8 slot = PCI_SLOT(dev->devfn);
+     u8 func = PCI_FUNC(dev->devfn);
+-    bool maskall = msix->host_maskall;
++    bool maskall = msix->host_maskall, zap_on_error = false;
+ 
+     ASSERT(pcidevs_locked());
+ 
+@@ -837,43 +835,45 @@ static int msix_capability_init(struct p
+     /* Locate MSI-X table region */
+     table_offset = pci_conf_read32(seg, bus, slot, func,
+                                    msix_table_offset_reg(pos));
+-    bir = (u8)(table_offset & PCI_MSIX_BIRMASK);
+-    table_offset &= ~PCI_MSIX_BIRMASK;
++    if ( !msix->used_entries &&
++         (!msi ||
++          (is_hardware_domain(current->domain) &&
++           (dev->domain == current->domain || dev->domain == dom_io))) )
++    {
++        unsigned int bir = table_offset & PCI_MSIX_BIRMASK, pbus, pslot, pfunc;
++        int vf;
++        paddr_t pba_paddr;
++        unsigned int pba_offset;
+ 
+-    if ( !dev->info.is_virtfn )
+-    {
+-        pbus = bus;
+-        pslot = slot;
+-        pfunc = func;
+-        vf = -1;
+-    }
+-    else
+-    {
+-        pbus = dev->info.physfn.bus;
+-        pslot = PCI_SLOT(dev->info.physfn.devfn);
+-        pfunc = PCI_FUNC(dev->info.physfn.devfn);
+-        vf = PCI_BDF2(dev->bus, dev->devfn);
+-    }
+-
+-    table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
+-    WARN_ON(msi && msi->table_base != table_paddr);
+-    if ( !table_paddr )
+-    {
+-        if ( !msi || !msi->table_base )
++        if ( !dev->info.is_virtfn )
+         {
+-            pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
+-                             control & ~PCI_MSIX_FLAGS_ENABLE);
+-            xfree(entry);
+-            return -ENXIO;
++            pbus = bus;
++            pslot = slot;
++            pfunc = func;
++            vf = -1;
++        }
++        else
++        {
++            pbus = dev->info.physfn.bus;
++            pslot = PCI_SLOT(dev->info.physfn.devfn);
++            pfunc = PCI_FUNC(dev->info.physfn.devfn);
++            vf = PCI_BDF2(dev->bus, dev->devfn);
+         }
+-        table_paddr = msi->table_base;
+-    }
+-    table_paddr += table_offset;
+ 
+-    if ( !msix->used_entries )
+-    {
+-        u64 pba_paddr;
+-        u32 pba_offset;
++        table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
++        WARN_ON(msi && msi->table_base != table_paddr);
++        if ( !table_paddr )
++        {
++            if ( !msi || !msi->table_base )
++            {
++                pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
++                                 control & ~PCI_MSIX_FLAGS_ENABLE);
++                xfree(entry);
++                return -ENXIO;
++            }
++            table_paddr = msi->table_base;
++        }
++        table_paddr += table_offset & ~PCI_MSIX_BIRMASK;
+ 
+         msix->nr_entries = nr_entries;
+         msix->table.first = PFN_DOWN(table_paddr);
+@@ -894,7 +894,19 @@ static int msix_capability_init(struct p
+                                   BITS_TO_LONGS(nr_entries) - 1);
+         WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, msix->pba.first,
+                                         msix->pba.last));
++
++        zap_on_error = true;
+     }
++    else if ( !msix->table.first )
++    {
++        pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
++                         control);
++        xfree(entry);
++        return -ENODATA;
++    }
++    else
++        table_paddr = (msix->table.first << PAGE_SHIFT) +
++                      (table_offset & ~PCI_MSIX_BIRMASK & ~PAGE_MASK);
+ 
+     if ( entry )
+     {
+@@ -905,8 +917,16 @@ static int msix_capability_init(struct p
+ 
+         if ( idx < 0 )
+         {
++            if ( zap_on_error )
++            {
++                msix->table.first = 0;
++                msix->pba.first = 0;
++
++                control &= ~PCI_MSIX_FLAGS_ENABLE;
++            }
++
+             pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),
+-                             control & ~PCI_MSIX_FLAGS_ENABLE);
++                             control);
+             xfree(entry);
+             return idx;
+         }
+@@ -1102,9 +1122,14 @@ static void _pci_cleanup_msix(struct arc
+         if ( rangeset_remove_range(mmio_ro_ranges, msix->table.first,
+                                    msix->table.last) )
+             WARN();
++        msix->table.first = 0;
++        msix->table.last = 0;
++
+         if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
+                                    msix->pba.last) )
+             WARN();
++        msix->pba.first = 0;
++        msix->pba.last = 0;
+     }
+ }
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa338.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa338.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa338.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa338.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,42 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn: relax port_is_valid()
+
+To avoid ports potentially becoming invalid behind the back of certain
+other functions (due to ->max_evtchn shrinking) because of
+- a guest invoking evtchn_reset() and from a 2nd vCPU opening new
+  channels in parallel (see also XSA-343),
+- alloc_unbound_xen_event_channel() produced channels living above the
+  2-level range (see also XSA-342),
+drop the max_evtchns check from port_is_valid(). For a port for which
+the function once returned "true", the returned value may not turn into
+"false" later on. The function's result may only depend on bounds which
+can only ever grow (which is the case for d->valid_evtchns).
+
+This also eliminates a false sense of safety, utilized by some of the
+users (see again XSA-343): Without a suitable lock held, d->max_evtchns
+may change at any time, and hence deducing that certain other operations
+are safe when port_is_valid() returned true is not legitimate. The
+opportunities to abuse this may get widened by the change here
+(depending on guest and host configuration), but will be taken care of
+by the other XSA.
+
+This is XSA-338.
+
+Fixes: 48974e6ce52e ("evtchn: use a per-domain variable for the max number of event channels")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+---
+v5: New, split from larger patch.
+
+--- a/xen/include/xen/event.h
++++ b/xen/include/xen/event.h
+@@ -107,8 +107,6 @@ void notify_via_xen_event_channel(struct
+ 
+ static inline bool_t port_is_valid(struct domain *d, unsigned int p)
+ {
+-    if ( p >= d->max_evtchns )
+-        return 0;
+     return p < read_atomic(&d->valid_evtchns);
+ }
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa339.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa339.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa339.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa339.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,76 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/pv: Avoid double exception injection
+
+There is at least one path (SYSENTER with NT set, Xen converts to #GP) which
+ends up injecting the #GP fault twice, first in compat_sysenter(), and then a
+second time in compat_test_all_events(), due to the stale TBF_EXCEPTION left
+in TRAPBOUNCE_flags.
+
+The guest kernel sees the second fault first, which is a kernel level #GP
+pointing at the head of the #GP handler, and is therefore a userspace
+trigger-able DoS.
+
+This particular bug has bitten us several times before, so rearrange
+{compat_,}create_bounce_frame() to clobber TRAPBOUNCE on success, rather than
+leaving this task to one area of code which isn't used uniformly.
+
+Other scenarios which might result in a double injection (e.g. two calls
+directly to compat_create_bounce_frame) will now crash the guest, which is far
+more obvious than letting the kernel run with corrupt state.
+
+This is XSA-339
+
+Fixes: fdac9515607b ("x86: clear EFLAGS.NT in SYSENTER entry path")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
+index c3e62f8734..73619f57ca 100644
+--- a/xen/arch/x86/x86_64/compat/entry.S
++++ b/xen/arch/x86/x86_64/compat/entry.S
+@@ -78,7 +78,6 @@ compat_process_softirqs:
+         sti
+ .Lcompat_bounce_exception:
+         call  compat_create_bounce_frame
+-        movb  $0, TRAPBOUNCE_flags(%rdx)
+         jmp   compat_test_all_events
+ 
+ 	ALIGN
+@@ -352,7 +351,13 @@ __UNLIKELY_END(compat_bounce_null_selector)
+         movl  %eax,UREGS_cs+8(%rsp)
+         movl  TRAPBOUNCE_eip(%rdx),%eax
+         movl  %eax,UREGS_rip+8(%rsp)
++
++        /* Trapbounce complete.  Clobber state to avoid an erroneous second injection. */
++        xor   %eax, %eax
++        mov   %ax,  TRAPBOUNCE_cs(%rdx)
++        mov   %al,  TRAPBOUNCE_flags(%rdx)
+         ret
++
+ .section .fixup,"ax"
+ .Lfx13:
+         xorl  %edi,%edi
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index 1e880eb9f6..71a00e846b 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -90,7 +90,6 @@ process_softirqs:
+         sti
+ .Lbounce_exception:
+         call  create_bounce_frame
+-        movb  $0, TRAPBOUNCE_flags(%rdx)
+         jmp   test_all_events
+ 
+         ALIGN
+@@ -512,6 +511,11 @@ UNLIKELY_START(z, create_bounce_frame_bad_bounce_ip)
+         jmp   asm_domain_crash_synchronous  /* Does not return */
+ __UNLIKELY_END(create_bounce_frame_bad_bounce_ip)
+         movq  %rax,UREGS_rip+8(%rsp)
++
++        /* Trapbounce complete.  Clobber state to avoid an erroneous second injection. */
++        xor   %eax, %eax
++        mov   %rax, TRAPBOUNCE_eip(%rdx)
++        mov   %al,  TRAPBOUNCE_flags(%rdx)
+         ret
+ 
+         .pushsection .fixup, "ax", @progbits
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa340.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa340.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa340.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa340.patch	2022-04-05 13:04:22.000000000 +0100
@@ -0,0 +1,65 @@
+From: Julien Grall <jgrall@amazon.com>
+Subject: xen/evtchn: Add missing barriers when accessing/allocating an event channel
+
+While the allocation of a bucket is always performed with the per-domain
+lock, the bucket may be accessed without the lock taken (for instance, see
+evtchn_send()).
+
+Instead such sites relies on port_is_valid() to return a non-zero value
+when the port has a struct evtchn associated to it. The function will
+mostly check whether the port is less than d->valid_evtchns as all the
+buckets/event channels should be allocated up to that point.
+
+Unfortunately a compiler is free to re-order the assignment in
+evtchn_allocate_port() so it would be possible to have d->valid_evtchns
+updated before the new bucket has finish to allocate.
+
+Additionally on Arm, even if this was compiled "correctly", the
+processor can still re-order the memory access.
+
+Add a write memory barrier in the allocation side and a read memory
+barrier when the port is valid to prevent any re-ordering issue.
+
+This is XSA-340.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -178,6 +178,13 @@ int evtchn_allocate_port(struct domain *
+             return -ENOMEM;
+         bucket_from_port(d, port) = chn;
+
++        /*
++         * d->valid_evtchns is used to check whether the bucket can be
++         * accessed without the per-domain lock. Therefore,
++         * d->valid_evtchns should be seen *after* the new bucket has
++         * been setup.
++         */
++        smp_wmb();
+         write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
+     }
+
+--- a/xen/include/xen/event.h
++++ b/xen/include/xen/event.h
+@@ -107,7 +107,17 @@ void notify_via_xen_event_channel(struct
+
+ static inline bool_t port_is_valid(struct domain *d, unsigned int p)
+ {
+-    return p < read_atomic(&d->valid_evtchns);
++    if ( p >= read_atomic(&d->valid_evtchns) )
++        return false;
++
++    /*
++     * The caller will usually access the event channel afterwards and
++     * may be done without taking the per-domain lock. The barrier is
++     * going in pair the smp_wmb() barrier in evtchn_allocate_port().
++     */
++    smp_rmb();
++
++    return true;
+ }
+
+ static inline struct evtchn *evtchn_from_port(struct domain *d, unsigned int p)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa342-4.13.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa342-4.13.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa342-4.13.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa342-4.13.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,145 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn/x86: enforce correct upper limit for 32-bit guests
+
+The recording of d->max_evtchns in evtchn_2l_init(), in particular with
+the limited set of callers of the function, is insufficient. Neither for
+PV nor for HVM guests the bitness is known at domain_create() time, yet
+the upper bound in 2-level mode depends upon guest bitness. Recording
+too high a limit "allows" x86 32-bit domains to open not properly usable
+event channels, management of which (inside Xen) would then result in
+corruption of the shared info and vCPU info structures.
+
+Keep the upper limit dynamic for the 2-level case, introducing a helper
+function to retrieve the effective limit. This helper is now supposed to
+be private to the event channel code. The used in do_poll() and
+domain_dump_evtchn_info() weren't consistent with port uses elsewhere
+and hence get switched to port_is_valid().
+
+Furthermore FIFO mode's setup_ports() gets adjusted to loop only up to
+the prior ABI limit, rather than all the way up to the new one.
+
+Finally a word on the change to do_poll(): Accessing ->max_evtchns
+without holding a suitable lock was never safe, as it as well as
+->evtchn_port_ops may change behind do_poll()'s back. Using
+port_is_valid() instead widens some the window for potential abuse,
+until we've dealt with the race altogether (see XSA-343).
+
+This is XSA-342.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Fixes: 48974e6ce52e ("evtchn: use a per-domain variable for the max number of event channels")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/event_2l.c
++++ b/xen/common/event_2l.c
+@@ -103,7 +103,6 @@ static const struct evtchn_port_ops evtc
+ void evtchn_2l_init(struct domain *d)
+ {
+     d->evtchn_port_ops = &evtchn_port_ops_2l;
+-    d->max_evtchns = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
+ }
+ 
+ /*
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -151,7 +151,7 @@ static void free_evtchn_bucket(struct do
+ 
+ int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
+ {
+-    if ( port > d->max_evtchn_port || port >= d->max_evtchns )
++    if ( port > d->max_evtchn_port || port >= max_evtchns(d) )
+         return -ENOSPC;
+ 
+     if ( port_is_valid(d, port) )
+@@ -1396,13 +1396,11 @@ static void domain_dump_evtchn_info(stru
+ 
+     spin_lock(&d->event_lock);
+ 
+-    for ( port = 1; port < d->max_evtchns; ++port )
++    for ( port = 1; port_is_valid(d, port); ++port )
+     {
+         const struct evtchn *chn;
+         char *ssid;
+ 
+-        if ( !port_is_valid(d, port) )
+-            continue;
+         chn = evtchn_from_port(d, port);
+         if ( chn->state == ECS_FREE )
+             continue;
+--- a/xen/common/event_fifo.c
++++ b/xen/common/event_fifo.c
+@@ -478,7 +478,7 @@ static void cleanup_event_array(struct d
+     d->evtchn_fifo = NULL;
+ }
+ 
+-static void setup_ports(struct domain *d)
++static void setup_ports(struct domain *d, unsigned int prev_evtchns)
+ {
+     unsigned int port;
+ 
+@@ -488,7 +488,7 @@ static void setup_ports(struct domain *d
+      * - save its pending state.
+      * - set default priority.
+      */
+-    for ( port = 1; port < d->max_evtchns; port++ )
++    for ( port = 1; port < prev_evtchns; port++ )
+     {
+         struct evtchn *evtchn;
+ 
+@@ -546,6 +546,8 @@ int evtchn_fifo_init_control(struct evtc
+     if ( !d->evtchn_fifo )
+     {
+         struct vcpu *vcb;
++        /* Latch the value before it changes during setup_event_array(). */
++        unsigned int prev_evtchns = max_evtchns(d);
+ 
+         for_each_vcpu ( d, vcb ) {
+             rc = setup_control_block(vcb);
+@@ -562,8 +564,7 @@ int evtchn_fifo_init_control(struct evtc
+             goto error;
+ 
+         d->evtchn_port_ops = &evtchn_port_ops_fifo;
+-        d->max_evtchns = EVTCHN_FIFO_NR_CHANNELS;
+-        setup_ports(d);
++        setup_ports(d, prev_evtchns);
+     }
+     else
+         rc = map_control_block(v, gfn, offset);
+--- a/xen/common/schedule.c
++++ b/xen/common/schedule.c
+@@ -1434,7 +1434,7 @@ static long do_poll(struct sched_poll *s
+             goto out;
+ 
+         rc = -EINVAL;
+-        if ( port >= d->max_evtchns )
++        if ( !port_is_valid(d, port) )
+             goto out;
+ 
+         rc = 0;
+--- a/xen/include/xen/event.h
++++ b/xen/include/xen/event.h
+@@ -105,6 +105,12 @@ void notify_via_xen_event_channel(struct
+ #define bucket_from_port(d, p) \
+     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
+ 
++static inline unsigned int max_evtchns(const struct domain *d)
++{
++    return d->evtchn_fifo ? EVTCHN_FIFO_NR_CHANNELS
++                          : BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
++}
++
+ static inline bool_t port_is_valid(struct domain *d, unsigned int p)
+ {
+     if ( p >= read_atomic(&d->valid_evtchns) )
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -382,7 +382,6 @@ struct domain
+     /* Event channel information. */
+     struct evtchn   *evtchn;                         /* first bucket only */
+     struct evtchn  **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */
+-    unsigned int     max_evtchns;     /* number supported by ABI */
+     unsigned int     max_evtchn_port; /* max permitted port number */
+     unsigned int     valid_evtchns;   /* number of allocated event channels */
+     spinlock_t       event_lock;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-1.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,190 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn: evtchn_reset() shouldn't succeed with still-open ports
+
+While the function closes all ports, it does so without holding any
+lock, and hence racing requests may be issued causing new ports to get
+opened. This would have been problematic in particular if such a newly
+opened port had a port number above the new implementation limit (i.e.
+when switching from FIFO to 2-level) after the reset, as prior to
+"evtchn: relax port_is_valid()" this could have led to e.g.
+evtchn_close()'s "BUG_ON(!port_is_valid(d2, port2))" to trigger.
+
+Introduce a counter of active ports and check that it's (still) no
+larger then the number of Xen internally used ones after obtaining the
+necessary lock in evtchn_reset().
+
+As to the access model of the new {active,xen}_evtchns fields - while
+all writes get done using write_atomic(), reads ought to use
+read_atomic() only when outside of a suitably locked region.
+
+Note that as of now evtchn_bind_virq() and evtchn_bind_ipi() don't have
+a need to call check_free_port().
+
+This is part of XSA-343.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -188,6 +188,8 @@ int evtchn_allocate_port(struct domain *
+         write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
+     }
+ 
++    write_atomic(&d->active_evtchns, d->active_evtchns + 1);
++
+     return 0;
+ }
+ 
+@@ -211,11 +213,26 @@ static int get_free_port(struct domain *
+     return -ENOSPC;
+ }
+ 
++/*
++ * Check whether a port is still marked free, and if so update the domain
++ * counter accordingly.  To be used on function exit paths.
++ */
++static void check_free_port(struct domain *d, evtchn_port_t port)
++{
++    if ( port_is_valid(d, port) &&
++         evtchn_from_port(d, port)->state == ECS_FREE )
++        write_atomic(&d->active_evtchns, d->active_evtchns - 1);
++}
++
+ void evtchn_free(struct domain *d, struct evtchn *chn)
+ {
+     /* Clear pending event to avoid unexpected behavior on re-bind. */
+     evtchn_port_clear_pending(d, chn);
+ 
++    if ( consumer_is_xen(chn) )
++        write_atomic(&d->xen_evtchns, d->xen_evtchns - 1);
++    write_atomic(&d->active_evtchns, d->active_evtchns - 1);
++
+     /* Reset binding to vcpu0 when the channel is freed. */
+     chn->state          = ECS_FREE;
+     chn->notify_vcpu_id = 0;
+@@ -258,6 +275,7 @@ static long evtchn_alloc_unbound(evtchn_
+     alloc->port = port;
+ 
+  out:
++    check_free_port(d, port);
+     spin_unlock(&d->event_lock);
+     rcu_unlock_domain(d);
+ 
+@@ -351,6 +369,7 @@ static long evtchn_bind_interdomain(evtc
+     bind->local_port = lport;
+ 
+  out:
++    check_free_port(ld, lport);
+     spin_unlock(&ld->event_lock);
+     if ( ld != rd )
+         spin_unlock(&rd->event_lock);
+@@ -484,7 +503,7 @@ static long evtchn_bind_pirq(evtchn_bind
+     struct domain *d = current->domain;
+     struct vcpu   *v = d->vcpu[0];
+     struct pirq   *info;
+-    int            port, pirq = bind->pirq;
++    int            port = 0, pirq = bind->pirq;
+     long           rc;
+ 
+     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
+@@ -532,6 +551,7 @@ static long evtchn_bind_pirq(evtchn_bind
+     arch_evtchn_bind_pirq(d, pirq);
+ 
+  out:
++    check_free_port(d, port);
+     spin_unlock(&d->event_lock);
+ 
+     return rc;
+@@ -1005,10 +1025,10 @@ int evtchn_unmask(unsigned int port)
+     return 0;
+ }
+ 
+-
+ int evtchn_reset(struct domain *d)
+ {
+     unsigned int i;
++    int rc = 0;
+ 
+     if ( d != current->domain && !d->controller_pause_count )
+         return -EINVAL;
+@@ -1018,7 +1038,9 @@ int evtchn_reset(struct domain *d)
+ 
+     spin_lock(&d->event_lock);
+ 
+-    if ( d->evtchn_fifo )
++    if ( d->active_evtchns > d->xen_evtchns )
++        rc = -EAGAIN;
++    else if ( d->evtchn_fifo )
+     {
+         /* Switching back to 2-level ABI. */
+         evtchn_fifo_destroy(d);
+@@ -1027,7 +1049,7 @@ int evtchn_reset(struct domain *d)
+ 
+     spin_unlock(&d->event_lock);
+ 
+-    return 0;
++    return rc;
+ }
+ 
+ static long evtchn_set_priority(const struct evtchn_set_priority *set_priority)
+@@ -1213,10 +1235,9 @@ int alloc_unbound_xen_event_channel(
+ 
+     spin_lock(&ld->event_lock);
+ 
+-    rc = get_free_port(ld);
++    port = rc = get_free_port(ld);
+     if ( rc < 0 )
+         goto out;
+-    port = rc;
+     chn = evtchn_from_port(ld, port);
+ 
+     rc = xsm_evtchn_unbound(XSM_TARGET, ld, chn, remote_domid);
+@@ -1232,7 +1253,10 @@ int alloc_unbound_xen_event_channel(
+ 
+     spin_unlock(&chn->lock);
+ 
++    write_atomic(&ld->xen_evtchns, ld->xen_evtchns + 1);
++
+  out:
++    check_free_port(ld, port);
+     spin_unlock(&ld->event_lock);
+ 
+     return rc < 0 ? rc : port;
+@@ -1308,6 +1332,7 @@ int evtchn_init(struct domain *d)
+         return -EINVAL;
+     }
+     evtchn_from_port(d, 0)->state = ECS_RESERVED;
++    write_atomic(&d->active_evtchns, 0);
+ 
+ #if MAX_VIRT_CPUS > BITS_PER_LONG
+     d->poll_mask = xzalloc_array(unsigned long,
+@@ -1335,6 +1360,8 @@ void evtchn_destroy(struct domain *d)
+     for ( i = 0; port_is_valid(d, i); i++ )
+         evtchn_close(d, i, 0);
+ 
++    ASSERT(!d->active_evtchns);
++
+     clear_global_virq_handlers(d);
+ 
+     evtchn_fifo_destroy(d);
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -345,6 +345,16 @@ struct domain
+     struct evtchn  **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */
+     unsigned int     max_evtchn_port; /* max permitted port number */
+     unsigned int     valid_evtchns;   /* number of allocated event channels */
++    /*
++     * Number of in-use event channels.  Writers should use write_atomic().
++     * Readers need to use read_atomic() only when not holding event_lock.
++     */
++    unsigned int     active_evtchns;
++    /*
++     * Number of event channels used internally by Xen (not subject to
++     * EVTCHNOP_reset).  Read/write access like for active_evtchns.
++     */
++    unsigned int     xen_evtchns;
+     spinlock_t       event_lock;
+     const struct evtchn_port_ops *evtchn_port_ops;
+     struct evtchn_fifo_domain *evtchn_fifo;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-2.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,290 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn: convert per-channel lock to be IRQ-safe
+
+... in order for send_guest_{global,vcpu}_virq() to be able to make use
+of it.
+
+This is part of XSA-343.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -248,6 +248,7 @@ static long evtchn_alloc_unbound(evtchn_
+     int            port;
+     domid_t        dom = alloc->dom;
+     long           rc;
++    unsigned long  flags;
+ 
+     d = rcu_lock_domain_by_any_id(dom);
+     if ( d == NULL )
+@@ -263,14 +264,14 @@ static long evtchn_alloc_unbound(evtchn_
+     if ( rc )
+         goto out;
+ 
+-    spin_lock(&chn->lock);
++    spin_lock_irqsave(&chn->lock, flags);
+ 
+     chn->state = ECS_UNBOUND;
+     if ( (chn->u.unbound.remote_domid = alloc->remote_dom) == DOMID_SELF )
+         chn->u.unbound.remote_domid = current->domain->domain_id;
+     evtchn_port_init(d, chn);
+ 
+-    spin_unlock(&chn->lock);
++    spin_unlock_irqrestore(&chn->lock, flags);
+ 
+     alloc->port = port;
+ 
+@@ -283,26 +284,32 @@ static long evtchn_alloc_unbound(evtchn_
+ }
+ 
+ 
+-static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
++static unsigned long double_evtchn_lock(struct evtchn *lchn,
++                                        struct evtchn *rchn)
+ {
+-    if ( lchn < rchn )
++    unsigned long flags;
++
++    if ( lchn <= rchn )
+     {
+-        spin_lock(&lchn->lock);
+-        spin_lock(&rchn->lock);
++        spin_lock_irqsave(&lchn->lock, flags);
++        if ( lchn != rchn )
++            spin_lock(&rchn->lock);
+     }
+     else
+     {
+-        if ( lchn != rchn )
+-            spin_lock(&rchn->lock);
++        spin_lock_irqsave(&rchn->lock, flags);
+         spin_lock(&lchn->lock);
+     }
++
++    return flags;
+ }
+ 
+-static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn)
++static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn,
++                                 unsigned long flags)
+ {
+-    spin_unlock(&lchn->lock);
+     if ( lchn != rchn )
+-        spin_unlock(&rchn->lock);
++        spin_unlock(&lchn->lock);
++    spin_unlock_irqrestore(&rchn->lock, flags);
+ }
+ 
+ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
+@@ -312,6 +319,7 @@ static long evtchn_bind_interdomain(evtc
+     int            lport, rport = bind->remote_port;
+     domid_t        rdom = bind->remote_dom;
+     long           rc;
++    unsigned long  flags;
+ 
+     if ( rdom == DOMID_SELF )
+         rdom = current->domain->domain_id;
+@@ -347,7 +355,7 @@ static long evtchn_bind_interdomain(evtc
+     if ( rc )
+         goto out;
+ 
+-    double_evtchn_lock(lchn, rchn);
++    flags = double_evtchn_lock(lchn, rchn);
+ 
+     lchn->u.interdomain.remote_dom  = rd;
+     lchn->u.interdomain.remote_port = rport;
+@@ -364,7 +372,7 @@ static long evtchn_bind_interdomain(evtc
+      */
+     evtchn_port_set_pending(ld, lchn->notify_vcpu_id, lchn);
+ 
+-    double_evtchn_unlock(lchn, rchn);
++    double_evtchn_unlock(lchn, rchn, flags);
+ 
+     bind->local_port = lport;
+ 
+@@ -387,6 +395,7 @@ int evtchn_bind_virq(evtchn_bind_virq_t
+     struct domain *d = current->domain;
+     int            virq = bind->virq, vcpu = bind->vcpu;
+     int            rc = 0;
++    unsigned long  flags;
+ 
+     if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
+         return -EINVAL;
+@@ -419,14 +428,14 @@ int evtchn_bind_virq(evtchn_bind_virq_t
+ 
+     chn = evtchn_from_port(d, port);
+ 
+-    spin_lock(&chn->lock);
++    spin_lock_irqsave(&chn->lock, flags);
+ 
+     chn->state          = ECS_VIRQ;
+     chn->notify_vcpu_id = vcpu;
+     chn->u.virq         = virq;
+     evtchn_port_init(d, chn);
+ 
+-    spin_unlock(&chn->lock);
++    spin_unlock_irqrestore(&chn->lock, flags);
+ 
+     v->virq_to_evtchn[virq] = bind->port = port;
+ 
+@@ -443,6 +452,7 @@ static long evtchn_bind_ipi(evtchn_bind_
+     struct domain *d = current->domain;
+     int            port, vcpu = bind->vcpu;
+     long           rc = 0;
++    unsigned long  flags;
+ 
+     if ( (vcpu < 0) || (vcpu >= d->max_vcpus) ||
+          (d->vcpu[vcpu] == NULL) )
+@@ -455,13 +465,13 @@ static long evtchn_bind_ipi(evtchn_bind_
+ 
+     chn = evtchn_from_port(d, port);
+ 
+-    spin_lock(&chn->lock);
++    spin_lock_irqsave(&chn->lock, flags);
+ 
+     chn->state          = ECS_IPI;
+     chn->notify_vcpu_id = vcpu;
+     evtchn_port_init(d, chn);
+ 
+-    spin_unlock(&chn->lock);
++    spin_unlock_irqrestore(&chn->lock, flags);
+ 
+     bind->port = port;
+ 
+@@ -505,6 +515,7 @@ static long evtchn_bind_pirq(evtchn_bind
+     struct pirq   *info;
+     int            port = 0, pirq = bind->pirq;
+     long           rc;
++    unsigned long  flags;
+ 
+     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
+         return -EINVAL;
+@@ -537,14 +548,14 @@ static long evtchn_bind_pirq(evtchn_bind
+         goto out;
+     }
+ 
+-    spin_lock(&chn->lock);
++    spin_lock_irqsave(&chn->lock, flags);
+ 
+     chn->state  = ECS_PIRQ;
+     chn->u.pirq.irq = pirq;
+     link_pirq_port(port, chn, v);
+     evtchn_port_init(d, chn);
+ 
+-    spin_unlock(&chn->lock);
++    spin_unlock_irqrestore(&chn->lock, flags);
+ 
+     bind->port = port;
+ 
+@@ -565,6 +576,7 @@ int evtchn_close(struct domain *d1, int
+     struct evtchn *chn1, *chn2;
+     int            port2;
+     long           rc = 0;
++    unsigned long  flags;
+ 
+  again:
+     spin_lock(&d1->event_lock);
+@@ -664,14 +676,14 @@ int evtchn_close(struct domain *d1, int
+         BUG_ON(chn2->state != ECS_INTERDOMAIN);
+         BUG_ON(chn2->u.interdomain.remote_dom != d1);
+ 
+-        double_evtchn_lock(chn1, chn2);
++        flags = double_evtchn_lock(chn1, chn2);
+ 
+         evtchn_free(d1, chn1);
+ 
+         chn2->state = ECS_UNBOUND;
+         chn2->u.unbound.remote_domid = d1->domain_id;
+ 
+-        double_evtchn_unlock(chn1, chn2);
++        double_evtchn_unlock(chn1, chn2, flags);
+ 
+         goto out;
+ 
+@@ -679,9 +691,9 @@ int evtchn_close(struct domain *d1, int
+         BUG();
+     }
+ 
+-    spin_lock(&chn1->lock);
++    spin_lock_irqsave(&chn1->lock, flags);
+     evtchn_free(d1, chn1);
+-    spin_unlock(&chn1->lock);
++    spin_unlock_irqrestore(&chn1->lock, flags);
+ 
+  out:
+     if ( d2 != NULL )
+@@ -701,13 +713,14 @@ int evtchn_send(struct domain *ld, unsig
+     struct evtchn *lchn, *rchn;
+     struct domain *rd;
+     int            rport, ret = 0;
++    unsigned long  flags;
+ 
+     if ( !port_is_valid(ld, lport) )
+         return -EINVAL;
+ 
+     lchn = evtchn_from_port(ld, lport);
+ 
+-    spin_lock(&lchn->lock);
++    spin_lock_irqsave(&lchn->lock, flags);
+ 
+     /* Guest cannot send via a Xen-attached event channel. */
+     if ( unlikely(consumer_is_xen(lchn)) )
+@@ -742,7 +755,7 @@ int evtchn_send(struct domain *ld, unsig
+     }
+ 
+ out:
+-    spin_unlock(&lchn->lock);
++    spin_unlock_irqrestore(&lchn->lock, flags);
+ 
+     return ret;
+ }
+@@ -1232,6 +1245,7 @@ int alloc_unbound_xen_event_channel(
+ {
+     struct evtchn *chn;
+     int            port, rc;
++    unsigned long  flags;
+ 
+     spin_lock(&ld->event_lock);
+ 
+@@ -1244,14 +1258,14 @@ int alloc_unbound_xen_event_channel(
+     if ( rc )
+         goto out;
+ 
+-    spin_lock(&chn->lock);
++    spin_lock_irqsave(&chn->lock, flags);
+ 
+     chn->state = ECS_UNBOUND;
+     chn->xen_consumer = get_xen_consumer(notification_fn);
+     chn->notify_vcpu_id = lvcpu;
+     chn->u.unbound.remote_domid = remote_domid;
+ 
+-    spin_unlock(&chn->lock);
++    spin_unlock_irqrestore(&chn->lock, flags);
+ 
+     write_atomic(&ld->xen_evtchns, ld->xen_evtchns + 1);
+ 
+@@ -1274,11 +1288,12 @@ void notify_via_xen_event_channel(struct
+ {
+     struct evtchn *lchn, *rchn;
+     struct domain *rd;
++    unsigned long flags;
+ 
+     ASSERT(port_is_valid(ld, lport));
+     lchn = evtchn_from_port(ld, lport);
+ 
+-    spin_lock(&lchn->lock);
++    spin_lock_irqsave(&lchn->lock, flags);
+ 
+     if ( likely(lchn->state == ECS_INTERDOMAIN) )
+     {
+@@ -1288,7 +1303,7 @@ void notify_via_xen_event_channel(struct
+         evtchn_port_set_pending(rd, rchn->notify_vcpu_id, rchn);
+     }
+ 
+-    spin_unlock(&lchn->lock);
++    spin_unlock_irqrestore(&lchn->lock, flags);
+ }
+ 
+ void evtchn_check_pollers(struct domain *d, unsigned int port)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-3.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-3.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-3.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa343-4.11-3.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,381 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn: address races with evtchn_reset()
+
+Neither d->evtchn_port_ops nor max_evtchns(d) may be used in an entirely
+lock-less manner, as both may change by a racing evtchn_reset(). In the
+common case, at least one of the domain's event lock or the per-channel
+lock needs to be held. In the specific case of the inter-domain sending
+by evtchn_send() and notify_via_xen_event_channel() holding the other
+side's per-channel lock is sufficient, as the channel can't change state
+without both per-channel locks held. Without such a channel changing
+state, evtchn_reset() can't complete successfully.
+
+Lock-free accesses continue to be permitted for the shim (calling some
+otherwise internal event channel functions), as this happens while the
+domain is in effectively single-threaded mode. Special care also needs
+taking for the shim's marking of in-use ports as ECS_RESERVED (allowing
+use of such ports in the shim case is okay because switching into and
+hence also out of FIFO mode is impossible there).
+
+As a side effect, certain operations on Xen bound event channels which
+were mistakenly permitted so far (e.g. unmask or poll) will be refused
+now.
+
+This is part of XSA-343.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2367,14 +2367,24 @@ static void dump_irqs(unsigned char key)
+ 
+             for ( i = 0; i < action->nr_guests; i++ )
+             {
++                struct evtchn *evtchn;
++                unsigned int pending = 2, masked = 2;
++
+                 d = action->guest[i];
+                 pirq = domain_irq_to_pirq(d, irq);
+                 info = pirq_info(d, pirq);
++                evtchn = evtchn_from_port(d, info->evtchn);
++                local_irq_disable();
++                if ( spin_trylock(&evtchn->lock) )
++                {
++                    pending = evtchn_is_pending(d, evtchn);
++                    masked = evtchn_is_masked(d, evtchn);
++                    spin_unlock(&evtchn->lock);
++                }
++                local_irq_enable();
+                 printk("%u:%3d(%c%c%c)",
+-                       d->domain_id, pirq,
+-                       evtchn_port_is_pending(d, info->evtchn) ? 'P' : '-',
+-                       evtchn_port_is_masked(d, info->evtchn) ? 'M' : '-',
+-                       (info->masked ? 'M' : '-'));
++                       d->domain_id, pirq, "-P?"[pending],
++                       "-M?"[masked], info->masked ? 'M' : '-');
+                 if ( i != action->nr_guests )
+                     printk(",");
+             }
+--- a/xen/arch/x86/pv/shim.c
++++ b/xen/arch/x86/pv/shim.c
+@@ -616,8 +616,11 @@ void pv_shim_inject_evtchn(unsigned int
+     if ( port_is_valid(guest, port) )
+     {
+         struct evtchn *chn = evtchn_from_port(guest, port);
++        unsigned long flags;
+ 
++        spin_lock_irqsave(&chn->lock, flags);
+         evtchn_port_set_pending(guest, chn->notify_vcpu_id, chn);
++        spin_unlock_irqrestore(&chn->lock, flags);
+     }
+ }
+ 
+--- a/xen/common/event_2l.c
++++ b/xen/common/event_2l.c
+@@ -63,8 +63,10 @@ static void evtchn_2l_unmask(struct doma
+     }
+ }
+ 
+-static bool evtchn_2l_is_pending(const struct domain *d, evtchn_port_t port)
++static bool evtchn_2l_is_pending(const struct domain *d,
++                                 const struct evtchn *evtchn)
+ {
++    evtchn_port_t port = evtchn->port;
+     unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
+ 
+     ASSERT(port < max_ports);
+@@ -72,8 +74,10 @@ static bool evtchn_2l_is_pending(const s
+             guest_test_bit(d, port, &shared_info(d, evtchn_pending)));
+ }
+ 
+-static bool evtchn_2l_is_masked(const struct domain *d, evtchn_port_t port)
++static bool evtchn_2l_is_masked(const struct domain *d,
++                                const struct evtchn *evtchn)
+ {
++    evtchn_port_t port = evtchn->port;
+     unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
+ 
+     ASSERT(port < max_ports);
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -156,8 +156,9 @@ int evtchn_allocate_port(struct domain *
+ 
+     if ( port_is_valid(d, port) )
+     {
+-        if ( evtchn_from_port(d, port)->state != ECS_FREE ||
+-             evtchn_port_is_busy(d, port) )
++        const struct evtchn *chn = evtchn_from_port(d, port);
++
++        if ( chn->state != ECS_FREE || evtchn_is_busy(d, chn) )
+             return -EBUSY;
+     }
+     else
+@@ -770,6 +771,7 @@ void send_guest_vcpu_virq(struct vcpu *v
+     unsigned long flags;
+     int port;
+     struct domain *d;
++    struct evtchn *chn;
+ 
+     ASSERT(!virq_is_global(virq));
+ 
+@@ -780,7 +782,10 @@ void send_guest_vcpu_virq(struct vcpu *v
+         goto out;
+ 
+     d = v->domain;
+-    evtchn_port_set_pending(d, v->vcpu_id, evtchn_from_port(d, port));
++    chn = evtchn_from_port(d, port);
++    spin_lock(&chn->lock);
++    evtchn_port_set_pending(d, v->vcpu_id, chn);
++    spin_unlock(&chn->lock);
+ 
+  out:
+     spin_unlock_irqrestore(&v->virq_lock, flags);
+@@ -809,7 +814,9 @@ static void send_guest_global_virq(struc
+         goto out;
+ 
+     chn = evtchn_from_port(d, port);
++    spin_lock(&chn->lock);
+     evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
++    spin_unlock(&chn->lock);
+ 
+  out:
+     spin_unlock_irqrestore(&v->virq_lock, flags);
+@@ -819,6 +826,7 @@ void send_guest_pirq(struct domain *d, c
+ {
+     int port;
+     struct evtchn *chn;
++    unsigned long flags;
+ 
+     /*
+      * PV guests: It should not be possible to race with __evtchn_close(). The
+@@ -833,7 +841,9 @@ void send_guest_pirq(struct domain *d, c
+     }
+ 
+     chn = evtchn_from_port(d, port);
++    spin_lock_irqsave(&chn->lock, flags);
+     evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
++    spin_unlock_irqrestore(&chn->lock, flags);
+ }
+ 
+ static struct domain *global_virq_handlers[NR_VIRQS] __read_mostly;
+@@ -1028,12 +1038,15 @@ int evtchn_unmask(unsigned int port)
+ {
+     struct domain *d = current->domain;
+     struct evtchn *evtchn;
++    unsigned long flags;
+ 
+     if ( unlikely(!port_is_valid(d, port)) )
+         return -EINVAL;
+ 
+     evtchn = evtchn_from_port(d, port);
++    spin_lock_irqsave(&evtchn->lock, flags);
+     evtchn_port_unmask(d, evtchn);
++    spin_unlock_irqrestore(&evtchn->lock, flags);
+ 
+     return 0;
+ }
+@@ -1446,8 +1459,8 @@ static void domain_dump_evtchn_info(stru
+ 
+         printk("    %4u [%d/%d/",
+                port,
+-               evtchn_port_is_pending(d, port),
+-               evtchn_port_is_masked(d, port));
++               evtchn_is_pending(d, chn),
++               evtchn_is_masked(d, chn));
+         evtchn_port_print_state(d, chn);
+         printk("]: s=%d n=%d x=%d",
+                chn->state, chn->notify_vcpu_id, chn->xen_consumer);
+--- a/xen/common/event_fifo.c
++++ b/xen/common/event_fifo.c
+@@ -295,23 +295,26 @@ static void evtchn_fifo_unmask(struct do
+         evtchn_fifo_set_pending(v, evtchn);
+ }
+ 
+-static bool evtchn_fifo_is_pending(const struct domain *d, evtchn_port_t port)
++static bool evtchn_fifo_is_pending(const struct domain *d,
++                                   const struct evtchn *evtchn)
+ {
+-    const event_word_t *word = evtchn_fifo_word_from_port(d, port);
++    const event_word_t *word = evtchn_fifo_word_from_port(d, evtchn->port);
+ 
+     return word && guest_test_bit(d, EVTCHN_FIFO_PENDING, word);
+ }
+ 
+-static bool_t evtchn_fifo_is_masked(const struct domain *d, evtchn_port_t port)
++static bool_t evtchn_fifo_is_masked(const struct domain *d,
++                                    const struct evtchn *evtchn)
+ {
+-    const event_word_t *word = evtchn_fifo_word_from_port(d, port);
++    const event_word_t *word = evtchn_fifo_word_from_port(d, evtchn->port);
+ 
+     return !word || guest_test_bit(d, EVTCHN_FIFO_MASKED, word);
+ }
+ 
+-static bool_t evtchn_fifo_is_busy(const struct domain *d, evtchn_port_t port)
++static bool_t evtchn_fifo_is_busy(const struct domain *d,
++                                  const struct evtchn *evtchn)
+ {
+-    const event_word_t *word = evtchn_fifo_word_from_port(d, port);
++    const event_word_t *word = evtchn_fifo_word_from_port(d, evtchn->port);
+ 
+     return word && guest_test_bit(d, EVTCHN_FIFO_LINKED, word);
+ }
+--- a/xen/include/asm-x86/event.h
++++ b/xen/include/asm-x86/event.h
+@@ -47,4 +47,10 @@ static inline bool arch_virq_is_global(u
+     return true;
+ }
+ 
++#ifdef CONFIG_PV_SHIM
++# include <asm/pv/shim.h>
++# define arch_evtchn_is_special(chn) \
++             (pv_shim && (chn)->port && (chn)->state == ECS_RESERVED)
++#endif
++
+ #endif
+--- a/xen/include/xen/event.h
++++ b/xen/include/xen/event.h
+@@ -125,6 +125,24 @@ static inline struct evtchn *evtchn_from
+     return bucket_from_port(d, p) + (p % EVTCHNS_PER_BUCKET);
+ }
+ 
++/*
++ * "usable" as in "by a guest", i.e. Xen consumed channels are assumed to be
++ * taken care of separately where used for Xen's internal purposes.
++ */
++static bool evtchn_usable(const struct evtchn *evtchn)
++{
++    if ( evtchn->xen_consumer )
++        return false;
++
++#ifdef arch_evtchn_is_special
++    if ( arch_evtchn_is_special(evtchn) )
++        return true;
++#endif
++
++    BUILD_BUG_ON(ECS_FREE > ECS_RESERVED);
++    return evtchn->state > ECS_RESERVED;
++}
++
+ /* Wait on a Xen-attached event channel. */
+ #define wait_on_xen_event_channel(port, condition)                      \
+     do {                                                                \
+@@ -157,19 +175,24 @@ int evtchn_reset(struct domain *d);
+ 
+ /*
+  * Low-level event channel port ops.
++ *
++ * All hooks have to be called with a lock held which prevents the channel
++ * from changing state. This may be the domain event lock, the per-channel
++ * lock, or in the case of sending interdomain events also the other side's
++ * per-channel lock. Exceptions apply in certain cases for the PV shim.
+  */
+ struct evtchn_port_ops {
+     void (*init)(struct domain *d, struct evtchn *evtchn);
+     void (*set_pending)(struct vcpu *v, struct evtchn *evtchn);
+     void (*clear_pending)(struct domain *d, struct evtchn *evtchn);
+     void (*unmask)(struct domain *d, struct evtchn *evtchn);
+-    bool (*is_pending)(const struct domain *d, evtchn_port_t port);
+-    bool (*is_masked)(const struct domain *d, evtchn_port_t port);
++    bool (*is_pending)(const struct domain *d, const struct evtchn *evtchn);
++    bool (*is_masked)(const struct domain *d, const struct evtchn *evtchn);
+     /*
+      * Is the port unavailable because it's still being cleaned up
+      * after being closed?
+      */
+-    bool (*is_busy)(const struct domain *d, evtchn_port_t port);
++    bool (*is_busy)(const struct domain *d, const struct evtchn *evtchn);
+     int (*set_priority)(struct domain *d, struct evtchn *evtchn,
+                         unsigned int priority);
+     void (*print_state)(struct domain *d, const struct evtchn *evtchn);
+@@ -185,38 +208,67 @@ static inline void evtchn_port_set_pendi
+                                            unsigned int vcpu_id,
+                                            struct evtchn *evtchn)
+ {
+-    d->evtchn_port_ops->set_pending(d->vcpu[vcpu_id], evtchn);
++    if ( evtchn_usable(evtchn) )
++        d->evtchn_port_ops->set_pending(d->vcpu[vcpu_id], evtchn);
+ }
+ 
+ static inline void evtchn_port_clear_pending(struct domain *d,
+                                              struct evtchn *evtchn)
+ {
+-    d->evtchn_port_ops->clear_pending(d, evtchn);
++    if ( evtchn_usable(evtchn) )
++        d->evtchn_port_ops->clear_pending(d, evtchn);
+ }
+ 
+ static inline void evtchn_port_unmask(struct domain *d,
+                                       struct evtchn *evtchn)
+ {
+-    d->evtchn_port_ops->unmask(d, evtchn);
++    if ( evtchn_usable(evtchn) )
++        d->evtchn_port_ops->unmask(d, evtchn);
+ }
+ 
+-static inline bool evtchn_port_is_pending(const struct domain *d,
+-                                          evtchn_port_t port)
++static inline bool evtchn_is_pending(const struct domain *d,
++                                     const struct evtchn *evtchn)
+ {
+-    return d->evtchn_port_ops->is_pending(d, port);
++    return evtchn_usable(evtchn) && d->evtchn_port_ops->is_pending(d, evtchn);
+ }
+ 
+-static inline bool evtchn_port_is_masked(const struct domain *d,
+-                                         evtchn_port_t port)
++static inline bool evtchn_port_is_pending(struct domain *d, evtchn_port_t port)
+ {
+-    return d->evtchn_port_ops->is_masked(d, port);
++    struct evtchn *evtchn = evtchn_from_port(d, port);
++    bool rc;
++    unsigned long flags;
++
++    spin_lock_irqsave(&evtchn->lock, flags);
++    rc = evtchn_is_pending(d, evtchn);
++    spin_unlock_irqrestore(&evtchn->lock, flags);
++
++    return rc;
++}
++
++static inline bool evtchn_is_masked(const struct domain *d,
++                                    const struct evtchn *evtchn)
++{
++    return !evtchn_usable(evtchn) || d->evtchn_port_ops->is_masked(d, evtchn);
++}
++
++static inline bool evtchn_port_is_masked(struct domain *d, evtchn_port_t port)
++{
++    struct evtchn *evtchn = evtchn_from_port(d, port);
++    bool rc;
++    unsigned long flags;
++
++    spin_lock_irqsave(&evtchn->lock, flags);
++    rc = evtchn_is_masked(d, evtchn);
++    spin_unlock_irqrestore(&evtchn->lock, flags);
++
++    return rc;
+ }
+ 
+-static inline bool evtchn_port_is_busy(const struct domain *d,
+-                                       evtchn_port_t port)
++static inline bool evtchn_is_busy(const struct domain *d,
++                                  const struct evtchn *evtchn)
+ {
+     return d->evtchn_port_ops->is_busy &&
+-           d->evtchn_port_ops->is_busy(d, port);
++           d->evtchn_port_ops->is_busy(d, evtchn);
+ }
+ 
+ static inline int evtchn_port_set_priority(struct domain *d,
+@@ -225,6 +277,8 @@ static inline int evtchn_port_set_priori
+ {
+     if ( !d->evtchn_port_ops->set_priority )
+         return -ENOSYS;
++    if ( !evtchn_usable(evtchn) )
++        return -EACCES;
+     return d->evtchn_port_ops->set_priority(d, evtchn, priority);
+ }
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa344-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa344-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa344-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa344-4.11-1.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,132 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn: arrange for preemption in evtchn_destroy()
+
+Especially closing of fully established interdomain channels can take
+quite some time, due to the locking involved. Therefore we shouldn't
+assume we can clean up still active ports all in one go. Besides adding
+the necessary preemption check, also avoid pointlessly starting from
+(or now really ending at) 0; 1 is the lowest numbered port which may
+need closing.
+
+Since we're now reducing ->valid_evtchns, free_xen_event_channel(),
+and (at least to be on the safe side) notify_via_xen_event_channel()
+need to cope with attempts to close / unbind from / send through already
+closed (and no longer valid, as per port_is_valid()) ports.
+
+This is part of XSA-344.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+
+--- a/xen/common/domain.c
++++ b/xen/common/domain.c
+@@ -646,7 +646,6 @@ int domain_kill(struct domain *d)
+         if ( d->is_dying != DOMDYING_alive )
+             return domain_kill(d);
+         d->is_dying = DOMDYING_dying;
+-        evtchn_destroy(d);
+         gnttab_release_mappings(d);
+         tmem_destroy(d->tmem_client);
+         vnuma_destroy(d->vnuma);
+@@ -654,6 +653,9 @@ int domain_kill(struct domain *d)
+         d->tmem_client = NULL;
+         /* fallthrough */
+     case DOMDYING_dying:
++        rc = evtchn_destroy(d);
++        if ( rc )
++            break;
+         rc = domain_relinquish_resources(d);
+         if ( rc != 0 )
+             break;
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -1291,7 +1291,16 @@ int alloc_unbound_xen_event_channel(
+ 
+ void free_xen_event_channel(struct domain *d, int port)
+ {
+-    BUG_ON(!port_is_valid(d, port));
++    if ( !port_is_valid(d, port) )
++    {
++        /*
++         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
++         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
++         */
++        smp_rmb();
++        BUG_ON(!d->is_dying);
++        return;
++    }
+ 
+     evtchn_close(d, port, 0);
+ }
+@@ -1303,7 +1312,17 @@ void notify_via_xen_event_channel(struct
+     struct domain *rd;
+     unsigned long flags;
+ 
+-    ASSERT(port_is_valid(ld, lport));
++    if ( !port_is_valid(ld, lport) )
++    {
++        /*
++         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
++         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
++         */
++        smp_rmb();
++        ASSERT(ld->is_dying);
++        return;
++    }
++
+     lchn = evtchn_from_port(ld, lport);
+ 
+     spin_lock_irqsave(&lchn->lock, flags);
+@@ -1375,8 +1394,7 @@ int evtchn_init(struct domain *d)
+     return 0;
+ }
+ 
+-
+-void evtchn_destroy(struct domain *d)
++int evtchn_destroy(struct domain *d)
+ {
+     unsigned int i;
+ 
+@@ -1385,14 +1403,29 @@ void evtchn_destroy(struct domain *d)
+     spin_barrier(&d->event_lock);
+ 
+     /* Close all existing event channels. */
+-    for ( i = 0; port_is_valid(d, i); i++ )
++    for ( i = d->valid_evtchns; --i; )
++    {
+         evtchn_close(d, i, 0);
+ 
++        /*
++         * Avoid preempting when called from domain_create()'s error path,
++         * and don't check too often (choice of frequency is arbitrary).
++         */
++        if ( i && !(i & 0x3f) && d->is_dying != DOMDYING_dead &&
++             hypercall_preempt_check() )
++        {
++            write_atomic(&d->valid_evtchns, i);
++            return -ERESTART;
++        }
++    }
++
+     ASSERT(!d->active_evtchns);
+ 
+     clear_global_virq_handlers(d);
+ 
+     evtchn_fifo_destroy(d);
++
++    return 0;
+ }
+ 
+ 
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -135,7 +135,7 @@ struct evtchn
+ } __attribute__((aligned(64)));
+ 
+ int  evtchn_init(struct domain *d); /* from domain_create */
+-void evtchn_destroy(struct domain *d); /* from domain_kill */
++int  evtchn_destroy(struct domain *d); /* from domain_kill */
+ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
+ 
+ struct waitqueue_vcpu;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa344-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa344-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa344-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa344-4.11-2.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,203 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn: arrange for preemption in evtchn_reset()
+
+Like for evtchn_destroy() looping over all possible event channels to
+close them can take a significant amount of time. Unlike done there, we
+can't alter domain properties (i.e. d->valid_evtchns) here. Borrow, in a
+lightweight form, the paging domctl continuation concept, redirecting
+the continuations to different sub-ops. Just like there this is to be
+able to allow for predictable overall results of the involved sub-ops:
+Racing requests should either complete or be refused.
+
+Note that a domain can't interfere with an already started (by a remote
+domain) reset, due to being paused. It can prevent a remote reset from
+happening by leaving a reset unfinished, but that's only going to affect
+itself.
+
+This is part of XSA-344.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+
+--- a/xen/common/domain.c
++++ b/xen/common/domain.c
+@@ -1105,7 +1105,7 @@ void domain_unpause_except_self(struct d
+         domain_unpause(d);
+ }
+ 
+-int domain_soft_reset(struct domain *d)
++int domain_soft_reset(struct domain *d, bool resuming)
+ {
+     struct vcpu *v;
+     int rc;
+@@ -1119,7 +1119,7 @@ int domain_soft_reset(struct domain *d)
+         }
+     spin_unlock(&d->shutdown_lock);
+ 
+-    rc = evtchn_reset(d);
++    rc = evtchn_reset(d, resuming);
+     if ( rc )
+         return rc;
+ 
+--- a/xen/common/domctl.c
++++ b/xen/common/domctl.c
+@@ -648,12 +648,22 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
+     }
+ 
+     case XEN_DOMCTL_soft_reset:
++    case XEN_DOMCTL_soft_reset_cont:
+         if ( d == current->domain ) /* no domain_pause() */
+         {
+             ret = -EINVAL;
+             break;
+         }
+-        ret = domain_soft_reset(d);
++        ret = domain_soft_reset(d, op->cmd == XEN_DOMCTL_soft_reset_cont);
++        if ( ret == -ERESTART )
++        {
++            op->cmd = XEN_DOMCTL_soft_reset_cont;
++            if ( !__copy_field_to_guest(u_domctl, op, cmd) )
++                ret = hypercall_create_continuation(__HYPERVISOR_domctl,
++                                                    "h", u_domctl);
++            else
++                ret = -EFAULT;
++        }
+         break;
+ 
+     case XEN_DOMCTL_destroydomain:
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -1051,7 +1051,7 @@ int evtchn_unmask(unsigned int port)
+     return 0;
+ }
+ 
+-int evtchn_reset(struct domain *d)
++int evtchn_reset(struct domain *d, bool resuming)
+ {
+     unsigned int i;
+     int rc = 0;
+@@ -1059,11 +1059,40 @@ int evtchn_reset(struct domain *d)
+     if ( d != current->domain && !d->controller_pause_count )
+         return -EINVAL;
+ 
+-    for ( i = 0; port_is_valid(d, i); i++ )
++    spin_lock(&d->event_lock);
++
++    /*
++     * If we are resuming, then start where we stopped. Otherwise, check
++     * that a reset operation is not already in progress, and if none is,
++     * record that this is now the case.
++     */
++    i = resuming ? d->next_evtchn : !d->next_evtchn;
++    if ( i > d->next_evtchn )
++        d->next_evtchn = i;
++
++    spin_unlock(&d->event_lock);
++
++    if ( !i )
++        return -EBUSY;
++
++    for ( ; port_is_valid(d, i); i++ )
++    {
+         evtchn_close(d, i, 1);
+ 
++        /* NB: Choice of frequency is arbitrary. */
++        if ( !(i & 0x3f) && hypercall_preempt_check() )
++        {
++            spin_lock(&d->event_lock);
++            d->next_evtchn = i;
++            spin_unlock(&d->event_lock);
++            return -ERESTART;
++        }
++    }
++
+     spin_lock(&d->event_lock);
+ 
++    d->next_evtchn = 0;
++
+     if ( d->active_evtchns > d->xen_evtchns )
+         rc = -EAGAIN;
+     else if ( d->evtchn_fifo )
+@@ -1198,7 +1227,8 @@ long do_event_channel_op(int cmd, XEN_GU
+         break;
+     }
+ 
+-    case EVTCHNOP_reset: {
++    case EVTCHNOP_reset:
++    case EVTCHNOP_reset_cont: {
+         struct evtchn_reset reset;
+         struct domain *d;
+ 
+@@ -1211,9 +1241,13 @@ long do_event_channel_op(int cmd, XEN_GU
+ 
+         rc = xsm_evtchn_reset(XSM_TARGET, current->domain, d);
+         if ( !rc )
+-            rc = evtchn_reset(d);
++            rc = evtchn_reset(d, cmd == EVTCHNOP_reset_cont);
+ 
+         rcu_unlock_domain(d);
++
++        if ( rc == -ERESTART )
++            rc = hypercall_create_continuation(__HYPERVISOR_event_channel_op,
++                                               "ih", EVTCHNOP_reset_cont, arg);
+         break;
+     }
+ 
+--- a/xen/include/public/domctl.h
++++ b/xen/include/public/domctl.h
+@@ -1121,7 +1121,10 @@ struct xen_domctl {
+ #define XEN_DOMCTL_iomem_permission              20
+ #define XEN_DOMCTL_ioport_permission             21
+ #define XEN_DOMCTL_hypercall_init                22
+-#define XEN_DOMCTL_arch_setup                    23 /* Obsolete IA64 only */
++#ifdef __XEN__
++/* #define XEN_DOMCTL_arch_setup                 23 Obsolete IA64 only */
++#define XEN_DOMCTL_soft_reset_cont               23
++#endif
+ #define XEN_DOMCTL_settimeoffset                 24
+ #define XEN_DOMCTL_getvcpuaffinity               25
+ #define XEN_DOMCTL_real_mode_area                26 /* Obsolete PPC only */
+--- a/xen/include/public/event_channel.h
++++ b/xen/include/public/event_channel.h
+@@ -74,6 +74,9 @@
+ #define EVTCHNOP_init_control    11
+ #define EVTCHNOP_expand_array    12
+ #define EVTCHNOP_set_priority    13
++#ifdef __XEN__
++#define EVTCHNOP_reset_cont      14
++#endif
+ /* ` } */
+ 
+ typedef uint32_t evtchn_port_t;
+--- a/xen/include/xen/event.h
++++ b/xen/include/xen/event.h
+@@ -163,7 +163,7 @@ void evtchn_check_pollers(struct domain
+ void evtchn_2l_init(struct domain *d);
+ 
+ /* Close all event channels and reset to 2-level ABI. */
+-int evtchn_reset(struct domain *d);
++int evtchn_reset(struct domain *d, bool resuming);
+ 
+ /*
+  * Low-level event channel port ops.
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -355,6 +355,8 @@ struct domain
+      * EVTCHNOP_reset).  Read/write access like for active_evtchns.
+      */
+     unsigned int     xen_evtchns;
++    /* Port to resume from in evtchn_reset(), when in a continuation. */
++    unsigned int     next_evtchn;
+     spinlock_t       event_lock;
+     const struct evtchn_port_ops *evtchn_port_ops;
+     struct evtchn_fifo_domain *evtchn_fifo;
+@@ -608,7 +610,7 @@ int domain_shutdown(struct domain *d, u8
+ void domain_resume(struct domain *d);
+ void domain_pause_for_debugger(void);
+ 
+-int domain_soft_reset(struct domain *d);
++int domain_soft_reset(struct domain *d, bool resuming);
+ 
+ int vcpu_start_shutdown_deferral(struct vcpu *v);
+ void vcpu_end_shutdown_deferral(struct vcpu *v);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa346-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa346-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa346-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa346-4.11-1.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,57 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: IOMMU: suppress "iommu_dont_flush_iotlb" when about to free a page
+
+Deferring flushes to a single, wide range one - as is done when
+handling XENMAPSPACE_gmfn_range - is okay only as long as
+pages don't get freed ahead of the eventual flush. While the only
+function setting the flag (xenmem_add_to_physmap()) suggests by its name
+that it's only mapping new entries, in reality the way
+xenmem_add_to_physmap_one() works means an unmap would happen not only
+for the page being moved (but not freed) but, if the destination GFN is
+populated, also for the page being displaced from that GFN. Collapsing
+the two flushes for this GFN into just one (end even more so deferring
+it to a batched invocation) is not correct.
+
+This is part of XSA-346.
+
+Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/memory.c
++++ b/xen/common/memory.c
+@@ -298,7 +298,10 @@ int guest_remove_page(struct domain *d,
+     p2m_type_t p2mt;
+ #endif
+     mfn_t mfn;
++#ifdef CONFIG_HAS_PASSTHROUGH
++    bool *dont_flush_p, dont_flush;
+     int rc;
++#endif
+ 
+ #ifdef CONFIG_X86
+     mfn = get_gfn_query(d, gmfn, &p2mt);
+@@ -376,8 +379,22 @@ int guest_remove_page(struct domain *d,
+         return -ENXIO;
+     }
+ 
++#ifdef CONFIG_HAS_PASSTHROUGH
++    /*
++     * Since we're likely to free the page below, we need to suspend
++     * xenmem_add_to_physmap()'s suppressing of IOMMU TLB flushes.
++     */
++    dont_flush_p = &this_cpu(iommu_dont_flush_iotlb);
++    dont_flush = *dont_flush_p;
++    *dont_flush_p = false;
++#endif
++
+     rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);
+ 
++#ifdef CONFIG_HAS_PASSTHROUGH
++    *dont_flush_p = dont_flush;
++#endif
++
+     /*
+      * With the lack of an IOMMU on some platforms, domains with DMA-capable
+      * device must retrieve the same pfn when the hypercall populate_physmap
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa346-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa346-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa346-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa346-4.11-2.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,202 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: IOMMU: hold page ref until after deferred TLB flush
+
+When moving around a page via XENMAPSPACE_gmfn_range, deferring the TLB
+flush for the "from" GFN range requires that the page remains allocated
+to the guest until the TLB flush has actually occurred. Otherwise a
+parallel hypercall to remove the page would only flush the TLB for the
+GFN it has been moved to, but not the one is was mapped at originally.
+
+This is part of XSA-346.
+
+Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/arch/arm/mm.c
++++ b/xen/arch/arm/mm.c
+@@ -1222,7 +1222,7 @@ void share_xen_page_with_guest(struct pa
+ int xenmem_add_to_physmap_one(
+     struct domain *d,
+     unsigned int space,
+-    union xen_add_to_physmap_batch_extra extra,
++    union add_to_physmap_extra extra,
+     unsigned long idx,
+     gfn_t gfn)
+ {
+@@ -1294,10 +1294,6 @@ int xenmem_add_to_physmap_one(
+         break;
+     }
+     case XENMAPSPACE_dev_mmio:
+-        /* extra should be 0. Reserved for future use. */
+-        if ( extra.res0 )
+-            return -EOPNOTSUPP;
+-
+         rc = map_dev_mmio_region(d, gfn, 1, _mfn(idx));
+         return rc;
+ 
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -4634,7 +4634,7 @@ static int handle_iomem_range(unsigned l
+ int xenmem_add_to_physmap_one(
+     struct domain *d,
+     unsigned int space,
+-    union xen_add_to_physmap_batch_extra extra,
++    union add_to_physmap_extra extra,
+     unsigned long idx,
+     gfn_t gpfn)
+ {
+@@ -4721,9 +4721,20 @@ int xenmem_add_to_physmap_one(
+         rc = guest_physmap_add_page(d, gpfn, mfn, PAGE_ORDER_4K);
+ 
+  put_both:
+-    /* In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top. */
++    /*
++     * In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top.
++     * We also may need to transfer ownership of the page reference to our
++     * caller.
++     */
+     if ( space == XENMAPSPACE_gmfn )
++    {
+         put_gfn(d, gfn);
++        if ( !rc && extra.ppage )
++        {
++            *extra.ppage = page;
++            page = NULL;
++        }
++    }
+ 
+     if ( page )
+         put_page(page);
+--- a/xen/common/memory.c
++++ b/xen/common/memory.c
+@@ -811,11 +811,10 @@ int xenmem_add_to_physmap(struct domain
+ {
+     unsigned int done = 0;
+     long rc = 0;
+-    union xen_add_to_physmap_batch_extra extra;
++    union add_to_physmap_extra extra = {};
++    struct page_info *pages[16];
+ 
+-    if ( xatp->space != XENMAPSPACE_gmfn_foreign )
+-        extra.res0 = 0;
+-    else
++    if ( xatp->space == XENMAPSPACE_gmfn_foreign )
+         extra.foreign_domid = DOMID_INVALID;
+ 
+     if ( xatp->space != XENMAPSPACE_gmfn_range )
+@@ -831,7 +830,10 @@ int xenmem_add_to_physmap(struct domain
+ 
+ #ifdef CONFIG_HAS_PASSTHROUGH
+     if ( need_iommu(d) )
++    {
+         this_cpu(iommu_dont_flush_iotlb) = 1;
++        extra.ppage = &pages[0];
++    }
+ #endif
+ 
+     while ( xatp->size > done )
+@@ -844,8 +846,12 @@ int xenmem_add_to_physmap(struct domain
+         xatp->idx++;
+         xatp->gpfn++;
+ 
++        if ( extra.ppage )
++            ++extra.ppage;
++
+         /* Check for continuation if it's not the last iteration. */
+-        if ( xatp->size > ++done && hypercall_preempt_check() )
++        if ( (++done > ARRAY_SIZE(pages) && extra.ppage) ||
++             (xatp->size > done && hypercall_preempt_check()) )
+         {
+             rc = start + done;
+             break;
+@@ -856,6 +862,7 @@ int xenmem_add_to_physmap(struct domain
+     if ( need_iommu(d) )
+     {
+         int ret;
++        unsigned int i;
+ 
+         this_cpu(iommu_dont_flush_iotlb) = 0;
+ 
+@@ -863,6 +870,15 @@ int xenmem_add_to_physmap(struct domain
+         if ( unlikely(ret) && rc >= 0 )
+             rc = ret;
+ 
++        /*
++         * Now that the IOMMU TLB flush was done for the original GFN, drop
++         * the page references. The 2nd flush below is fine to make later, as
++         * whoever removes the page again from its new GFN will have to do
++         * another flush anyway.
++         */
++        for ( i = 0; i < done; ++i )
++            put_page(pages[i]);
++
+         ret = iommu_iotlb_flush(d, xatp->gpfn - done, done);
+         if ( unlikely(ret) && rc >= 0 )
+             rc = ret;
+@@ -876,6 +892,8 @@ static int xenmem_add_to_physmap_batch(s
+                                        struct xen_add_to_physmap_batch *xatpb,
+                                        unsigned int extent)
+ {
++    union add_to_physmap_extra extra = {};
++
+     if ( xatpb->size < extent )
+         return -EILSEQ;
+ 
+@@ -884,6 +902,19 @@ static int xenmem_add_to_physmap_batch(s
+          !guest_handle_subrange_okay(xatpb->errs, extent, xatpb->size - 1) )
+         return -EFAULT;
+ 
++    switch ( xatpb->space )
++    {
++    case XENMAPSPACE_dev_mmio:
++        /* res0 is reserved for future use. */
++        if ( xatpb->u.res0 )
++            return -EOPNOTSUPP;
++        break;
++
++    case XENMAPSPACE_gmfn_foreign:
++        extra.foreign_domid = xatpb->u.foreign_domid;
++        break;
++    }
++
+     while ( xatpb->size > extent )
+     {
+         xen_ulong_t idx;
+@@ -896,8 +927,7 @@ static int xenmem_add_to_physmap_batch(s
+                                                extent, 1)) )
+             return -EFAULT;
+ 
+-        rc = xenmem_add_to_physmap_one(d, xatpb->space,
+-                                       xatpb->u,
++        rc = xenmem_add_to_physmap_one(d, xatpb->space, extra,
+                                        idx, _gfn(gpfn));
+ 
+         if ( unlikely(__copy_to_guest_offset(xatpb->errs, extent, &rc, 1)) )
+--- a/xen/include/xen/mm.h
++++ b/xen/include/xen/mm.h
+@@ -577,8 +577,22 @@ void scrub_one_page(struct page_info *);
+                       &(d)->xenpage_list : &(d)->page_list)
+ #endif
+ 
++union add_to_physmap_extra {
++    /*
++     * XENMAPSPACE_gmfn: When deferring TLB flushes, a page reference needs
++     * to be kept until after the flush, so the page can't get removed from
++     * the domain (and re-used for another purpose) beforehand. By passing
++     * non-NULL, the caller of xenmem_add_to_physmap_one() indicates it wants
++     * to have ownership of such a reference transferred in the success case.
++     */
++    struct page_info **ppage;
++
++    /* XENMAPSPACE_gmfn_foreign */
++    domid_t foreign_domid;
++};
++
+ int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
+-                              union xen_add_to_physmap_batch_extra extra,
++                              union add_to_physmap_extra extra,
+                               unsigned long idx, gfn_t gfn);
+ 
+ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa347-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa347-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa347-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa347-4.11-1.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,52 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: update live PTEs atomically
+
+Updating a live PTE word by word allows the IOMMU to see a partially
+updated entry. Construct the new entry fully in a local variable and
+then write the new entry by a single insn.
+
+This is part of XSA-347.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -41,7 +41,7 @@ static void clear_iommu_pte_present(unsi
+ 
+     table = map_domain_page(_mfn(l1_mfn));
+     pte = table + pfn_to_pde_idx(gfn, IOMMU_PAGING_MODE_LEVEL_1);
+-    *pte = 0;
++    write_atomic(pte, 0);
+     unmap_domain_page(table);
+ }
+ 
+@@ -49,7 +49,7 @@ static bool_t set_iommu_pde_present(u32
+                                     unsigned int next_level,
+                                     bool_t iw, bool_t ir)
+ {
+-    uint64_t addr_lo, addr_hi, maddr_next;
++    uint64_t addr_lo, addr_hi, maddr_next, full;
+     u32 entry;
+     bool need_flush = false, old_present;
+ 
+@@ -106,7 +106,7 @@ static bool_t set_iommu_pde_present(u32
+     if ( next_level == IOMMU_PAGING_MODE_LEVEL_0 )
+         set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, entry,
+                              IOMMU_PTE_FC_MASK, IOMMU_PTE_FC_SHIFT, &entry);
+-    pde[1] = entry;
++    full = (uint64_t)entry << 32;
+ 
+     /* mark next level as 'present' */
+     set_field_in_reg_u32((u32)addr_lo >> PAGE_SHIFT, 0,
+@@ -118,7 +118,9 @@ static bool_t set_iommu_pde_present(u32
+     set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, entry,
+                          IOMMU_PDE_PRESENT_MASK,
+                          IOMMU_PDE_PRESENT_SHIFT, &entry);
+-    pde[0] = entry;
++    full |= entry;
++
++    write_atomic((uint64_t *)pde, full);
+ 
+     return need_flush;
+ }
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa347-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa347-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa347-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa347-4.11-2.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,80 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: ensure suitable ordering of DTE modifications
+
+DMA and interrupt translation should be enabled only after other
+applicable DTE fields have been written. Similarly when disabling
+translation or when moving a device between domains, translation should
+first be disabled, before other entry fields get modified. Note however
+that the "moving" aspect doesn't apply to the interrupt remapping side,
+as domain specifics are maintained in the IRTEs here, not the DTE. We
+also never disable interrupt remapping once it got enabled for a device
+(the respective argument passed is always the immutable iommu_intremap).
+
+This is part of XSA-347.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -147,7 +147,22 @@ void amd_iommu_set_root_page_table(
+     u32 *dte, u64 root_ptr, u16 domain_id, u8 paging_mode, u8 valid)
+ {
+     u64 addr_hi, addr_lo;
+-    u32 entry;
++    u32 entry, dte0 = dte[0];
++
++    if ( valid ||
++         get_field_from_reg_u32(dte0, IOMMU_DEV_TABLE_VALID_MASK,
++                                IOMMU_DEV_TABLE_VALID_SHIFT) )
++    {
++        set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, dte0,
++                             IOMMU_DEV_TABLE_TRANSLATION_VALID_MASK,
++                             IOMMU_DEV_TABLE_TRANSLATION_VALID_SHIFT, &dte0);
++        set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, dte0,
++                             IOMMU_DEV_TABLE_VALID_MASK,
++                             IOMMU_DEV_TABLE_VALID_SHIFT, &dte0);
++        dte[0] = dte0;
++        smp_wmb();
++    }
++
+     set_field_in_reg_u32(domain_id, 0,
+                          IOMMU_DEV_TABLE_DOMAIN_ID_MASK,
+                          IOMMU_DEV_TABLE_DOMAIN_ID_SHIFT, &entry);
+@@ -166,8 +181,9 @@ void amd_iommu_set_root_page_table(
+                          IOMMU_DEV_TABLE_IO_READ_PERMISSION_MASK,
+                          IOMMU_DEV_TABLE_IO_READ_PERMISSION_SHIFT, &entry);
+     dte[1] = entry;
++    smp_wmb();
+ 
+-    set_field_in_reg_u32((u32)addr_lo >> PAGE_SHIFT, 0,
++    set_field_in_reg_u32((u32)addr_lo >> PAGE_SHIFT, dte0,
+                          IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_MASK,
+                          IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_SHIFT, &entry);
+     set_field_in_reg_u32(paging_mode, entry,
+@@ -180,7 +196,7 @@ void amd_iommu_set_root_page_table(
+                          IOMMU_CONTROL_DISABLED, entry,
+                          IOMMU_DEV_TABLE_VALID_MASK,
+                          IOMMU_DEV_TABLE_VALID_SHIFT, &entry);
+-    dte[0] = entry;
++    write_atomic(&dte[0], entry);
+ }
+ 
+ void iommu_dte_set_iotlb(u32 *dte, u8 i)
+@@ -212,6 +228,7 @@ void __init amd_iommu_set_intremap_table
+                         IOMMU_DEV_TABLE_INT_CONTROL_MASK,
+                         IOMMU_DEV_TABLE_INT_CONTROL_SHIFT, &entry);
+     dte[5] = entry;
++    smp_wmb();
+ 
+     set_field_in_reg_u32((u32)addr_lo >> 6, 0,
+                         IOMMU_DEV_TABLE_INT_TABLE_PTR_LOW_MASK,
+@@ -229,7 +246,7 @@ void __init amd_iommu_set_intremap_table
+                          IOMMU_CONTROL_DISABLED, entry,
+                          IOMMU_DEV_TABLE_INT_VALID_MASK,
+                          IOMMU_DEV_TABLE_INT_VALID_SHIFT, &entry);
+-    dte[4] = entry;
++    write_atomic(&dte[4], entry);
+ }
+ 
+ void __init iommu_dte_add_device_entry(u32 *dte, struct ivrs_mappings *ivrs_dev)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa348-4.11.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa348-4.11.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa348-4.11.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa348-4.11.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,164 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86: avoid calling {svm,vmx}_do_resume()
+
+These functions follow the following path: hvm_do_resume() ->
+handle_hvm_io_completion() -> hvm_wait_for_io() ->
+wait_on_xen_event_channel() -> do_softirq() -> schedule() ->
+sched_context_switch() -> continue_running() and hence may
+recursively invoke themselves. If this ends up happening a couple of
+times, a stack overflow would result.
+
+Prevent this by also resetting the stack at the
+->arch.ctxt_switch->tail() invocations (in both places for consistency)
+and thus jumping to the functions instead of calling them.
+
+This is XSA-348 / CVE-2020-29566.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+
+--- sle12sp4.orig/xen/arch/x86/domain.c	2020-10-15 17:35:17.000000000 +0200
++++ sle12sp4/xen/arch/x86/domain.c	2020-11-10 17:56:59.000000000 +0100
+@@ -121,7 +121,7 @@ static void play_dead(void)
+     (*dead_idle)();
+ }
+ 
+-static void idle_loop(void)
++static void noreturn idle_loop(void)
+ {
+     unsigned int cpu = smp_processor_id();
+ 
+@@ -161,11 +161,6 @@ void startup_cpu_idle_loop(void)
+     reset_stack_and_jump(idle_loop);
+ }
+ 
+-static void noreturn continue_idle_domain(struct vcpu *v)
+-{
+-    reset_stack_and_jump(idle_loop);
+-}
+-
+ void dump_pageframe_info(struct domain *d)
+ {
+     struct page_info *page;
+@@ -456,7 +451,7 @@ int arch_domain_create(struct domain *d,
+         static const struct arch_csw idle_csw = {
+             .from = paravirt_ctxt_switch_from,
+             .to   = paravirt_ctxt_switch_to,
+-            .tail = continue_idle_domain,
++            .tail = idle_loop,
+         };
+ 
+         d->arch.ctxt_switch = &idle_csw;
+@@ -1770,20 +1765,12 @@ void context_switch(struct vcpu *prev, s
+     /* Ensure that the vcpu has an up-to-date time base. */
+     update_vcpu_system_time(next);
+ 
+-    /*
+-     * Schedule tail *should* be a terminal function pointer, but leave a
+-     * bug frame around just in case it returns, to save going back into the
+-     * context switching code and leaving a far more subtle crash to diagnose.
+-     */
+-    nextd->arch.ctxt_switch->tail(next);
+-    BUG();
++    reset_stack_and_jump_ind(nextd->arch.ctxt_switch->tail);
+ }
+ 
+ void continue_running(struct vcpu *same)
+ {
+-    /* See the comment above. */
+-    same->domain->arch.ctxt_switch->tail(same);
+-    BUG();
++    reset_stack_and_jump_ind(same->domain->arch.ctxt_switch->tail);
+ }
+ 
+ int __sync_local_execstate(void)
+--- sle12sp4.orig/xen/arch/x86/hvm/svm/svm.c	2020-06-18 15:13:13.001760095 +0200
++++ sle12sp4/xen/arch/x86/hvm/svm/svm.c	2020-11-10 17:56:59.000000000 +0100
+@@ -1111,8 +1111,9 @@ static void svm_ctxt_switch_to(struct vc
+         wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
+ }
+ 
+-static void noreturn svm_do_resume(struct vcpu *v)
++static void noreturn svm_do_resume(void)
+ {
++    struct vcpu *v = current;
+     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
+     bool debug_state = (v->domain->debugger_attached ||
+                         v->domain->arch.monitor.software_breakpoint_enabled ||
+--- sle12sp4.orig/xen/arch/x86/hvm/vmx/vmcs.c	2019-12-03 17:46:26.000000000 +0100
++++ sle12sp4/xen/arch/x86/hvm/vmx/vmcs.c	2020-11-10 17:56:59.000000000 +0100
+@@ -1782,8 +1782,9 @@ void vmx_vmentry_failure(void)
+     domain_crash_synchronous();
+ }
+ 
+-void vmx_do_resume(struct vcpu *v)
++void vmx_do_resume(void)
+ {
++    struct vcpu *v = current;
+     bool_t debug_state;
+     unsigned long host_cr4;
+ 
+--- sle12sp4.orig/xen/arch/x86/pv/domain.c	2019-06-25 23:47:11.000000000 +0200
++++ sle12sp4/xen/arch/x86/pv/domain.c	2020-11-10 17:56:59.000000000 +0100
+@@ -58,7 +58,7 @@ static int parse_pcid(const char *s)
+ }
+ custom_runtime_param("pcid", parse_pcid);
+ 
+-static void noreturn continue_nonidle_domain(struct vcpu *v)
++static void noreturn continue_nonidle_domain(void)
+ {
+     check_wakeup_from_wait();
+     reset_stack_and_jump(ret_from_intr);
+--- sle12sp4.orig/xen/include/asm-x86/current.h	2019-06-25 23:47:11.000000000 +0200
++++ sle12sp4/xen/include/asm-x86/current.h	2020-11-10 17:56:59.000000000 +0100
+@@ -124,16 +124,23 @@ unsigned long get_stack_dump_bottom (uns
+ # define CHECK_FOR_LIVEPATCH_WORK ""
+ #endif
+ 
+-#define reset_stack_and_jump(__fn)                                      \
++#define switch_stack_and_jump(fn, instr, constr)                        \
+     ({                                                                  \
+         __asm__ __volatile__ (                                          \
+             "mov %0,%%"__OP"sp;"                                        \
+-            CHECK_FOR_LIVEPATCH_WORK                                      \
+-             "jmp %c1"                                                  \
+-            : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" );   \
++            CHECK_FOR_LIVEPATCH_WORK                                    \
++            instr "1"                                                   \
++            : : "r" (guest_cpu_user_regs()), constr (fn) : "memory" );  \
+         unreachable();                                                  \
+     })
+ 
++#define reset_stack_and_jump(fn)                                        \
++    switch_stack_and_jump(fn, "jmp %c", "i")
++
++/* The constraint may only specify non-call-clobbered registers. */
++#define reset_stack_and_jump_ind(fn)                                    \
++    switch_stack_and_jump(fn, "INDIRECT_JMP %", "b")
++
+ /*
+  * Which VCPU's state is currently running on each CPU?
+  * This is not necesasrily the same as 'current' as a CPU may be
+--- sle12sp4.orig/xen/include/asm-x86/domain.h	2019-12-03 17:46:26.000000000 +0100
++++ sle12sp4/xen/include/asm-x86/domain.h	2020-11-10 17:56:59.000000000 +0100
+@@ -328,7 +328,7 @@ struct arch_domain
+     const struct arch_csw {
+         void (*from)(struct vcpu *);
+         void (*to)(struct vcpu *);
+-        void (*tail)(struct vcpu *);
++        void noreturn (*tail)(void);
+     } *ctxt_switch;
+ 
+     /* nestedhvm: translate l2 guest physical to host physical */
+--- sle12sp4.orig/xen/include/asm-x86/hvm/vmx/vmx.h	2019-12-03 17:46:26.000000000 +0100
++++ sle12sp4/xen/include/asm-x86/hvm/vmx/vmx.h	2020-11-10 17:56:59.000000000 +0100
+@@ -95,7 +95,7 @@ typedef enum {
+ void vmx_asm_vmexit_handler(struct cpu_user_regs);
+ void vmx_asm_do_vmentry(void);
+ void vmx_intr_assist(void);
+-void noreturn vmx_do_resume(struct vcpu *);
++void noreturn vmx_do_resume(void);
+ void vmx_vlapic_msr_changed(struct vcpu *v);
+ void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt);
+ void vmx_realmode(struct cpu_user_regs *regs);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-arm.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-arm.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-arm.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-arm.patch	2022-05-30 08:33:19.000000000 +0100
@@ -0,0 +1,58 @@
+From: Julien Grall <jgrall@amazon.com>
+Subject: xen/arm: Always trap AMU system registers
+
+The Activity Monitors Unit (AMU) has been introduced by ARMv8.4. It is
+considered to be unsafe to be expose to guests as they might expose
+information about code executed by other guests or the host.
+
+Arm provided a way to trap all the AMU system registers by setting
+CPTR_EL2.TAM to 1.
+
+Unfortunately, on older revision of the specification, the bit 30 (now
+CPTR_EL1.TAM) was RES0. Because of that, Xen is setting it to 0 and
+therefore the system registers would be exposed to the guest when it is
+run on processors with AMU.
+
+As the bit is mark as UNKNOWN at boot in Armv8.4, the only safe solution
+for us is to always set CPTR_EL1.TAM to 1.
+
+Guest trying to access the AMU system registers will now receive an
+undefined instruction. Unfortunately, this means that even well-behaved
+guest may fail to boot because we don't sanitize the ID registers.
+
+This is a known issues with other Armv8.0+ features (e.g. SVE, Pointer
+Auth). This will taken care separately.
+
+This is part of XSA-351 (or XSA-93 re-born).
+
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Andre Przywara <andre.przywara@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
+
+diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
+index a36f145e67..22bd1bd4c6 100644
+--- a/xen/arch/arm/traps.c
++++ b/xen/arch/arm/traps.c
+@@ -151,7 +151,8 @@ void init_traps(void)
+      * On ARM64 the TCPx bits which we set here (0..9,12,13) are all
+      * RES1, i.e. they would trap whether we did this write or not.
+      */
+-    WRITE_SYSREG((HCPTR_CP_MASK & ~(HCPTR_CP(10) | HCPTR_CP(11))) | HCPTR_TTA,
++    WRITE_SYSREG((HCPTR_CP_MASK & ~(HCPTR_CP(10) | HCPTR_CP(11))) |
++                 HCPTR_TTA | HCPTR_TAM,
+                  CPTR_EL2);
+ 
+     /* Setup hypervisor traps */
+diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
+index 3ca67f8157..d3d12a9d19 100644
+--- a/xen/include/asm-arm/processor.h
++++ b/xen/include/asm-arm/processor.h
+@@ -351,6 +351,7 @@
+ #define VTCR_RES1       (_AC(1,UL)<<31)
+ 
+ /* HCPTR Hyp. Coprocessor Trap Register */
++#define HCPTR_TAM       ((_AC(1,U)<<30))
+ #define HCPTR_TTA       ((_AC(1,U)<<20))        /* Trap trace registers */
+ #define HCPTR_CP(x)     ((_AC(1,U)<<(x)))       /* Trap Coprocessor x */
+ #define HCPTR_CP_MASK   ((_AC(1,U)<<14)-1)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-x86-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-x86-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-x86-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-x86-4.11-1.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,163 @@
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Subject: x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Currently a PV hardware domain can also be given control over the CPU
+frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
+However since commit 322ec7c89f6 the default behavior has been changed
+to reject accesses to not explicitly handled MSRs, preventing PV
+guests that manage CPU frequency from reading
+MSR_IA32_PERF_{STATUS/CTL}.
+
+Additionally some HVM guests (Windows at least) will attempt to read
+MSR_IA32_PERF_CTL and will panic if given back a #GP fault:
+
+  vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
+  d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0
+
+Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
+handling shared between HVM and PV guests, and add an explicit case
+for reads to MSR_IA32_PERF_{STATUS/CTL}.
+
+Restore previous behavior and allow PV guests with the required
+permissions to read the contents of the mentioned MSRs. Non privileged
+guests will get 0 when trying to read those registers, as writes to
+MSR_IA32_PERF_CTL by such guest will already be silently dropped.
+
+Fixes: 322ec7c89f6 ('x86/pv: disallow access to unknown MSRs')
+Fixes: 84e848fd7a1 ('x86/hvm: disallow access to unknown MSRs')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 3059178798a23ba870ff86ff54d442a07e6651fc)
+
+diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
+index 256e58d82b..3495ac9f4a 100644
+--- a/xen/arch/x86/msr.c
++++ b/xen/arch/x86/msr.c
+@@ -141,6 +141,7 @@ int init_vcpu_msr_policy(struct vcpu *v)
+ 
+ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
+ {
++    const struct domain *d = v->domain;
+     const struct cpuid_policy *cp = v->domain->arch.cpuid;
+     const struct msr_domain_policy *dp = v->domain->arch.msr;
+     const struct msr_vcpu_policy *vp = v->arch.msr;
+@@ -212,6 +213,25 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
+         break;
+ 
+         /*
++         * These MSRs are not enumerated in CPUID.  They have been around
++         * since the Pentium 4, and implemented by other vendors.
++         *
++         * Some versions of Windows try reading these before setting up a #GP
++         * handler, and Linux has several unguarded reads as well.  Provide
++         * RAZ semantics, in general, but permit a cpufreq controller dom0 to
++         * have full access.
++         */
++    case MSR_IA32_PERF_STATUS:
++    case MSR_IA32_PERF_CTL:
++        if ( !(cp->x86_vendor & (X86_VENDOR_INTEL | X86_VENDOR_CENTAUR)) )
++            goto gp_fault;
++
++        *val = 0;
++        if ( likely(!is_cpufreq_controller(d)) || rdmsr_safe(msr, *val) == 0 )
++            break;
++        goto gp_fault;
++
++        /*
+          * TODO: Implement when we have better topology representation.
+     case MSR_INTEL_CORE_THREAD_COUNT:
+          */
+@@ -241,6 +261,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
+     case MSR_INTEL_CORE_THREAD_COUNT:
+     case MSR_INTEL_PLATFORM_INFO:
+     case MSR_ARCH_CAPABILITIES:
++    case MSR_IA32_PERF_STATUS:
+         /* Read-only */
+     case MSR_TSX_FORCE_ABORT:
+     case MSR_TSX_CTRL:
+@@ -345,6 +366,21 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
+         break;
+     }
+ 
++        /*
++         * This MSR is not enumerated in CPUID.  It has been around since the
++         * Pentium 4, and implemented by other vendors.
++         *
++         * To match the RAZ semantics, implement as write-discard, except for
++         * a cpufreq controller dom0 which has full access.
++         */
++    case MSR_IA32_PERF_CTL:
++        if ( !(cp->x86_vendor & (X86_VENDOR_INTEL | X86_VENDOR_CENTAUR)) )
++            goto gp_fault;
++
++        if ( likely(!is_cpufreq_controller(d)) || wrmsr_safe(msr, val) == 0 )
++            break;
++        goto gp_fault;
++
+     default:
+         return X86EMUL_UNHANDLEABLE;
+     }
+diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
+index 8120ded330..755f00db33 100644
+--- a/xen/arch/x86/pv/emul-priv-op.c
++++ b/xen/arch/x86/pv/emul-priv-op.c
+@@ -816,12 +816,6 @@ static inline uint64_t guest_misc_enable(uint64_t val)
+     return val;
+ }
+ 
+-static inline bool is_cpufreq_controller(const struct domain *d)
+-{
+-    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
+-            is_hardware_domain(d));
+-}
+-
+ static int read_msr(unsigned int reg, uint64_t *val,
+                     struct x86_emulate_ctxt *ctxt)
+ {
+@@ -1096,14 +1090,6 @@ static int write_msr(unsigned int reg, uint64_t val,
+             return X86EMUL_OKAY;
+         break;
+ 
+-    case MSR_IA32_PERF_CTL:
+-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+-            break;
+-        if ( likely(!is_cpufreq_controller(currd)) ||
+-             wrmsr_safe(reg, val) == 0 )
+-            return X86EMUL_OKAY;
+-        break;
+-
+     case MSR_IA32_THERM_CONTROL:
+     case MSR_IA32_ENERGY_PERF_BIAS:
+         if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
+index c0cc5d9336..7e4ad5d51b 100644
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -920,6 +920,22 @@ extern enum cpufreq_controller {
+     FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
+ } cpufreq_controller;
+ 
++static always_inline bool is_cpufreq_controller(const struct domain *d)
++{
++    /*
++     * A PV dom0 can be nominated as the cpufreq controller, instead of using
++     * Xen's cpufreq driver, at which point dom0 gets direct access to certain
++     * MSRs.
++     *
++     * This interface only works when dom0 is identity pinned and has the same
++     * number of vCPUs as pCPUs on the system.
++     *
++     * It would be far better to paravirtualise the interface.
++     */
++    return (is_pv_domain(d) && is_hardware_domain(d) &&
++            cpufreq_controller == FREQCTL_dom0_kernel);
++}
++
+ #define CPUPOOLID_NONE    -1
+ 
+ struct cpupool *cpupool_get_by_id(int poolid);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-x86-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-x86-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-x86-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa351-x86-4.11-2.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,118 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/msr: Disallow guest access to the RAPL MSRs
+
+Researchers have demonstrated using the RAPL interface to perform a
+differential power analysis attack to recover AES keys used by other cores in
+the system.
+
+Furthermore, even privileged guests cannot use this interface correctly, due
+to MSR scope and vcpu scheduling issues.  The interface would want to be
+paravirtualised to be used sensibly.
+
+Disallow access to the RAPL MSRs completely, as well as other MSRs which
+potentially access fine grain power information.
+
+This is part of XSA-351.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
+index 3495ac9f4a..99c848ff41 100644
+--- a/xen/arch/x86/msr.c
++++ b/xen/arch/x86/msr.c
+@@ -156,6 +156,15 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
+     case MSR_TSX_FORCE_ABORT:
+     case MSR_TSX_CTRL:
+     case MSR_MCU_OPT_CTRL:
++    case MSR_RAPL_POWER_UNIT:
++    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
++    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
++    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
++    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
++    case MSR_PLATFORM_ENERGY_COUNTER:
++    case MSR_PLATFORM_POWER_LIMIT:
++    case MSR_F15H_CU_POWER ... MSR_F15H_CU_MAX_POWER:
++    case MSR_AMD_RAPL_POWER_UNIT ... MSR_AMD_PKG_ENERGY_STATUS:
+         /* Not offered to guests. */
+         goto gp_fault;
+ 
+@@ -266,6 +275,15 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
+     case MSR_TSX_FORCE_ABORT:
+     case MSR_TSX_CTRL:
+     case MSR_MCU_OPT_CTRL:
++    case MSR_RAPL_POWER_UNIT:
++    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
++    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
++    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
++    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
++    case MSR_PLATFORM_ENERGY_COUNTER:
++    case MSR_PLATFORM_POWER_LIMIT:
++    case MSR_F15H_CU_POWER ... MSR_F15H_CU_MAX_POWER:
++    case MSR_AMD_RAPL_POWER_UNIT ... MSR_AMD_PKG_ENERGY_STATUS:
+         /* Not offered to guests. */
+         goto gp_fault;
+ 
+diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
+index 480d1d8102..a685dcdcca 100644
+--- a/xen/include/asm-x86/msr-index.h
++++ b/xen/include/asm-x86/msr-index.h
+@@ -96,6 +96,38 @@
+ /* Lower 6 bits define the format of the address in the LBR stack */
+ #define MSR_IA32_PERF_CAP_LBR_FORMAT	0x3f
+ 
++/*
++ * Intel Runtime Average Power Limiting (RAPL) interface.  Power plane base
++ * addresses (MSR_*_POWER_LIMIT) are model specific, but have so-far been
++ * consistent since their introduction in SandyBridge.
++ *
++ * Offsets of functionality from the power plane base is architectural, but
++ * not all power planes support all functionality.
++ */
++#define MSR_RAPL_POWER_UNIT		0x00000606
++
++#define MSR_PKG_POWER_LIMIT		0x00000610
++#define MSR_PKG_ENERGY_STATUS		0x00000611
++#define MSR_PKG_PERF_STATUS		0x00000613
++#define MSR_PKG_POWER_INFO		0x00000614
++
++#define MSR_DRAM_POWER_LIMIT		0x00000618
++#define MSR_DRAM_ENERGY_STATUS		0x00000619
++#define MSR_DRAM_PERF_STATUS		0x0000061b
++#define MSR_DRAM_POWER_INFO		0x0000061c
++
++#define MSR_PP0_POWER_LIMIT		0x00000638
++#define MSR_PP0_ENERGY_STATUS		0x00000639
++#define MSR_PP0_POLICY			0x0000063a
++
++#define MSR_PP1_POWER_LIMIT		0x00000640
++#define MSR_PP1_ENERGY_STATUS		0x00000641
++#define MSR_PP1_POLICY			0x00000642
++
++/* Intel Platform-wide power interface. */
++#define MSR_PLATFORM_ENERGY_COUNTER	0x0000064d
++#define MSR_PLATFORM_POWER_LIMIT	0x0000065c
++
+ #define MSR_IA32_BNDCFGS		0x00000d90
+ #define IA32_BNDCFGS_ENABLE		0x00000001
+ #define IA32_BNDCFGS_PRESERVE		0x00000002
+@@ -218,6 +250,8 @@
+ #define MSR_K8_VM_CR			0xc0010114
+ #define MSR_K8_VM_HSAVE_PA		0xc0010117
+ 
++#define MSR_F15H_CU_POWER		0xc001007a
++#define MSR_F15H_CU_MAX_POWER		0xc001007b
+ #define MSR_AMD_FAM15H_EVNTSEL0		0xc0010200
+ #define MSR_AMD_FAM15H_PERFCTR0		0xc0010201
+ #define MSR_AMD_FAM15H_EVNTSEL1		0xc0010202
+@@ -231,6 +265,10 @@
+ #define MSR_AMD_FAM15H_EVNTSEL5		0xc001020a
+ #define MSR_AMD_FAM15H_PERFCTR5		0xc001020b
+ 
++#define MSR_AMD_RAPL_POWER_UNIT		0xc0010299
++#define MSR_AMD_CORE_ENERGY_STATUS	0xc001029a
++#define MSR_AMD_PKG_ENERGY_STATUS	0xc001029b
++
+ #define MSR_AMD_L7S0_FEATURE_MASK	0xc0011002
+ #define MSR_AMD_THRM_FEATURE_MASK	0xc0011003
+ #define MSR_K8_FEATURE_MASK		0xc0011004
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa352.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa352.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa352.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa352.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,42 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: only Dom0 can change node owner
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Otherwise we can give quota away to another domain, either causing it to run
+out of quota, or in case of Dom0 use unbounded amounts of memory and bypass
+the quota system entirely.
+
+This was fixed in the C version of xenstored in 2006 (c/s db34d2aaa5f5,
+predating the XSA process by 5 years).
+
+It was also fixed in the mirage version of xenstore in 2012, with a unit test
+demonstrating the vulnerability:
+
+  https://github.com/mirage/ocaml-xenstore/commit/6b91f3ac46b885d0530a51d57a9b3a57d64923a7
+  https://github.com/mirage/ocaml-xenstore/commit/22ee5417c90b8fda905c38de0d534506152eace6
+
+but possibly without realising that the vulnerability still affected the
+in-tree oxenstored (added c/s f44af660412 in 2010).
+
+This is XSA-352.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
+index 3b05128f1b..5f915f2bbe 100644
+--- a/tools/ocaml/xenstored/store.ml
++++ b/tools/ocaml/xenstored/store.ml
+@@ -407,7 +407,8 @@ let setperms store perm path nperms =
+ 	| Some node ->
+ 		let old_owner = Node.get_owner node in
+ 		let new_owner = Perms.Node.get_owner nperms in
+-		if not ((old_owner = new_owner) || (Perms.Connection.is_dom0 perm)) then Quota.check store.quota new_owner 0;
++		if not ((old_owner = new_owner) || (Perms.Connection.is_dom0 perm)) then
++			raise Define.Permission_denied;
+ 		store.root <- path_setperms store perm path nperms;
+ 		Quota.del_entry store.quota old_owner;
+ 		Quota.add_entry store.quota new_owner
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa353.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa353.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa353.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa353.patch	2022-04-05 13:04:23.000000000 +0100
@@ -0,0 +1,89 @@
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Subject: tools/ocaml/xenstored: do permission checks on xenstore root
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This was lacking in a disappointing number of places.
+
+The xenstore root node is treated differently from all other nodes, because it
+doesn't have a parent, and mutation requires changing the parent.
+
+Unfortunately this lead to open-coding the special case for root into every
+single xenstore operation, and out of all the xenstore operations only read
+did a permission check when handling the root node.
+
+This means that an unprivileged guest can:
+
+ * xenstore-chmod / to its liking and subsequently write new arbitrary nodes
+   there (subject to quota)
+ * xenstore-rm -r / deletes almost the entire xenstore tree (xenopsd quickly
+   refills some, but you are left with a broken system)
+ * DIRECTORY on / lists all children when called through python
+   bindings (xenstore-ls stops at /local because it tries to list recursively)
+ * get-perms on / works too, but that is just a minor information leak
+
+Add the missing permission checks, but this should really be refactored to do
+the root handling and permission checks on the node only once from a single
+function, instead of getting it wrong nearly everywhere.
+
+This is XSA-353.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
+index f299ec6461..92b6289b5e 100644
+--- a/tools/ocaml/xenstored/store.ml
++++ b/tools/ocaml/xenstored/store.ml
+@@ -273,15 +273,17 @@ let path_rm store perm path =
+ 			Node.del_childname node name
+ 		with Not_found ->
+ 			raise Define.Doesnt_exist in
+-	if path = [] then
++	if path = [] then (
++		Node.check_perm store.root perm Perms.WRITE;
+ 		Node.del_all_children store.root
+-	else
++	) else
+ 		Path.apply_modify store.root path do_rm
+ 
+ let path_setperms store perm path perms =
+-	if path = [] then
++	if path = [] then (
++		Node.check_perm store.root perm Perms.WRITE;
+ 		Node.set_perms store.root perms
+-	else
++	) else
+ 		let do_setperms node name =
+ 			let c = Node.find node name in
+ 			Node.check_owner c perm;
+@@ -313,9 +315,10 @@ let read store perm path =
+ 
+ let ls store perm path =
+ 	let children =
+-		if path = [] then
+-			(Node.get_children store.root)
+-		else
++		if path = [] then (
++			Node.check_perm store.root perm Perms.READ;
++			Node.get_children store.root
++		) else
+ 			let do_ls node name =
+ 				let cnode = Node.find node name in
+ 				Node.check_perm cnode perm Perms.READ;
+@@ -324,9 +327,10 @@ let ls store perm path =
+ 	List.rev (List.map (fun n -> Symbol.to_string n.Node.name) children)
+ 
+ let getperms store perm path =
+-	if path = [] then
+-		(Node.get_perms store.root)
+-	else
++	if path = [] then (
++		Node.check_perm store.root perm Perms.READ;
++		Node.get_perms store.root
++	) else
+ 		let fct n name =
+ 			let c = Node.find n name in
+ 			Node.check_perm c perm Perms.READ;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa355.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa355.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa355.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa355.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,23 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: memory: fix off-by-one in XSA-346 change
+
+The comparison against ARRAY_SIZE() needs to be >= in order to avoid
+overrunning the pages[] array.
+
+This is XSA-355.
+
+Fixes: 5777a3742d88 ("IOMMU: hold page ref until after deferred TLB flush")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/memory.c
++++ b/xen/common/memory.c
+@@ -854,7 +854,7 @@ int xenmem_add_to_physmap(struct domain
+             ++extra.ppage;
+ 
+         /* Check for continuation if it's not the last iteration. */
+-        if ( (++done > ARRAY_SIZE(pages) && extra.ppage) ||
++        if ( (++done >= ARRAY_SIZE(pages) && extra.ppage) ||
+              (xatp->size > done && hypercall_preempt_check()) )
+         {
+             rc = start + done;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa358-4.14.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa358-4.14.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa358-4.14.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa358-4.14.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,54 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn/FIFO: re-order and synchronize (with) map_control_block()
+
+For evtchn_fifo_set_pending()'s check of the control block having been
+set to be effective, ordering of respective reads and writes needs to be
+ensured: The control block pointer needs to be recorded strictly after
+the setting of all the queue heads, and it needs checking strictly
+before any uses of them (this latter aspect was already guaranteed).
+
+This is XSA-358 / CVE-2020-29570.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/event_fifo.c
++++ b/xen/common/event_fifo.c
+@@ -249,6 +249,10 @@ static void evtchn_fifo_set_pending(stru
+             goto unlock;
+         }
+ 
++        /*
++         * This also acts as the read counterpart of the smp_wmb() in
++         * map_control_block().
++         */
+         if ( guest_test_and_set_bit(d, EVTCHN_FIFO_LINKED, word) )
+             goto unlock;
+ 
+@@ -474,6 +478,7 @@ static int setup_control_block(struct vc
+ static int map_control_block(struct vcpu *v, uint64_t gfn, uint32_t offset)
+ {
+     void *virt;
++    struct evtchn_fifo_control_block *control_block;
+     unsigned int i;
+     int rc;
+ 
+@@ -484,10 +489,15 @@ static int map_control_block(struct vcpu
+     if ( rc < 0 )
+         return rc;
+ 
+-    v->evtchn_fifo->control_block = virt + offset;
++    control_block = virt + offset;
+ 
+     for ( i = 0; i <= EVTCHN_FIFO_PRIORITY_MIN; i++ )
+-        v->evtchn_fifo->queue[i].head = &v->evtchn_fifo->control_block->head[i];
++        v->evtchn_fifo->queue[i].head = &control_block->head[i];
++
++    /* All queue heads must have been set before setting the control block. */
++    smp_wmb();
++
++    v->evtchn_fifo->control_block = control_block;
+ 
+     return 0;
+ }
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa359.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa359.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa359.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa359.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,40 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: evtchn/FIFO: add 2nd smp_rmb() to evtchn_fifo_word_from_port()
+
+Besides with add_page_to_event_array() the function also needs to
+synchronize with evtchn_fifo_init_control() setting both d->evtchn_fifo
+and (subsequently) d->evtchn_port_ops.
+
+This is XSA-359 / CVE-2020-29571.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/event_fifo.c
++++ b/xen/common/event_fifo.c
+@@ -55,6 +55,13 @@ static inline event_word_t *evtchn_fifo_
+ {
+     unsigned int p, w;
+ 
++    /*
++     * Callers aren't required to hold d->event_lock, so we need to synchronize
++     * with evtchn_fifo_init_control() setting d->evtchn_port_ops /after/
++     * d->evtchn_fifo.
++     */
++    smp_rmb();
++
+     if ( unlikely(port >= d->evtchn_fifo->num_evtchns) )
+         return NULL;
+ 
+@@ -606,6 +613,10 @@ int evtchn_fifo_init_control(struct evtc
+         if ( rc < 0 )
+             goto error;
+ 
++        /*
++         * This call, as a side effect, synchronizes with
++         * evtchn_fifo_word_from_port().
++         */
+         rc = map_control_block(v, gfn, offset);
+         if ( rc < 0 )
+             goto error;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa364.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa364.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa364.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa364.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,69 @@
+From dadb5b4b21c904ce59024c686eb1c55be8f46c52 Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Thu, 21 Jan 2021 10:16:08 +0000
+Subject: [PATCH] xen/page_alloc: Only flush the page to RAM once we know they
+ are scrubbed
+
+At the moment, each page are flushed to RAM just after the allocator
+found some free pages. However, this is happening before check if the
+page was scrubbed.
+
+As a consequence, on Arm, a guest may be able to access the old content
+of the scrubbed pages if it has cache disabled (default at boot) and
+the content didn't reach the Point of Coherency.
+
+The flush is now moved after we know the content of the page will not
+change. This also has the benefit to reduce the amount of work happening
+with the heap_lock held.
+
+This is XSA-364.
+
+Fixes: 307c3be3ccb2 ("mm: Don't scrub pages while holding heap lock in alloc_heap_pages()")
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/common/page_alloc.c | 14 +++++++++-----
+ 1 file changed, 9 insertions(+), 5 deletions(-)
+
+diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
+index 02ac1fa613e7..1744e6faa5c4 100644
+--- a/xen/common/page_alloc.c
++++ b/xen/common/page_alloc.c
+@@ -924,6 +924,7 @@ static struct page_info *alloc_heap_pages(
+     bool need_tlbflush = false;
+     uint32_t tlbflush_timestamp = 0;
+     unsigned int dirty_cnt = 0;
++    mfn_t mfn;
+ 
+     /* Make sure there are enough bits in memflags for nodeID. */
+     BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+@@ -1022,11 +1023,6 @@ static struct page_info *alloc_heap_pages(
+         pg[i].u.inuse.type_info = 0;
+         page_set_owner(&pg[i], NULL);
+ 
+-        /* Ensure cache and RAM are consistent for platforms where the
+-         * guest can control its own visibility of/through the cache.
+-         */
+-        flush_page_to_ram(mfn_x(page_to_mfn(&pg[i])),
+-                          !(memflags & MEMF_no_icache_flush));
+     }
+ 
+     spin_unlock(&heap_lock);
+@@ -1062,6 +1058,14 @@ static struct page_info *alloc_heap_pages(
+     if ( need_tlbflush )
+         filtered_flush_tlb_mask(tlbflush_timestamp);
+ 
++    /*
++     * Ensure cache and RAM are consistent for platforms where the guest
++     * can control its own visibility of/through the cache.
++     */
++    mfn = page_to_mfn(pg);
++    for ( i = 0; i < (1U << order); i++ )
++        flush_page_to_ram(mfn_x(mfn) + i, !(memflags & MEMF_no_icache_flush));
++
+     return pg;
+ }
+ 
+-- 
+2.17.1
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa366-4.11.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa366-4.11.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa366-4.11.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa366-4.11.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,39 @@
+From: Roger Pau Monne <roger.pau@citrix.com>
+Subject: x86/ept: fix missing IOMMU flush in atomic_write_ept_entry
+
+Backport of XSA-321 missed a flush in atomic_write_ept_entry when
+level was different than 0. Such omission will undermine the fix for
+XSA-321, because page table entries cached in the IOMMU can get out
+of sync and contain stale entries.
+
+Fix this by slightly re-arranging the code to prevent the early return
+when level is different that 0. Note that the early return is just an
+optimization because foreign entries cannot have level > 0.
+
+This is XSA-366.
+
+Reported-by: M. Vefa Bicakci <m.v.b@runbox.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/arch/x86/mm/p2m-ept.c | 7 +------
+ 1 file changed, 1 insertion(+), 6 deletions(-)
+
+diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
+index 036771f43c..fde2f5f7e3 100644
+--- a/xen/arch/x86/mm/p2m-ept.c
++++ b/xen/arch/x86/mm/p2m-ept.c
+@@ -53,12 +53,7 @@ static int atomic_write_ept_entry(ept_entry_t *entryptr, ept_entry_t new,
+     bool_t check_foreign = (new.mfn != entryptr->mfn ||
+                             new.sa_p2mt != entryptr->sa_p2mt);
+ 
+-    if ( level )
+-    {
+-        ASSERT(!is_epte_superpage(&new) || !p2m_is_foreign(new.sa_p2mt));
+-        write_atomic(&entryptr->epte, new.epte);
+-        return 0;
+-    }
++    ASSERT(!level || !is_epte_superpage(&new) || !p2m_is_foreign(new.sa_p2mt));
+ 
+     if ( unlikely(p2m_is_foreign(new.sa_p2mt)) )
+     {
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-1.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,118 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: size qinval queue dynamically
+
+With the present synchronous model, we need two slots for every
+operation (the operation itself and a wait descriptor).  There can be
+one such pair of requests pending per CPU. To ensure that under all
+normal circumstances a slot is always available when one is requested,
+size the queue ring according to the number of present CPUs.
+
+This is part of XSA-373 / CVE-2021-28692.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/vtd/iommu.h
++++ b/xen/drivers/passthrough/vtd/iommu.h
+@@ -447,17 +447,9 @@ struct qinval_entry {
+     }q;
+ };
+ 
+-/* Order of queue invalidation pages(max is 8) */
+-#define QINVAL_PAGE_ORDER   2
+-
+-#define QINVAL_ARCH_PAGE_ORDER  (QINVAL_PAGE_ORDER + PAGE_SHIFT_4K - PAGE_SHIFT)
+-#define QINVAL_ARCH_PAGE_NR     ( QINVAL_ARCH_PAGE_ORDER < 0 ?  \
+-                                1 :                             \
+-                                1 << QINVAL_ARCH_PAGE_ORDER )
+-
+ /* Each entry is 16 bytes, so 2^8 entries per page */
+ #define QINVAL_ENTRY_ORDER  ( PAGE_SHIFT - 4 )
+-#define QINVAL_ENTRY_NR     (1 << (QINVAL_PAGE_ORDER + 8))
++#define QINVAL_MAX_ENTRY_NR (1u << (7 + QINVAL_ENTRY_ORDER))
+ 
+ /* Status data flag */
+ #define QINVAL_STAT_INIT  0
+--- a/xen/drivers/passthrough/vtd/qinval.c
++++ b/xen/drivers/passthrough/vtd/qinval.c
+@@ -31,6 +31,9 @@
+ 
+ #define VTD_QI_TIMEOUT	1
+ 
++static unsigned int __read_mostly qi_pg_order;
++static unsigned int __read_mostly qi_entry_nr;
++
+ static int __must_check invalidate_sync(struct iommu *iommu);
+ 
+ static void print_qi_regs(struct iommu *iommu)
+@@ -55,7 +58,7 @@ static unsigned int qinval_next_index(st
+     tail >>= QINVAL_INDEX_SHIFT;
+ 
+     /* (tail+1 == head) indicates a full queue, wait for HW */
+-    while ( ( tail + 1 ) % QINVAL_ENTRY_NR ==
++    while ( ((tail + 1) & (qi_entry_nr - 1)) ==
+             ( dmar_readq(iommu->reg, DMAR_IQH_REG) >> QINVAL_INDEX_SHIFT ) )
+         cpu_relax();
+ 
+@@ -68,7 +71,7 @@ static void qinval_update_qtail(struct i
+ 
+     /* Need hold register lock when update tail */
+     ASSERT( spin_is_locked(&iommu->register_lock) );
+-    val = (index + 1) % QINVAL_ENTRY_NR;
++    val = (index + 1) & (qi_entry_nr - 1);
+     dmar_writeq(iommu->reg, DMAR_IQT_REG, (val << QINVAL_INDEX_SHIFT));
+ }
+ 
+@@ -417,7 +420,27 @@ int enable_qinval(struct iommu *iommu)
+     if ( qi_ctrl->qinval_maddr == 0 )
+     {
+         drhd = iommu_to_drhd(iommu);
+-        qi_ctrl->qinval_maddr = alloc_pgtable_maddr(drhd, QINVAL_ARCH_PAGE_NR);
++        if ( !qi_entry_nr )
++        {
++            /*
++             * With the present synchronous model, we need two slots for every
++             * operation (the operation itself and a wait descriptor).  There
++             * can be one such pair of requests pending per CPU.  One extra
++             * entry is needed as the ring is considered full when there's
++             * only one entry left.
++             */
++            BUILD_BUG_ON(CONFIG_NR_CPUS * 2 >= QINVAL_MAX_ENTRY_NR);
++            qi_pg_order = get_order_from_bytes((num_present_cpus() * 2 + 1) <<
++                                               (PAGE_SHIFT -
++                                                QINVAL_ENTRY_ORDER));
++            qi_entry_nr = 1u << (qi_pg_order + QINVAL_ENTRY_ORDER);
++
++            dprintk(XENLOG_INFO VTDPREFIX,
++                    "QI: using %u-entry ring(s)\n", qi_entry_nr);
++        }
++
++        qi_ctrl->qinval_maddr =
++            alloc_pgtable_maddr(drhd, qi_entry_nr >> QINVAL_ENTRY_ORDER);
+         if ( qi_ctrl->qinval_maddr == 0 )
+         {
+             dprintk(XENLOG_WARNING VTDPREFIX,
+@@ -431,15 +454,16 @@ int enable_qinval(struct iommu *iommu)
+ 
+     spin_lock_irqsave(&iommu->register_lock, flags);
+ 
+-    /* Setup Invalidation Queue Address(IQA) register with the
+-     * address of the page we just allocated.  QS field at
+-     * bits[2:0] to indicate size of queue is one 4KB page.
+-     * That's 256 entries.  Queued Head (IQH) and Queue Tail (IQT)
+-     * registers are automatically reset to 0 with write
+-     * to IQA register.
++    /*
++     * Setup Invalidation Queue Address (IQA) register with the address of the
++     * pages we just allocated.  The QS field at bits[2:0] indicates the size
++     * (page order) of the queue.
++     *
++     * Queued Head (IQH) and Queue Tail (IQT) registers are automatically
++     * reset to 0 with write to IQA register.
+      */
+     dmar_writeq(iommu->reg, DMAR_IQA_REG,
+-                qi_ctrl->qinval_maddr | QINVAL_PAGE_ORDER);
++                qi_ctrl->qinval_maddr | qi_pg_order);
+ 
+     dmar_writeq(iommu->reg, DMAR_IQT_REG, 0);
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-2.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,111 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: size command buffer dynamically
+
+With the present synchronous model, we need two slots for every
+operation (the operation itself and a wait command).  There can be one
+such pair of commands pending per CPU. To ensure that under all normal
+circumstances a slot is always available when one is requested, size the
+command ring according to the number of present CPUs.
+
+This is part of XSA-373 / CVE-2021-28692.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_cmd.c
++++ b/xen/drivers/passthrough/amd/iommu_cmd.c
+@@ -24,8 +24,7 @@
+ 
+ static int queue_iommu_command(struct amd_iommu *iommu, u32 cmd[])
+ {
+-    u32 tail, head, *cmd_buffer;
+-    int i;
++    uint32_t tail, head;
+ 
+     tail = iommu->cmd_buffer.tail;
+     if ( ++tail == iommu->cmd_buffer.entries )
+@@ -35,12 +34,9 @@ static int queue_iommu_command(struct am
+                                       IOMMU_CMD_BUFFER_HEAD_OFFSET));
+     if ( head != tail )
+     {
+-        cmd_buffer = (u32 *)(iommu->cmd_buffer.buffer +
+-                             (iommu->cmd_buffer.tail *
+-                             IOMMU_CMD_BUFFER_ENTRY_SIZE));
+-
+-        for ( i = 0; i < IOMMU_CMD_BUFFER_U32_PER_ENTRY; i++ )
+-            cmd_buffer[i] = cmd[i];
++        memcpy(iommu->cmd_buffer.buffer +
++               (iommu->cmd_buffer.tail * sizeof(cmd_entry_t)),
++               cmd, sizeof(cmd_entry_t));
+ 
+         iommu->cmd_buffer.tail = tail;
+         return 1;
+--- a/xen/drivers/passthrough/amd/iommu_init.c
++++ b/xen/drivers/passthrough/amd/iommu_init.c
+@@ -136,7 +136,7 @@ static void register_iommu_cmd_buffer_in
+     writel(entry, iommu->mmio_base + IOMMU_CMD_BUFFER_BASE_LOW_OFFSET);
+ 
+     power_of2_entries = get_order_from_bytes(iommu->cmd_buffer.alloc_size) +
+-        IOMMU_CMD_BUFFER_POWER_OF2_ENTRIES_PER_PAGE;
++        PAGE_SHIFT - IOMMU_CMD_BUFFER_ENTRY_ORDER;
+ 
+     entry = 0;
+     iommu_set_addr_hi_to_reg(&entry, addr_hi);
+@@ -1000,9 +1000,31 @@ static void * __init allocate_ring_buffe
+ static void * __init allocate_cmd_buffer(struct amd_iommu *iommu)
+ {
+     /* allocate 'command buffer' in power of 2 increments of 4K */
++    static unsigned int __read_mostly nr_ents;
++
++    if ( !nr_ents )
++    {
++        unsigned int order;
++
++        /*
++         * With the present synchronous model, we need two slots for every
++         * operation (the operation itself and a wait command).  There can be
++         * one such pair of requests pending per CPU.  One extra entry is
++         * needed as the ring is considered full when there's only one entry
++         * left.
++         */
++        BUILD_BUG_ON(CONFIG_NR_CPUS * 2 >= IOMMU_CMD_BUFFER_MAX_ENTRIES);
++        order = get_order_from_bytes((num_present_cpus() * 2 + 1) <<
++                                     IOMMU_CMD_BUFFER_ENTRY_ORDER);
++        nr_ents = 1u << (order + PAGE_SHIFT - IOMMU_CMD_BUFFER_ENTRY_ORDER);
++
++        AMD_IOMMU_DEBUG("using %u-entry cmd ring(s)\n", nr_ents);
++    }
++
++    BUILD_BUG_ON(sizeof(cmd_entry_t) != (1u << IOMMU_CMD_BUFFER_ENTRY_ORDER));
++
+     return allocate_ring_buffer(&iommu->cmd_buffer, sizeof(cmd_entry_t),
+-                                IOMMU_CMD_BUFFER_DEFAULT_ENTRIES,
+-                                "Command Buffer");
++                                nr_ents, "Command Buffer");
+ }
+ 
+ static void * __init allocate_event_log(struct amd_iommu *iommu)
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+@@ -20,9 +20,6 @@
+ #ifndef _ASM_X86_64_AMD_IOMMU_DEFS_H
+ #define _ASM_X86_64_AMD_IOMMU_DEFS_H
+ 
+-/* IOMMU Command Buffer entries: in power of 2 increments, minimum of 256 */
+-#define IOMMU_CMD_BUFFER_DEFAULT_ENTRIES	512
+-
+ /* IOMMU Event Log entries: in power of 2 increments, minimum of 256 */
+ #define IOMMU_EVENT_LOG_DEFAULT_ENTRIES     512
+ 
+@@ -185,9 +182,8 @@
+ #define IOMMU_CMD_BUFFER_LENGTH_MASK		0x0F000000
+ #define IOMMU_CMD_BUFFER_LENGTH_SHIFT		24
+ 
+-#define IOMMU_CMD_BUFFER_ENTRY_SIZE			16
+-#define IOMMU_CMD_BUFFER_POWER_OF2_ENTRIES_PER_PAGE	8
+-#define IOMMU_CMD_BUFFER_U32_PER_ENTRY 	(IOMMU_CMD_BUFFER_ENTRY_SIZE / 4)
++#define IOMMU_CMD_BUFFER_ENTRY_ORDER            4
++#define IOMMU_CMD_BUFFER_MAX_ENTRIES            (1u << 15)
+ 
+ #define IOMMU_CMD_OPCODE_MASK			0xF0000000
+ #define IOMMU_CMD_OPCODE_SHIFT			28
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-3.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-3.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-3.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-3.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,163 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: eliminate flush related timeouts
+
+Leaving an in-progress operation pending when it appears to take too
+long is problematic: If e.g. a QI command completed later, the write to
+the "poll slot" may instead be understood to signal a subsequently
+started command's completion. Also our accounting of the timeout period
+was actually wrong: We included the time it took for the command to
+actually make it to the front of the queue, which could be heavily
+affected by guests other than the one for which the flush is being
+performed.
+
+Do away with all timeout detection on all flush related code paths.
+Log excessively long processing times (with a progressive threshold) to
+have some indication of problems in this area.
+
+Additionally log (once) if qinval_next_index() didn't immediately find
+an available slot. Together with the earlier change sizing the queue(s)
+dynamically, we should now have a guarantee that with our fully
+synchronous model any demand for slots can actually be satisfied.
+
+This is part of XSA-373 / CVE-2021-28692.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/vtd/dmar.h
++++ b/xen/drivers/passthrough/vtd/dmar.h
+@@ -127,6 +127,34 @@ do {
+     }                                                           \
+ } while (0)
+ 
++#define IOMMU_FLUSH_WAIT(what, iommu, offset, op, cond, sts)       \
++do {                                                               \
++    static unsigned int __read_mostly threshold = 1;               \
++    s_time_t start = NOW();                                        \
++    s_time_t timeout = start + DMAR_OPERATION_TIMEOUT * threshold; \
++                                                                   \
++    for ( ; ; )                                                    \
++    {                                                              \
++        sts = op(iommu->reg, offset);                              \
++        if ( cond )                                                \
++            break;                                                 \
++        if ( timeout && NOW() > timeout )                          \
++        {                                                          \
++            threshold |= threshold << 1;                           \
++            printk(XENLOG_WARNING VTDPREFIX                        \
++                   " IOMMU#%u: %s flush taking too long\n",        \
++                   iommu->index, what);                            \
++            timeout = 0;                                           \
++        }                                                          \
++        cpu_relax();                                               \
++    }                                                              \
++                                                                   \
++    if ( !timeout )                                                \
++        printk(XENLOG_WARNING VTDPREFIX                            \
++               " IOMMU#%u: %s flush took %lums\n",                 \
++               iommu->index, what, (NOW() - start) / 10000000);    \
++} while ( false )
++
+ int vtd_hw_check(void);
+ void disable_pmr(struct iommu *iommu);
+ int is_igd_drhd(struct acpi_drhd_unit *drhd);
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -357,8 +357,8 @@ static void iommu_flush_write_buffer(str
+     dmar_writel(iommu->reg, DMAR_GCMD_REG, val | DMA_GCMD_WBF);
+ 
+     /* Make sure hardware complete it */
+-    IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG, dmar_readl,
+-                  !(val & DMA_GSTS_WBFS), val);
++    IOMMU_FLUSH_WAIT("write buffer", iommu, DMAR_GSTS_REG, dmar_readl,
++                     !(val & DMA_GSTS_WBFS), val);
+ 
+     spin_unlock_irqrestore(&iommu->register_lock, flags);
+ }
+@@ -408,8 +408,8 @@ static int __must_check flush_context_re
+     dmar_writeq(iommu->reg, DMAR_CCMD_REG, val);
+ 
+     /* Make sure hardware complete it */
+-    IOMMU_WAIT_OP(iommu, DMAR_CCMD_REG, dmar_readq,
+-                  !(val & DMA_CCMD_ICC), val);
++    IOMMU_FLUSH_WAIT("context", iommu, DMAR_CCMD_REG, dmar_readq,
++                     !(val & DMA_CCMD_ICC), val);
+ 
+     spin_unlock_irqrestore(&iommu->register_lock, flags);
+     /* flush context entry will implicitly flush write buffer */
+@@ -491,8 +491,8 @@ static int __must_check flush_iotlb_reg(
+     dmar_writeq(iommu->reg, tlb_offset + 8, val);
+ 
+     /* Make sure hardware complete it */
+-    IOMMU_WAIT_OP(iommu, (tlb_offset + 8), dmar_readq,
+-                  !(val & DMA_TLB_IVT), val);
++    IOMMU_FLUSH_WAIT("iotlb", iommu, (tlb_offset + 8), dmar_readq,
++                     !(val & DMA_TLB_IVT), val);
+     spin_unlock_irqrestore(&iommu->register_lock, flags);
+ 
+     /* check IOTLB invalidation granularity */
+--- a/xen/drivers/passthrough/vtd/qinval.c
++++ b/xen/drivers/passthrough/vtd/qinval.c
+@@ -29,8 +29,6 @@
+ #include "extern.h"
+ #include "../ats.h"
+ 
+-#define VTD_QI_TIMEOUT	1
+-
+ static unsigned int __read_mostly qi_pg_order;
+ static unsigned int __read_mostly qi_entry_nr;
+ 
+@@ -60,7 +58,11 @@ static unsigned int qinval_next_index(st
+     /* (tail+1 == head) indicates a full queue, wait for HW */
+     while ( ((tail + 1) & (qi_entry_nr - 1)) ==
+             ( dmar_readq(iommu->reg, DMAR_IQH_REG) >> QINVAL_INDEX_SHIFT ) )
++    {
++        printk_once(XENLOG_ERR VTDPREFIX " IOMMU#%u: no QI slot available\n",
++                    iommu->index);
+         cpu_relax();
++    }
+ 
+     return tail;
+ }
+@@ -180,23 +182,32 @@ static int __must_check queue_invalidate
+     /* Now we don't support interrupt method */
+     if ( sw )
+     {
+-        s_time_t timeout;
+-
+-        /* In case all wait descriptor writes to same addr with same data */
+-        timeout = NOW() + MILLISECS(flush_dev_iotlb ?
+-                                    iommu_dev_iotlb_timeout : VTD_QI_TIMEOUT);
++        static unsigned int __read_mostly threshold = 1;
++        s_time_t start = NOW();
++        s_time_t timeout = start + (flush_dev_iotlb
++                                    ? iommu_dev_iotlb_timeout
++                                    : 100) * MILLISECS(threshold);
+ 
+         while ( ACCESS_ONCE(*this_poll_slot) != QINVAL_STAT_DONE )
+         {
+-            if ( NOW() > timeout )
++            if ( timeout && NOW() > timeout )
+             {
+-                print_qi_regs(iommu);
++                threshold |= threshold << 1;
+                 printk(XENLOG_WARNING VTDPREFIX
+-                       " Queue invalidate wait descriptor timed out\n");
+-                return -ETIMEDOUT;
++                       " IOMMU#%u: QI%s wait descriptor taking too long\n",
++                       iommu->index, flush_dev_iotlb ? " dev" : "");
++                print_qi_regs(iommu);
++                timeout = 0;
+             }
+             cpu_relax();
+         }
++
++        if ( !timeout )
++            printk(XENLOG_WARNING VTDPREFIX
++                   " IOMMU#%u: QI%s wait descriptor took %lums\n",
++                   iommu->index, flush_dev_iotlb ? " dev" : "",
++                   (NOW() - start) / 10000000);
++
+         return 0;
+     }
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-4.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-4.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-4.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-4.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,86 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: wait for command slot to be available
+
+No caller cared about send_iommu_command() indicating unavailability of
+a slot. Hence if a sufficient number prior commands timed out, we did
+blindly assume that the requested command was submitted to the IOMMU
+when really it wasn't. This could mean both a hanging system (waiting
+for a command to complete that was never seen by the IOMMU) or blindly
+propagating success back to callers, making them believe they're fine
+to e.g. free previously unmapped pages.
+
+Fold the three involved functions into one, add spin waiting for an
+available slot along the lines of VT-d's qinval_next_index(), and as a
+consequence drop all error indicator return types/values.
+
+This is part of XSA-373 / CVE-2021-28692.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_cmd.c
++++ b/xen/drivers/passthrough/amd/iommu_cmd.c
+@@ -22,48 +22,36 @@
+ #include <asm/hvm/svm/amd-iommu-proto.h>
+ #include "../ats.h"
+ 
+-static int queue_iommu_command(struct amd_iommu *iommu, u32 cmd[])
++static void send_iommu_command(struct amd_iommu *iommu,
++                               const uint32_t cmd[4])
+ {
+-    uint32_t tail, head;
++    uint32_t tail;
+ 
+     tail = iommu->cmd_buffer.tail;
+     if ( ++tail == iommu->cmd_buffer.entries )
+         tail = 0;
+ 
+-    head = iommu_get_rb_pointer(readl(iommu->mmio_base +
+-                                      IOMMU_CMD_BUFFER_HEAD_OFFSET));
+-    if ( head != tail )
++    while ( tail == iommu_get_rb_pointer(readl(iommu->mmio_base +
++                                               IOMMU_CMD_BUFFER_HEAD_OFFSET)) )
+     {
+-        memcpy(iommu->cmd_buffer.buffer +
+-               (iommu->cmd_buffer.tail * sizeof(cmd_entry_t)),
+-               cmd, sizeof(cmd_entry_t));
+-
+-        iommu->cmd_buffer.tail = tail;
+-        return 1;
++        printk_once(XENLOG_ERR
++                    "AMD IOMMU %04x:%02x:%02x.%u: no cmd slot available\n",
++                    iommu->seg, PCI_BUS(iommu->bdf),
++                    PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf));
++        cpu_relax();
+     }
+ 
+-    return 0;
+-}
++    memcpy(iommu->cmd_buffer.buffer +
++           (iommu->cmd_buffer.tail * sizeof(cmd_entry_t)),
++           cmd, sizeof(cmd_entry_t));
+ 
+-static void commit_iommu_command_buffer(struct amd_iommu *iommu)
+-{
+-    u32 tail = 0;
++    iommu->cmd_buffer.tail = tail;
+ 
++    tail = 0;
+     iommu_set_rb_pointer(&tail, iommu->cmd_buffer.tail);
+     writel(tail, iommu->mmio_base+IOMMU_CMD_BUFFER_TAIL_OFFSET);
+ }
+ 
+-int send_iommu_command(struct amd_iommu *iommu, u32 cmd[])
+-{
+-    if ( queue_iommu_command(iommu, cmd) )
+-    {
+-        commit_iommu_command_buffer(iommu);
+-        return 1;
+-    }
+-
+-    return 0;
+-}
+-
+ static void flush_command_buffer(struct amd_iommu *iommu)
+ {
+     u32 cmd[4], status;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-5.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-5.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-5.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa373-4.11-5.patch	2022-04-05 13:04:24.000000000 +0100
@@ -0,0 +1,145 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: drop command completion timeout
+
+First and foremost - such timeouts were not signaled to callers, making
+them believe they're fine to e.g. free previously unmapped pages.
+
+Mirror VT-d's behavior: A fixed number of loop iterations is not a
+suitable way to detect timeouts in an environment (CPU and bus speeds)
+independent manner anyway. Furthermore, leaving an in-progress operation
+pending when it appears to take too long is problematic: If a command
+completed later, the signaling of its completion may instead be
+understood to signal a subsequently started command's completion.
+
+Log excessively long processing times (with a progressive threshold) to
+have some indication of problems in this area. Allow callers to specify
+a non-default timeout bias for this logging, using the same values as
+VT-d does, which in particular means a (by default) much larger value
+for device IO TLB invalidation.
+
+This is part of XSA-373 / CVE-2021-28692.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_cmd.c
++++ b/xen/drivers/passthrough/amd/iommu_cmd.c
+@@ -52,10 +52,12 @@ static void send_iommu_command(struct am
+     writel(tail, iommu->mmio_base+IOMMU_CMD_BUFFER_TAIL_OFFSET);
+ }
+ 
+-static void flush_command_buffer(struct amd_iommu *iommu)
++static void flush_command_buffer(struct amd_iommu *iommu,
++                                 unsigned int timeout_base)
+ {
+-    u32 cmd[4], status;
+-    int loop_count, comp_wait;
++    uint32_t cmd[4];
++    s_time_t start, timeout;
++    static unsigned int __read_mostly threshold = 1;
+ 
+     /* RW1C 'ComWaitInt' in status register */
+     writel(IOMMU_STATUS_COMP_WAIT_INT_MASK,
+@@ -71,24 +73,31 @@ static void flush_command_buffer(struct
+                          IOMMU_COMP_WAIT_I_FLAG_SHIFT, &cmd[0]);
+     send_iommu_command(iommu, cmd);
+ 
+-    /* Make loop_count long enough for polling completion wait bit */
+-    loop_count = 1000;
+-    do {
+-        status = readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
+-        comp_wait = get_field_from_reg_u32(status,
+-                                           IOMMU_STATUS_COMP_WAIT_INT_MASK,
+-                                           IOMMU_STATUS_COMP_WAIT_INT_SHIFT);
+-        --loop_count;
+-    } while ( !comp_wait && loop_count );
+-
+-    if ( comp_wait )
++    start = NOW();
++    timeout = start + (timeout_base ?: 100) * MILLISECS(threshold);
++    while ( !(readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET) &
++              IOMMU_STATUS_COMP_WAIT_INT_MASK) )
+     {
+-        /* RW1C 'ComWaitInt' in status register */
+-        writel(IOMMU_STATUS_COMP_WAIT_INT_MASK,
+-               iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
+-        return;
++        if ( timeout && NOW() > timeout )
++        {
++            threshold |= threshold << 1;
++            printk(XENLOG_WARNING
++                   "AMD IOMMU %04x:%02x:%02x.%u: %scompletion wait taking too long\n",
++                   iommu->seg, PCI_BUS(iommu->bdf),
++                   PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
++                   timeout_base ? "iotlb " : "");
++            timeout = 0;
++        }
++        cpu_relax();
+     }
+-    AMD_IOMMU_DEBUG("Warning: ComWaitInt bit did not assert!\n");
++
++    if ( !timeout )
++        printk(XENLOG_WARNING
++               "AMD IOMMU %04x:%02x:%02x.%u: %scompletion wait took %lums\n",
++               iommu->seg, PCI_BUS(iommu->bdf),
++               PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
++               timeout_base ? "iotlb " : "",
++               (NOW() - start) / 10000000);
+ }
+ 
+ /* Build low level iommu command messages */
+@@ -300,7 +309,7 @@ void amd_iommu_flush_iotlb(u8 devfn, con
+     /* send INVALIDATE_IOTLB_PAGES command */
+     spin_lock_irqsave(&iommu->lock, flags);
+     invalidate_iotlb_pages(iommu, maxpend, 0, queueid, gaddr, req_id, order);
+-    flush_command_buffer(iommu);
++    flush_command_buffer(iommu, iommu_dev_iotlb_timeout);
+     spin_unlock_irqrestore(&iommu->lock, flags);
+ }
+ 
+@@ -337,7 +346,7 @@ static void _amd_iommu_flush_pages(struc
+     {
+         spin_lock_irqsave(&iommu->lock, flags);
+         invalidate_iommu_pages(iommu, gaddr, dom_id, order);
+-        flush_command_buffer(iommu);
++        flush_command_buffer(iommu, 0);
+         spin_unlock_irqrestore(&iommu->lock, flags);
+     }
+ 
+@@ -361,7 +370,7 @@ void amd_iommu_flush_device(struct amd_i
+     ASSERT( spin_is_locked(&iommu->lock) );
+ 
+     invalidate_dev_table_entry(iommu, bdf);
+-    flush_command_buffer(iommu);
++    flush_command_buffer(iommu, 0);
+ }
+ 
+ void amd_iommu_flush_intremap(struct amd_iommu *iommu, uint16_t bdf)
+@@ -369,7 +378,7 @@ void amd_iommu_flush_intremap(struct amd
+     ASSERT( spin_is_locked(&iommu->lock) );
+ 
+     invalidate_interrupt_table(iommu, bdf);
+-    flush_command_buffer(iommu);
++    flush_command_buffer(iommu, 0);
+ }
+ 
+ void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
+@@ -377,7 +386,7 @@ void amd_iommu_flush_all_caches(struct a
+     ASSERT( spin_is_locked(&iommu->lock) );
+ 
+     invalidate_iommu_all(iommu);
+-    flush_command_buffer(iommu);
++    flush_command_buffer(iommu, 0);
+ }
+ 
+ void amd_iommu_send_guest_cmd(struct amd_iommu *iommu, u32 cmd[])
+@@ -387,7 +396,8 @@ void amd_iommu_send_guest_cmd(struct amd
+     spin_lock_irqsave(&iommu->lock, flags);
+ 
+     send_iommu_command(iommu, cmd);
+-    flush_command_buffer(iommu);
++    /* TBD: Timeout selection may require peeking into cmd[]. */
++    flush_command_buffer(iommu, 0);
+ 
+     spin_unlock_irqrestore(&iommu->lock, flags);
+ }
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa375-4.12.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa375-4.12.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa375-4.12.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa375-4.12.patch	2022-04-05 13:04:25.000000000 +0100
@@ -0,0 +1,50 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/spec-ctrl: Protect against Speculative Code Store Bypass
+
+Modern x86 processors have far-better-than-architecturally-guaranteed self
+modifying code detection.  Typically, when a write hits an instruction in
+flight, a Machine Clear occurs to flush stale content in the frontend and
+backend.
+
+For self modifying code, before a write which hits an instruction in flight
+retires, the frontend can speculatively decode and execute the old instruction
+stream.  Speculation of this form can suffer from type confusion in registers,
+and potentially leak data.
+
+Furthermore, updates are typically byte-wise, rather than atomic.  Depending
+on timing, speculation can race ahead multiple times between individual
+writes, and execute the transiently-malformed instruction stream.
+
+Xen has stubs which are used in certain cases for emulation purposes.  Inhibit
+speculation between updating the stub and executing it.
+
+This is XSA-375 / CVE-2021-0089.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
+index 6dc4f92a84..59c15ca0e7 100644
+--- a/xen/arch/x86/pv/emul-priv-op.c
++++ b/xen/arch/x86/pv/emul-priv-op.c
+@@ -97,6 +97,8 @@ static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
+     BUILD_BUG_ON(STUB_BUF_SIZE / 2 < MAX(9, /* Default emul stub */
+                                          5 + IOEMUL_QUIRK_STUB_BYTES));
+ 
++    asm volatile ( "lfence" ::: "memory" ); /* SCSB */
++
+     /* Handy function-typed pointer to the stub. */
+     return (void *)stub_va;
+ }
+diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
+index bba6dd0187..cd123492a6 100644
+--- a/xen/arch/x86/x86_emulate/x86_emulate.c
++++ b/xen/arch/x86/x86_emulate/x86_emulate.c
+@@ -1093,6 +1093,7 @@ static inline int mkec(uint8_t e, int32_t ec, ...)
+ # define invoke_stub(pre, post, constraints...) do {                    \
+     stub_exn.info = (union stub_exception_token) { .raw = ~0 };         \
+     stub_exn.line = __LINE__; /* Utility outweighs livepatching cost */ \
++    asm volatile ( "lfence" ::: "memory" ); /* SCSB */                  \
+     asm volatile ( pre "\n\tINDIRECT_CALL %[stub]\n\t" post "\n"        \
+                    ".Lret%=:\n\t"                                       \
+                    ".pushsection .fixup,\"ax\"\n"                       \
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa377-4.11.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa377-4.11.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa377-4.11.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa377-4.11.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,27 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/spec-ctrl: Mitigate TAA after S3 resume
+
+The user chosen setting for MSR_TSX_CTRL needs restoring after S3.
+
+All APs get the correct setting via start_secondary(), but the BSP was missed
+out.
+
+This is XSA-377 / CVE-2021-28690.
+
+Fixes: 8c4330818f6 ("x86/spec-ctrl: Mitigate the TSX Asynchronous Abort sidechannel")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
+index 30e1bd5cd3..451cba622c 100644
+--- a/xen/arch/x86/acpi/power.c
++++ b/xen/arch/x86/acpi/power.c
+@@ -259,6 +259,8 @@ static int enter_state(u32 state)
+ 
+     microcode_resume_cpu(0);
+ 
++    tsx_init(); /* Needs microcode.  May change HLE/RTM feature bits. */
++
+     if ( !recheck_cpu_features(0) )
+         panic("Missing previously available feature(s).");
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0a.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0a.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0a.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0a.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,75 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/p2m: fix PoD accounting in guest_physmap_add_entry()
+
+The initial observation was that the mfn_valid() check comes too late:
+Neither mfn_add() nor mfn_to_page() (let alone de-referencing the
+result of the latter) are valid for MFNs failing this check. Move it up
+and - noticing that there's no caller doing so - also add an assertion
+that this should never produce "false" here.
+
+In turn this would have meant that the "else" to that if() could now go
+away, which didn't seem right at all. And indeed, considering callers
+like memory_exchange() or various grant table functions, the PoD
+accounting should have been outside of that if() from the very
+beginning.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -794,6 +794,12 @@ guest_physmap_add_entry(struct domain *d
+     if ( p2m_is_foreign(t) )
+         return -EINVAL;
+ 
++    if ( !mfn_valid(mfn) )
++    {
++        ASSERT_UNREACHABLE();
++        return -EINVAL;
++    }
++
+     p2m_lock(p2m);
+ 
+     P2M_DEBUG("adding gfn=%#lx mfn=%#lx\n", gfn_x(gfn), mfn_x(mfn));
+@@ -894,12 +900,13 @@ guest_physmap_add_entry(struct domain *d
+     }
+ 
+     /* Now, actually do the two-way mapping */
+-    if ( mfn_valid(mfn) )
++    rc = p2m_set_entry(p2m, gfn, mfn, page_order, t, p2m->default_access);
++    if ( rc == 0 )
+     {
+-        rc = p2m_set_entry(p2m, gfn, mfn, page_order, t,
+-                           p2m->default_access);
+-        if ( rc )
+-            goto out; /* Failed to update p2m, bail without updating m2p. */
++        pod_lock(p2m);
++        p2m->pod.entry_count -= pod_count;
++        BUG_ON(p2m->pod.entry_count < 0);
++        pod_unlock(p2m);
+ 
+         if ( !p2m_is_grant(t) )
+         {
+@@ -908,22 +915,7 @@ guest_physmap_add_entry(struct domain *d
+                                   gfn_x(gfn_add(gfn, i)));
+         }
+     }
+-    else
+-    {
+-        gdprintk(XENLOG_WARNING, "Adding bad mfn to p2m map (%#lx -> %#lx)\n",
+-                 gfn_x(gfn), mfn_x(mfn));
+-        rc = p2m_set_entry(p2m, gfn, INVALID_MFN, page_order,
+-                           p2m_invalid, p2m->default_access);
+-        if ( rc == 0 )
+-        {
+-            pod_lock(p2m);
+-            p2m->pod.entry_count -= pod_count;
+-            BUG_ON(p2m->pod.entry_count < 0);
+-            pod_unlock(p2m);
+-        }
+-    }
+ 
+-out:
+     p2m_unlock(p2m);
+ 
+     return rc;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0b.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0b.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0b.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0b.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,60 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/p2m: don't ignore p2m_remove_page()'s return value
+
+It's not very nice to return from guest_physmap_add_entry() after
+perhaps already having made some changes to the P2M, but this is pre-
+existing practice in the function, and imo better than ignoring errors.
+
+Take the liberty and replace an mfn_add() instance with a local variable
+already holding the result (as proven by the check immediately ahead).
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -702,8 +702,7 @@ void p2m_final_teardown(struct domain *d
+     p2m_teardown_hostp2m(d);
+ }
+ 
+-
+-static int
++static int __must_check
+ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn_l, unsigned long mfn,
+                 unsigned int page_order)
+ {
+@@ -892,9 +891,9 @@ guest_physmap_add_entry(struct domain *d
+                 ASSERT(mfn_valid(omfn));
+                 P2M_DEBUG("old gfn=%#lx -> mfn %#lx\n",
+                           gfn_x(ogfn) , mfn_x(omfn));
+-                if ( mfn_eq(omfn, mfn_add(mfn, i)) )
+-                    p2m_remove_page(p2m, gfn_x(ogfn), mfn_x(mfn_add(mfn, i)),
+-                                    0);
++                if ( mfn_eq(omfn, mfn_add(mfn, i)) &&
++                     (rc = p2m_remove_page(p2m, gfn_x(ogfn), mfn_x(omfn), 0)) )
++                    goto out;
+             }
+         }
+     }
+@@ -916,6 +915,7 @@ guest_physmap_add_entry(struct domain *d
+         }
+     }
+ 
++ out:
+     p2m_unlock(p2m);
+ 
+     return rc;
+@@ -2385,9 +2385,9 @@ int p2m_change_altp2m_gfn(struct domain
+ 
+     if ( gfn_eq(new_gfn, INVALID_GFN) )
+     {
+-        if ( mfn_valid(mfn) )
+-            p2m_remove_page(ap2m, gfn_x(old_gfn), mfn_x(mfn), PAGE_ORDER_4K);
+-        rc = 0;
++        rc = mfn_valid(mfn)
++             ? p2m_remove_page(ap2m, gfn_x(old_gfn), mfn_x(mfn), PAGE_ORDER_4K)
++             : 0;
+         goto out;
+     }
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0c.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0c.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0c.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-0c.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,57 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/p2m: don't assert that the passed in MFN matches for a remove
+
+guest_physmap_remove_page() gets handed an MFN from the outside, yet
+takes the necessary lock to prevent further changes to the GFN <-> MFN
+mapping itself. While some callers, in particular guest_remove_page()
+(by way of having called get_gfn_query()), hold the GFN lock already,
+various others (most notably perhaps the 2nd instance in
+xenmem_add_to_physmap_one()) don't. While it also is an option to fix
+all the callers, deal with the issue in p2m_remove_page() instead:
+Replace the ASSERT() by a conditional and split the loop into two, such
+that all checking gets done before any modification would occur.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -708,7 +708,6 @@ p2m_remove_page(struct p2m_domain *p2m,
+ {
+     unsigned long i;
+     gfn_t gfn = _gfn(gfn_l);
+-    mfn_t mfn_return;
+     p2m_type_t t;
+     p2m_access_t a;
+ 
+@@ -719,15 +718,26 @@ p2m_remove_page(struct p2m_domain *p2m,
+     ASSERT(gfn_locked_by_me(p2m, gfn));
+     P2M_DEBUG("removing gfn=%#lx mfn=%#lx\n", gfn_l, mfn);
+ 
++    for ( i = 0; i < (1UL << page_order); )
++    {
++        unsigned int cur_order;
++        mfn_t mfn_return = p2m->get_entry(p2m, gfn_add(gfn, i), &t, &a, 0,
++                                          &cur_order, NULL);
++
++        if ( p2m_is_valid(t) &&
++             (!mfn_valid(_mfn(mfn)) || mfn + i != mfn_x(mfn_return)) )
++            return -EILSEQ;
++
++        i += (1UL << cur_order) - ((gfn_l + i) & ((1UL << cur_order) - 1));
++    }
++
+     if ( mfn_valid(_mfn(mfn)) )
+     {
+         for ( i = 0; i < (1UL << page_order); i++ )
+         {
+-            mfn_return = p2m->get_entry(p2m, gfn_add(gfn, i), &t, &a, 0,
+-                                        NULL, NULL);
++            p2m->get_entry(p2m, gfn_add(gfn, i), &t, &a, 0, NULL, NULL);
+             if ( !p2m_is_grant(t) && !p2m_is_shared(t) && !p2m_is_foreign(t) )
+                 set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY);
+-            ASSERT( !p2m_is_valid(t) || mfn + i == mfn_x(mfn_return) );
+         }
+     }
+     return p2m_set_entry(p2m, gfn, INVALID_MFN, page_order, p2m_invalid,
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-1.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,142 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: correct global exclusion range extending
+
+Besides unity mapping regions, the AMD IOMMU spec also provides for
+exclusion ranges (areas of memory not to be subject to DMA translation)
+to be specified by firmware in the ACPI tables. The spec does not put
+any constraints on the number of such regions.
+
+Blindly assuming all addresses between any two such ranges should also
+be excluded can't be right. Since hardware has room for just a single
+such range (comprised of the Exclusion Base Register and the Exclusion
+Range Limit Register), combine only adjacent or overlapping regions (for
+now; this may require further adjustment in case table entries aren't
+sorted by address) with matching exclusion_allow_all settings. This
+requires bubbling up error indicators, such that IOMMU init can be
+failed when concatenation wasn't possible.
+
+Furthermore, since the exclusion range specified in IOMMU registers
+implies R/W access, reject requests asking for less permissions (this
+will be brought closer to the spec by a subsequent change).
+
+This is part of XSA-378 / CVE-2021-28695.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_acpi.c
++++ b/xen/drivers/passthrough/amd/iommu_acpi.c
+@@ -98,12 +98,21 @@ static struct amd_iommu * __init find_io
+     return NULL;
+ }
+ 
+-static void __init reserve_iommu_exclusion_range(
+-    struct amd_iommu *iommu, uint64_t base, uint64_t limit)
++static int __init reserve_iommu_exclusion_range(
++    struct amd_iommu *iommu, uint64_t base, uint64_t limit,
++    bool all, bool iw, bool ir)
+ {
++    if ( !ir || !iw )
++        return -EPERM;
++
+     /* need to extend exclusion range? */
+     if ( iommu->exclusion_enable )
+     {
++        if ( iommu->exclusion_limit + PAGE_SIZE < base ||
++             limit + PAGE_SIZE < iommu->exclusion_base ||
++             iommu->exclusion_allow_all != all )
++            return -EBUSY;
++
+         if ( iommu->exclusion_base < base )
+             base = iommu->exclusion_base;
+         if ( iommu->exclusion_limit > limit )
+@@ -111,16 +120,11 @@ static void __init reserve_iommu_exclusi
+     }
+ 
+     iommu->exclusion_enable = IOMMU_CONTROL_ENABLED;
++    iommu->exclusion_allow_all = all;
+     iommu->exclusion_base = base;
+     iommu->exclusion_limit = limit;
+-}
+ 
+-static void __init reserve_iommu_exclusion_range_all(
+-    struct amd_iommu *iommu,
+-    unsigned long base, unsigned long limit)
+-{
+-    reserve_iommu_exclusion_range(iommu, base, limit);
+-    iommu->exclusion_allow_all = IOMMU_CONTROL_ENABLED;
++    return 0;
+ }
+ 
+ static void __init reserve_unity_map_for_device(
+@@ -158,6 +162,7 @@ static int __init register_exclusion_ran
+     unsigned long range_top, iommu_top, length;
+     struct amd_iommu *iommu;
+     unsigned int bdf;
++    int rc = 0;
+ 
+     /* is part of exclusion range inside of IOMMU virtual address space? */
+     /* note: 'limit' parameter is assumed to be page-aligned */
+@@ -179,10 +184,15 @@ static int __init register_exclusion_ran
+     if ( limit >= iommu_top )
+     {
+         for_each_amd_iommu( iommu )
+-            reserve_iommu_exclusion_range_all(iommu, base, limit);
++        {
++            rc = reserve_iommu_exclusion_range(iommu, base, limit,
++                                               true /* all */, iw, ir);
++            if ( rc )
++                break;
++        }
+     }
+ 
+-    return 0;
++    return rc;
+ }
+ 
+ static int __init register_exclusion_range_for_device(
+@@ -193,6 +203,7 @@ static int __init register_exclusion_ran
+     unsigned long range_top, iommu_top, length;
+     struct amd_iommu *iommu;
+     u16 req;
++    int rc = 0;
+ 
+     iommu = find_iommu_for_device(seg, bdf);
+     if ( !iommu )
+@@ -222,12 +233,13 @@ static int __init register_exclusion_ran
+     /* register IOMMU exclusion range settings for device */
+     if ( limit >= iommu_top  )
+     {
+-        reserve_iommu_exclusion_range(iommu, base, limit);
++        rc = reserve_iommu_exclusion_range(iommu, base, limit,
++                                           false /* all */, iw, ir);
+         ivrs_mappings[bdf].dte_allow_exclusion = IOMMU_CONTROL_ENABLED;
+         ivrs_mappings[req].dte_allow_exclusion = IOMMU_CONTROL_ENABLED;
+     }
+ 
+-    return 0;
++    return rc;
+ }
+ 
+ static int __init register_exclusion_range_for_iommu_devices(
+@@ -237,6 +249,7 @@ static int __init register_exclusion_ran
+     unsigned long range_top, iommu_top, length;
+     unsigned int bdf;
+     u16 req;
++    int rc = 0;
+ 
+     /* is part of exclusion range inside of IOMMU virtual address space? */
+     /* note: 'limit' parameter is assumed to be page-aligned */
+@@ -267,8 +280,10 @@ static int __init register_exclusion_ran
+ 
+     /* register IOMMU exclusion range settings */
+     if ( limit >= iommu_top )
+-        reserve_iommu_exclusion_range_all(iommu, base, limit);
+-    return 0;
++        rc = reserve_iommu_exclusion_range(iommu, base, limit,
++                                           true /* all */, iw, ir);
++
++    return rc;
+ }
+ 
+ static int __init parse_ivmd_device_select(
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-2.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,223 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: correct device unity map handling
+
+Blindly assuming all addresses between any two such ranges, specified by
+firmware in the ACPI tables, should also be unity-mapped can't be right.
+Nor can it be correct to merge ranges with differing permissions. Track
+ranges individually; don't merge at all, but check for overlaps instead.
+This requires bubbling up error indicators, such that IOMMU init can be
+failed when allocation of a new tracking struct wasn't possible, or an
+overlap was detected.
+
+At this occasion also stop ignoring
+amd_iommu_reserve_domain_unity_map()'s return value.
+
+This is part of XSA-378 / CVE-2021-28695.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: George Dunlap <george.dunlap@citrix.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_acpi.c
++++ b/xen/drivers/passthrough/amd/iommu_acpi.c
+@@ -127,32 +127,48 @@ static int __init reserve_iommu_exclusio
+     return 0;
+ }
+ 
+-static void __init reserve_unity_map_for_device(
+-    u16 seg, u16 bdf, unsigned long base,
+-    unsigned long length, u8 iw, u8 ir)
++static int __init reserve_unity_map_for_device(
++    uint16_t seg, uint16_t bdf, unsigned long base,
++    unsigned long length, bool iw, bool ir)
+ {
+     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
+-    unsigned long old_top, new_top;
++    struct ivrs_unity_map *unity_map = ivrs_mappings[bdf].unity_map;
+ 
+-    /* need to extend unity-mapped range? */
+-    if ( ivrs_mappings[bdf].unity_map_enable )
++    /* Check for overlaps. */
++    for ( ; unity_map; unity_map = unity_map->next )
+     {
+-        old_top = ivrs_mappings[bdf].addr_range_start +
+-            ivrs_mappings[bdf].addr_range_length;
+-        new_top = base + length;
+-        if ( old_top > new_top )
+-            new_top = old_top;
+-        if ( ivrs_mappings[bdf].addr_range_start < base )
+-            base = ivrs_mappings[bdf].addr_range_start;
+-        length = new_top - base;
+-    }
+-
+-    /* extend r/w permissioms and keep aggregate */
+-    ivrs_mappings[bdf].write_permission = iw;
+-    ivrs_mappings[bdf].read_permission = ir;
+-    ivrs_mappings[bdf].unity_map_enable = IOMMU_CONTROL_ENABLED;
+-    ivrs_mappings[bdf].addr_range_start = base;
+-    ivrs_mappings[bdf].addr_range_length = length;
++        /*
++         * Exact matches are okay. This can in particular happen when
++         * register_exclusion_range_for_device() calls here twice for the
++         * same (s,b,d,f).
++         */
++        if ( base == unity_map->addr && length == unity_map->length &&
++             ir == unity_map->read && iw == unity_map->write )
++            return 0;
++
++        if ( unity_map->addr + unity_map->length > base &&
++             base + length > unity_map->addr )
++        {
++            AMD_IOMMU_DEBUG("IVMD Error: overlap [%lx,%lx) vs [%lx,%lx)\n",
++                            base, base + length, unity_map->addr,
++                            unity_map->addr + unity_map->length);
++            return -EPERM;
++        }
++    }
++
++    /* Populate and insert a new unity map. */
++    unity_map = xmalloc(struct ivrs_unity_map);
++    if ( !unity_map )
++        return -ENOMEM;
++
++    unity_map->read = ir;
++    unity_map->write = iw;
++    unity_map->addr = base;
++    unity_map->length = length;
++    unity_map->next = ivrs_mappings[bdf].unity_map;
++    ivrs_mappings[bdf].unity_map = unity_map;
++
++    return 0;
+ }
+ 
+ static int __init register_exclusion_range_for_all_devices(
+@@ -175,13 +191,13 @@ static int __init register_exclusion_ran
+         length = range_top - base;
+         /* reserve r/w unity-mapped page entries for devices */
+         /* note: these entries are part of the exclusion range */
+-        for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
+-            reserve_unity_map_for_device(seg, bdf, base, length, iw, ir);
++        for ( bdf = 0; !rc && bdf < ivrs_bdf_entries; bdf++ )
++            rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir);
+         /* push 'base' just outside of virtual address space */
+         base = iommu_top;
+     }
+     /* register IOMMU exclusion range settings */
+-    if ( limit >= iommu_top )
++    if ( !rc && limit >= iommu_top )
+     {
+         for_each_amd_iommu( iommu )
+         {
+@@ -223,15 +239,15 @@ static int __init register_exclusion_ran
+         length = range_top - base;
+         /* reserve unity-mapped page entries for device */
+         /* note: these entries are part of the exclusion range */
+-        reserve_unity_map_for_device(seg, bdf, base, length, iw, ir);
+-        reserve_unity_map_for_device(seg, req, base, length, iw, ir);
++        rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir) ?:
++             reserve_unity_map_for_device(seg, req, base, length, iw, ir);
+ 
+         /* push 'base' just outside of virtual address space */
+         base = iommu_top;
+     }
+ 
+     /* register IOMMU exclusion range settings for device */
+-    if ( limit >= iommu_top  )
++    if ( !rc && limit >= iommu_top  )
+     {
+         rc = reserve_iommu_exclusion_range(iommu, base, limit,
+                                            false /* all */, iw, ir);
+@@ -262,15 +278,15 @@ static int __init register_exclusion_ran
+         length = range_top - base;
+         /* reserve r/w unity-mapped page entries for devices */
+         /* note: these entries are part of the exclusion range */
+-        for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
++        for ( bdf = 0; !rc && bdf < ivrs_bdf_entries; bdf++ )
+         {
+             if ( iommu == find_iommu_for_device(iommu->seg, bdf) )
+             {
+-                reserve_unity_map_for_device(iommu->seg, bdf, base, length,
+-                                             iw, ir);
+                 req = get_ivrs_mappings(iommu->seg)[bdf].dte_requestor_id;
+-                reserve_unity_map_for_device(iommu->seg, req, base, length,
+-                                             iw, ir);
++                rc = reserve_unity_map_for_device(iommu->seg, bdf, base, length,
++                                                  iw, ir) ?:
++                     reserve_unity_map_for_device(iommu->seg, req, base, length,
++                                                  iw, ir);
+             }
+         }
+ 
+@@ -279,7 +295,7 @@ static int __init register_exclusion_ran
+     }
+ 
+     /* register IOMMU exclusion range settings */
+-    if ( limit >= iommu_top )
++    if ( !rc && limit >= iommu_top )
+         rc = reserve_iommu_exclusion_range(iommu, base, limit,
+                                            true /* all */, iw, ir);
+ 
+--- a/xen/drivers/passthrough/amd/iommu_init.c
++++ b/xen/drivers/passthrough/amd/iommu_init.c
+@@ -1187,7 +1187,6 @@ static int __init alloc_ivrs_mappings(u1
+     {
+         ivrs_mappings[bdf].dte_requestor_id = bdf;
+         ivrs_mappings[bdf].dte_allow_exclusion = IOMMU_CONTROL_DISABLED;
+-        ivrs_mappings[bdf].unity_map_enable = IOMMU_CONTROL_DISABLED;
+         ivrs_mappings[bdf].iommu = NULL;
+ 
+         ivrs_mappings[bdf].intremap_table = NULL;
+--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
++++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+@@ -372,15 +372,17 @@ static int amd_iommu_assign_device(struc
+     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
+     int bdf = PCI_BDF2(pdev->bus, devfn);
+     int req_id = get_dma_requestor_id(pdev->seg, bdf);
++    const struct ivrs_unity_map *unity_map;
+ 
+-    if ( ivrs_mappings[req_id].unity_map_enable )
++    for ( unity_map = ivrs_mappings[req_id].unity_map; unity_map;
++          unity_map = unity_map->next )
+     {
+-        amd_iommu_reserve_domain_unity_map(
+-            d,
+-            ivrs_mappings[req_id].addr_range_start,
+-            ivrs_mappings[req_id].addr_range_length,
+-            ivrs_mappings[req_id].write_permission,
+-            ivrs_mappings[req_id].read_permission);
++        int rc = amd_iommu_reserve_domain_unity_map(
++                     d, unity_map->addr, unity_map->length,
++                     unity_map->write, unity_map->read);
++
++        if ( rc )
++            return rc;
+     }
+ 
+     return reassign_device(pdev->domain, d, devfn, pdev);
+--- a/xen/include/asm-x86/amd-iommu.h
++++ b/xen/include/asm-x86/amd-iommu.h
+@@ -108,15 +108,19 @@ struct amd_iommu {
+     struct list_head ats_devices;
+ };
+ 
++struct ivrs_unity_map {
++    bool read:1;
++    bool write:1;
++    paddr_t addr;
++    unsigned long length;
++    struct ivrs_unity_map *next;
++};
++
+ struct ivrs_mappings {
+     u16 dte_requestor_id;
+     u8 dte_allow_exclusion;
+-    u8 unity_map_enable;
+-    u8 write_permission;
+-    u8 read_permission;
+-    unsigned long addr_range_start;
+-    unsigned long addr_range_length;
+     struct amd_iommu *iommu;
++    struct ivrs_unity_map *unity_map;
+ 
+     /* per device interrupt remapping table */
+     void *intremap_table;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-3.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-3.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-3.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-3.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,102 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: IOMMU: also pass p2m_access_t to p2m_get_iommu_flags()
+
+A subsequent change will want to customize the IOMMU permissions based
+on this.
+
+This is part of XSA-378.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/arch/x86/mm/p2m-ept.c
++++ b/xen/arch/x86/mm/p2m-ept.c
+@@ -711,7 +711,7 @@ ept_set_entry(struct p2m_domain *p2m, gf
+     uint8_t ipat = 0;
+     bool_t need_modify_vtd_table = 1;
+     bool_t vtd_pte_present = 0;
+-    unsigned int iommu_flags = p2m_get_iommu_flags(p2mt, mfn);
++    unsigned int iommu_flags = p2m_get_iommu_flags(p2mt, p2ma, mfn);
+     bool_t needs_sync = 1;
+     ept_entry_t old_entry = { .epte = 0 };
+     ept_entry_t new_entry = { .epte = 0 };
+@@ -837,8 +837,8 @@ ept_set_entry(struct p2m_domain *p2m, gf
+ 
+         /* Safe to read-then-write because we hold the p2m lock */
+         if ( ept_entry->mfn == new_entry.mfn &&
+-             p2m_get_iommu_flags(ept_entry->sa_p2mt, _mfn(ept_entry->mfn)) ==
+-             iommu_flags )
++             p2m_get_iommu_flags(ept_entry->sa_p2mt, ept_entry->access,
++                                 _mfn(ept_entry->mfn)) == iommu_flags )
+             need_modify_vtd_table = 0;
+ 
+         ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
+--- a/xen/arch/x86/mm/p2m-pt.c
++++ b/xen/arch/x86/mm/p2m-pt.c
+@@ -471,6 +471,16 @@ int p2m_pt_handle_deferred_changes(uint6
+     return rc;
+ }
+ 
++/* Reconstruct a fake p2m_access_t from stored PTE flags. */
++static p2m_access_t p2m_flags_to_access(unsigned int flags)
++{
++    if ( flags & _PAGE_PRESENT )
++        return p2m_access_n;
++
++    /* No need to look at _PAGE_NX for now. */
++    return flags & _PAGE_RW ? p2m_access_rw : p2m_access_r;
++}
++
+ /* Returns: 0 for success, -errno for failure */
+ static int
+ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
+@@ -487,7 +497,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
+     l2_pgentry_t l2e_content;
+     l3_pgentry_t l3e_content;
+     int rc;
+-    unsigned int iommu_pte_flags = p2m_get_iommu_flags(p2mt, mfn);
++    unsigned int iommu_pte_flags = p2m_get_iommu_flags(p2mt, p2ma, mfn);
+     /*
+      * old_mfn and iommu_old_flags control possible flush/update needs on the
+      * IOMMU: We need to flush when MFN or flags (i.e. permissions) change.
+@@ -556,6 +566,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
+                 old_mfn = l1e_get_pfn(*p2m_entry);
+                 iommu_old_flags =
+                     p2m_get_iommu_flags(p2m_flags_to_type(flags),
++                                        p2m_flags_to_access(flags),
+                                         _mfn(old_mfn));
+             }
+             else
+@@ -602,9 +613,10 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
+                                    0, L1_PAGETABLE_ENTRIES);
+         ASSERT(p2m_entry);
+         old_mfn = l1e_get_pfn(*p2m_entry);
++        flags = l1e_get_flags(*p2m_entry);
+         iommu_old_flags =
+-            p2m_get_iommu_flags(p2m_flags_to_type(l1e_get_flags(*p2m_entry)),
+-                                _mfn(old_mfn));
++            p2m_get_iommu_flags(p2m_flags_to_type(flags),
++                                p2m_flags_to_access(flags), _mfn(old_mfn));
+ 
+         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
+             entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
+@@ -648,6 +660,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
+                 old_mfn = l1e_get_pfn(*p2m_entry);
+                 iommu_old_flags =
+                     p2m_get_iommu_flags(p2m_flags_to_type(flags),
++                                        p2m_flags_to_access(flags),
+                                         _mfn(old_mfn));
+             }
+             else
+--- a/xen/include/asm-x86/p2m.h
++++ b/xen/include/asm-x86/p2m.h
+@@ -839,7 +839,8 @@ int p2m_altp2m_propagate_change(struct d
+ /*
+  * p2m type to IOMMU flags
+  */
+-static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
++static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt,
++                                               p2m_access_t p2ma, mfn_t mfn)
+ {
+     unsigned int flags;
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-4.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-4.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-4.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-4.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,400 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: IOMMU: generalize VT-d's tracking of mapped RMRR regions
+
+In order to re-use it elsewhere, move the logic to vendor independent
+code and strip it of RMRR specifics.
+
+Note that the prior "map" parameter gets folded into the new "p2ma" one
+(which AMD IOMMU code will want to make use of), assigning alternative
+meaning ("unmap") to p2m_access_x. Prepare set_identity_p2m_entry() and
+p2m_get_iommu_flags() for getting passed access types other than
+p2m_access_rw (in the latter case just for p2m_mmio_direct requests).
+
+Note also that, to be on the safe side, an overlap check gets added to
+the main loop of iommu_identity_mapping().
+
+This is part of XSA-378.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -1157,7 +1157,8 @@ int set_identity_p2m_entry(struct domain
+     {
+         if ( !need_iommu(d) )
+             return 0;
+-        return iommu_map_page(d, gfn_l, gfn_l, IOMMUF_readable|IOMMUF_writable);
++        return iommu_map_page(d, gfn_l, gfn_l,
++                              p2m_access_to_iommu_flags(p2ma));
+     }
+ 
+     gfn_lock(p2m, gfn, 0);
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -42,12 +42,6 @@
+ #include "vtd.h"
+ #include "../ats.h"
+ 
+-struct mapped_rmrr {
+-    struct list_head list;
+-    u64 base, end;
+-    unsigned int count;
+-};
+-
+ /* Possible unfiltered LAPIC/MSI messages from untrusted sources? */
+ bool __read_mostly untrusted_msi;
+ 
+@@ -1785,16 +1779,11 @@ out:
+ static void iommu_domain_teardown(struct domain *d)
+ {
+     struct domain_iommu *hd = dom_iommu(d);
+-    struct mapped_rmrr *mrmrr, *tmp;
+ 
+     if ( list_empty(&acpi_drhd_units) )
+         return;
+ 
+-    list_for_each_entry_safe ( mrmrr, tmp, &hd->arch.mapped_rmrrs, list )
+-    {
+-        list_del(&mrmrr->list);
+-        xfree(mrmrr);
+-    }
++    iommu_identity_map_teardown(d);
+ 
+     if ( iommu_use_hap_pt(d) )
+         return;
+@@ -1903,74 +1892,6 @@ static void iommu_set_pgd(struct domain
+         pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
+ }
+ 
+-static int rmrr_identity_mapping(struct domain *d, bool_t map,
+-                                 const struct acpi_rmrr_unit *rmrr,
+-                                 u32 flag)
+-{
+-    unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
+-    unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >> PAGE_SHIFT_4K;
+-    struct mapped_rmrr *mrmrr;
+-    struct domain_iommu *hd = dom_iommu(d);
+-
+-    ASSERT(pcidevs_locked());
+-    ASSERT(rmrr->base_address < rmrr->end_address);
+-
+-    /*
+-     * No need to acquire hd->arch.mapping_lock: Both insertion and removal
+-     * get done while holding pcidevs_lock.
+-     */
+-    list_for_each_entry( mrmrr, &hd->arch.mapped_rmrrs, list )
+-    {
+-        if ( mrmrr->base == rmrr->base_address &&
+-             mrmrr->end == rmrr->end_address )
+-        {
+-            int ret = 0;
+-
+-            if ( map )
+-            {
+-                ++mrmrr->count;
+-                return 0;
+-            }
+-
+-            if ( --mrmrr->count )
+-                return 0;
+-
+-            while ( base_pfn < end_pfn )
+-            {
+-                if ( clear_identity_p2m_entry(d, base_pfn) )
+-                    ret = -ENXIO;
+-                base_pfn++;
+-            }
+-
+-            list_del(&mrmrr->list);
+-            xfree(mrmrr);
+-            return ret;
+-        }
+-    }
+-
+-    if ( !map )
+-        return -ENOENT;
+-
+-    while ( base_pfn < end_pfn )
+-    {
+-        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
+-
+-        if ( err )
+-            return err;
+-        base_pfn++;
+-    }
+-
+-    mrmrr = xmalloc(struct mapped_rmrr);
+-    if ( !mrmrr )
+-        return -ENOMEM;
+-    mrmrr->base = rmrr->base_address;
+-    mrmrr->end = rmrr->end_address;
+-    mrmrr->count = 1;
+-    list_add_tail(&mrmrr->list, &hd->arch.mapped_rmrrs);
+-
+-    return 0;
+-}
+-
+ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
+ {
+     struct acpi_rmrr_unit *rmrr;
+@@ -2002,7 +1923,9 @@ static int intel_iommu_add_device(u8 dev
+              * Since RMRRs are always reserved in the e820 map for the hardware
+              * domain, there shouldn't be a conflict.
+              */
+-            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr, 0);
++            ret = iommu_identity_mapping(pdev->domain, p2m_access_rw,
++                                         rmrr->base_address, rmrr->end_address,
++                                         0);
+             if ( ret )
+                 dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
+                         pdev->domain->domain_id);
+@@ -2047,7 +1970,8 @@ static int intel_iommu_remove_device(u8
+          * Any flag is nothing to clear these mappings but here
+          * its always safe and strict to set 0.
+          */
+-        rmrr_identity_mapping(pdev->domain, 0, rmrr, 0);
++        iommu_identity_mapping(pdev->domain, p2m_access_x, rmrr->base_address,
++                               rmrr->end_address, 0);
+     }
+ 
+     return domain_context_unmap(pdev->domain, devfn, pdev);
+@@ -2214,7 +2138,8 @@ static void __hwdom_init setup_hwdom_rmr
+          * domain, there shouldn't be a conflict. So its always safe and
+          * strict to set 0.
+          */
+-        ret = rmrr_identity_mapping(d, 1, rmrr, 0);
++        ret = iommu_identity_mapping(d, p2m_access_rw, rmrr->base_address,
++                                     rmrr->end_address, 0);
+         if ( ret )
+             dprintk(XENLOG_ERR VTDPREFIX,
+                      "IOMMU: mapping reserved region failed\n");
+@@ -2371,7 +2296,9 @@ static int reassign_device_ownership(
+                  * Any RMRR flag is always ignored when remove a device,
+                  * but its always safe and strict to set 0.
+                  */
+-                ret = rmrr_identity_mapping(source, 0, rmrr, 0);
++                ret = iommu_identity_mapping(source, p2m_access_x,
++                                             rmrr->base_address,
++                                             rmrr->end_address, 0);
+                 if ( ret != -ENOENT )
+                     return ret;
+             }
+@@ -2468,7 +2395,8 @@ static int intel_iommu_assign_device(
+              PCI_BUS(bdf) == bus &&
+              PCI_DEVFN2(bdf) == devfn )
+         {
+-            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
++            ret = iommu_identity_mapping(d, p2m_access_rw, rmrr->base_address,
++                                         rmrr->end_address, flag);
+             if ( ret )
+             {
+                 int rc;
+--- a/xen/drivers/passthrough/x86/iommu.c
++++ b/xen/drivers/passthrough/x86/iommu.c
+@@ -144,7 +144,7 @@ int arch_iommu_domain_init(struct domain
+     struct domain_iommu *hd = dom_iommu(d);
+ 
+     spin_lock_init(&hd->arch.mapping_lock);
+-    INIT_LIST_HEAD(&hd->arch.mapped_rmrrs);
++    INIT_LIST_HEAD(&hd->arch.identity_maps);
+ 
+     return 0;
+ }
+@@ -153,6 +153,99 @@ void arch_iommu_domain_destroy(struct do
+ {
+ }
+ 
++struct identity_map {
++    struct list_head list;
++    paddr_t base, end;
++    p2m_access_t access;
++    unsigned int count;
++};
++
++int iommu_identity_mapping(struct domain *d, p2m_access_t p2ma,
++                           paddr_t base, paddr_t end,
++                           unsigned int flag)
++{
++    unsigned long base_pfn = base >> PAGE_SHIFT_4K;
++    unsigned long end_pfn = PAGE_ALIGN_4K(end) >> PAGE_SHIFT_4K;
++    struct identity_map *map;
++    struct domain_iommu *hd = dom_iommu(d);
++
++    ASSERT(pcidevs_locked());
++    ASSERT(base < end);
++
++    /*
++     * No need to acquire hd->arch.mapping_lock: Both insertion and removal
++     * get done while holding pcidevs_lock.
++     */
++    list_for_each_entry( map, &hd->arch.identity_maps, list )
++    {
++        if ( map->base == base && map->end == end )
++        {
++            int ret = 0;
++
++            if ( p2ma != p2m_access_x )
++            {
++                if ( map->access != p2ma )
++                    return -EADDRINUSE;
++                ++map->count;
++                return 0;
++            }
++
++            if ( --map->count )
++                return 0;
++
++            while ( base_pfn < end_pfn )
++            {
++                if ( clear_identity_p2m_entry(d, base_pfn) )
++                    ret = -ENXIO;
++                base_pfn++;
++            }
++
++            list_del(&map->list);
++            xfree(map);
++
++            return ret;
++        }
++
++        if ( end >= map->base && map->end >= base )
++            return -EADDRINUSE;
++    }
++
++    if ( p2ma == p2m_access_x )
++        return -ENOENT;
++
++    while ( base_pfn < end_pfn )
++    {
++        int err = set_identity_p2m_entry(d, base_pfn, p2ma, flag);
++
++        if ( err )
++            return err;
++        base_pfn++;
++    }
++
++    map = xmalloc(struct identity_map);
++    if ( !map )
++        return -ENOMEM;
++    map->base = base;
++    map->end = end;
++    map->access = p2ma;
++    map->count = 1;
++    list_add_tail(&map->list, &hd->arch.identity_maps);
++
++    return 0;
++}
++
++void iommu_identity_map_teardown(struct domain *d)
++{
++    struct domain_iommu *hd = dom_iommu(d);
++    struct identity_map *map, *tmp;
++
++    list_for_each_entry_safe ( map, tmp, &hd->arch.identity_maps, list )
++    {
++        list_del(&map->list);
++        xfree(map);
++    }
++}
++
+ /*
+  * Local variables:
+  * mode: C
+--- a/xen/include/asm-x86/iommu.h
++++ b/xen/include/asm-x86/iommu.h
+@@ -16,6 +16,7 @@
+ 
+ #include <xen/errno.h>
+ #include <xen/list.h>
++#include <xen/mem_access.h>
+ #include <xen/spinlock.h>
+ #include <asm/processor.h>
+ #include <asm/hvm/vmx/vmcs.h>
+@@ -36,7 +37,7 @@ struct arch_iommu
+     spinlock_t mapping_lock;            /* io page table lock */
+     int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
+     u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
+-    struct list_head mapped_rmrrs;
++    struct list_head identity_maps;
+ 
+     /* amd iommu support */
+     int paging_mode;
+@@ -94,6 +95,11 @@ bool_t iommu_supports_eim(void);
+ int iommu_enable_x2apic_IR(void);
+ void iommu_disable_x2apic_IR(void);
+ 
++int iommu_identity_mapping(struct domain *d, p2m_access_t p2ma,
++                           paddr_t base, paddr_t end,
++                           unsigned int flag);
++void iommu_identity_map_teardown(struct domain *d);
++
+ extern bool untrusted_msi;
+ 
+ int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
+--- a/xen/include/asm-x86/mem_access.h
++++ b/xen/include/asm-x86/mem_access.h
+@@ -44,10 +44,8 @@ bool p2m_mem_access_emulate_check(struct
+                                   const vm_event_response_t *rsp);
+ 
+ /* Sanity check for mem_access hardware support */
+-static inline bool p2m_mem_access_sanity_check(struct domain *d)
+-{
+-    return is_hvm_domain(d) && cpu_has_vmx && hap_enabled(d);
+-}
++#define p2m_mem_access_sanity_check(d) \
++    (is_hvm_domain(d) && cpu_has_vmx && hap_enabled(d))
+ 
+ #endif /*__ASM_X86_MEM_ACCESS_H__ */
+ 
+--- a/xen/include/asm-x86/p2m.h
++++ b/xen/include/asm-x86/p2m.h
+@@ -836,6 +836,34 @@ int p2m_altp2m_propagate_change(struct d
+                                 mfn_t mfn, unsigned int page_order,
+                                 p2m_type_t p2mt, p2m_access_t p2ma);
+ 
++/* p2m access to IOMMU flags */
++static inline unsigned int p2m_access_to_iommu_flags(p2m_access_t p2ma)
++{
++    switch ( p2ma )
++    {
++    case p2m_access_rw:
++    case p2m_access_rwx:
++        return IOMMUF_readable | IOMMUF_writable;
++
++    case p2m_access_r:
++    case p2m_access_rx:
++    case p2m_access_rx2rw:
++        return IOMMUF_readable;
++
++    case p2m_access_w:
++    case p2m_access_wx:
++        return IOMMUF_writable;
++
++    case p2m_access_n:
++    case p2m_access_x:
++    case p2m_access_n2rwx:
++        return 0;
++    }
++
++    ASSERT_UNREACHABLE();
++    return 0;
++}
++
+ /*
+  * p2m type to IOMMU flags
+  */
+@@ -857,9 +885,10 @@ static inline unsigned int p2m_get_iommu
+         flags = IOMMUF_readable;
+         break;
+     case p2m_mmio_direct:
+-        flags = IOMMUF_readable;
+-        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
+-            flags |= IOMMUF_writable;
++        flags = p2m_access_to_iommu_flags(p2ma);
++        if ( (flags & IOMMUF_writable) &&
++             rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
++            flags &= ~IOMMUF_writable;
+         break;
+     default:
+         flags = 0;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-5.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-5.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-5.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-5.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,200 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: re-arrange/complete re-assignment handling
+
+Prior to the assignment step having completed successfully, devices
+should not get associated with their new owner. Hand the device to DomIO
+(perhaps temporarily), until after the de-assignment step has completed.
+
+De-assignment of a device (from other than Dom0) as well as failure of
+reassign_device() during assignment should result in unity mappings
+getting torn down. This in turn requires switching to a refcounted
+mapping approach, as was already used by VT-d for its RMRRs, to prevent
+unmapping a region used by multiple devices.
+
+This is CVE-2021-28696 / part of XSA-378.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -716,27 +716,49 @@ int amd_iommu_unmap_page(struct domain *
+     return 0;
+ }
+ 
+-int amd_iommu_reserve_domain_unity_map(struct domain *domain,
+-                                       u64 phys_addr,
+-                                       unsigned long size, int iw, int ir)
++int amd_iommu_reserve_domain_unity_map(struct domain *d,
++                                       const struct ivrs_unity_map *map,
++                                       unsigned int flag)
+ {
+-    unsigned long npages, i;
+-    unsigned long gfn;
+-    unsigned int flags = !!ir;
+-    int rt = 0;
+-
+-    if ( iw )
+-        flags |= IOMMUF_writable;
+-
+-    npages = region_to_pages(phys_addr, size);
+-    gfn = phys_addr >> PAGE_SHIFT;
+-    for ( i = 0; i < npages; i++ )
++    int rc;
++
++    if ( d == dom_io )
++        return 0;
++
++    for ( rc = 0; !rc && map; map = map->next )
+     {
+-        rt = amd_iommu_map_page(domain, gfn +i, gfn +i, flags);
+-        if ( rt != 0 )
+-            return rt;
++        p2m_access_t p2ma = p2m_access_n;
++
++        if ( map->read )
++            p2ma |= p2m_access_r;
++        if ( map->write )
++            p2ma |= p2m_access_w;
++
++        rc = iommu_identity_mapping(d, p2ma, map->addr,
++                                    map->addr + map->length - 1, flag);
+     }
+-    return 0;
++
++    return rc;
++}
++
++int amd_iommu_reserve_domain_unity_unmap(struct domain *d,
++                                         const struct ivrs_unity_map *map)
++{
++    int rc;
++
++    if ( d == dom_io )
++        return 0;
++
++    for ( rc = 0; map; map = map->next )
++    {
++        int ret = iommu_identity_mapping(d, p2m_access_x, map->addr,
++                                         map->addr + map->length - 1, 0);
++
++        if ( ret && ret != -ENOENT && !rc )
++            rc = ret;
++    }
++
++    return rc;
+ }
+ 
+ /* Share p2m table with iommu. */
+--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
++++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+@@ -333,6 +333,7 @@ static int reassign_device(struct domain
+     struct amd_iommu *iommu;
+     int bdf, rc;
+     struct domain_iommu *t = dom_iommu(target);
++    const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
+ 
+     bdf = PCI_BDF2(pdev->bus, pdev->devfn);
+     iommu = find_iommu_for_device(pdev->seg, bdf);
+@@ -347,10 +348,24 @@ static int reassign_device(struct domain
+ 
+     amd_iommu_disable_domain_device(source, iommu, devfn, pdev);
+ 
+-    if ( devfn == pdev->devfn )
++    /*
++     * If the device belongs to the hardware domain, and it has a unity mapping,
++     * don't remove it from the hardware domain, because BIOS may reference that
++     * mapping.
++     */
++    if ( !is_hardware_domain(source) )
++    {
++        rc = amd_iommu_reserve_domain_unity_unmap(
++                 source,
++                 ivrs_mappings[get_dma_requestor_id(pdev->seg, bdf)].unity_map);
++        if ( rc )
++            return rc;
++    }
++
++    if ( devfn == pdev->devfn && pdev->domain != dom_io )
+     {
+-        list_move(&pdev->domain_list, &target->arch.pdev_list);
+-        pdev->domain = target;
++        list_move(&pdev->domain_list, &dom_io->arch.pdev_list);
++        pdev->domain = dom_io;
+     }
+ 
+     rc = allocate_domain_resources(t);
+@@ -362,6 +377,12 @@ static int reassign_device(struct domain
+                     pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                     source->domain_id, target->domain_id);
+ 
++    if ( devfn == pdev->devfn && pdev->domain != target )
++    {
++        list_move(&pdev->domain_list, &target->arch.pdev_list);
++        pdev->domain = target;
++    }
++
+     return 0;
+ }
+ 
+@@ -372,20 +393,28 @@ static int amd_iommu_assign_device(struc
+     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
+     int bdf = PCI_BDF2(pdev->bus, devfn);
+     int req_id = get_dma_requestor_id(pdev->seg, bdf);
+-    const struct ivrs_unity_map *unity_map;
++    int rc = amd_iommu_reserve_domain_unity_map(
++                 d, ivrs_mappings[req_id].unity_map, flag);
++
++    if ( !rc )
++        rc = reassign_device(pdev->domain, d, devfn, pdev);
+ 
+-    for ( unity_map = ivrs_mappings[req_id].unity_map; unity_map;
+-          unity_map = unity_map->next )
++    if ( rc && !is_hardware_domain(d) )
+     {
+-        int rc = amd_iommu_reserve_domain_unity_map(
+-                     d, unity_map->addr, unity_map->length,
+-                     unity_map->write, unity_map->read);
++        int ret = amd_iommu_reserve_domain_unity_unmap(
++                      d, ivrs_mappings[req_id].unity_map);
+ 
+-        if ( rc )
+-            return rc;
++        if ( ret )
++        {
++            printk(XENLOG_ERR "AMD-Vi: "
++                   "unity-unmap for d%d/%04x:%02x:%02x.%u failed (%d)\n",
++                   d->domain_id, pdev->seg, pdev->bus,
++                   PCI_SLOT(devfn), PCI_FUNC(devfn), ret);
++            domain_crash(d);
++        }
+     }
+ 
+-    return reassign_device(pdev->domain, d, devfn, pdev);
++    return rc;
+ }
+ 
+ static void deallocate_next_page_table(struct page_info *pg, int level)
+@@ -451,6 +480,7 @@ static void deallocate_iommu_page_tables
+ 
+ static void amd_iommu_domain_destroy(struct domain *d)
+ {
++    iommu_identity_map_teardown(d);
+     deallocate_iommu_page_tables(d);
+     amd_iommu_flush_all_pages(d);
+ }
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+@@ -60,8 +60,10 @@ int __must_check amd_iommu_unmap_page(st
+ u64 amd_iommu_get_next_table_from_pte(u32 *entry);
+ int __must_check amd_iommu_alloc_root(struct domain_iommu *hd);
+ int amd_iommu_reserve_domain_unity_map(struct domain *domain,
+-                                       u64 phys_addr, unsigned long size,
+-                                       int iw, int ir);
++                                       const struct ivrs_unity_map *map,
++                                       unsigned int flag);
++int amd_iommu_reserve_domain_unity_unmap(struct domain *d,
++                                         const struct ivrs_unity_map *map);
+ 
+ /* Share p2m table with iommu */
+ void amd_iommu_share_p2m(struct domain *d);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-6.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-6.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-6.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-6.patch	2022-06-01 21:13:47.000000000 +0100
@@ -0,0 +1,411 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: re-arrange exclusion range and unity map recording
+
+The spec makes no provisions for OS behavior here to depend on the
+amount of RAM found on the system. While the spec may not sufficiently
+clearly distinguish both kinds of regions, they are surely meant to be
+separate things: Only regions with ACPI_IVMD_EXCLUSION_RANGE set should
+be candidates for putting in the exclusion range registers. (As there's
+only a single such pair of registers per IOMMU, secondary non-adjacent
+regions with the flag set already get converted to unity mapped
+regions.)
+
+First of all, drop the dependency on max_page. With commit b4f042236ae0
+("AMD/IOMMU: Cease using a dynamic height for the IOMMU pagetables") the
+use of it here was stale anyway; it was bogus already before, as it
+didn't account for max_page getting increased later on. Simply try an
+exclusion range registration first, and if it fails (for being
+unsuitable or non-mergeable), register a unity mapping range.
+
+With this various local variables become unnecessary and hence get
+dropped at the same time.
+
+With the max_page boundary dropped for using unity maps, the minimum
+page table tree height now needs both recording and enforcing in
+amd_iommu_domain_init(). Since we can't predict which devices may get
+assigned to a domain, our only option is to uniformly force at least
+that height for all domains, now that the height isn't dynamic anymore.
+
+Further don't make use of the exclusion range unless ACPI data says so.
+
+Note that exclusion range registration in
+register_range_for_all_devices() is on a best effort basis. Hence unity
+map entries also registered are redundant when the former succeeded, but
+they also do no harm. Improvements in this area can be done later imo.
+
+Also adjust types where suitable without touching extra lines.
+
+This is part of XSA-378.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/drivers/passthrough/amd/iommu_acpi.c
++++ b/xen/drivers/passthrough/amd/iommu_acpi.c
+@@ -99,12 +99,8 @@ static struct amd_iommu * __init find_io
+ }
+ 
+ static int __init reserve_iommu_exclusion_range(
+-    struct amd_iommu *iommu, uint64_t base, uint64_t limit,
+-    bool all, bool iw, bool ir)
++    struct amd_iommu *iommu, paddr_t base, paddr_t limit, bool all)
+ {
+-    if ( !ir || !iw )
+-        return -EPERM;
+-
+     /* need to extend exclusion range? */
+     if ( iommu->exclusion_enable )
+     {
+@@ -133,14 +129,18 @@ static int __init reserve_unity_map_for_
+ {
+     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
+     struct ivrs_unity_map *unity_map = ivrs_mappings[bdf].unity_map;
++    int paging_mode = amd_iommu_get_paging_mode(PFN_UP(base + length));
++
++    if ( paging_mode < 0 )
++        return paging_mode;
+ 
+     /* Check for overlaps. */
+     for ( ; unity_map; unity_map = unity_map->next )
+     {
+         /*
+          * Exact matches are okay. This can in particular happen when
+-         * register_exclusion_range_for_device() calls here twice for the
+-         * same (s,b,d,f).
++         * register_range_for_device() calls here twice for the same
++         * (s,b,d,f).
+          */
+         if ( base == unity_map->addr && length == unity_map->length &&
+              ir == unity_map->read && iw == unity_map->write )
+@@ -168,55 +168,52 @@ static int __init reserve_unity_map_for_
+     unity_map->next = ivrs_mappings[bdf].unity_map;
+     ivrs_mappings[bdf].unity_map = unity_map;
+ 
++    if ( paging_mode > amd_iommu_min_paging_mode )
++        amd_iommu_min_paging_mode = paging_mode;
++
+     return 0;
+ }
+ 
+-static int __init register_exclusion_range_for_all_devices(
+-    unsigned long base, unsigned long limit, u8 iw, u8 ir)
++static int __init register_range_for_all_devices(
++    paddr_t base, paddr_t limit, bool iw, bool ir, bool exclusion)
+ {
+     int seg = 0; /* XXX */
+-    unsigned long range_top, iommu_top, length;
+     struct amd_iommu *iommu;
+-    unsigned int bdf;
+     int rc = 0;
+ 
+     /* is part of exclusion range inside of IOMMU virtual address space? */
+     /* note: 'limit' parameter is assumed to be page-aligned */
+-    range_top = limit + PAGE_SIZE;
+-    iommu_top = max_page * PAGE_SIZE;
+-    if ( base < iommu_top )
+-    {
+-        if ( range_top > iommu_top )
+-            range_top = iommu_top;
+-        length = range_top - base;
+-        /* reserve r/w unity-mapped page entries for devices */
+-        /* note: these entries are part of the exclusion range */
+-        for ( bdf = 0; !rc && bdf < ivrs_bdf_entries; bdf++ )
+-            rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir);
+-        /* push 'base' just outside of virtual address space */
+-        base = iommu_top;
+-    }
+-    /* register IOMMU exclusion range settings */
+-    if ( !rc && limit >= iommu_top )
++    if ( exclusion )
+     {
+         for_each_amd_iommu( iommu )
+         {
+-            rc = reserve_iommu_exclusion_range(iommu, base, limit,
+-                                               true /* all */, iw, ir);
+-            if ( rc )
+-                break;
++            int ret = reserve_iommu_exclusion_range(iommu, base, limit,
++                                                    true /* all */);
++
++            if ( ret && !rc )
++                rc = ret;
+         }
+     }
+ 
++    if ( !exclusion || rc )
++    {
++        paddr_t length = limit + PAGE_SIZE - base;
++        unsigned int bdf;
++
++        /* reserve r/w unity-mapped page entries for devices */
++        for ( bdf = rc = 0; !rc && bdf < ivrs_bdf_entries; bdf++ )
++            rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir);
++    }
++
+     return rc;
+ }
+ 
+-static int __init register_exclusion_range_for_device(
+-    u16 bdf, unsigned long base, unsigned long limit, u8 iw, u8 ir)
++static int __init register_range_for_device(
++    unsigned int bdf, paddr_t base, paddr_t limit,
++    bool iw, bool ir, bool exclusion)
+ {
+     int seg = 0; /* XXX */
+     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
+-    unsigned long range_top, iommu_top, length;
+     struct amd_iommu *iommu;
+     u16 req;
+     int rc = 0;
+@@ -230,27 +227,19 @@ static int __init register_exclusion_ran
+     req = ivrs_mappings[bdf].dte_requestor_id;
+ 
+     /* note: 'limit' parameter is assumed to be page-aligned */
+-    range_top = limit + PAGE_SIZE;
+-    iommu_top = max_page * PAGE_SIZE;
+-    if ( base < iommu_top )
+-    {
+-        if ( range_top > iommu_top )
+-            range_top = iommu_top;
+-        length = range_top - base;
++    if ( exclusion )
++        rc = reserve_iommu_exclusion_range(iommu, base, limit,
++                                           false /* all */);
++    if ( !exclusion || rc )
++    {
++        paddr_t length = limit + PAGE_SIZE - base;
++
+         /* reserve unity-mapped page entries for device */
+-        /* note: these entries are part of the exclusion range */
+         rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir) ?:
+              reserve_unity_map_for_device(seg, req, base, length, iw, ir);
+-
+-        /* push 'base' just outside of virtual address space */
+-        base = iommu_top;
+     }
+-
+-    /* register IOMMU exclusion range settings for device */
+-    if ( !rc && limit >= iommu_top  )
++    else
+     {
+-        rc = reserve_iommu_exclusion_range(iommu, base, limit,
+-                                           false /* all */, iw, ir);
+         ivrs_mappings[bdf].dte_allow_exclusion = IOMMU_CONTROL_ENABLED;
+         ivrs_mappings[req].dte_allow_exclusion = IOMMU_CONTROL_ENABLED;
+     }
+@@ -258,53 +247,42 @@ static int __init register_exclusion_ran
+     return rc;
+ }
+ 
+-static int __init register_exclusion_range_for_iommu_devices(
+-    struct amd_iommu *iommu,
+-    unsigned long base, unsigned long limit, u8 iw, u8 ir)
++static int __init register_range_for_iommu_devices(
++    struct amd_iommu *iommu, paddr_t base, paddr_t limit,
++    bool iw, bool ir, bool exclusion)
+ {
+-    unsigned long range_top, iommu_top, length;
++    /* note: 'limit' parameter is assumed to be page-aligned */
++    paddr_t length = limit + PAGE_SIZE - base;
+     unsigned int bdf;
+     u16 req;
+-    int rc = 0;
++    int rc;
+ 
+-    /* is part of exclusion range inside of IOMMU virtual address space? */
+-    /* note: 'limit' parameter is assumed to be page-aligned */
+-    range_top = limit + PAGE_SIZE;
+-    iommu_top = max_page * PAGE_SIZE;
+-    if ( base < iommu_top )
+-    {
+-        if ( range_top > iommu_top )
+-            range_top = iommu_top;
+-        length = range_top - base;
+-        /* reserve r/w unity-mapped page entries for devices */
+-        /* note: these entries are part of the exclusion range */
+-        for ( bdf = 0; !rc && bdf < ivrs_bdf_entries; bdf++ )
+-        {
+-            if ( iommu == find_iommu_for_device(iommu->seg, bdf) )
+-            {
+-                req = get_ivrs_mappings(iommu->seg)[bdf].dte_requestor_id;
+-                rc = reserve_unity_map_for_device(iommu->seg, bdf, base, length,
+-                                                  iw, ir) ?:
+-                     reserve_unity_map_for_device(iommu->seg, req, base, length,
+-                                                  iw, ir);
+-            }
+-        }
+-
+-        /* push 'base' just outside of virtual address space */
+-        base = iommu_top;
++    if ( exclusion )
++    {
++        rc = reserve_iommu_exclusion_range(iommu, base, limit, true /* all */);
++        if ( !rc )
++            return 0;
+     }
+ 
+-    /* register IOMMU exclusion range settings */
+-    if ( !rc && limit >= iommu_top )
+-        rc = reserve_iommu_exclusion_range(iommu, base, limit,
+-                                           true /* all */, iw, ir);
++    /* reserve unity-mapped page entries for devices */
++    for ( bdf = rc = 0; !rc && bdf < ivrs_bdf_entries; bdf++ )
++    {
++        if ( iommu != find_iommu_for_device(iommu->seg, bdf) )
++            continue;
++
++        req = get_ivrs_mappings(iommu->seg)[bdf].dte_requestor_id;
++        rc = reserve_unity_map_for_device(iommu->seg, bdf, base, length,
++                                          iw, ir) ?:
++             reserve_unity_map_for_device(iommu->seg, req, base, length,
++                                          iw, ir);
++    }
+ 
+     return rc;
+ }
+ 
+ static int __init parse_ivmd_device_select(
+     const struct acpi_ivrs_memory *ivmd_block,
+-    unsigned long base, unsigned long limit, u8 iw, u8 ir)
++    paddr_t base, paddr_t limit, bool iw, bool ir, bool exclusion)
+ {
+     u16 bdf;
+ 
+@@ -315,12 +293,12 @@ static int __init parse_ivmd_device_sele
+         return -ENODEV;
+     }
+ 
+-    return register_exclusion_range_for_device(bdf, base, limit, iw, ir);
++    return register_range_for_device(bdf, base, limit, iw, ir, exclusion);
+ }
+ 
+ static int __init parse_ivmd_device_range(
+     const struct acpi_ivrs_memory *ivmd_block,
+-    unsigned long base, unsigned long limit, u8 iw, u8 ir)
++    paddr_t base, paddr_t limit, bool iw, bool ir, bool exclusion)
+ {
+     unsigned int first_bdf, last_bdf, bdf;
+     int error;
+@@ -342,15 +320,15 @@ static int __init parse_ivmd_device_rang
+     }
+ 
+     for ( bdf = first_bdf, error = 0; (bdf <= last_bdf) && !error; bdf++ )
+-        error = register_exclusion_range_for_device(
+-            bdf, base, limit, iw, ir);
++        error = register_range_for_device(
++            bdf, base, limit, iw, ir, exclusion);
+ 
+     return error;
+ }
+ 
+ static int __init parse_ivmd_device_iommu(
+     const struct acpi_ivrs_memory *ivmd_block,
+-    unsigned long base, unsigned long limit, u8 iw, u8 ir)
++    paddr_t base, paddr_t limit, bool iw, bool ir, bool exclusion)
+ {
+     int seg = 0; /* XXX */
+     struct amd_iommu *iommu;
+@@ -365,14 +343,14 @@ static int __init parse_ivmd_device_iomm
+         return -ENODEV;
+     }
+ 
+-    return register_exclusion_range_for_iommu_devices(
+-        iommu, base, limit, iw, ir);
++    return register_range_for_iommu_devices(
++        iommu, base, limit, iw, ir, exclusion);
+ }
+ 
+ static int __init parse_ivmd_block(const struct acpi_ivrs_memory *ivmd_block)
+ {
+     unsigned long start_addr, mem_length, base, limit;
+-    u8 iw, ir;
++    bool iw = true, ir = true, exclusion = false;
+ 
+     if ( ivmd_block->header.length < sizeof(*ivmd_block) )
+     {
+@@ -389,13 +367,11 @@ static int __init parse_ivmd_block(const
+                     ivmd_block->header.type, start_addr, mem_length);
+ 
+     if ( ivmd_block->header.flags & ACPI_IVMD_EXCLUSION_RANGE )
+-        iw = ir = IOMMU_CONTROL_ENABLED;
++        exclusion = true;
+     else if ( ivmd_block->header.flags & ACPI_IVMD_UNITY )
+     {
+-        iw = ivmd_block->header.flags & ACPI_IVMD_READ ?
+-            IOMMU_CONTROL_ENABLED : IOMMU_CONTROL_DISABLED;
+-        ir = ivmd_block->header.flags & ACPI_IVMD_WRITE ?
+-            IOMMU_CONTROL_ENABLED : IOMMU_CONTROL_DISABLED;
++        iw = ivmd_block->header.flags & ACPI_IVMD_READ;
++        ir = ivmd_block->header.flags & ACPI_IVMD_WRITE;
+     }
+     else
+     {
+@@ -406,20 +382,20 @@ static int __init parse_ivmd_block(const
+     switch( ivmd_block->header.type )
+     {
+     case ACPI_IVRS_TYPE_MEMORY_ALL:
+-        return register_exclusion_range_for_all_devices(
+-            base, limit, iw, ir);
++        return register_range_for_all_devices(
++            base, limit, iw, ir, exclusion);
+ 
+     case ACPI_IVRS_TYPE_MEMORY_ONE:
+-        return parse_ivmd_device_select(ivmd_block,
+-                                        base, limit, iw, ir);
++        return parse_ivmd_device_select(ivmd_block, base, limit,
++                                        iw, ir, exclusion);
+ 
+     case ACPI_IVRS_TYPE_MEMORY_RANGE:
+-        return parse_ivmd_device_range(ivmd_block,
+-                                       base, limit, iw, ir);
++        return parse_ivmd_device_range(ivmd_block, base, limit,
++                                       iw, ir, exclusion);
+ 
+     case ACPI_IVRS_TYPE_MEMORY_IOMMU:
+-        return parse_ivmd_device_iommu(ivmd_block,
+-                                       base, limit, iw, ir);
++        return parse_ivmd_device_iommu(ivmd_block, base, limit,
++                                       iw, ir, exclusion);
+ 
+     default:
+         AMD_IOMMU_DEBUG("IVMD Error: Invalid Block Type!\n");
+--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
++++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+@@ -218,6 +218,8 @@ static int __must_check allocate_domain_
+     return rc;
+ }
+ 
++int __read_mostly amd_iommu_min_paging_mode = 1;
++
+ static int amd_iommu_domain_init(struct domain *d)
+ {
+     struct domain_iommu *hd = dom_iommu(d);
+@@ -229,11 +231,13 @@ static int amd_iommu_domain_init(struct
+      * - HVM could in principle use 3 or 4 depending on how much guest
+      *   physical address space we give it, but this isn't known yet so use 4
+      *   unilaterally.
++     * - Unity maps may require an even higher number.
+      */
+-    hd->arch.paging_mode = amd_iommu_get_paging_mode(
+-        is_hvm_domain(d)
+-        ? 1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
+-        : get_upper_mfn_bound() + 1);
++    hd->arch.paging_mode = max(amd_iommu_get_paging_mode(
++            is_hvm_domain(d)
++            ? 1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
++            : get_upper_mfn_bound() + 1),
++        amd_iommu_min_paging_mode);
+ 
+     return 0;
+ }
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+@@ -126,6 +126,8 @@ extern struct hpet_sbdf {
+     } init;
+ } hpet_sbdf;
+ 
++extern int amd_iommu_min_paging_mode;
++
+ extern void *shared_intremap_table;
+ extern unsigned long *shared_intremap_inuse;
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-7.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-7.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-7.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-7.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,88 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/p2m: introduce p2m_is_special()
+
+Seeing the similarity of grant, foreign, and (subsequently) direct-MMIO
+handling, introduce a new P2M type group named "special" (as in "needing
+special accessors to create/destroy").
+
+Also use -EPERM instead of other error codes on the two domain_crash()
+paths touched.
+
+This is part of XSA-378.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -736,7 +736,7 @@ p2m_remove_page(struct p2m_domain *p2m,
+         for ( i = 0; i < (1UL << page_order); i++ )
+         {
+             p2m->get_entry(p2m, gfn_add(gfn, i), &t, &a, 0, NULL, NULL);
+-            if ( !p2m_is_grant(t) && !p2m_is_shared(t) && !p2m_is_foreign(t) )
++            if ( !p2m_is_special(t) && !p2m_is_shared(t) )
+                 set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY);
+         }
+     }
+@@ -848,13 +848,13 @@ guest_physmap_add_entry(struct domain *d
+                                   &ot, &a, 0, NULL, NULL);
+             ASSERT(!p2m_is_shared(ot));
+         }
+-        if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
++        if ( p2m_is_special(ot) )
+         {
+-            /* Really shouldn't be unmapping grant/foreign maps this way */
++            /* Don't permit unmapping grant/foreign this way. */
+             domain_crash(d);
+             p2m_unlock(p2m);
+             
+-            return -EINVAL;
++            return -EPERM;
+         }
+         else if ( p2m_is_ram(ot) && !p2m_is_paged(ot) )
+         {
+@@ -947,8 +947,7 @@ int p2m_change_type_one(struct domain *d
+     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+     int rc;
+ 
+-    BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
+-    BUG_ON(p2m_is_foreign(ot) || p2m_is_foreign(nt));
++    BUG_ON(p2m_is_special(ot) || p2m_is_special(nt));
+ 
+     gfn_lock(p2m, gfn, 0);
+ 
+@@ -1091,11 +1090,11 @@ static int set_typed_p2m_entry(struct do
+         gfn_unlock(p2m, gfn, order);
+         return cur_order + 1;
+     }
+-    if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
++    if ( p2m_is_special(ot) )
+     {
+         gfn_unlock(p2m, gfn, order);
+         domain_crash(d);
+-        return -ENOENT;
++        return -EPERM;
+     }
+     else if ( p2m_is_ram(ot) )
+     {
+--- a/xen/include/asm-x86/p2m.h
++++ b/xen/include/asm-x86/p2m.h
+@@ -142,6 +142,10 @@ typedef unsigned int p2m_query_t;
+                             | p2m_to_mask(p2m_ram_logdirty) )
+ #define P2M_SHARED_TYPES   (p2m_to_mask(p2m_ram_shared))
+ 
++/* Types established/cleaned up via special accessors. */
++#define P2M_SPECIAL_TYPES (P2M_GRANT_TYPES | \
++                           p2m_to_mask(p2m_map_foreign))
++
+ /* Valid types not necessarily associated with a (valid) MFN. */
+ #define P2M_INVALID_MFN_TYPES (P2M_POD_TYPES                  \
+                                | p2m_to_mask(p2m_mmio_direct) \
+@@ -170,6 +174,7 @@ typedef unsigned int p2m_query_t;
+ #define p2m_is_paged(_t)    (p2m_to_mask(_t) & P2M_PAGED_TYPES)
+ #define p2m_is_sharable(_t) (p2m_to_mask(_t) & P2M_SHARABLE_TYPES)
+ #define p2m_is_shared(_t)   (p2m_to_mask(_t) & P2M_SHARED_TYPES)
++#define p2m_is_special(_t)  (p2m_to_mask(_t) & P2M_SPECIAL_TYPES)
+ #define p2m_is_broken(_t)   (p2m_to_mask(_t) & P2M_BROKEN_TYPES)
+ #define p2m_is_foreign(_t)  (p2m_to_mask(_t) & p2m_to_mask(p2m_map_foreign))
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-8.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-8.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-8.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa378-4.11-8.patch	2022-05-26 17:34:24.000000000 +0100
@@ -0,0 +1,157 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/p2m: guard (in particular) identity mapping entries
+
+Such entries, created by set_identity_p2m_entry(), should only be
+destroyed by clear_identity_p2m_entry(). However, similarly, entries
+created by set_mmio_p2m_entry() should only be torn down by
+clear_mmio_p2m_entry(), so the logic gets based upon p2m_mmio_direct as
+the entry type (separation between "ordinary" and 1:1 mappings would
+require a further indicator to tell apart the two).
+
+As to the guest_remove_page() change, commit 48dfb297a20a ("x86/PVH:
+allow guest_remove_page to remove p2m_mmio_direct pages"), which
+introduced the call to clear_mmio_p2m_entry(), claimed this was done for
+hwdom only without this actually having been the case. However, this
+code shouldn't be there in the first place, as MMIO entries shouldn't be
+dropped this way. Avoid triggering the warning again that 48dfb297a20a
+silenced by an adjustment to xenmem_add_to_physmap_one() instead.
+
+Note that guest_physmap_mark_populate_on_demand() gets tightened beyond
+the immediate purpose of this change.
+
+Note also that I didn't inspect code which isn't security supported,
+e.g. sharing, paging, or altp2m.
+
+This is CVE-2021-28694 / part of XSA-378.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -4783,7 +4783,9 @@ int xenmem_add_to_physmap_one(
+ 
+     /* Remove previously mapped page if it was present. */
+     prev_mfn = mfn_x(get_gfn(d, gfn_x(gpfn), &p2mt));
+-    if ( mfn_valid(_mfn(prev_mfn)) )
++    if ( p2mt == p2m_mmio_direct )
++        rc = -EPERM;
++    else if ( mfn_valid(_mfn(prev_mfn)) )
+     {
+         if ( is_xen_heap_mfn(prev_mfn) )
+             /* Xen heap frames are simply unhooked from this phys slot. */
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -725,7 +725,8 @@ p2m_remove_page(struct p2m_domain *p2m,
+                                           &cur_order, NULL);
+ 
+         if ( p2m_is_valid(t) &&
+-             (!mfn_valid(_mfn(mfn)) || mfn + i != mfn_x(mfn_return)) )
++             (!mfn_valid(_mfn(mfn)) || t == p2m_mmio_direct ||
++              mfn + i != mfn_x(mfn_return)) )
+             return -EILSEQ;
+ 
+         i += (1UL << cur_order) - ((gfn_l + i) & ((1UL << cur_order) - 1));
+@@ -803,7 +804,7 @@ guest_physmap_add_entry(struct domain *d
+     if ( p2m_is_foreign(t) )
+         return -EINVAL;
+ 
+-    if ( !mfn_valid(mfn) )
++    if ( !mfn_valid(mfn) || t == p2m_mmio_direct )
+     {
+         ASSERT_UNREACHABLE();
+         return -EINVAL;
+@@ -850,7 +851,7 @@ guest_physmap_add_entry(struct domain *d
+         }
+         if ( p2m_is_special(ot) )
+         {
+-            /* Don't permit unmapping grant/foreign this way. */
++            /* Don't permit unmapping grant/foreign/direct-MMIO this way. */
+             domain_crash(d);
+             p2m_unlock(p2m);
+             
+@@ -1192,8 +1193,8 @@ int set_identity_p2m_entry(struct domain
+  *    order+1  for caller to retry with order (guaranteed smaller than
+  *             the order value passed in)
+  */
+-int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn_l, mfn_t mfn,
+-                         unsigned int order)
++static int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn_l,
++                                mfn_t mfn, unsigned int order)
+ {
+     int rc = -EINVAL;
+     gfn_t gfn = _gfn(gfn_l);
+--- a/xen/arch/x86/mm/p2m-pod.c
++++ b/xen/arch/x86/mm/p2m-pod.c
+@@ -1302,17 +1302,17 @@ guest_physmap_mark_populate_on_demand(st
+ 
+         p2m->get_entry(p2m, gfn_add(gfn, i), &ot, &a, 0, &cur_order, NULL);
+         n = 1UL << min(order, cur_order);
+-        if ( p2m_is_ram(ot) )
++        if ( ot == p2m_populate_on_demand )
++        {
++            /* Count how many PoD entries we'll be replacing if successful */
++            pod_count += n;
++        }
++        else if ( ot != p2m_invalid && ot != p2m_mmio_dm )
+         {
+             P2M_DEBUG("gfn_to_mfn returned type %d!\n", ot);
+             rc = -EBUSY;
+             goto out;
+         }
+-        else if ( ot == p2m_populate_on_demand )
+-        {
+-            /* Count how man PoD entries we'll be replacing if successful */
+-            pod_count += n;
+-        }
+     }
+ 
+     /* Now, actually do the two-way mapping */
+--- a/xen/common/memory.c
++++ b/xen/common/memory.c
+@@ -335,7 +335,7 @@ int guest_remove_page(struct domain *d,
+     }
+     if ( p2mt == p2m_mmio_direct )
+     {
+-        rc = clear_mmio_p2m_entry(d, gmfn, mfn, PAGE_ORDER_4K);
++        rc = -EPERM;
+         goto out_put_gfn;
+     }
+ #else
+@@ -1651,6 +1651,15 @@ int prepare_ring_for_helper(
+         return -ENOENT;
+     }
+ #endif
++#ifdef CONFIG_X86
++    if ( p2mt == p2m_mmio_direct )
++    {
++        if ( page )
++            put_page(page);
++
++        return -EPERM;
++    }
++#endif
+ 
+     if ( !page )
+         return -EINVAL;
+--- a/xen/include/asm-x86/p2m.h
++++ b/xen/include/asm-x86/p2m.h
+@@ -144,7 +144,8 @@ typedef unsigned int p2m_query_t;
+ 
+ /* Types established/cleaned up via special accessors. */
+ #define P2M_SPECIAL_TYPES (P2M_GRANT_TYPES | \
+-                           p2m_to_mask(p2m_map_foreign))
++                           p2m_to_mask(p2m_map_foreign) | \
++                           p2m_to_mask(p2m_mmio_direct))
+ 
+ /* Valid types not necessarily associated with a (valid) MFN. */
+ #define P2M_INVALID_MFN_TYPES (P2M_POD_TYPES                  \
+@@ -629,8 +630,6 @@ int set_foreign_p2m_entry(struct domain
+ /* Set mmio addresses in the p2m table (for pass-through) */
+ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                        unsigned int order, p2m_access_t access);
+-int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+-                         unsigned int order);
+ 
+ /* Set identity addresses in the p2m table (for pass-through) */
+ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa379-4.12.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa379-4.12.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa379-4.12.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa379-4.12.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,77 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/mm: widen locked region in xenmem_add_to_physmap_one()
+
+For pages which can be made part of the P2M by the guest, but which can
+also later be de-allocated (grant table v2 status pages being the
+present example), it is imperative that they be mapped at no more than a
+single GFN. We therefore need to make sure that of two parallel
+XENMAPSPACE_grant_table requests for the same status page one completes
+before the second checks at which other GFN the underlying MFN is
+presently mapped.
+
+Push down the respective put_gfn(). This leverages that gfn_lock()
+really aliases p2m_lock(), but the function makes this assumption
+already anyway: In the XENMAPSPACE_gmfn case lock nesting constraints
+for both involved GFNs would otherwise need to be enforced to avoid ABBA
+deadlocks.
+
+This is CVE-2021-28697 / XSA-379.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -4807,8 +4807,20 @@ int xenmem_add_to_physmap_one(
+         goto put_both;
+     }
+ 
+-    /* Remove previously mapped page if it was present. */
++    /*
++     * Note that we're (ab)using GFN locking (to really be locking of the
++     * entire P2M) here in (at least) two ways: Finer grained locking would
++     * expose lock order violations in the XENMAPSPACE_gmfn case (due to the
++     * earlier get_gfn_unshare() above). Plus at the very least for the grant
++     * table v2 status page case we need to guarantee that the same page can
++     * only appear at a single GFN. While this is a property we want in
++     * general, for pages which can subsequently be freed this imperative:
++     * Upon freeing we wouldn't be able to find other mappings in the P2M
++     * (unless we did a brute force search).
++     */
+     prev_mfn = mfn_x(get_gfn(d, gfn_x(gpfn), &p2mt));
++
++    /* Remove previously mapped page if it was present. */
+     if ( p2mt == p2m_mmio_direct )
+         rc = -EPERM;
+     else if ( mfn_valid(_mfn(prev_mfn)) )
+@@ -4820,27 +4832,21 @@ int xenmem_add_to_physmap_one(
+             /* Normal domain memory is freed, to avoid leaking memory. */
+             rc = guest_remove_page(d, gfn_x(gpfn));
+     }
+-    /* In the XENMAPSPACE_gmfn case we still hold a ref on the old page. */
+-    put_gfn(d, gfn_x(gpfn));
+-
+-    if ( rc )
+-        goto put_both;
+ 
+     /* Unmap from old location, if any. */
+     old_gpfn = get_gpfn_from_mfn(mfn_x(mfn));
+     ASSERT(!SHARED_M2P(old_gpfn));
+     if ( space == XENMAPSPACE_gmfn && old_gpfn != gfn )
+-    {
+         rc = -EXDEV;
+-        goto put_both;
+-    }
+-    if ( old_gpfn != INVALID_M2P_ENTRY )
++    else if ( !rc && old_gpfn != INVALID_M2P_ENTRY )
+         rc = guest_physmap_remove_page(d, _gfn(old_gpfn), mfn, PAGE_ORDER_4K);
+ 
+     /* Map at new location. */
+     if ( !rc )
+         rc = guest_physmap_add_page(d, gpfn, mfn, PAGE_ORDER_4K);
+ 
++    put_gfn(d, gfn_x(gpfn));
++
+  put_both:
+     /*
+      * In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top.
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-3.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-3.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-3.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-3.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,74 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: gnttab: avoid triggering assertion in radix_tree_ulong_to_ptr()
+
+Relevant quotes from the C11 standard:
+
+"Except where explicitly stated otherwise, for the purposes of this
+ subclause unnamed members of objects of structure and union type do not
+ participate in initialization. Unnamed members of structure objects
+ have indeterminate value even after initialization."
+
+"If there are fewer initializers in a brace-enclosed list than there are
+ elements or members of an aggregate, [...], the remainder of the
+ aggregate shall be initialized implicitly the same as objects that have
+ static storage duration."
+
+"If an object that has static or thread storage duration is not
+ initialized explicitly, then:
+ [...]
+ — if it is an aggregate, every member is initialized (recursively)
+   according to these rules, and any padding is initialized to zero
+   bits;
+ [...]"
+
+"A bit-field declaration with no declarator, but only a colon and a
+ width, indicates an unnamed bit-field." Footnote: "An unnamed bit-field
+ structure member is useful for padding to conform to externally imposed
+ layouts."
+
+"There may be unnamed padding within a structure object, but not at its
+ beginning."
+
+Which makes me conclude:
+- Whether an unnamed bit-field member is an unnamed member or padding is
+  unclear, and hence also whether the last quote above would render the
+  big endian case of the structure declaration invalid.
+- Whether the number of members of an aggregate includes unnamed ones is
+  also not really clear.
+- The initializer in map_grant_ref() initializes all fields of the "cnt"
+  sub-structure of the union, so assuming the second quote above applies
+  here (indirectly), the compiler isn't required to implicitly
+  initialize the rest (i.e. in particular any padding) like would happen
+  for static storage duration objects.
+
+Gcc 7.4.1 can be observed (apparently in debug builds only) to translate
+aforementioned initializer to a read-modify-write operation of a stack
+variable, leaving unchanged the top two bits of whatever was previously
+in that stack slot. Clearly if either of the two bits were set,
+radix_tree_ulong_to_ptr()'s assertion would trigger.
+
+Therefore, to be on the safe side, add an explicit padding field for the
+non-big-endian-bitfields case and give a dummy name to both padding
+fields.
+
+Fixes: 9781b51efde2 ("gnttab: replace mapkind()")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -952,10 +952,13 @@ union maptrack_node {
+     struct {
+         /* Radix tree slot pointers use two of the bits. */
+ #ifdef __BIG_ENDIAN_BITFIELD
+-        unsigned long    : 2;
++        unsigned long _0 : 2;
+ #endif
+         unsigned long rd : BITS_PER_LONG / 2 - 1;
+         unsigned long wr : BITS_PER_LONG / 2 - 1;
++#ifndef __BIG_ENDIAN_BITFIELD
++        unsigned long _0 : 2;
++#endif
+     } cnt;
+     unsigned long raw;
+ };
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-4.11-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-4.11-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-4.11-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-4.11-1.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,139 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: gnttab: add preemption check to gnttab_release_mappings()
+
+A guest may die with many grant mappings still in place, or simply with
+a large maptrack table. Iterating through this may take more time than
+is reasonable without intermediate preemption (to run softirqs and
+perhaps the scheduler).
+
+Move the invocation of the function to the section where other
+restartable functions get invoked, and have the function itself check
+for preemption every once in a while. Have it iterate the table
+backwards, such that decreasing the maptrack limit is all it takes to
+convey restart information.
+
+In domain_teardown() introduce PROG_none such that inserting at the
+front will be easier going forward.
+
+This is part of CVE-2021-28698 / XSA-380.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/domain.c
++++ b/xen/common/domain.c
+@@ -646,13 +646,15 @@ int domain_kill(struct domain *d)
+         if ( d->is_dying != DOMDYING_alive )
+             return domain_kill(d);
+         d->is_dying = DOMDYING_dying;
+-        gnttab_release_mappings(d);
+         tmem_destroy(d->tmem_client);
+         vnuma_destroy(d->vnuma);
+         domain_set_outstanding_pages(d, 0);
+         d->tmem_client = NULL;
+         /* fallthrough */
+     case DOMDYING_dying:
++        rc = gnttab_release_mappings(d);
++        if ( rc )
++            break;
+         rc = evtchn_destroy(d);
+         if ( rc )
+             break;
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -62,7 +62,13 @@ struct grant_table {
+     unsigned int          nr_grant_frames;
+     /* Number of grant status frames shared with guest (for version 2) */
+     unsigned int          nr_status_frames;
+-    /* Number of available maptrack entries. */
++    /*
++     * Number of available maptrack entries.  For cleanup purposes it is
++     * important to realize that this field and @maptrack further down will
++     * only ever be accessed by the local domain.  Thus it is okay to clean
++     * up early, and to shrink the limit for the purpose of tracking cleanup
++     * progress.
++     */
+     unsigned int          maptrack_limit;
+     /* Shared grant table (see include/public/grant_table.h). */
+     union {
+@@ -3618,9 +3624,7 @@ grant_table_create(
+     return ret;
+ }
+ 
+-void
+-gnttab_release_mappings(
+-    struct domain *d)
++int gnttab_release_mappings(struct domain *d)
+ {
+     struct grant_table   *gt = d->grant_table, *rgt;
+     struct grant_mapping *map;
+@@ -3634,8 +3638,32 @@ gnttab_release_mappings(
+ 
+     BUG_ON(!d->is_dying);
+ 
+-    for ( handle = 0; handle < gt->maptrack_limit; handle++ )
++    if ( !gt || !gt->maptrack )
++        return 0;
++
++    for ( handle = gt->maptrack_limit; handle; )
+     {
++        /*
++         * Deal with full pages such that their freeing (in the body of the
++         * if()) remains simple.
++         */
++        if ( handle < gt->maptrack_limit && !(handle % MAPTRACK_PER_PAGE) )
++        {
++            /*
++             * Changing maptrack_limit alters nr_maptrack_frames()'es return
++             * value. Free the then excess trailing page right here, rather
++             * than leaving it to grant_table_destroy() (and in turn requiring
++             * to leave gt->maptrack_limit unaltered).
++             */
++            gt->maptrack_limit = handle;
++            FREE_XENHEAP_PAGE(gt->maptrack[nr_maptrack_frames(gt)]);
++
++            if ( hypercall_preempt_check() )
++                return -ERESTART;
++        }
++
++        --handle;
++
+         map = &maptrack_entry(gt, handle);
+         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) )
+             continue;
+@@ -3723,6 +3751,11 @@ gnttab_release_mappings(
+ 
+         map->flags = 0;
+     }
++
++    gt->maptrack_limit = 0;
++    FREE_XENHEAP_PAGE(gt->maptrack[0]);
++
++    return 0;
+ }
+ 
+ void grant_table_warn_active_grants(struct domain *d)
+@@ -3785,8 +3818,7 @@ grant_table_destroy(
+         free_xenheap_page(t->shared_raw[i]);
+     xfree(t->shared_raw);
+ 
+-    for ( i = 0; i < nr_maptrack_frames(t); i++ )
+-        free_xenheap_page(t->maptrack[i]);
++    ASSERT(!t->maptrack_limit);
+     vfree(t->maptrack);
+ 
+     for ( i = 0; i < nr_active_grant_frames(t); i++ )
+--- a/xen/include/xen/grant_table.h
++++ b/xen/include/xen/grant_table.h
+@@ -46,9 +46,7 @@ int grant_table_set_limits(struct domain
+ void grant_table_warn_active_grants(struct domain *d);
+ 
+ /* Domain death release of granted mappings of other domains' memory. */
+-void
+-gnttab_release_mappings(
+-    struct domain *d);
++int gnttab_release_mappings(struct domain *d);
+ 
+ int mem_sharing_gref_to_gfn(struct grant_table *gt, grant_ref_t ref,
+                             gfn_t *gfn, uint16_t *status);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-4.11-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-4.11-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-4.11-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa380-4.11-2.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,384 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: gnttab: replace mapkind()
+
+mapkind() doesn't scale very well with larger maptrack entry counts,
+using a brute force linear search through all entries, with the only
+option of an early loop exit if a matching writable entry was found.
+Introduce a radix tree alongside the main maptrack table, thus
+allowing much faster MFN-based lookup. To avoid the need to actually
+allocate space for the individual nodes, encode the two counters in the
+node pointers themselves, thus limiting the number of permitted
+simultaneous r/o and r/w mappings of the same MFN to 2³¹-1 (64-bit) /
+2¹⁵-1 (32-bit) each.
+
+To avoid enforcing an unnecessarily low bound on the number of
+simultaneous mappings of a single MFN, introduce
+radix_tree_{ulong_to_ptr,ptr_to_ulong} paralleling
+radix_tree_{int_to_ptr,ptr_to_int}.
+
+As a consequence locking changes are also applicable: With there no
+longer being any inspection of the remote domain's active entries,
+there's also no need anymore to hold the remote domain's grant table
+lock. And since we're no longer iterating over the local domain's map
+track table, the lock in map_grant_ref() can also be dropped before the
+new maptrack entry actually gets populated.
+
+As a nice side effect this also reduces the number of IOMMU operations
+in unmap_common(): Previously we would have "established" a readable
+mapping whenever we didn't find a writable entry anymore (yet, of
+course, at least one readable one). But we only need to do this if we
+actually dropped the last writable entry, not if there were none already
+before.
+
+This is part of CVE-2021-28698 / XSA-380.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -36,6 +36,7 @@
+ #include <xen/iommu.h>
+ #include <xen/paging.h>
+ #include <xen/keyhandler.h>
++#include <xen/radix-tree.h>
+ #include <xen/vmap.h>
+ #include <xsm/xsm.h>
+ #include <asm/flushtlb.h>
+@@ -80,8 +81,13 @@ struct grant_table {
+     grant_status_t       **status;
+     /* Active grant table. */
+     struct active_grant_entry **active;
+-    /* Mapping tracking table per vcpu. */
++    /* Handle-indexed tracking table of mappings. */
+     struct grant_mapping **maptrack;
++    /*
++     * MFN-indexed tracking tree of mappings, if needed.  Note that this is
++     * protected by @lock, not @maptrack_lock.
++     */
++    struct radix_tree_root maptrack_tree;
+ 
+     /* Domain to which this struct grant_table belongs. */
+     const struct domain *domain;
+@@ -421,34 +427,6 @@ static int get_paged_frame(unsigned long
+     return rc;
+ }
+ 
+-static inline void
+-double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
+-{
+-    /*
+-     * See mapkind() for why the write lock is also required for the
+-     * remote domain.
+-     */
+-    if ( lgt < rgt )
+-    {
+-        grant_write_lock(lgt);
+-        grant_write_lock(rgt);
+-    }
+-    else
+-    {
+-        if ( lgt != rgt )
+-            grant_write_lock(rgt);
+-        grant_write_lock(lgt);
+-    }
+-}
+-
+-static inline void
+-double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
+-{
+-    grant_write_unlock(lgt);
+-    if ( lgt != rgt )
+-        grant_write_unlock(rgt);
+-}
+-
+ #define INVALID_MAPTRACK_HANDLE UINT_MAX
+ 
+ static inline grant_handle_t
+@@ -871,41 +849,17 @@ static struct active_grant_entry *grant_
+     return ERR_PTR(-EINVAL);
+ }
+ 
+-#define MAPKIND_READ 1
+-#define MAPKIND_WRITE 2
+-static unsigned int mapkind(
+-    struct grant_table *lgt, const struct domain *rd, mfn_t mfn)
+-{
+-    struct grant_mapping *map;
+-    grant_handle_t handle, limit = lgt->maptrack_limit;
+-    unsigned int kind = 0;
+-
+-    /*
+-     * Must have the local domain's grant table write lock when
+-     * iterating over its maptrack entries.
+-     */
+-    ASSERT(percpu_rw_is_write_locked(&lgt->lock));
+-    /*
+-     * Must have the remote domain's grant table write lock while
+-     * counting its active entries.
+-     */
+-    ASSERT(percpu_rw_is_write_locked(&rd->grant_table->lock));
+-
+-    smp_rmb();
+-
+-    for ( handle = 0; !(kind & MAPKIND_WRITE) && handle < limit; handle++ )
+-    {
+-        map = &maptrack_entry(lgt, handle);
+-        if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
+-             map->domid != rd->domain_id )
+-            continue;
+-        if ( mfn_eq(_active_entry(rd->grant_table, map->ref).frame, mfn) )
+-            kind |= map->flags & GNTMAP_readonly ?
+-                    MAPKIND_READ : MAPKIND_WRITE;
+-    }
+-
+-    return kind;
+-}
++union maptrack_node {
++    struct {
++        /* Radix tree slot pointers use two of the bits. */
++#ifdef __BIG_ENDIAN_BITFIELD
++        unsigned long    : 2;
++#endif
++        unsigned long rd : BITS_PER_LONG / 2 - 1;
++        unsigned long wr : BITS_PER_LONG / 2 - 1;
++    } cnt;
++    unsigned long raw;
++};
+ 
+ /*
+  * Returns 0 if TLB flush / invalidate required by caller.
+@@ -931,7 +885,6 @@ map_grant_ref(
+     struct grant_mapping *mt;
+     grant_entry_header_t *shah;
+     uint16_t *status;
+-    bool_t need_iommu;
+ 
+     led = current;
+     ld = led->domain;
+@@ -1139,31 +1092,75 @@ map_grant_ref(
+         goto undo_out;
+     }
+ 
+-    need_iommu = gnttab_need_iommu_mapping(ld);
+-    if ( need_iommu )
++    if ( gnttab_need_iommu_mapping(ld) )
+     {
++        union maptrack_node node = {
++            .cnt.rd = !!(op->flags & GNTMAP_readonly),
++            .cnt.wr = !(op->flags & GNTMAP_readonly),
++        };
++        int err;
++        void **slot = NULL;
+         unsigned int kind;
+ 
+-        double_gt_lock(lgt, rgt);
++        grant_write_lock(lgt);
++
++        err = radix_tree_insert(&lgt->maptrack_tree, mfn_x(frame),
++                                radix_tree_ulong_to_ptr(node.raw));
++        if ( err == -EEXIST )
++        {
++            slot = radix_tree_lookup_slot(&lgt->maptrack_tree, mfn_x(frame));
++            if ( likely(slot) )
++            {
++                node.raw = radix_tree_ptr_to_ulong(*slot);
++                err = -EBUSY;
++
++                /* Update node only when refcount doesn't overflow. */
++                if ( op->flags & GNTMAP_readonly ? ++node.cnt.rd
++                                                 : ++node.cnt.wr )
++                {
++                    radix_tree_replace_slot(slot,
++                                            radix_tree_ulong_to_ptr(node.raw));
++                    err = 0;
++                }
++            }
++            else
++                ASSERT_UNREACHABLE();
++        }
+ 
+         /*
+          * We're not translated, so we know that dfns and mfns are
+          * the same things, so the IOMMU entry is always 1-to-1.
+          */
+-        kind = mapkind(lgt, rd, frame);
+-        if ( !(op->flags & GNTMAP_readonly) &&
+-             !(kind & MAPKIND_WRITE) )
++        if ( !(op->flags & GNTMAP_readonly) && node.cnt.wr == 1 )
+             kind = IOMMUF_readable | IOMMUF_writable;
+-        else if ( !kind )
++        else if ( (op->flags & GNTMAP_readonly) &&
++                  node.cnt.rd == 1 && !node.cnt.wr )
+             kind = IOMMUF_readable;
+         else
+             kind = 0;
+-        if ( kind && iommu_map_page(ld, mfn_x(frame), mfn_x(frame), kind) )
++        if ( err ||
++             (kind && iommu_map_page(ld, mfn_x(frame), mfn_x(frame), kind)) )
+         {
+-            double_gt_unlock(lgt, rgt);
++            if ( !err )
++            {
++                if ( slot )
++                {
++                    op->flags & GNTMAP_readonly ? node.cnt.rd--
++                                                : node.cnt.wr--;
++                    radix_tree_replace_slot(slot,
++                                            radix_tree_ulong_to_ptr(node.raw));
++                }
++                else
++                    radix_tree_delete(&lgt->maptrack_tree, mfn_x(frame));
++            }
++
+             rc = GNTST_general_error;
+-            goto undo_out;
+         }
++
++        grant_write_unlock(lgt);
++
++        if ( rc != GNTST_okay )
++            goto undo_out;
+     }
+ 
+     TRACE_1D(TRC_MEM_PAGE_GRANT_MAP, op->dom);
+@@ -1171,10 +1168,6 @@ map_grant_ref(
+     /*
+      * All maptrack entry users check mt->flags first before using the
+      * other fields so just ensure the flags field is stored last.
+-     *
+-     * However, if gnttab_need_iommu_mapping() then this would race
+-     * with a concurrent mapkind() call (on an unmap, for example)
+-     * and a lock is required.
+      */
+     mt = &maptrack_entry(lgt, handle);
+     mt->domid = op->dom;
+@@ -1182,9 +1175,6 @@ map_grant_ref(
+     smp_wmb();
+     write_atomic(&mt->flags, op->flags);
+ 
+-    if ( need_iommu )
+-        double_gt_unlock(lgt, rgt);
+-
+     op->dev_bus_addr = mfn_to_maddr(frame);
+     op->handle       = handle;
+     op->status       = GNTST_okay;
+@@ -1411,19 +1401,34 @@ unmap_common(
+ 
+     if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
+     {
+-        unsigned int kind;
++        void **slot;
++        union maptrack_node node;
+         int err = 0;
+ 
+-        double_gt_lock(lgt, rgt);
++        grant_write_lock(lgt);
++        slot = radix_tree_lookup_slot(&lgt->maptrack_tree, mfn_x(op->frame));
++        node.raw = likely(slot) ? radix_tree_ptr_to_ulong(*slot) : 0;
++
++        /* Refcount must not underflow. */
++        if ( !(flags & GNTMAP_readonly ? node.cnt.rd--
++                                       : node.cnt.wr--) )
++            BUG();
+ 
+-        kind = mapkind(lgt, rd, op->frame);
+-        if ( !kind )
++        if ( !node.raw )
+             err = iommu_unmap_page(ld, mfn_x(op->frame));
+-        else if ( !(kind & MAPKIND_WRITE) )
++        else if ( !(flags & GNTMAP_readonly) && !node.cnt.wr )
+             err = iommu_map_page(ld, mfn_x(op->frame),
+                                  mfn_x(op->frame), IOMMUF_readable);
+ 
+-        double_gt_unlock(lgt, rgt);
++        if ( err )
++            ;
++        else if ( !node.raw )
++            radix_tree_delete(&lgt->maptrack_tree, mfn_x(op->frame));
++        else
++            radix_tree_replace_slot(slot,
++                                    radix_tree_ulong_to_ptr(node.raw));
++
++        grant_write_unlock(lgt);
+ 
+         if ( err )
+             rc = GNTST_general_error;
+@@ -1854,6 +1859,8 @@ grant_table_init(struct domain *d, struc
+         gt->maptrack = vzalloc(gt->max_maptrack_frames * sizeof(*gt->maptrack));
+         if ( gt->maptrack == NULL )
+             goto out;
++
++        radix_tree_init(&gt->maptrack_tree);
+     }
+ 
+     /* Shared grant table. */
+@@ -3643,6 +3650,8 @@ int gnttab_release_mappings(struct domai
+ 
+     for ( handle = gt->maptrack_limit; handle; )
+     {
++        mfn_t mfn;
++
+         /*
+          * Deal with full pages such that their freeing (in the body of the
+          * if()) remains simple.
+@@ -3744,17 +3753,31 @@ int gnttab_release_mappings(struct domai
+         if ( act->pin == 0 )
+             gnttab_clear_flag(rd, _GTF_reading, status);
+ 
++        mfn = act->frame;
++
+         active_entry_release(act);
+         grant_read_unlock(rgt);
+ 
+         rcu_unlock_domain(rd);
+ 
+         map->flags = 0;
++
++        /*
++         * This is excessive in that a single such call would suffice per
++         * mapped MFN (or none at all, if no entry was ever inserted). But it
++         * should be the common case for an MFN to be mapped just once, and
++         * this way we don't need to further maintain the counters. We also
++         * don't want to leave cleaning up of the tree as a whole to the end
++         * of the function, as this could take quite some time.
++         */
++        radix_tree_delete(&gt->maptrack_tree, mfn_x(mfn));
+     }
+ 
+     gt->maptrack_limit = 0;
+     FREE_XENHEAP_PAGE(gt->maptrack[0]);
+ 
++    radix_tree_destroy(&gt->maptrack_tree, NULL);
++
+     return 0;
+ }
+ 
+--- a/xen/include/xen/radix-tree.h
++++ b/xen/include/xen/radix-tree.h
+@@ -190,6 +190,25 @@ static inline int radix_tree_ptr_to_int(
+     return (int)((long)ptr >> 2);
+ }
+ 
++/**
++ * radix_tree_{ulong_to_ptr,ptr_to_ulong}:
++ *
++ * Same for unsigned long values. Beware though that only BITS_PER_LONG-2
++ * bits are actually usable for the value.
++ */
++static inline void *radix_tree_ulong_to_ptr(unsigned long val)
++{
++    unsigned long ptr = (val << 2) | 0x2;
++    ASSERT((ptr >> 2) == val);
++    return (void *)ptr;
++}
++
++static inline unsigned long radix_tree_ptr_to_ulong(void *ptr)
++{
++    ASSERT(((unsigned long)ptr & 0x3) == 0x2);
++    return (unsigned long)ptr >> 2;
++}
++
+ int radix_tree_insert(struct radix_tree_root *, unsigned long, void *);
+ void *radix_tree_lookup(struct radix_tree_root *, unsigned long);
+ void **radix_tree_lookup_slot(struct radix_tree_root *, unsigned long);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa382.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa382.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa382.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa382.patch	2022-06-17 09:29:25.000000000 +0100
@@ -0,0 +1,34 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: gnttab: fix array capacity check in gnttab_get_status_frames()
+
+The number of grant frames is of no interest here; converting the passed
+in op.nr_frames this way means we allow for 8 times as many GFNs to be
+written as actually fit in the array. We would corrupt xlat areas of
+higher vCPU-s (after having faulted many times while trying to write to
+the guard pages between any two areas) for 32-bit PV guests. For HVM
+guests we'd simply crash as soon as we hit the first guard page, as
+accesses to the xlat area are simply memcpy() there.
+
+This is CVE-2021-28699 / XSA-382.
+
+Fixes: 18b1be5e324b ("gnttab: make resource limits per domain")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -3177,12 +3177,11 @@ gnttab_get_status_frames(XEN_GUEST_HANDL
+         goto unlock;
+     }
+ 
+-    if ( unlikely(limit_max < grant_to_status_frames(op.nr_frames)) )
++    if ( unlikely(limit_max < op.nr_frames) )
+     {
+         gdprintk(XENLOG_WARNING,
+-                 "grant_to_status_frames(%u) for d%d is too large (%u,%u)\n",
+-                 op.nr_frames, d->domain_id,
+-                 grant_to_status_frames(op.nr_frames), limit_max);
++                 "nr_status_frames for %pd is too large (%u,%u)\n",
++                 d, op.nr_frames, limit_max);
+         op.status = GNTST_general_error;
+         goto unlock;
+     }
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa384-4.11.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa384-4.11.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa384-4.11.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa384-4.11.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,79 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: gnttab: deal with status frame mapping race
+
+Once gnttab_map_frame() drops the grant table lock, the MFN it reports
+back to its caller is free to other manipulation. In particular
+gnttab_unpopulate_status_frames() might free it, by a racing request on
+another CPU, thus resulting in a reference to a deallocated page getting
+added to a domain's P2M.
+
+Obtain a page reference in gnttab_map_frame() to prevent freeing of the
+page until xenmem_add_to_physmap_one() has actually completed its acting
+on the page. Do so uniformly, even if only strictly required for v2
+status pages, to avoid extra conditionals (which then would all need to
+be kept in sync going forward).
+
+This is CVE-2021-28701 / XSA-384.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+
+--- a/xen/arch/arm/mm.c
++++ b/xen/arch/arm/mm.c
+@@ -1238,6 +1238,8 @@ int xenmem_add_to_physmap_one(
+         if ( rc )
+             return rc;
+ 
++        /* Need to take care of the reference obtained in gnttab_map_frame(). */
++        page = mfn_to_page(mfn);
+         t = p2m_ram_rw;
+ 
+         break;
+@@ -1304,9 +1306,12 @@ int xenmem_add_to_physmap_one(
+     /* Map at new location. */
+     rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
+ 
+-    /* If we fail to add the mapping, we need to drop the reference we
+-     * took earlier on foreign pages */
+-    if ( rc && space == XENMAPSPACE_gmfn_foreign )
++    /*
++     * For XENMAPSPACE_gmfn_foreign if we failed to add the mapping, we need
++     * to drop the reference we took earlier. In all other cases we need to
++     * drop any reference we took earlier (perhaps indirectly).
++     */
++    if ( space == XENMAPSPACE_gmfn_foreign ? rc : page != NULL )
+     {
+         ASSERT(page != NULL);
+         put_page(page);
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -4751,6 +4751,8 @@ int xenmem_add_to_physmap_one(
+             rc = gnttab_map_frame(d, idx, gpfn, &mfn);
+             if ( rc )
+                 return rc;
++            /* Need to take care of the ref obtained in gnttab_map_frame(). */
++            page = mfn_to_page(mfn);
+             break;
+         case XENMAPSPACE_gmfn:
+         {
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -3964,7 +3964,16 @@ int gnttab_map_frame(struct domain *d, u
+                                        *mfn, 0);
+ 
+     if ( !rc )
+-        gnttab_set_frame_gfn(gt, status, idx, gfn);
++    {
++        /*
++         * Make sure gnttab_unpopulate_status_frames() won't (successfully)
++         * free the page until our caller has completed its operation.
++         */
++        if ( get_page(mfn_to_page(*mfn), d) )
++            gnttab_set_frame_gfn(gt, status, idx, gfn);
++        else
++            rc = -EBUSY;
++    }
+ 
+     grant_write_unlock(gt);
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa385-4.12.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa385-4.12.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa385-4.12.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa385-4.12.patch	2022-06-01 21:38:43.000000000 +0100
@@ -0,0 +1,73 @@
+From: Julien Grall <jgrall@amazon.com>
+Subject: xen/page_alloc: Harden assign_pages()
+
+domain_tot_pages() and d->max_pages are 32-bit values. While the order
+should always be quite small, it would still be possible to overflow
+if domain_tot_pages() is near to (2^32 - 1).
+
+As this code may be called by a guest via XENMEM_increase_reservation
+and XENMEM_populate_physmap, we want to make sure the guest is not going
+to be able to allocate more than it is allowed.
+
+Rework the allocation check to avoid any possible overflow. While the
+check domain_tot_pages() < d->max_pages should technically not be
+necessary, it is probably best to have it to catch any possible
+inconsistencies in the future.
+
+This is CVE-2021-28706 / XSA-385.
+
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -2239,7 +2239,8 @@ gnttab_transfer(
+          * pages when it is dying.
+          */
+         if ( unlikely(e->is_dying) ||
+-             unlikely(e->tot_pages >= e->max_pages) )
++             unlikely(e->tot_pages >= e->max_pages) ||
++             unlikely(!(e->tot_pages + 1)) )
+         {
+             spin_unlock(&e->page_alloc_lock);
+ 
+@@ -2248,8 +2249,8 @@ gnttab_transfer(
+                          e->domain_id);
+             else
+                 gdprintk(XENLOG_INFO,
+-                         "Transferee d%d has no headroom (tot %u, max %u)\n",
+-                         e->domain_id, e->tot_pages, e->max_pages);
++                         "Transferee %pd has no headroom (tot %u, max %u)\n",
++                         e, e->tot_pages, e->max_pages);
+ 
+             gop.status = GNTST_general_error;
+             goto unlock_and_copyback;
+--- a/xen/common/page_alloc.c
++++ b/xen/common/page_alloc.c
+@@ -2278,12 +2278,21 @@ int assign_pages(
+ 
+     if ( !(memflags & MEMF_no_refcount) )
+     {
+-        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
++        unsigned int nr = 1u << order;
++
++        if ( unlikely(d->tot_pages > d->max_pages) )
++        {
++            gprintk(XENLOG_INFO, "Inconsistent allocation for %pd: %u > %u\n",
++                    d, d->tot_pages, d->max_pages);
++            rc = -EPERM;
++            goto out;
++        }
++
++        if ( unlikely(nr > d->max_pages - d->tot_pages) )
+         {
+             if ( !tmem_enabled() || order != 0 || d->tot_pages != d->max_pages )
+-                gprintk(XENLOG_INFO, "Over-allocation for domain %u: "
+-                        "%u > %u\n", d->domain_id,
+-                        d->tot_pages + (1 << order), d->max_pages);
++                gprintk(XENLOG_INFO, "Over-allocation for %pd: %Lu > %u\n",
++                        d, d->tot_pages + 0ull + nr, d->max_pages);
+             rc = -E2BIG;
+             goto out;
+         }
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa388-4.14-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa388-4.14-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa388-4.14-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa388-4.14-1.patch	2022-06-02 07:31:01.000000000 +0100
@@ -0,0 +1,172 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/PoD: deal with misaligned GFNs
+
+Users of XENMEM_decrease_reservation and XENMEM_populate_physmap aren't
+required to pass in order-aligned GFN values. (While I consider this
+bogus, I don't think we can fix this there, as that might break existing
+code, e.g Linux'es swiotlb, which - while affecting PV only - until
+recently had been enforcing only page alignment on the original
+allocation.) Only non-PoD code paths (guest_physmap_{add,remove}_page(),
+p2m_set_entry()) look to be dealing with this properly (in part by being
+implemented inefficiently, handling every 4k page separately).
+
+Introduce wrappers taking care of splitting the incoming request into
+aligned chunks, without putting much effort in trying to determine the
+largest possible chunk at every iteration.
+
+Also "handle" p2m_set_entry() failure for non-order-0 requests by
+crashing the domain in one more place. Alongside putting a log message
+there, also add one to the other similar path.
+
+Note regarding locking: This is left in the actual worker functions on
+the assumption that callers aren't guaranteed atomicity wrt acting on
+multiple pages at a time. For mis-aligned GFNs gfn_lock() wouldn't have
+locked the correct GFN range anyway, if it didn't simply resolve to
+p2m_lock(), and for well-behaved callers there continues to be only a
+single iteration, i.e. behavior is unchanged for them. (FTAOD pulling
+out just pod_lock() into p2m_pod_decrease_reservation() would result in
+a lock order violation.)
+
+This is CVE-2021-28704 and CVE-2021-28707 / part of XSA-388.
+
+Fixes: 3c352011c0d3 ("x86/PoD: shorten certain operations on higher order ranges")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/arch/x86/mm/p2m-pod.c
++++ b/xen/arch/x86/mm/p2m-pod.c
+@@ -495,7 +495,7 @@ p2m_pod_zero_check_superpage(struct p2m_
+ 
+ 
+ /*
+- * This function is needed for two reasons:
++ * This pair of functions is needed for two reasons:
+  * + To properly handle clearing of PoD entries
+  * + To "steal back" memory being freed for the PoD cache, rather than
+  *   releasing it.
+@@ -503,8 +503,8 @@ p2m_pod_zero_check_superpage(struct p2m_
+  * Once both of these functions have been completed, we can return and
+  * allow decrease_reservation() to handle everything else.
+  */
+-unsigned long
+-p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn, unsigned int order)
++static unsigned long
++decrease_reservation(struct domain *d, gfn_t gfn, unsigned int order)
+ {
+     unsigned long ret = 0, i, n;
+     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+@@ -551,8 +551,10 @@ p2m_pod_decrease_reservation(struct doma
+          * All PoD: Mark the whole region invalid and tell caller
+          * we're done.
+          */
+-        if ( p2m_set_entry(p2m, gfn, INVALID_MFN, order, p2m_invalid,
+-                           p2m->default_access) )
++        int rc = p2m_set_entry(p2m, gfn, INVALID_MFN, order, p2m_invalid,
++                               p2m->default_access);
++
++        if ( rc )
+         {
+             /*
+              * If this fails, we can't tell how much of the range was changed.
+@@ -560,7 +562,12 @@ p2m_pod_decrease_reservation(struct doma
+              * impossible.
+              */
+             if ( order != 0 )
++            {
++                printk(XENLOG_G_ERR
++                       "%pd: marking GFN %#lx (order %u) as non-PoD failed: %d\n",
++                       d, gfn_x(gfn), order, rc);
+                 domain_crash(d);
++            }
+             goto out_unlock;
+         }
+         ret = 1UL << order;
+@@ -667,6 +674,22 @@ out_unlock:
+     return ret;
+ }
+ 
++unsigned long
++p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn, unsigned int order)
++{
++    unsigned long left = 1UL << order, ret = 0;
++    unsigned int chunk_order = find_first_set_bit(gfn_x(gfn) | left);
++
++    do {
++        ret += decrease_reservation(d, gfn, chunk_order);
++
++        left -= 1UL << chunk_order;
++        gfn = gfn_add(gfn, 1UL << chunk_order);
++    } while ( left );
++
++    return ret;
++}
++
+ void p2m_pod_dump_data(struct domain *d)
+ {
+     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+@@ -1266,19 +1289,15 @@ remap_and_retry:
+     return true;
+ }
+ 
+-
+-int
+-guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn_l,
+-                                      unsigned int order)
++static int
++mark_populate_on_demand(struct domain *d, unsigned long gfn_l,
++                        unsigned int order)
+ {
+     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+     gfn_t gfn = _gfn(gfn_l);
+     unsigned long i, n, pod_count = 0;
+     int rc = 0;
+ 
+-    if ( !paging_mode_translate(d) )
+-        return -EINVAL;
+-
+     gfn_lock(p2m, gfn, order);
+ 
+     P2M_DEBUG("mark pod gfn=%#lx\n", gfn_l);
+@@ -1316,10 +1335,42 @@ guest_physmap_mark_populate_on_demand(st
+         BUG_ON(p2m->pod.entry_count < 0);
+         pod_unlock(p2m);
+     }
++    else if ( order )
++    {
++        /*
++         * If this failed, we can't tell how much of the range was changed.
++         * Best to crash the domain.
++         */
++        printk(XENLOG_G_ERR
++               "%pd: marking GFN %#lx (order %u) as PoD failed: %d\n",
++               d, gfn_l, order, rc);
++        domain_crash(d);
++    }
+ 
+ out:
+     gfn_unlock(p2m, gfn, order);
+ 
+     return rc;
+ }
++
++int
++guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
++                                      unsigned int order)
++{
++    unsigned long left = 1UL << order;
++    unsigned int chunk_order = find_first_set_bit(gfn | left);
++    int rc;
++
++    if ( !paging_mode_translate(d) )
++        return -EINVAL;
++
++    do {
++        rc = mark_populate_on_demand(d, gfn, chunk_order);
++
++        left -= 1UL << chunk_order;
++        gfn += 1UL << chunk_order;
++    } while ( !rc && left );
++
++    return rc;
++}
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa388-4.14-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa388-4.14-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa388-4.14-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa388-4.14-2.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,36 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/PoD: handle intermediate page orders in p2m_pod_cache_add()
+
+p2m_pod_decrease_reservation() may pass pages to the function which
+aren't 4k, 2M, or 1G. Handle all intermediate orders as well, to avoid
+hitting the BUG() at the switch() statement's "default" case.
+
+This is CVE-2021-28708 / part of XSA-388.
+
+Fixes: 3c352011c0d3 ("x86/PoD: shorten certain operations on higher order ranges")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/arch/x86/mm/p2m-pod.c
++++ b/xen/arch/x86/mm/p2m-pod.c
+@@ -111,15 +111,13 @@ p2m_pod_cache_add(struct p2m_domain *p2m
+     /* Then add to the appropriate populate-on-demand list. */
+     switch ( order )
+     {
+-    case PAGE_ORDER_1G:
+-        for ( i = 0; i < (1UL << PAGE_ORDER_1G); i += 1UL << PAGE_ORDER_2M )
++    case PAGE_ORDER_2M ... PAGE_ORDER_1G:
++        for ( i = 0; i < (1UL << order); i += 1UL << PAGE_ORDER_2M )
+             page_list_add_tail(page + i, &p2m->pod.super);
+         break;
+-    case PAGE_ORDER_2M:
+-        page_list_add_tail(page, &p2m->pod.super);
+-        break;
+-    case PAGE_ORDER_4K:
+-        page_list_add_tail(page, &p2m->pod.single);
++    case PAGE_ORDER_4K ... PAGE_ORDER_2M - 1:
++        for ( i = 0; i < (1UL << order); i += 1UL << PAGE_ORDER_4K )
++            page_list_add_tail(page + i, &p2m->pod.single);
+         break;
+     default:
+         BUG();
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa389-4.12.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa389-4.12.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa389-4.12.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa389-4.12.patch	2022-06-02 13:03:02.000000000 +0100
@@ -0,0 +1,178 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: x86/P2M: deal with partial success of p2m_set_entry()
+
+M2P and PoD stats need to remain in sync with P2M; if an update succeeds
+only partially, respective adjustments need to be made. If updates get
+made before the call, they may also need undoing upon complete failure
+(i.e. including the single-page case).
+
+Log-dirty state would better also be kept in sync.
+
+Note that the change to set_typed_p2m_entry() may not be strictly
+necessary (due to the order restriction enforced near the top of the
+function), but is being kept here to be on the safe side.
+
+This is CVE-2021-28705 and CVE-2021-28709 / XSA-389.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -780,6 +780,7 @@ p2m_remove_page(struct p2m_domain *p2m,
+     gfn_t gfn = _gfn(gfn_l);
+     p2m_type_t t;
+     p2m_access_t a;
++    int rc;
+ 
+     /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
+     if ( !paging_mode_translate(p2m->domain) )
+@@ -811,8 +812,27 @@ p2m_remove_page(struct p2m_domain *p2m,
+                 set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY);
+         }
+     }
+-    return p2m_set_entry(p2m, gfn, INVALID_MFN, page_order, p2m_invalid,
+-                         p2m->default_access);
++    rc = p2m_set_entry(p2m, gfn, INVALID_MFN, page_order, p2m_invalid,
++                       p2m->default_access);
++    if ( likely(!rc) || !mfn_valid(_mfn(mfn)) )
++        return rc;
++
++    /*
++     * The operation may have partially succeeded. For the failed part we need
++     * to undo the M2P update and, out of precaution, mark the pages dirty
++     * again.
++     */
++    for ( i = 0; i < (1UL << page_order); ++i )
++    {
++        p2m->get_entry(p2m, gfn_add(gfn, i), &t, &a, 0, NULL, NULL);
++        if ( !p2m_is_hole(t) && !p2m_is_special(t) && !p2m_is_shared(t) )
++        {
++            set_gpfn_from_mfn(mfn + i, gfn_l + i);
++            paging_mark_pfn_dirty(p2m->domain, _pfn(gfn_l + i));
++        }
++    }
++
++    return rc;
+ }
+ 
+ int
+@@ -980,13 +1000,8 @@ guest_physmap_add_entry(struct domain *d
+ 
+     /* Now, actually do the two-way mapping */
+     rc = p2m_set_entry(p2m, gfn, mfn, page_order, t, p2m->default_access);
+-    if ( rc == 0 )
++    if ( likely(!rc) )
+     {
+-        pod_lock(p2m);
+-        p2m->pod.entry_count -= pod_count;
+-        BUG_ON(p2m->pod.entry_count < 0);
+-        pod_unlock(p2m);
+-
+         if ( !p2m_is_grant(t) )
+         {
+             for ( i = 0; i < (1UL << page_order); i++ )
+@@ -996,6 +1009,42 @@ guest_physmap_add_entry(struct domain *d
+                                   gfn_x(gfn_add(gfn, i)));
+         }
+     }
++    else
++    {
++        /*
++         * The operation may have partially succeeded. For the successful part
++         * we need to update M2P and dirty state, while for the failed part we
++         * may need to adjust PoD stats as well as undo the earlier M2P update.
++         */
++        for ( i = 0; i < (1UL << page_order); ++i )
++        {
++            omfn = p2m->get_entry(p2m, gfn_add(gfn, i), &ot, &a, 0, NULL, NULL);
++            if ( p2m_is_pod(ot) )
++            {
++                BUG_ON(!pod_count);
++                --pod_count;
++            }
++            else if ( mfn_eq(omfn, mfn_add(mfn, i)) && ot == t &&
++                      a == p2m->default_access && !p2m_is_grant(t) )
++            {
++                set_gpfn_from_mfn(mfn_x(omfn), gfn_x(gfn) + i);
++                paging_mark_pfn_dirty(d, _pfn(gfn_x(gfn) + i));
++            }
++            else if ( p2m_is_ram(ot) && !p2m_is_paged(ot) )
++            {
++                ASSERT(mfn_valid(omfn));
++                set_gpfn_from_mfn(mfn_x(omfn), gfn_x(gfn) + i);
++            }
++        }
++    }
++
++    if ( pod_count )
++    {
++        pod_lock(p2m);
++        p2m->pod.entry_count -= pod_count;
++        BUG_ON(p2m->pod.entry_count < 0);
++        pod_unlock(p2m);
++    }
+ 
+  out:
+     p2m_unlock(p2m);
+@@ -1278,6 +1329,47 @@ static int set_typed_p2m_entry(struct do
+         domain_crash(d);
+         return -EPERM;
+     }
++
++    P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn_l, mfn_x(mfn));
++    rc = p2m_set_entry(p2m, gfn, mfn, order, gfn_p2mt, access);
++    if ( unlikely(rc) )
++    {
++        gdprintk(XENLOG_ERR, "p2m_set_entry: %#lx:%u -> %d (0x%"PRI_mfn")\n",
++                 gfn_l, order, rc, mfn_x(mfn));
++
++        /*
++         * The operation may have partially succeeded. For the successful part
++         * we need to update PoD stats, M2P, and dirty state.
++         */
++        if ( order != PAGE_ORDER_4K )
++        {
++            unsigned long i;
++
++            for ( i = 0; i < (1UL << order); ++i )
++            {
++                p2m_type_t t;
++                mfn_t cmfn = p2m->get_entry(p2m, gfn_add(gfn, i), &t, &a, 0,
++                                            NULL, NULL);
++
++                if ( !mfn_eq(cmfn, mfn_add(mfn, i)) || t != gfn_p2mt ||
++                     a != access )
++                    continue;
++
++                if ( p2m_is_ram(ot) )
++                {
++                    ASSERT(mfn_valid(mfn_add(omfn, i)));
++                    set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
++                }
++                else if ( p2m_is_pod(ot) )
++                {
++                    pod_lock(p2m);
++                    BUG_ON(!p2m->pod.entry_count);
++                    --p2m->pod.entry_count;
++                    pod_unlock(p2m);
++                }
++            }
++        }
++    }
+     else if ( p2m_is_ram(ot) )
+     {
+         unsigned long i;
+@@ -1288,12 +1381,6 @@ static int set_typed_p2m_entry(struct do
+             set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
+         }
+     }
+-
+-    P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn_l, mfn_x(mfn));
+-    rc = p2m_set_entry(p2m, gfn, mfn, order, gfn_p2mt, access);
+-    if ( rc )
+-        gdprintk(XENLOG_ERR, "p2m_set_entry: %#lx:%u -> %d (0x%"PRI_mfn")\n",
+-                 gfn_l, order, rc, mfn_x(mfn));
+     else if ( p2m_is_pod(ot) )
+     {
+         pod_lock(p2m);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa394-4.12.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa394-4.12.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa394-4.12.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa394-4.12.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,59 @@
+From 604fb691eee5bbeba770126451d880b932565e65 Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Wed, 5 Jan 2022 17:55:48 +0000
+Subject: [PATCH] xen/grant-table: Only decrement the refcounter when grant is
+ fully unmapped
+
+The grant unmapping hypercall (GNTTABOP_unmap_grant_ref) is not a
+simple revert of the changes done by the grant mapping hypercall
+(GNTTABOP_map_grant_ref).
+
+Instead, it is possible to partially (or even not) clear some flags.
+This will leave the grant is mapped until a future call where all
+the flags would be cleared.
+
+XSA-380 introduced a refcounting that is meant to only be dropped
+when the grant is fully unmapped. Unfortunately, unmap_common() will
+decrement the refcount for every successful call.
+
+A consequence is a domain would be able to underflow the refcount
+and trigger a BUG().
+
+Looking at the code, it is not clear to me why a domain would
+want to partially clear some flags in the grant-table. But as
+this is part of the ABI, it is better to not change the behavior
+for now.
+
+Fix it by checking if the maptrack handle has been released before
+decrementing the refcounting.
+
+This is CVE-2022-23034 / XSA-394.
+
+Fixes: 9781b51efde2 ("gnttab: replace mapkind()")
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/common/grant_table.c | 7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
+index ee5748e74eb9..61d29df7bdf6 100644
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -1402,7 +1402,12 @@ unmap_common(
+     if ( put_handle )
+         put_maptrack_handle(lgt, op->handle);
+ 
+-    if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
++    /*
++     * map_grant_ref() will only increment the refcount (and update the
++     * IOMMU) once per mapping. So we only want to decrement it once the
++     * maptrack handle has been put, alongside the further IOMMU update.
++     */
++    if ( put_handle && gnttab_need_iommu_mapping(ld) )
+     {
+         void **slot;
+         union maptrack_node node;
+-- 
+2.32.0
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa395-4.14.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa395-4.14.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa395-4.14.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa395-4.14.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,42 @@
+From 743348f5d545c7fff9cdea746840b795f5c26d43 Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Wed, 5 Jan 2022 18:09:39 +0000
+Subject: [PATCH] passthrough/x86: stop pirq iteration immediately in case of
+ error
+
+pt_pirq_iterate() will iterate in batch over all the PIRQs. The outer
+loop will bail out if 'rc' is non-zero but the inner loop will continue.
+
+This means 'rc' will get clobbered and we may miss any errors (such as
+-ERESTART in the case of the callback pci_clean_dpci_irq()).
+
+This is CVE-2022-23035 / XSA-395.
+
+Fixes: c24536b636f2 ("replace d->nr_pirqs sized arrays with radix tree")
+Fixes: f6dd295381f4 ("dpci: replace tasklet with softirq")
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+---
+ xen/drivers/passthrough/io.c | 4 ++++
+ 1 file changed, 4 insertions(+)
+
+diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
+index 71eaf2c17e27..b6e88ebc8646 100644
+--- a/xen/drivers/passthrough/io.c
++++ b/xen/drivers/passthrough/io.c
+@@ -810,7 +810,11 @@ int pt_pirq_iterate(struct domain *d,
+ 
+             pirq = pirqs[i]->pirq;
+             if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
++            {
+                 rc = cb(d, pirq_dpci, arg);
++                if ( rc )
++                    break;
++            }
+         }
+     } while ( !rc && ++pirq < d->nr_pirqs && n == ARRAY_SIZE(pirqs) );
+ 
+-- 
+2.32.0
+
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa397-4.12.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa397-4.12.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa397-4.12.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa397-4.12.patch	2022-05-26 17:34:25.000000000 +0100
@@ -0,0 +1,98 @@
+From: Roger Pau Monne <roger.pau@citrix.com>
+Subject: x86/hap: do not switch on log dirty for VRAM tracking
+
+XEN_DMOP_track_dirty_vram possibly calls into paging_log_dirty_enable
+when using HAP mode, and it can interact badly with other ongoing
+paging domctls, as XEN_DMOP_track_dirty_vram is not holding the domctl
+lock.
+
+This was detected as a result of the following assert triggering when
+doing repeated migrations of a HAP HVM domain with a stubdom:
+
+Assertion 'd->arch.paging.log_dirty.allocs == 0' failed at paging.c:198
+----[ Xen-4.17-unstable  x86_64  debug=y  Not tainted ]----
+CPU:    34
+RIP:    e008:[<ffff82d040314b3b>] arch/x86/mm/paging.c#paging_free_log_dirty_bitmap+0x606/0x6
+RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v23)
+[...]
+Xen call trace:
+   [<ffff82d040314b3b>] R arch/x86/mm/paging.c#paging_free_log_dirty_bitmap+0x606/0x63a
+   [<ffff82d040279f96>] S xsm/flask/hooks.c#domain_has_perm+0x5a/0x67
+   [<ffff82d04031577f>] F paging_domctl+0x251/0xd41
+   [<ffff82d04031640c>] F paging_domctl_continuation+0x19d/0x202
+   [<ffff82d0403202fa>] F pv_hypercall+0x150/0x2a7
+   [<ffff82d0403a729d>] F lstar_enter+0x12d/0x140
+
+Such assert triggered because the stubdom used
+XEN_DMOP_track_dirty_vram while dom0 was in the middle of executing
+XEN_DOMCTL_SHADOW_OP_OFF, and so log dirty become enabled while
+retiring the old structures, thus leading to new entries being
+populated in already clear slots.
+
+Fix this by not enabling log dirty for VRAM tracking, similar to what
+is done when using shadow instead of HAP. Call
+p2m_enable_hardware_log_dirty when enabling VRAM tracking in order to
+get some hardware assistance if available. As a side effect the memory
+pressure on the p2m pool should go down if only VRAM tracking is
+enabled, as the dirty bitmap is no longer allocated.
+
+Note that paging_log_dirty_range (used to get the dirty bitmap for
+VRAM tracking) doesn't use the log dirty bitmap, and instead relies on
+checking whether each gfn on the range has been switched from
+p2m_ram_logdirty to p2m_ram_rw in order to account for dirty pages.
+
+This is CVE-2022-26356 / XSA-397.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+--- a/xen/include/asm-x86/paging.h
++++ b/xen/include/asm-x86/paging.h
+@@ -144,9 +144,6 @@ void paging_log_dirty_range(struct domai
+                             unsigned long nr,
+                             uint8_t *dirty_bitmap);
+ 
+-/* enable log dirty */
+-int paging_log_dirty_enable(struct domain *d, bool_t log_global);
+-
+ /* log dirty initialization */
+ void paging_log_dirty_init(struct domain *d, const struct log_dirty_ops *ops);
+ 
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -69,13 +69,6 @@ int hap_track_dirty_vram(struct domain *
+     {
+         int size = (nr + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
+ 
+-        if ( !paging_mode_log_dirty(d) )
+-        {
+-            rc = paging_log_dirty_enable(d, 0);
+-            if ( rc )
+-                goto out;
+-        }
+-
+         rc = -ENOMEM;
+         dirty_bitmap = vzalloc(size);
+         if ( !dirty_bitmap )
+@@ -107,6 +100,10 @@ int hap_track_dirty_vram(struct domain *
+ 
+             paging_unlock(d);
+ 
++            domain_pause(d);
++            p2m_enable_hardware_log_dirty(d);
++            domain_unpause(d);
++
+             if ( oend > ostart )
+                 p2m_change_type_range(d, ostart, oend,
+                                       p2m_ram_logdirty, p2m_ram_rw);
+--- a/xen/arch/x86/mm/paging.c
++++ b/xen/arch/x86/mm/paging.c
+@@ -209,7 +209,7 @@ static int paging_free_log_dirty_bitmap(
+     return rc;
+ }
+ 
+-int paging_log_dirty_enable(struct domain *d, bool_t log_global)
++static int paging_log_dirty_enable(struct domain *d, bool_t log_global)
+ {
+     int ret;
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-1-xen-arm-Introduce-new-Arm-processors.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-1-xen-arm-Introduce-new-Arm-processors.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-1-xen-arm-Introduce-new-Arm-processors.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-1-xen-arm-Introduce-new-Arm-processors.patch	2022-06-02 13:15:13.000000000 +0100
@@ -0,0 +1,63 @@
+From f1346b2cfdbeb468b50be7b6f7aa38ce3c1acf2a Mon Sep 17 00:00:00 2001
+From: Bertrand Marquis <bertrand.marquis@arm.com>
+Date: Tue, 15 Feb 2022 10:37:51 +0000
+Subject: xen/arm: Introduce new Arm processors
+
+Add some new processor identifiers in processor.h and sync Xen
+definitions with status of Linux 5.17 (declared in
+arch/arm64/include/asm/cputype.h).
+
+This is part of XSA-398 / CVE-2022-23960.
+
+Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
+Acked-by: Julien Grall <julien@xen.org>
+(cherry picked from commit 35d1b85a6b43483f6bd007d48757434e54743e98)
+
+diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
+index 0f35ec59d15e..cd45fba9786f 100644
+--- a/xen/include/asm-arm/processor.h
++++ b/xen/include/asm-arm/processor.h
+@@ -48,19 +48,43 @@
+ #define ARM_CPU_PART_CORTEX_A17     0xC0E
+ #define ARM_CPU_PART_CORTEX_A15     0xC0F
+ #define ARM_CPU_PART_CORTEX_A53     0xD03
++#define ARM_CPU_PART_CORTEX_A35     0xD04
++#define ARM_CPU_PART_CORTEX_A55     0xD05
+ #define ARM_CPU_PART_CORTEX_A57     0xD07
+ #define ARM_CPU_PART_CORTEX_A72     0xD08
+ #define ARM_CPU_PART_CORTEX_A73     0xD09
+ #define ARM_CPU_PART_CORTEX_A75     0xD0A
++#define ARM_CPU_PART_CORTEX_A76     0xD0B
++#define ARM_CPU_PART_NEOVERSE_N1    0xD0C
++#define ARM_CPU_PART_CORTEX_A77     0xD0D
++#define ARM_CPU_PART_NEOVERSE_V1    0xD40
++#define ARM_CPU_PART_CORTEX_A78     0xD41
++#define ARM_CPU_PART_CORTEX_X1      0xD44
++#define ARM_CPU_PART_CORTEX_A710    0xD47
++#define ARM_CPU_PART_CORTEX_X2      0xD48
++#define ARM_CPU_PART_NEOVERSE_N2    0xD49
++#define ARM_CPU_PART_CORTEX_A78C    0xD4B
+ 
+ #define MIDR_CORTEX_A12 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A12)
+ #define MIDR_CORTEX_A17 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A17)
+ #define MIDR_CORTEX_A15 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A15)
+ #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
++#define MIDR_CORTEX_A35 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A35)
++#define MIDR_CORTEX_A55 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A55)
+ #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
+ #define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72)
+ #define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
+ #define MIDR_CORTEX_A75 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A75)
++#define MIDR_CORTEX_A76 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A76)
++#define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N1)
++#define MIDR_CORTEX_A77 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77)
++#define MIDR_NEOVERSE_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_V1)
++#define MIDR_CORTEX_A78 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78)
++#define MIDR_CORTEX_X1  MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_X1)
++#define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710)
++#define MIDR_CORTEX_X2  MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_X2)
++#define MIDR_NEOVERSE_N2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N2)
++#define MIDR_CORTEX_A78C MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78C)
+ 
+ /* MPIDR Multiprocessor Affinity Register */
+ #define _MPIDR_UP           (30)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-2-xen-arm-move-errata-CSV2-check-earlier.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-2-xen-arm-move-errata-CSV2-check-earlier.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-2-xen-arm-move-errata-CSV2-check-earlier.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-2-xen-arm-move-errata-CSV2-check-earlier.patch	2022-05-26 17:34:26.000000000 +0100
@@ -0,0 +1,53 @@
+From 35164a1704fe13e1f83dbd4b5b79838f07d564c6 Mon Sep 17 00:00:00 2001
+From: Bertrand Marquis <bertrand.marquis@arm.com>
+Date: Tue, 15 Feb 2022 10:39:47 +0000
+Subject: xen/arm: move errata CSV2 check earlier
+
+CSV2 availability check is done after printing to the user that
+workaround 1 will be used. Move the check before to prevent saying to the
+user that workaround 1 is used when it is not because it is not needed.
+This will also allow to reuse install_bp_hardening_vec function for
+other use cases.
+
+Code previously returning "true", now returns "0" to conform to
+enable_smccc_arch_workaround_1 returning an int and surrounding code
+doing a "return 0" if workaround is not needed.
+
+This is part of XSA-398 / CVE-2022-23960.
+
+Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
+Reviewed-by: Julien Grall <julien@xen.org>
+(cherry picked from commit 599616d70eb886b9ad0ef9d6b51693ce790504ba)
+
+diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
+index b254b9865783..9e1ecd071470 100644
+--- a/xen/arch/arm/cpuerrata.c
++++ b/xen/arch/arm/cpuerrata.c
+@@ -102,13 +102,6 @@ install_bp_hardening_vec(const struct arm_cpu_capabilities *entry,
+     printk(XENLOG_INFO "CPU%u will %s on exception entry\n",
+            smp_processor_id(), desc);
+ 
+-    /*
+-     * No need to install hardened vector when the processor has
+-     * ID_AA64PRF0_EL1.CSV2 set.
+-     */
+-    if ( cpu_data[smp_processor_id()].pfr64.csv2 )
+-        return true;
+-
+     spin_lock(&bp_lock);
+ 
+     /*
+@@ -167,6 +160,13 @@ static int enable_smccc_arch_workaround_1(void *data)
+     if ( !entry->matches(entry) )
+         return 0;
+ 
++    /*
++     * No need to install hardened vector when the processor has
++     * ID_AA64PRF0_EL1.CSV2 set.
++     */
++    if ( cpu_data[smp_processor_id()].pfr64.csv2 )
++        return 0;
++
+     if ( smccc_ver < SMCCC_VERSION(1, 1) )
+         goto warn;
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-3-xen-arm-Add-ECBHB-and-CLEARBHB-ID-fields.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-3-xen-arm-Add-ECBHB-and-CLEARBHB-ID-fields.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-3-xen-arm-Add-ECBHB-and-CLEARBHB-ID-fields.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-3-xen-arm-Add-ECBHB-and-CLEARBHB-ID-fields.patch	2022-06-05 19:16:55.000000000 +0100
@@ -0,0 +1,76 @@
+From 2e519fd8c1e3e7ae5370a6638615d2a52169db28 Mon Sep 17 00:00:00 2001
+From: Bertrand Marquis <bertrand.marquis@arm.com>
+Date: Wed, 23 Feb 2022 09:42:18 +0000
+Subject: xen/arm: Add ECBHB and CLEARBHB ID fields
+
+Introduce ID coprocessor register ID_AA64ISAR2_EL1.
+Add definitions in cpufeature and sysregs of ECBHB field in mmfr1 and
+CLEARBHB in isar2 ID coprocessor registers.
+
+This is part of XSA-398 / CVE-2022-23960.
+
+Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
+Acked-by: Julien Grall <julien@xen.org>
+(cherry picked from commit 4b68d12d98b8790d8002fcc2c25a9d713374a4d7)
+
+diff --git a/xen/arch/arm/cpu.c b/xen/arch/arm/cpu.c
+index 44126dbf0723..13dac7ccaf94 100644
+--- a/xen/arch/arm/cpu.c
++++ b/xen/arch/arm/cpu.c
+@@ -36,6 +36,7 @@ void identify_cpu(struct cpuinfo_arm *c)
+ 
+         c->isa64.bits[0] = READ_SYSREG64(ID_AA64ISAR0_EL1);
+         c->isa64.bits[1] = READ_SYSREG64(ID_AA64ISAR1_EL1);
++        c->isa64.bits[2] = READ_SYSREG64(ID_AA64ISAR2_EL1);
+ #endif
+ 
+         c->pfr32.bits[0] = READ_SYSREG32(ID_PFR0_EL1);
+diff --git a/xen/include/asm-arm/arm64/sysregs.h b/xen/include/asm-arm/arm64/sysregs.h
+index 08585a969ebd..5f1e9b998f33 100644
+--- a/xen/include/asm-arm/arm64/sysregs.h
++++ b/xen/include/asm-arm/arm64/sysregs.h
+@@ -166,6 +166,10 @@
+ #define ICH_AP1R2_EL2             __AP1Rx_EL2(2)
+ #define ICH_AP1R3_EL2             __AP1Rx_EL2(3)
+ 
++#ifndef ID_AA64ISAR2_EL1
++#define ID_AA64ISAR2_EL1            S3_0_C0_C6_2
++#endif
++
+ #endif /* _ASM_ARM_ARM64_SYSREGS_H */
+ 
+ /*
+diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
+index 60e677d84200..c748fc17fe66 100644
+--- a/xen/include/asm-arm/processor.h
++++ b/xen/include/asm-arm/processor.h
+@@ -425,12 +425,26 @@ struct cpuinfo_arm {
+             unsigned long lo:4;
+             unsigned long pan:4;
+             unsigned long __res1:8;
+-            unsigned long __res2:32;
++            unsigned long __res2:28;
++            unsigned long ecbhb:4;
+         };
+     } mm64;
+ 
+-    struct {
+-        uint64_t bits[2];
++    union {
++        uint64_t bits[3];
++        struct {
++            /* ISAR0 */
++            unsigned long __res0:64;
++
++            /* ISAR1 */
++            unsigned long __res1:64;
++
++            /* ISAR2 */
++            unsigned long __res3:28;
++            unsigned long clearbhb:4;
++
++            unsigned long __res4:32;
++        };
+     } isa64;
+ 
+ #endif
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-4-xen-arm-Add-Spectre-BHB-handling.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-4-xen-arm-Add-Spectre-BHB-handling.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-4-xen-arm-Add-Spectre-BHB-handling.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-4-xen-arm-Add-Spectre-BHB-handling.patch	2022-06-19 23:22:25.000000000 +0100
@@ -0,0 +1,358 @@
+From d340fad8be324e1760ea29d7c25658a8aec83306 Mon Sep 17 00:00:00 2001
+From: Rahul Singh <rahul.singh@arm.com>
+Date: Mon, 14 Feb 2022 18:47:32 +0000
+Subject: xen/arm: Add Spectre BHB handling
+
+This commit is adding Spectre BHB handling to Xen on Arm.
+The commit is introducing new alternative code to be executed during
+exception entry:
+- SMCC workaround 3 call
+- loop workaround (with 8, 24 or 32 iterations)
+- use of new clearbhb instruction
+
+Cpuerrata is modified by this patch to apply the required workaround for
+CPU affected by Spectre BHB when CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR is
+enabled.
+
+To do this the system previously used to apply smcc workaround 1 is
+reused and new alternative code to be copied in the exception handler is
+introduced.
+
+To define the type of workaround required by a processor, 4 new cpu
+capabilities are introduced (for each number of loop and for smcc
+workaround 3).
+
+When a processor is affected, enable_spectre_bhb_workaround is called
+and if the processor does not have CSV2 set to 3 or ECBHB feature (which
+would mean that the processor is doing what is required in hardware),
+the proper code is enabled at exception entry.
+
+In the case where workaround 3 is not supported by the firmware, we
+enable workaround 1 when possible as it will also mitigate Spectre BHB
+on systems without CSV2.
+
+This is part of XSA-398 / CVE-2022-23960.
+
+Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
+Signed-off-by: Rahul Singh <rahul.singh@arm.com>
+Acked-by: Julien Grall <julien@xen.org>
+(cherry picked from commit 62c91eb66a2904eefb1d1d9642e3697a1e3c3a3c)
+
+diff --git a/xen/arch/arm/arm64/bpi.S b/xen/arch/arm/arm64/bpi.S
+index d8743d955c4a..4e6382522048 100644
+--- a/xen/arch/arm/arm64/bpi.S
++++ b/xen/arch/arm/arm64/bpi.S
+@@ -16,6 +16,7 @@
+  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+  */
+ 
++#include <asm/macros.h>
+ #include <asm/smccc.h>
+ 
+ .macro ventry target
+@@ -58,16 +58,42 @@ ENTRY(__bp_harden_hyp_vecs_start)
+     .endr
+ ENTRY(__bp_harden_hyp_vecs_end)
+ 
+-ENTRY(__smccc_workaround_1_smc_start)
++.macro mitigate_spectre_bhb_loop count
++ENTRY(__mitigate_spectre_bhb_loop_start_\count)
++    stp     x0, x1, [sp, #-16]!
++    mov     x0, \count
++.Lspectre_bhb_loop\@:
++    b   . + 4
++    subs    x0, x0, #1
++    b.ne    .Lspectre_bhb_loop\@
++    sb
++    ldp     x0, x1, [sp], #16
++ENTRY(__mitigate_spectre_bhb_loop_end_\count)
++.endm
++
++.macro smccc_workaround num smcc_id
++ENTRY(__smccc_workaround_smc_start_\num)
+     sub     sp, sp, #(8 * 4)
+     stp     x0, x1, [sp, #(8 * 2)]
+     stp     x2, x3, [sp, #(8 * 0)]
+-    mov     w0, #ARM_SMCCC_ARCH_WORKAROUND_1_FID
++    mov     w0, \smcc_id
+     smc     #0
+     ldp     x2, x3, [sp, #(8 * 0)]
+     ldp     x0, x1, [sp, #(8 * 2)]
+     add     sp, sp, #(8 * 4)
+-ENTRY(__smccc_workaround_1_smc_end)
++ENTRY(__smccc_workaround_smc_end_\num)
++.endm
++
++ENTRY(__mitigate_spectre_bhb_clear_insn_start)
++    clearbhb
++    isb
++ENTRY(__mitigate_spectre_bhb_clear_insn_end)
++
++mitigate_spectre_bhb_loop 8
++mitigate_spectre_bhb_loop 24
++mitigate_spectre_bhb_loop 32
++smccc_workaround 1, #ARM_SMCCC_ARCH_WORKAROUND_1_FID
++smccc_workaround 3, #ARM_SMCCC_ARCH_WORKAROUND_3_FID
+ 
+ /*
+  * Local variables:
+diff --git a/xen/include/asm-arm/arm64/macros.h b/xen/include/asm-arm/arm64/macros.h
+index 9c5e676b37..a13ad8e2b1 100644
+--- a/xen/include/asm-arm/arm64/macros.h
++++ b/xen/include/asm-arm/arm64/macros.h
+@@ -21,5 +21,10 @@
+     ldr     \dst, [\dst, \tmp]
+     .endm
+ 
++    /* clearbhb instruction clearing the branch history */
++    .macro clearbhb
++        hint    #22
++    .endm
++
+ #endif /* __ASM_ARM_ARM64_MACROS_H */
+ 
+diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
+index 9e1ecd071470..d70d1e16e946 100644
+--- a/xen/arch/arm/cpuerrata.c
++++ b/xen/arch/arm/cpuerrata.c
+@@ -142,7 +142,16 @@ install_bp_hardening_vec(const struct arm_cpu_capabilities *entry,
+     return ret;
+ }
+ 
+-extern char __smccc_workaround_1_smc_start[], __smccc_workaround_1_smc_end[];
++extern char __smccc_workaround_smc_start_1[], __smccc_workaround_smc_end_1[];
++extern char __smccc_workaround_smc_start_3[], __smccc_workaround_smc_end_3[];
++extern char __mitigate_spectre_bhb_clear_insn_start[],
++            __mitigate_spectre_bhb_clear_insn_end[];
++extern char __mitigate_spectre_bhb_loop_start_8[],
++            __mitigate_spectre_bhb_loop_end_8[];
++extern char __mitigate_spectre_bhb_loop_start_24[],
++            __mitigate_spectre_bhb_loop_end_24[];
++extern char __mitigate_spectre_bhb_loop_start_32[],
++            __mitigate_spectre_bhb_loop_end_32[];
+ 
+ static int enable_smccc_arch_workaround_1(void *data)
+ {
+@@ -174,8 +183,8 @@ static int enable_smccc_arch_workaround_1(void *data)
+     if ( (int)res.a0 < 0 )
+         goto warn;
+ 
+-    return !install_bp_hardening_vec(entry,__smccc_workaround_1_smc_start,
+-                                     __smccc_workaround_1_smc_end,
++    return !install_bp_hardening_vec(entry,__smccc_workaround_smc_start_1,
++                                     __smccc_workaround_smc_end_1,
+                                      "call ARM_SMCCC_ARCH_WORKAROUND_1");
+ 
+ warn:
+@@ -190,6 +199,93 @@ static int enable_smccc_arch_workaround_1(void *data)
+     return 0;
+ }
+ 
++/*
++ * Spectre BHB Mitigation
++ *
++ * CPU is either:
++ * - Having CVS2.3 so it is not affected.
++ * - Having ECBHB and is clearing the branch history buffer when an exception
++ *   to a different exception level is happening so no mitigation is needed.
++ * - Mitigating using a loop on exception entry (number of loop depending on
++ *   the CPU).
++ * - Mitigating using the firmware.
++ */
++static int enable_spectre_bhb_workaround(void *data)
++{
++    const struct arm_cpu_capabilities *entry = data;
++
++    /*
++     * Enable callbacks are called on every CPU based on the capabilities, so
++     * double-check whether the CPU matches the entry.
++     */
++    if ( !entry->matches(entry) )
++        return 0;
++
++    if ( cpu_data[smp_processor_id()].pfr64.csv2 == 3 )
++        return 0;
++
++    if ( cpu_data[smp_processor_id()].mm64.ecbhb )
++        return 0;
++
++    if ( cpu_data[smp_processor_id()].isa64.clearbhb )
++        return !install_bp_hardening_vec(entry,
++                                    __mitigate_spectre_bhb_clear_insn_start,
++                                    __mitigate_spectre_bhb_clear_insn_end,
++                                     "use clearBHB instruction");
++
++    /* Apply solution depending on hwcaps set on arm_errata */
++    if ( cpus_have_cap(ARM_WORKAROUND_BHB_LOOP_8) )
++        return !install_bp_hardening_vec(entry,
++                                         __mitigate_spectre_bhb_loop_start_8,
++                                         __mitigate_spectre_bhb_loop_end_8,
++                                         "use 8 loops workaround");
++
++    if ( cpus_have_cap(ARM_WORKAROUND_BHB_LOOP_24) )
++        return !install_bp_hardening_vec(entry,
++                                         __mitigate_spectre_bhb_loop_start_24,
++                                         __mitigate_spectre_bhb_loop_end_24,
++                                         "use 24 loops workaround");
++
++    if ( cpus_have_cap(ARM_WORKAROUND_BHB_LOOP_32) )
++        return !install_bp_hardening_vec(entry,
++                                         __mitigate_spectre_bhb_loop_start_32,
++                                         __mitigate_spectre_bhb_loop_end_32,
++                                         "use 32 loops workaround");
++
++    if ( cpus_have_cap(ARM_WORKAROUND_BHB_SMCC_3) )
++    {
++        struct arm_smccc_res res;
++
++        if ( smccc_ver < SMCCC_VERSION(1, 1) )
++            goto warn;
++
++        arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FID,
++                          ARM_SMCCC_ARCH_WORKAROUND_3_FID, &res);
++        /* The return value is in the lower 32-bits. */
++        if ( (int)res.a0 < 0 )
++        {
++            /*
++             * On processor affected with CSV2=0, workaround 1 will mitigate
++             * both Spectre v2 and BHB so use it when available
++             */
++            if ( enable_smccc_arch_workaround_1(data) )
++                return 1;
++
++            goto warn;
++        }
++
++        return !install_bp_hardening_vec(entry,__smccc_workaround_smc_start_3,
++                                         __smccc_workaround_smc_end_3,
++                                         "call ARM_SMCCC_ARCH_WORKAROUND_3");
++    }
++
++warn:
++    printk_once("**** No support for any spectre BHB workaround.  ****\n"
++                "**** Please update your firmware.                ****\n");
++
++    return 0;
++}
++
+ #endif /* CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR */
+ 
+ /* Hardening Branch predictor code for Arm32 */
+@@ -449,19 +545,77 @@ static const struct arm_cpu_capabilities arm_errata[] = {
+     },
+     {
+         .capability = ARM_HARDEN_BRANCH_PREDICTOR,
+-        MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
++        MIDR_RANGE(MIDR_CORTEX_A72, 0, 1 << MIDR_VARIANT_SHIFT),
+         .enable = enable_smccc_arch_workaround_1,
+     },
+     {
+-        .capability = ARM_HARDEN_BRANCH_PREDICTOR,
++        .capability = ARM_WORKAROUND_BHB_SMCC_3,
+         MIDR_ALL_VERSIONS(MIDR_CORTEX_A73),
+-        .enable = enable_smccc_arch_workaround_1,
++        .enable = enable_spectre_bhb_workaround,
+     },
+     {
+-        .capability = ARM_HARDEN_BRANCH_PREDICTOR,
++        .capability = ARM_WORKAROUND_BHB_SMCC_3,
+         MIDR_ALL_VERSIONS(MIDR_CORTEX_A75),
+-        .enable = enable_smccc_arch_workaround_1,
++        .enable = enable_spectre_bhb_workaround,
++    },
++    /* spectre BHB */
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_8,
++        MIDR_RANGE(MIDR_CORTEX_A72, 1 << MIDR_VARIANT_SHIFT,
++                   (MIDR_VARIANT_MASK | MIDR_REVISION_MASK)),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_24,
++        MIDR_ALL_VERSIONS(MIDR_CORTEX_A76),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_24,
++        MIDR_ALL_VERSIONS(MIDR_CORTEX_A77),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_32,
++        MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_32,
++        MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_32,
++        MIDR_ALL_VERSIONS(MIDR_CORTEX_X1),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_32,
++        MIDR_ALL_VERSIONS(MIDR_CORTEX_X2),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_32,
++        MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
++        .enable = enable_spectre_bhb_workaround,
+     },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_24,
++        MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_32,
++        MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
++        .enable = enable_spectre_bhb_workaround,
++    },
++    {
++        .capability = ARM_WORKAROUND_BHB_LOOP_32,
++        MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
++        .enable = enable_spectre_bhb_workaround,
++    },
++
+ #endif
+ #ifdef CONFIG_ARM32_HARDEN_BRANCH_PREDICTOR
+     {
+diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
+index c748fc17fe66..87989eac6fc2 100644
+--- a/xen/include/asm-arm/cpufeature.h
++++ b/xen/include/asm-arm/cpufeature.h
+@@ -44,8 +44,12 @@
+ #define SKIP_CTXT_SWITCH_SERROR_SYNC 6
+ #define ARM_HARDEN_BRANCH_PREDICTOR 7
+ #define ARM_SSBD 8
++#define ARM_WORKAROUND_BHB_LOOP_8 9
++#define ARM_WORKAROUND_BHB_LOOP_24 10
++#define ARM_WORKAROUND_BHB_LOOP_32 11
++#define ARM_WORKAROUND_BHB_SMCC_3 12
+ 
+-#define ARM_NCAPS           9
++#define ARM_NCAPS           13
+ 
+ #ifndef __ASSEMBLY__
+ 
+diff --git a/xen/include/asm-arm/smccc.h b/xen/include/asm-arm/smccc.h
+index 126399dd7088..2abbffc3bd8a 100644
+--- a/xen/include/asm-arm/smccc.h
++++ b/xen/include/asm-arm/smccc.h
+@@ -274,6 +274,12 @@ void __arm_smccc_1_0_smc(register_t a0, register_t a1, register_t a2,
+                        ARM_SMCCC_OWNER_ARCH,        \
+                        0x7FFF)
+
++#define ARM_SMCCC_ARCH_WORKAROUND_3_FID             \
++    ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,         \
++                       ARM_SMCCC_CONV_32,           \
++                       ARM_SMCCC_OWNER_ARCH,        \
++                       0x3FFF)
++
+ /* SMCCC error codes */
+ #define ARM_SMCCC_NOT_REQUIRED          (-2)
+ #define ARM_SMCCC_ERR_UNKNOWN_FUNCTION  (-1)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-5-xen-arm-Allow-to-discover-and-use-SMCCC_ARCH_WORKARO.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-5-xen-arm-Allow-to-discover-and-use-SMCCC_ARCH_WORKARO.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-5-xen-arm-Allow-to-discover-and-use-SMCCC_ARCH_WORKARO.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-5-xen-arm-Allow-to-discover-and-use-SMCCC_ARCH_WORKARO.patch	2022-05-26 17:34:26.000000000 +0100
@@ -0,0 +1,91 @@
+From 21f5a7b22687aa1e384782c8a1c04148f288ad9f Mon Sep 17 00:00:00 2001
+From: Bertrand Marquis <bertrand.marquis@arm.com>
+Date: Thu, 17 Feb 2022 14:52:54 +0000
+Subject: xen/arm: Allow to discover and use SMCCC_ARCH_WORKAROUND_3
+
+Allow guest to discover whether or not SMCCC_ARCH_WORKAROUND_3 is
+supported and create a fastpath in the code to handle guests request to
+do the workaround.
+
+The function SMCCC_ARCH_WORKAROUND_3 will be called by the guest for
+flushing the branch history. So we want the handling to be as fast as
+possible.
+
+As the mitigation is applied on every guest exit, we can check for the
+call before saving all context and return very early.
+
+This is part of XSA-398 / CVE-2022-23960.
+
+Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
+Reviewed-by: Julien Grall <julien@xen.org>
+(cherry picked from commit c0a56ea0fd92ecb471936b7355ddbecbaea3707c)
+
+diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
+index 97bd06217bcd..788d0a1912f0 100644
+--- a/xen/arch/arm/arm64/entry.S
++++ b/xen/arch/arm/arm64/entry.S
+@@ -343,16 +343,26 @@ guest_sync:
+         cbnz    x1, guest_sync_slowpath         /* should be 0 for HVC #0 */
+ 
+         /*
+-         * Fastest path possible for ARM_SMCCC_ARCH_WORKAROUND_1.
+-         * The workaround has already been applied on the exception
++         * Fastest path possible for ARM_SMCCC_ARCH_WORKAROUND_1 and
++         * ARM_SMCCC_ARCH_WORKAROUND_3.
++         * The workaround needed has already been applied on the exception
+          * entry from the guest, so let's quickly get back to the guest.
+          *
+          * Note that eor is used because the function identifier cannot
+          * be encoded as an immediate for cmp.
+          */
+         eor     w0, w0, #ARM_SMCCC_ARCH_WORKAROUND_1_FID
+-        cbnz    w0, check_wa2
++        cbz     w0, fastpath_out_workaround
+ 
++        /* ARM_SMCCC_ARCH_WORKAROUND_2 handling */
++        eor     w0, w0, #(ARM_SMCCC_ARCH_WORKAROUND_1_FID ^ ARM_SMCCC_ARCH_WORKAROUND_2_FID)
++        cbz     w0, wa2_ssbd
++
++        /* Fastpath out for ARM_SMCCC_ARCH_WORKAROUND_3 */
++        eor     w0, w0, #(ARM_SMCCC_ARCH_WORKAROUND_2_FID ^ ARM_SMCCC_ARCH_WORKAROUND_3_FID)
++        cbnz    w0, guest_sync_slowpath
++
++fastpath_out_workaround:
+         /*
+          * Clobber both x0 and x1 to prevent leakage. Note that thanks
+          * the eor, x0 = 0.
+@@ -361,10 +371,7 @@ guest_sync:
+         eret
+         sb
+ 
+-check_wa2:
+-        /* ARM_SMCCC_ARCH_WORKAROUND_2 handling */
+-        eor     w0, w0, #(ARM_SMCCC_ARCH_WORKAROUND_1_FID ^ ARM_SMCCC_ARCH_WORKAROUND_2_FID)
+-        cbnz    w0, guest_sync_slowpath
++wa2_ssbd:
+ #ifdef CONFIG_ARM_SSBD
+ alternative_cb arm_enable_wa2_handling
+         b       wa2_end
+diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c
+index ecf4faa13da3..643976db6537 100644
+--- a/xen/arch/arm/vsmc.c
++++ b/xen/arch/arm/vsmc.c
+@@ -123,6 +123,10 @@ static bool handle_arch(struct cpu_user_regs *regs)
+                 break;
+             }
+             break;
++        case ARM_SMCCC_ARCH_WORKAROUND_3_FID:
++            if ( cpus_have_cap(ARM_WORKAROUND_BHB_SMCC_3) )
++                ret = 0;
++            break;
+         }
+ 
+         set_user_reg(regs, 0, ret);
+@@ -131,6 +135,7 @@ static bool handle_arch(struct cpu_user_regs *regs)
+     }
+ 
+     case ARM_SMCCC_ARCH_WORKAROUND_1_FID:
++    case ARM_SMCCC_ARCH_WORKAROUND_3_FID:
+         /* No return value */
+         return true;
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-6-x86-spec-ctrl-Cease-using-thunk-lfence-on-AMD.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-6-x86-spec-ctrl-Cease-using-thunk-lfence-on-AMD.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-6-x86-spec-ctrl-Cease-using-thunk-lfence-on-AMD.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa398-4.12-6-x86-spec-ctrl-Cease-using-thunk-lfence-on-AMD.patch	2022-06-05 22:17:52.000000000 +0100
@@ -0,0 +1,39 @@
+From 944afa38d9339a67f0164d07fb7ac8a54e9a4c60 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Mon, 7 Mar 2022 16:35:52 +0000
+Subject: x86/spec-ctrl: Cease using thunk=lfence on AMD
+
+AMD have updated their Spectre v2 guidance, and lfence/jmp is no longer
+considered safe.  AMD are recommending using retpoline everywhere.
+
+Update the default heuristics to never select THUNK_LFENCE.
+
+This is part of XSA-398 / CVE-2021-26401.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 8d03080d2a339840d3a59e0932a94f804e45110d)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index e2fcefc86a60..866b864918fd 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -953,16 +953,10 @@ void __init init_speculation_mitigations(void)
+         if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) )
+         {
+             /*
+-             * AMD's recommended mitigation is to set lfence as being dispatch
+-             * serialising, and to use IND_THUNK_LFENCE.
+-             */
+-            if ( cpu_has_lfence_dispatch )
+-                thunk = THUNK_LFENCE;
+-            /*
+-             * On Intel hardware, we'd like to use retpoline in preference to
++             * On all hardware, we'd like to use retpoline in preference to
+              * IBRS, but only if it is safe on this hardware.
+              */
+-            else if ( retpoline_safe(caps) )
++            if ( retpoline_safe(caps) )
+                 thunk = THUNK_RETPOLINE;
+             else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+                 ibrs = true;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa399-4.12.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa399-4.12.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa399-4.12.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa399-4.12.patch	2022-05-26 17:34:26.000000000 +0100
@@ -0,0 +1,45 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: correct ordering of operations in cleanup_domid_map()
+
+The function may be called without any locks held (leaving aside the
+domctl one, which we surely don't want to depend on here), so needs to
+play safe wrt other accesses to domid_map[] and domid_bitmap[]. This is
+to avoid context_set_domain_id()'s writing of domid_map[] to be reset to
+zero right away in the case of it racing the freeing of a DID.
+
+For the interaction with context_set_domain_id() and ->domid_map[] reads
+see the code comment.
+
+{check_,}cleanup_domid_map() are called with pcidevs_lock held or during
+domain cleanup only (and pcidevs_lock is also held around
+context_set_domain_id()), i.e. racing calls with the same (dom, iommu)
+tuple cannot occur.
+
+domain_iommu_domid(), besides its use by cleanup_domid_map(), has its
+result used only to control flushing, and hence a stale result would
+only lead to a stray extra flush.
+
+This is CVE-2022-26357 / XSA-399.
+
+Fixes: b9c20c78789f ("VT-d: per-iommu domain-id")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -1770,8 +1770,14 @@ static int domain_context_unmap(struct d
+             goto out;
+         }
+ 
++        /*
++         * Update domid_map[] /before/ domid_bitmap[] to avoid a race with
++         * context_set_domain_id(), setting the slot to DOMID_INVALID for
++         * ->domid_map[] reads to produce a suitable value while the bit is
++         * still set.
++         */
++        iommu->domid_map[iommu_domid] = DOMID_INVALID;
+         clear_bit(iommu_domid, iommu->domid_bitmap);
+-        iommu->domid_map[iommu_domid] = 0;
+     }
+ 
+ out:
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-00.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-00.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-00.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-00.patch	2022-05-26 17:34:26.000000000 +0100
@@ -0,0 +1,138 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: split domid map cleanup check into a function
+
+This logic will want invoking from elsewhere.
+
+No functional change intended.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -152,6 +152,68 @@ static void __init free_intel_iommu(stru
+     xfree(intel);
+ }
+ 
++static void cleanup_domid_map(struct domain *domain, struct iommu *iommu)
++{
++    int iommu_domid = domain_iommu_domid(domain, iommu);
++
++    if ( iommu_domid >= 0 )
++    {
++        /*
++         * Update domid_map[] /before/ domid_bitmap[] to avoid a race with
++         * context_set_domain_id(), setting the slot to DOMID_INVALID for
++         * ->domid_map[] reads to produce a suitable value while the bit is
++         * still set.
++         */
++        iommu->domid_map[iommu_domid] = DOMID_INVALID;
++        clear_bit(iommu_domid, iommu->domid_bitmap);
++    }
++}
++
++static bool any_pdev_behind_iommu(const struct domain *d,
++                                  const struct pci_dev *exclude,
++                                  const struct iommu *iommu)
++{
++    const struct pci_dev *pdev;
++
++    for_each_pdev ( d, pdev )
++    {
++        const struct acpi_drhd_unit *drhd;
++
++        if ( pdev == exclude )
++            continue;
++
++        drhd = acpi_find_matched_drhd_unit(pdev);
++        if ( drhd && drhd->iommu == iommu )
++            return true;
++    }
++
++    return false;
++}
++
++/*
++ * If no other devices under the same iommu owned by this domain,
++ * clear iommu in iommu_bitmap and clear domain_id in domid_bitmap.
++ */
++static void check_cleanup_domid_map(struct domain *d,
++                                    const struct pci_dev *exclude,
++                                    struct iommu *iommu)
++{
++    bool found = any_pdev_behind_iommu(d, exclude, iommu);
++
++    /*
++     * Hidden devices are associated with DomXEN but usable by the hardware
++     * domain. Hence they need considering here as well.
++     */
++    if ( !found && is_hardware_domain(d) )
++        found = any_pdev_behind_iommu(dom_xen, exclude, iommu);
++
++    if ( !found )
++    {
++        clear_bit(iommu->index, &dom_iommu(d)->arch.iommu_bitmap);
++        cleanup_domid_map(d, iommu);
++    }
++}
++
+ static int iommus_incoherent;
+ 
+ static void sync_cache(const void *addr, unsigned int size)
+@@ -1671,7 +1733,6 @@ static int domain_context_unmap(struct d
+     struct iommu *iommu;
+     int ret = 0;
+     u8 seg = pdev->seg, bus = pdev->bus, tmp_bus, tmp_devfn, secbus;
+-    int found = 0;
+ 
+     drhd = acpi_find_matched_drhd_unit(pdev);
+     if ( !drhd )
+@@ -1740,45 +1801,8 @@ static int domain_context_unmap(struct d
+         goto out;
+     }
+ 
+-    /*
+-     * if no other devices under the same iommu owned by this domain,
+-     * clear iommu in iommu_bitmap and clear domain_id in domid_bitmp
+-     */
+-    for_each_pdev ( domain, pdev )
+-    {
+-        if ( pdev->seg == seg && pdev->bus == bus && pdev->devfn == devfn )
+-            continue;
+-
+-        drhd = acpi_find_matched_drhd_unit(pdev);
+-        if ( drhd && drhd->iommu == iommu )
+-        {
+-            found = 1;
+-            break;
+-        }
+-    }
+-
+-    if ( found == 0 )
+-    {
+-        int iommu_domid;
+-
+-        clear_bit(iommu->index, &dom_iommu(domain)->arch.iommu_bitmap);
+-
+-        iommu_domid = domain_iommu_domid(domain, iommu);
+-        if ( iommu_domid == -1 )
+-        {
+-            ret = -EINVAL;
+-            goto out;
+-        }
+-
+-        /*
+-         * Update domid_map[] /before/ domid_bitmap[] to avoid a race with
+-         * context_set_domain_id(), setting the slot to DOMID_INVALID for
+-         * ->domid_map[] reads to produce a suitable value while the bit is
+-         * still set.
+-         */
+-        iommu->domid_map[iommu_domid] = DOMID_INVALID;
+-        clear_bit(iommu_domid, iommu->domid_bitmap);
+-    }
++    if ( !ret )
++        check_cleanup_domid_map(domain, pdev, iommu);
+ 
+ out:
+     return ret;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-01.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-01.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-01.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-01.patch	2022-05-26 17:34:26.000000000 +0100
@@ -0,0 +1,105 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: fix (de)assign ordering when RMRRs are in use
+
+In the event that the RMRR mappings are essential for device operation,
+they should be established before updating the device's context entry,
+while they should be torn down only after the device's context entry was
+successfully updated.
+
+Also adjust a related log message.
+
+This is CVE-2022-26358 / part of XSA-400.
+
+Fixes: 8b99f4400b69 ("VT-d: fix RMRR related error handling")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -2352,6 +2352,10 @@ static int reassign_device_ownership(
+ {
+     int ret;
+ 
++    ret = domain_context_unmap(source, devfn, pdev);
++    if ( ret )
++        return ret;
++
+     /*
+      * Devices assigned to untrusted domains (here assumed to be any domU)
+      * can attempt to send arbitrary LAPIC/MSI messages. We are unprotected
+@@ -2388,10 +2392,6 @@ static int reassign_device_ownership(
+             }
+     }
+ 
+-    ret = domain_context_unmap(source, devfn, pdev);
+-    if ( ret )
+-        return ret;
+-
+     if ( devfn == pdev->devfn && pdev->domain != dom_io )
+     {
+         list_move(&pdev->domain_list, &dom_io->arch.pdev_list);
+@@ -2468,9 +2468,8 @@ static int intel_iommu_assign_device(
+         }
+     }
+ 
+-    ret = reassign_device_ownership(s, d, devfn, pdev);
+-    if ( ret || d == dom_io )
+-        return ret;
++    if ( d == dom_io )
++        return reassign_device_ownership(s, d, devfn, pdev);
+ 
+     /* Setup rmrr identity mapping */
+     for_each_rmrr_device( rmrr, bdf, i )
+@@ -2483,20 +2482,37 @@ static int intel_iommu_assign_device(
+                                          rmrr->end_address, flag);
+             if ( ret )
+             {
+-                int rc;
+-
+-                rc = reassign_device_ownership(d, s, devfn, pdev);
+                 printk(XENLOG_G_ERR VTDPREFIX
+-                       " cannot map reserved region (%"PRIx64",%"PRIx64"] for Dom%d (%d)\n",
+-                       rmrr->base_address, rmrr->end_address,
+-                       d->domain_id, ret);
+-                if ( rc )
+-                {
+-                    printk(XENLOG_ERR VTDPREFIX
+-                           " failed to reclaim %04x:%02x:%02x.%u from %pd (%d)\n",
+-                           seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), d, rc);
+-                    domain_crash(d);
+-                }
++                       "%pd: cannot map reserved region [%"PRIx64",%"PRIx64"]: %d\n",
++                       d, rmrr->base_address, rmrr->end_address, ret);
++                break;
++            }
++        }
++    }
++
++    if ( !ret )
++        ret = reassign_device_ownership(s, d, devfn, pdev);
++
++    /* See reassign_device_ownership() for the hwdom aspect. */
++    if ( !ret || is_hardware_domain(d) )
++        return ret;
++
++    for_each_rmrr_device( rmrr, bdf, i )
++    {
++        if ( rmrr->segment == seg &&
++             PCI_BUS(bdf) == bus &&
++             PCI_DEVFN2(bdf) == devfn )
++        {
++            int rc = iommu_identity_mapping(d, p2m_access_x,
++                                            rmrr->base_address,
++                                            rmrr->end_address, 0);
++
++            if ( rc && rc != -ENOENT )
++            {
++                printk(XENLOG_ERR VTDPREFIX
++                       "%pd: cannot unmap reserved region [%"PRIx64",%"PRIx64"]: %d\n",
++                       d, rmrr->base_address, rmrr->end_address, rc);
++                domain_crash(d);
+                 break;
+             }
+         }
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-02.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-02.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-02.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-02.patch	2022-05-26 17:34:26.000000000 +0100
@@ -0,0 +1,80 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: fix add/remove ordering when RMRRs are in use
+
+In the event that the RMRR mappings are essential for device operation,
+they should be established before updating the device's context entry,
+while they should be torn down only after the device's context entry was
+successfully cleared.
+
+Also switch to %pd in related log messages.
+
+Fixes: fa88cfadf918 ("vt-d: Map RMRR in intel_iommu_add_device() if the device has RMRR")
+Fixes: 8b99f4400b69 ("VT-d: fix RMRR related error handling")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -1985,14 +1985,6 @@ static int intel_iommu_add_device(u8 dev
+     if ( !pdev->domain )
+         return -EINVAL;
+ 
+-    ret = domain_context_mapping(pdev->domain, devfn, pdev);
+-    if ( ret )
+-    {
+-        dprintk(XENLOG_ERR VTDPREFIX, "d%d: context mapping failed\n",
+-                pdev->domain->domain_id);
+-        return ret;
+-    }
+-
+     for_each_rmrr_device ( rmrr, bdf, i )
+     {
+         if ( rmrr->segment == pdev->seg &&
+@@ -2009,12 +2001,17 @@ static int intel_iommu_add_device(u8 dev
+                                          rmrr->base_address, rmrr->end_address,
+                                          0);
+             if ( ret )
+-                dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
+-                        pdev->domain->domain_id);
++                dprintk(XENLOG_ERR VTDPREFIX, "%pd: RMRR mapping failed\n",
++                        pdev->domain);
+         }
+     }
+ 
+-    return 0;
++    ret = domain_context_mapping(pdev->domain, devfn, pdev);
++    if ( ret )
++        dprintk(XENLOG_ERR VTDPREFIX, "%pd: context mapping failed\n",
++                pdev->domain);
++
++    return ret;
+ }
+ 
+ static int intel_iommu_enable_device(struct pci_dev *pdev)
+@@ -2036,11 +2033,15 @@ static int intel_iommu_remove_device(u8
+ {
+     struct acpi_rmrr_unit *rmrr;
+     u16 bdf;
+-    int i;
++    int ret, i;
+ 
+     if ( !pdev->domain )
+         return -EINVAL;
+ 
++    ret = domain_context_unmap(pdev->domain, devfn, pdev);
++    if ( ret )
++        return ret;
++
+     for_each_rmrr_device ( rmrr, bdf, i )
+     {
+         if ( rmrr->segment != pdev->seg ||
+@@ -2056,7 +2057,7 @@ static int intel_iommu_remove_device(u8
+                                rmrr->end_address, 0);
+     }
+ 
+-    return domain_context_unmap(pdev->domain, devfn, pdev);
++    return 0;
+ }
+ 
+ static int __hwdom_init setup_hwdom_device(u8 devfn, struct pci_dev *pdev)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-03.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-03.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-03.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-03.patch	2022-06-05 22:22:30.000000000 +0100
@@ -0,0 +1,99 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: drop ownership checking from domain_context_mapping_one()
+
+Despite putting in quite a bit of effort it was not possible to
+establish why exactly this code exists (beyond possibly sanity
+checking). Instead of a subsequent change further complicating this
+logic, simply get rid of it.
+
+Take the opportunity and move the respective unmap_vtd_domain_page() out
+of the locked region.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -112,28 +112,6 @@ static int context_set_domain_id(struct
+     return 0;
+ }
+ 
+-static int context_get_domain_id(struct context_entry *context,
+-                                 struct iommu *iommu)
+-{
+-    unsigned long dom_index, nr_dom;
+-    int domid = -1;
+-
+-    if (iommu && context)
+-    {
+-        nr_dom = cap_ndoms(iommu->cap);
+-
+-        dom_index = context_domain_id(*context);
+-
+-        if ( dom_index < nr_dom && iommu->domid_map )
+-            domid = iommu->domid_map[dom_index];
+-        else
+-            dprintk(XENLOG_DEBUG VTDPREFIX,
+-                    "dom_index %lu exceeds nr_dom %lu or iommu has no domid_map\n",
+-                    dom_index, nr_dom);
+-    }
+-    return domid;
+-}
+-
+ static struct intel_iommu *__init alloc_intel_iommu(void)
+ {
+     struct intel_iommu *intel;
+@@ -1433,49 +1411,9 @@ int domain_context_mapping_one(
+ 
+     if ( context_present(*context) )
+     {
+-        int res = 0;
+-
+-        /* Try to get domain ownership from device structure.  If that's
+-         * not available, try to read it from the context itself. */
+-        if ( pdev )
+-        {
+-            if ( pdev->domain != domain )
+-            {
+-                printk(XENLOG_G_INFO VTDPREFIX
+-                       "d%d: %04x:%02x:%02x.%u owned by d%d!",
+-                       domain->domain_id,
+-                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+-                       pdev->domain ? pdev->domain->domain_id : -1);
+-                res = -EINVAL;
+-            }
+-        }
+-        else
+-        {
+-            int cdomain;
+-            cdomain = context_get_domain_id(context, iommu);
+-            
+-            if ( cdomain < 0 )
+-            {
+-                printk(XENLOG_G_WARNING VTDPREFIX
+-                       "d%d: %04x:%02x:%02x.%u mapped, but can't find owner!\n",
+-                       domain->domain_id,
+-                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+-                res = -EINVAL;
+-            }
+-            else if ( cdomain != domain->domain_id )
+-            {
+-                printk(XENLOG_G_INFO VTDPREFIX
+-                       "d%d: %04x:%02x:%02x.%u already mapped to d%d!",
+-                       domain->domain_id,
+-                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+-                       cdomain);
+-                res = -EINVAL;
+-            }
+-        }
+-
+-        unmap_vtd_domain_page(context_entries);
+         spin_unlock(&iommu->lock);
+-        return res;
++        unmap_vtd_domain_page(context_entries);
++        return 0;
+     }
+ 
+     if ( iommu_passthrough && is_hardware_domain(domain) )
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-04.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-04.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-04.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-04.patch	2022-06-06 12:57:27.000000000 +0100
@@ -0,0 +1,561 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: re-assign devices directly
+
+Devices with RMRRs, due to it being unspecified how/when the specified
+memory regions may get accessed, may not be left disconnected from their
+respective mappings (as long as it's not certain that the device has
+been fully quiesced). Hence rather than unmapping the old context and
+then mapping the new one, re-assignment needs to be done in a single
+step.
+
+This is CVE-2022-26359 / part of XSA-400.
+
+Reported-by: Roger Pau Monné <roger.pau@citrix.com>
+
+Similarly quarantining scratch-page mode relies on page tables to be
+continuously wired up.
+
+To avoid complicating things more than necessary, treat all devices
+mostly equally, i.e. regardless of their association with any RMRRs. The
+main difference is when it comes to updating context entries, which need
+to be atomic when there are RMRRs. Yet atomicity can only be achieved
+with CMPXCHG16B, availability of which we can't take for given.
+
+The seemingly complicated choice of non-negative return values for
+domain_context_mapping_one() is to limit code churn: This way callers
+passing NULL for pdev don't need fiddling with.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/drivers/passthrough/vtd/extern.h
++++ b/xen/drivers/passthrough/vtd/extern.h
+@@ -70,7 +70,8 @@ void free_pgtable_maddr(u64 maddr);
+ void *map_vtd_domain_page(u64 maddr);
+ void unmap_vtd_domain_page(void *va);
+ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
+-                               u8 bus, u8 devfn, const struct pci_dev *);
++                               uint8_t bus, uint8_t devfn,
++                               const struct pci_dev *pdev, unsigned int mode);
+ int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
+                              u8 bus, u8 devfn);
+ int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
+@@ -90,8 +91,8 @@ int is_igd_vt_enabled_quirk(void);
+ void platform_quirks_init(void);
+ void vtd_ops_preamble_quirk(struct iommu* iommu);
+ void vtd_ops_postamble_quirk(struct iommu* iommu);
+-int __must_check me_wifi_quirk(struct domain *domain,
+-                               u8 bus, u8 devfn, int map);
++int __must_check me_wifi_quirk(struct domain *domain, uint8_t bus,
++                               uint8_t devfn, unsigned int mode);
+ void pci_vtd_quirk(const struct pci_dev *);
+ void quirk_iommu_caps(struct iommu *iommu);
+ 
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -108,6 +108,7 @@ static int context_set_domain_id(struct
+     }
+ 
+     set_bit(i, iommu->domid_bitmap);
++    context->hi &= ~(((1 << DID_FIELD_WIDTH) - 1) << DID_HIGH_OFFSET);
+     context->hi |= (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OFFSET;
+     return 0;
+ }
+@@ -1389,15 +1390,27 @@ static void __hwdom_init intel_iommu_hwd
+     }
+ }
+ 
++/*
++ * This function returns
++ * - a negative errno value upon error,
++ * - zero upon success when previously the entry was non-present, or this isn't
++ *   the "main" request for a device (pdev == NULL), or for no-op quarantining
++ *   assignments,
++ * - positive (one) upon success when previously the entry was present and this
++ *   is the "main" request for a device (pdev != NULL).
++ */
+ int domain_context_mapping_one(
+     struct domain *domain,
+     struct iommu *iommu,
+-    u8 bus, u8 devfn, const struct pci_dev *pdev)
++    uint8_t bus, uint8_t devfn, const struct pci_dev *pdev,
++    unsigned int mode)
+ {
+     struct domain_iommu *hd = dom_iommu(domain);
+-    struct context_entry *context, *context_entries;
++    struct context_entry *context, *context_entries, lctxt;
++    __uint128_t old;
+     u64 maddr, pgd_maddr;
+-    u16 seg = iommu->intel->drhd->segment;
++    uint16_t seg = iommu->intel->drhd->segment, prev_did = 0;
++    struct domain *prev_dom = NULL;
+     int agaw, rc, ret;
+     bool_t flush_dev_iotlb;
+ 
+@@ -1406,17 +1419,32 @@ int domain_context_mapping_one(
+     maddr = bus_to_context_maddr(iommu, bus);
+     context_entries = (struct context_entry *)map_vtd_domain_page(maddr);
+     context = &context_entries[devfn];
++    old = (lctxt = *context).full;
+ 
+-    if ( context_present(*context) )
++    if ( context_present(lctxt) )
+     {
+-        spin_unlock(&iommu->lock);
+-        unmap_vtd_domain_page(context_entries);
+-        return 0;
++        domid_t domid;
++
++        prev_did = context_domain_id(lctxt);
++        domid = iommu->domid_map[prev_did];
++        if ( domid < DOMID_FIRST_RESERVED )
++            prev_dom = rcu_lock_domain_by_id(domid);
++        else if ( domid == DOMID_IO )
++            prev_dom = rcu_lock_domain(dom_io);
++        if ( !prev_dom )
++        {
++            spin_unlock(&iommu->lock);
++            unmap_vtd_domain_page(context_entries);
++            dprintk(XENLOG_DEBUG VTDPREFIX,
++                    "no domain for did %u (nr_dom %u)\n",
++                    prev_did, cap_ndoms(iommu->cap));
++            return -ESRCH;
++        }
+     }
+ 
+     if ( iommu_passthrough && is_hardware_domain(domain) )
+     {
+-        context_set_translation_type(*context, CONTEXT_TT_PASS_THRU);
++        context_set_translation_type(lctxt, CONTEXT_TT_PASS_THRU);
+         agaw = level_to_agaw(iommu->nr_pt_levels);
+     }
+     else
+@@ -1433,6 +1461,8 @@ int domain_context_mapping_one(
+                 spin_unlock(&hd->arch.mapping_lock);
+                 spin_unlock(&iommu->lock);
+                 unmap_vtd_domain_page(context_entries);
++                if ( prev_dom )
++                    rcu_unlock_domain(prev_dom);
+                 return -ENOMEM;
+             }
+         }
+@@ -1450,33 +1480,102 @@ int domain_context_mapping_one(
+                 goto nomem;
+         }
+ 
+-        context_set_address_root(*context, pgd_maddr);
++        context_set_address_root(lctxt, pgd_maddr);
+         if ( ats_enabled && ecap_dev_iotlb(iommu->ecap) )
+-            context_set_translation_type(*context, CONTEXT_TT_DEV_IOTLB);
++            context_set_translation_type(lctxt, CONTEXT_TT_DEV_IOTLB);
+         else
+-            context_set_translation_type(*context, CONTEXT_TT_MULTI_LEVEL);
++            context_set_translation_type(lctxt, CONTEXT_TT_MULTI_LEVEL);
+ 
+         spin_unlock(&hd->arch.mapping_lock);
+     }
+ 
+-    if ( context_set_domain_id(context, domain, iommu) )
++    rc = context_set_domain_id(&lctxt, domain, iommu);
++    if ( rc )
+     {
++    unlock:
+         spin_unlock(&iommu->lock);
+         unmap_vtd_domain_page(context_entries);
+-        return -EFAULT;
++        if ( prev_dom )
++            rcu_unlock_domain(prev_dom);
++        return rc;
++    }
++
++    if ( !prev_dom )
++    {
++        context_set_address_width(lctxt, agaw);
++        context_set_fault_enable(lctxt);
++        context_set_present(lctxt);
++    }
++    else if ( prev_dom == domain )
++    {
++        ASSERT(lctxt.full == context->full);
++        rc = !!pdev;
++        goto unlock;
++    }
++    else
++    {
++        ASSERT(context_address_width(lctxt) == agaw);
++        ASSERT(!context_fault_disable(lctxt));
++    }
++
++    if ( cpu_has_cx16 )
++    {
++        __uint128_t res = cmpxchg16b(context, &old, &lctxt.full);
++
++        /*
++         * Hardware does not update the context entry behind our backs,
++         * so the return value should match "old".
++         */
++        if ( res != old )
++        {
++            if ( pdev )
++                check_cleanup_domid_map(domain, pdev, iommu);
++            printk(XENLOG_ERR
++                   "%04x:%02x:%02x.%u: unexpected context entry %016lx_%016lx (expected %016lx_%016lx)\n",
++                   pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
++                   (uint64_t)(res >> 64), (uint64_t)res,
++                   (uint64_t)(old >> 64), (uint64_t)old);
++            rc = -EILSEQ;
++            goto unlock;
++        }
++    }
++    else if ( !prev_dom || !(mode & MAP_WITH_RMRR) )
++    {
++        context_clear_present(*context);
++        iommu_sync_cache(context, sizeof(*context));
++
++        write_atomic(&context->hi, lctxt.hi);
++        /* No barrier should be needed between these two. */
++        write_atomic(&context->lo, lctxt.lo);
++    }
++    else /* Best effort, updating DID last. */
++    {
++         /*
++          * By non-atomically updating the context entry's DID field last,
++          * during a short window in time TLB entries with the old domain ID
++          * but the new page tables may be inserted.  This could affect I/O
++          * of other devices using this same (old) domain ID.  Such updating
++          * therefore is not a problem if this was the only device associated
++          * with the old domain ID.  Diverting I/O of any of a dying domain's
++          * devices to the quarantine page tables is intended anyway.
++          */
++        if ( !(mode & (MAP_OWNER_DYING | MAP_SINGLE_DEVICE)) )
++            printk(XENLOG_WARNING VTDPREFIX
++                   " %04x:%02x:%02x.%u: reassignment may cause %pd data corruption\n",
++                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), prev_dom);
++
++        write_atomic(&context->lo, lctxt.lo);
++        /* No barrier should be needed between these two. */
++        write_atomic(&context->hi, lctxt.hi);
+     }
+ 
+-    context_set_address_width(*context, agaw);
+-    context_set_fault_enable(*context);
+-    context_set_present(*context);
+     iommu_sync_cache(context, sizeof(struct context_entry));
+     spin_unlock(&iommu->lock);
+ 
+-    /* Context entry was previously non-present (with domid 0). */
+-    rc = iommu_flush_context_device(iommu, 0, PCI_BDF2(bus, devfn),
+-                                    DMA_CCMD_MASK_NOBIT, 1);
++    rc = iommu_flush_context_device(iommu, prev_did, PCI_BDF2(bus, devfn),
++                                    DMA_CCMD_MASK_NOBIT, !prev_dom);
+     flush_dev_iotlb = !!find_ats_dev_drhd(iommu);
+-    ret = iommu_flush_iotlb_dsi(iommu, 0, 1, flush_dev_iotlb);
++    ret = iommu_flush_iotlb_dsi(iommu, prev_did, !prev_dom, flush_dev_iotlb);
+ 
+     /*
+      * The current logic for returns:
+@@ -1497,17 +1596,35 @@ int domain_context_mapping_one(
+     unmap_vtd_domain_page(context_entries);
+ 
+     if ( !seg && !rc )
+-        rc = me_wifi_quirk(domain, bus, devfn, MAP_ME_PHANTOM_FUNC);
++        rc = me_wifi_quirk(domain, bus, devfn, mode);
+ 
+-    return rc;
++    if ( rc )
++    {
++        if ( !prev_dom )
++            domain_context_unmap_one(domain, iommu, bus, devfn);
++        else if ( prev_dom != domain ) /* Avoid infinite recursion. */
++            domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev,
++                                       mode & MAP_WITH_RMRR);
++    }
++
++    if ( prev_dom )
++        rcu_unlock_domain(prev_dom);
++
++    return rc ?: pdev && prev_dom;
+ }
+ 
++static int domain_context_unmap(struct domain *d, uint8_t devfn,
++                                struct pci_dev *pdev);
++
+ static int domain_context_mapping(struct domain *domain, u8 devfn,
+                                   struct pci_dev *pdev)
+ {
+     struct acpi_drhd_unit *drhd;
++    const struct acpi_rmrr_unit *rmrr;
+     int ret = 0;
+-    u8 seg = pdev->seg, bus = pdev->bus, secbus;
++    unsigned int i, mode = 0;
++    uint16_t seg = pdev->seg, bdf;
++    uint8_t bus = pdev->bus, secbus;
+ 
+     drhd = acpi_find_matched_drhd_unit(pdev);
+     if ( !drhd )
+@@ -1515,8 +1632,30 @@ static int domain_context_mapping(struct
+ 
+     ASSERT(pcidevs_locked());
+ 
++    for_each_rmrr_device( rmrr, bdf, i )
++    {
++        if ( rmrr->segment != pdev->seg ||
++             bdf != PCI_BDF2(pdev->bus, pdev->devfn) )
++            continue;
++
++        mode |= MAP_WITH_RMRR;
++        break;
++    }
++
++    if ( domain != pdev->domain )
++    {
++        if ( pdev->domain->is_dying )
++            mode |= MAP_OWNER_DYING;
++        else if ( drhd &&
++                  !any_pdev_behind_iommu(pdev->domain, pdev, drhd->iommu) &&
++                  !pdev->phantom_stride )
++            mode |= MAP_SINGLE_DEVICE;
++    }
++
+     switch ( pdev->type )
+     {
++        bool prev_present;
++
+     case DEV_TYPE_PCI_HOST_BRIDGE:
+         if ( iommu_debug )
+             printk(VTDPREFIX "d%d:Hostbridge: skip %04x:%02x:%02x.%u map\n",
+@@ -1537,7 +1676,9 @@ static int domain_context_mapping(struct
+                    domain->domain_id, seg, bus,
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+         ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                         pdev);
++                                         pdev, mode);
++        if ( ret > 0 )
++            ret = 0;
+         if ( !ret && devfn == pdev->devfn && ats_device(pdev, drhd) > 0 )
+             enable_ats_device(pdev, &drhd->iommu->ats_devices);
+ 
+@@ -1550,20 +1691,33 @@ static int domain_context_mapping(struct
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+ 
+         ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                         pdev);
+-        if ( ret )
++                                         pdev, mode);
++        if ( ret < 0 )
+             break;
++        prev_present = ret;
++        ret = 0;
+ 
+         if ( find_upstream_bridge(seg, &bus, &devfn, &secbus) < 1 )
+             break;
+ 
+         /*
++         * Strictly speaking if the device is the only one behind this bridge
++         * and the only one with this (secbus,0,0) tuple, it could be allowed
++         * to be re-assigned regardless of RMRR presence.  But let's deal with
++         * that case only if it is actually found in the wild.
++         */
++        if ( prev_present && (mode & MAP_WITH_RMRR) &&
++             domain != pdev->domain )
++            ret = -EOPNOTSUPP;
++
++        /*
+          * Mapping a bridge should, if anything, pass the struct pci_dev of
+          * that bridge. Since bridges don't normally get assigned to guests,
+          * their owner would be the wrong one. Pass NULL instead.
+          */
+-        ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                         NULL);
++        if ( ret >= 0 )
++            ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
++                                             NULL, mode);
+ 
+         /*
+          * Devices behind PCIe-to-PCI/PCIx bridge may generate different
+@@ -1578,7 +1732,15 @@ static int domain_context_mapping(struct
+         if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
+              (secbus != pdev->bus || pdev->devfn != 0) )
+             ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
+-                                             NULL);
++                                             NULL, mode);
++
++        if ( ret )
++        {
++            if ( !prev_present )
++                domain_context_unmap(domain, devfn, pdev);
++            else if ( pdev->domain != domain ) /* Avoid infinite recursion. */
++                domain_context_mapping(pdev->domain, devfn, pdev);
++        }
+ 
+         break;
+ 
+@@ -2237,9 +2399,8 @@ static int reassign_device_ownership(
+ {
+     int ret;
+ 
+-    ret = domain_context_unmap(source, devfn, pdev);
+-    if ( ret )
+-        return ret;
++    if ( !has_arch_pdevs(target) )
++        vmx_pi_hooks_assign(target);
+ 
+     /*
+      * Devices assigned to untrusted domains (here assumed to be any domU)
+@@ -2249,6 +2410,31 @@ static int reassign_device_ownership(
+     if ( (target != hardware_domain) && !iommu_intremap )
+         untrusted_msi = true;
+ 
++    ret = domain_context_mapping(target, devfn, pdev);
++    if ( ret )
++    {
++        if ( !has_arch_pdevs(target) )
++            vmx_pi_hooks_deassign(target);
++        return ret;
++    }
++
++    if ( pdev->devfn == devfn )
++    {
++        const struct acpi_drhd_unit *drhd = acpi_find_matched_drhd_unit(pdev);
++
++        if ( drhd )
++            check_cleanup_domid_map(source, pdev, drhd->iommu);
++    }
++
++    if ( devfn == pdev->devfn && pdev->domain != target )
++    {
++        list_move(&pdev->domain_list, &target->arch.pdev_list);
++        pdev->domain = target;
++    }
++
++    if ( !has_arch_pdevs(source) )
++        vmx_pi_hooks_deassign(source);
++
+     /*
+      * If the device belongs to the hardware domain, and it has RMRR, don't
+      * remove it from the hardware domain, because BIOS may use RMRR at
+@@ -2277,34 +2463,7 @@ static int reassign_device_ownership(
+             }
+     }
+ 
+-    if ( devfn == pdev->devfn && pdev->domain != dom_io )
+-    {
+-        list_move(&pdev->domain_list, &dom_io->arch.pdev_list);
+-        pdev->domain = dom_io;
+-    }
+-
+-    if ( !has_arch_pdevs(source) )
+-        vmx_pi_hooks_deassign(source);
+-
+-    if ( !has_arch_pdevs(target) )
+-        vmx_pi_hooks_assign(target);
+-
+-    ret = domain_context_mapping(target, devfn, pdev);
+-    if ( ret )
+-    {
+-        if ( !has_arch_pdevs(target) )
+-            vmx_pi_hooks_deassign(target);
+-
+-        return ret;
+-    }
+-
+-    if ( devfn == pdev->devfn && pdev->domain != target )
+-    {
+-        list_move(&pdev->domain_list, &target->arch.pdev_list);
+-        pdev->domain = target;
+-    }
+-
+-    return ret;
++    return 0;
+ }
+ 
+ static int intel_iommu_assign_device(
+--- a/xen/drivers/passthrough/vtd/iommu.h
++++ b/xen/drivers/passthrough/vtd/iommu.h
+@@ -201,8 +201,12 @@ struct root_entry {
+     do {(root).val |= ((value) & PAGE_MASK_4K);} while(0)
+ 
+ struct context_entry {
+-    u64 lo;
+-    u64 hi;
++    union {
++        struct {
++            uint64_t lo, hi;
++        };
++        __uint128_t full;
++    };
+ };
+ #define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
+ #define context_present(c) ((c).lo & 1)
+--- a/xen/drivers/passthrough/vtd/quirks.c
++++ b/xen/drivers/passthrough/vtd/quirks.c
+@@ -330,7 +330,8 @@ void __init platform_quirks_init(void)
+  */
+ 
+ static int __must_check map_me_phantom_function(struct domain *domain,
+-                                                u32 dev, int map)
++                                                unsigned int dev,
++                                                unsigned int mode)
+ {
+     struct acpi_drhd_unit *drhd;
+     struct pci_dev *pdev;
+@@ -341,9 +342,9 @@ static int __must_check map_me_phantom_f
+     drhd = acpi_find_matched_drhd_unit(pdev);
+ 
+     /* map or unmap ME phantom function */
+-    if ( map )
++    if ( !(mode & UNMAP_ME_PHANTOM_FUNC) )
+         rc = domain_context_mapping_one(domain, drhd->iommu, 0,
+-                                        PCI_DEVFN(dev, 7), NULL);
++                                        PCI_DEVFN(dev, 7), NULL, mode);
+     else
+         rc = domain_context_unmap_one(domain, drhd->iommu, 0,
+                                       PCI_DEVFN(dev, 7));
+@@ -351,7 +352,8 @@ static int __must_check map_me_phantom_f
+     return rc;
+ }
+ 
+-int me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
++int me_wifi_quirk(struct domain *domain, uint8_t bus, uint8_t devfn,
++                  unsigned int mode)
+ {
+     u32 id;
+     int rc = 0;
+@@ -375,7 +377,7 @@ int me_wifi_quirk(struct domain *domain,
+             case 0x423b8086:
+             case 0x423c8086:
+             case 0x423d8086:
+-                rc = map_me_phantom_function(domain, 3, map);
++                rc = map_me_phantom_function(domain, 3, mode);
+                 break;
+             default:
+                 break;
+@@ -401,7 +403,7 @@ int me_wifi_quirk(struct domain *domain,
+             case 0x42388086:        /* Puma Peak */
+             case 0x422b8086:
+             case 0x422c8086:
+-                rc = map_me_phantom_function(domain, 22, map);
++                rc = map_me_phantom_function(domain, 22, mode);
+                 break;
+             default:
+                 break;
+--- a/xen/drivers/passthrough/vtd/vtd.h
++++ b/xen/drivers/passthrough/vtd/vtd.h
+@@ -22,8 +22,14 @@
+ 
+ #include <xen/iommu.h>
+ 
+-#define MAP_ME_PHANTOM_FUNC      1
+-#define UNMAP_ME_PHANTOM_FUNC    0
++/*
++ * Values for domain_context_mapping_one()'s and me_wifi_quirk()'s "mode"
++ * parameters.
++ */
++#define MAP_WITH_RMRR         (1u << 0)
++#define MAP_OWNER_DYING       (1u << 1)
++#define MAP_SINGLE_DEVICE     (1u << 2)
++#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
+ 
+ /* Allow for both IOAPIC and IOSAPIC. */
+ #define IO_xAPIC_route_entry IO_APIC_route_entry
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-05.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-05.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-05.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-05.patch	2022-06-13 22:16:03.000000000 +0100
@@ -0,0 +1,443 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: re-assign devices directly
+
+Devices with unity map ranges, due to it being unspecified how/when
+these memory ranges may get accessed, may not be left disconnected from
+their unity mappings (as long as it's not certain that the device has
+been fully quiesced). Hence rather than tearing down the old root page
+table pointer and then establishing the new one, re-assignment needs to
+be done in a single step.
+
+This is CVE-2022-26360 / part of XSA-400.
+
+Reported-by: Roger Pau Monné <roger.pau@citrix.com>
+
+Similarly quarantining scratch-page mode relies on page tables to be
+continuously wired up.
+
+To avoid complicating things more than necessary, treat all devices
+mostly equally, i.e. regardless of their association with any unity map
+ranges.  The main difference is when it comes to updating DTEs, which need
+to be atomic when there are unity mappings. Yet atomicity can only be
+achieved with CMPXCHG16B, availability of which we can't take for given.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+@@ -72,8 +72,11 @@ void amd_iommu_share_p2m(struct domain *
+ int get_dma_requestor_id(u16 seg, u16 bdf);
+ void amd_iommu_set_intremap_table(
+     u32 *dte, u64 intremap_ptr, u8 int_valid);
+-void amd_iommu_set_root_page_table(
+-    u32 *dte, u64 root_ptr, u16 domain_id, u8 paging_mode, u8 valid);
++#define SET_ROOT_VALID          (1u << 0)
++#define SET_ROOT_WITH_UNITY_MAP (1u << 1)
++int __must_check amd_iommu_set_root_page_table(
++    u32 *dte, u64 root_ptr, u16 domain_id, u8 paging_mode, unsigned int flags);
++paddr_t amd_iommu_get_root_page_table(const u32 *dte);
+ void iommu_dte_set_iotlb(u32 *dte, u8 i);
+ void iommu_dte_add_device_entry(u32 *dte, struct ivrs_mappings *ivrs_dev);
+ void iommu_dte_set_guest_cr3(u32 *dte, u16 dom_id, u64 gcr3,
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -143,12 +143,105 @@ static unsigned int set_iommu_pte_presen
+     return need_flush;
+ }
+ 
+-void amd_iommu_set_root_page_table(
+-    u32 *dte, u64 root_ptr, u16 domain_id, u8 paging_mode, u8 valid)
++/*
++ * This function returns
++ * - -errno for errors,
++ * - 0 for a successful update, atomic when necessary
++ * - 1 for a successful but non-atomic update, which may need to be warned
++ *   about by the caller.
++ */
++int amd_iommu_set_root_page_table(u32 *dte, u64 root_ptr, u16 domain_id,
++                                  u8 paging_mode, unsigned int flags)
+ {
++    bool valid = flags & SET_ROOT_VALID;
+     u64 addr_hi, addr_lo;
+     u32 entry, dte0 = dte[0];
+ 
++    addr_lo = root_ptr & DMA_32BIT_MASK;
++    addr_hi = root_ptr >> 32;
++
++    if ( get_field_from_reg_u32(dte0, IOMMU_DEV_TABLE_VALID_MASK,
++                                IOMMU_DEV_TABLE_VALID_SHIFT) &&
++         get_field_from_reg_u32(dte0, IOMMU_DEV_TABLE_TRANSLATION_VALID_MASK,
++                                IOMMU_DEV_TABLE_TRANSLATION_VALID_SHIFT) &&
++         (cpu_has_cx16 || (flags & SET_ROOT_WITH_UNITY_MAP)) )
++    {
++        union {
++            u32 dte[4];
++            u64 raw64[2];
++            __uint128_t raw128;
++        } ldte;
++        __uint128_t old;
++        int ret = 0;
++
++        memcpy(ldte.dte, dte, sizeof(ldte));
++        old = ldte.raw128;
++
++        set_field_in_reg_u32(domain_id, ldte.dte[2],
++                             IOMMU_DEV_TABLE_DOMAIN_ID_MASK,
++                             IOMMU_DEV_TABLE_DOMAIN_ID_SHIFT, &ldte.dte[2]);
++
++        set_field_in_reg_u32(addr_hi, ldte.dte[1],
++                             IOMMU_DEV_TABLE_PAGE_TABLE_PTR_HIGH_MASK,
++                             IOMMU_DEV_TABLE_PAGE_TABLE_PTR_HIGH_SHIFT,
++                             &ldte.dte[1]);
++        set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, ldte.dte[1],
++                             IOMMU_DEV_TABLE_IO_WRITE_PERMISSION_MASK,
++                             IOMMU_DEV_TABLE_IO_WRITE_PERMISSION_SHIFT,
++                             &ldte.dte[1]);
++        set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, ldte.dte[1],
++                             IOMMU_DEV_TABLE_IO_READ_PERMISSION_MASK,
++                             IOMMU_DEV_TABLE_IO_READ_PERMISSION_SHIFT,
++                             &ldte.dte[1]);
++
++        set_field_in_reg_u32(addr_lo >> PAGE_SHIFT, ldte.dte[0],
++                             IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_MASK,
++                             IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_SHIFT,
++                             &ldte.dte[0]);
++        set_field_in_reg_u32(paging_mode, ldte.dte[0],
++                             IOMMU_DEV_TABLE_PAGING_MODE_MASK,
++                             IOMMU_DEV_TABLE_PAGING_MODE_SHIFT, &ldte.dte[0]);
++        set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, ldte.dte[0],
++                             IOMMU_DEV_TABLE_TRANSLATION_VALID_MASK,
++                             IOMMU_DEV_TABLE_TRANSLATION_VALID_SHIFT,
++                             &ldte.dte[0]);
++        set_field_in_reg_u32(valid ? IOMMU_CONTROL_ENABLED
++                                   : IOMMU_CONTROL_DISABLED,
++                             ldte.dte[0], IOMMU_DEV_TABLE_VALID_MASK,
++                             IOMMU_DEV_TABLE_VALID_SHIFT, &ldte.dte[0]);
++
++        if ( cpu_has_cx16 )
++        {
++            __uint128_t res = cmpxchg16b(dte, &old, &ldte.raw128);
++
++            /*
++             * Hardware does not update the DTE behind our backs, so the
++             * return value should match "old".
++             */
++            if ( res != old )
++            {
++                printk(XENLOG_ERR
++                       "Dom%d: unexpected DTE %016lx_%016lx (expected %016lx_%016lx)\n",
++                       domain_id,
++                       (u64)(res >> 64), (u64)res,
++                       (u64)(old >> 64), (u64)old);
++                ret = -EILSEQ;
++            }
++        }
++        else /* Best effort, updating domain_id last. */
++        {
++            u64 *ptr = (void *)dte;
++
++            write_atomic(ptr + 0, ldte.raw64[0]);
++            /* No barrier should be needed between these two. */
++            write_atomic(ptr + 1, ldte.raw64[1]);
++
++            ret = 1;
++        }
++
++        return ret;
++    }
++
+     if ( valid ||
+          get_field_from_reg_u32(dte0, IOMMU_DEV_TABLE_VALID_MASK,
+                                 IOMMU_DEV_TABLE_VALID_SHIFT) )
+@@ -183,9 +276,6 @@ void amd_iommu_set_root_page_table(uint3
+                          IOMMU_DEV_TABLE_DOMAIN_ID_SHIFT, &entry);
+     dte[2] = entry;
+ 
+-    addr_lo = root_ptr & DMA_32BIT_MASK;
+-    addr_hi = root_ptr >> 32;
+-
+     set_field_in_reg_u32((u32)addr_hi, 0,
+                          IOMMU_DEV_TABLE_PAGE_TABLE_PTR_HIGH_MASK,
+                          IOMMU_DEV_TABLE_PAGE_TABLE_PTR_HIGH_SHIFT, &entry);
+@@ -197,6 +287,20 @@ void amd_iommu_set_root_page_table(uint3
+                          IOMMU_DEV_TABLE_VALID_MASK,
+                          IOMMU_DEV_TABLE_VALID_SHIFT, &entry);
+     write_atomic(&dte[0], entry);
++
++    return 0;
++}
++
++paddr_t amd_iommu_get_root_page_table(const u32 *dte)
++{
++    u32 lo = get_field_from_reg_u32(
++                      dte[0], IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_MASK,
++                      IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_SHIFT);
++    u32 hi = get_field_from_reg_u32(
++                      dte[1], IOMMU_DEV_TABLE_PAGE_TABLE_PTR_HIGH_MASK,
++                      IOMMU_DEV_TABLE_PAGE_TABLE_PTR_HIGH_SHIFT);
++
++    return ((paddr_t)hi << 32) | (lo << PAGE_SHIFT);
+ }
+ 
+ void iommu_dte_set_iotlb(u32 *dte, u8 i)
+--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
++++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+@@ -107,22 +107,60 @@ static void disable_translation(u32 *dte
+     dte[0] = entry;
+ }
+ 
+-static void amd_iommu_setup_domain_device(
++static int __must_check allocate_domain_resources(struct domain_iommu *hd)
++{
++    int rc;
++
++    spin_lock(&hd->arch.mapping_lock);
++    rc = amd_iommu_alloc_root(hd);
++    spin_unlock(&hd->arch.mapping_lock);
++
++    return rc;
++}
++
++static bool any_pdev_behind_iommu(const struct domain *d,
++                                  const struct pci_dev *exclude,
++                                  const struct amd_iommu *iommu)
++{
++    const struct pci_dev *pdev;
++
++    for_each_pdev ( d, pdev )
++    {
++        if ( pdev == exclude )
++            continue;
++
++        if ( find_iommu_for_device(pdev->seg,
++                                   PCI_BDF2(pdev->bus, pdev->devfn)) == iommu )
++            return true;
++    }
++
++    return false;
++}
++
++static int __must_check amd_iommu_setup_domain_device(
+     struct domain *domain, struct amd_iommu *iommu,
+     u8 devfn, struct pci_dev *pdev)
+ {
+-    void *dte;
++    u32 *dte;
+     unsigned long flags;
+-    int req_id, valid = 1;
+-    int dte_i = 0;
++    unsigned int req_id, sr_flags;
++    int dte_i = 0, rc;
+     u8 bus = pdev->bus;
+-    const struct domain_iommu *hd = dom_iommu(domain);
++    struct domain_iommu *hd = dom_iommu(domain);
++    const struct ivrs_mappings *ivrs_dev;
++
++    BUG_ON(!hd->arch.paging_mode || !iommu->dev_table.buffer);
+ 
+-    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
+-            !iommu->dev_table.buffer );
++    rc = allocate_domain_resources(hd);
++    if ( rc )
++        return rc;
+ 
+-    if ( iommu_passthrough && is_hardware_domain(domain) )
+-        valid = 0;
++    req_id = get_dma_requestor_id(iommu->seg,
++                                  PCI_BDF2(pdev->bus, pdev->devfn));
++    ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
++    sr_flags = (iommu_passthrough && is_hardware_domain(domain)
++                ? 0 : SET_ROOT_VALID)
++               | (ivrs_dev->unity_map ? SET_ROOT_WITH_UNITY_MAP : 0);
+ 
+     if ( ats_enabled )
+         dte_i = 1;
+@@ -130,32 +168,87 @@ static void amd_iommu_setup_domain_devic
+     /* get device-table entry */
+     req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
+     dte = iommu->dev_table.buffer + (req_id * IOMMU_DEV_TABLE_ENTRY_SIZE);
++    ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
+ 
+     spin_lock_irqsave(&iommu->lock, flags);
+ 
+     if ( !is_translation_valid((u32 *)dte) )
+     {
+         /* bind DTE to domain page-tables */
+-        amd_iommu_set_root_page_table(
+-            (u32 *)dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
+-            hd->arch.paging_mode, valid);
++        rc = amd_iommu_set_root_page_table(
++                 dte, page_to_maddr(hd->arch.root_table),
++                 domain->domain_id, hd->arch.paging_mode, sr_flags);
++        if ( rc )
++        {
++            ASSERT(rc < 0);
++            spin_unlock_irqrestore(&iommu->lock, flags);
++            return rc;
++        }
+ 
+         if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
+              iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
+             iommu_dte_set_iotlb((u32 *)dte, dte_i);
+ 
+         amd_iommu_flush_device(iommu, req_id);
++    }
++    else if ( amd_iommu_get_root_page_table(dte) !=
++              page_to_maddr(hd->arch.root_table) )
++    {
++        /*
++         * Strictly speaking if the device is the only one with this requestor
++         * ID, it could be allowed to be re-assigned regardless of unity map
++         * presence.  But let's deal with that case only if it is actually
++         * found in the wild.
++         */
++        if ( req_id != PCI_BDF2(bus, devfn) &&
++             (sr_flags & SET_ROOT_WITH_UNITY_MAP) )
++            rc = -EOPNOTSUPP;
++        else
++            rc = amd_iommu_set_root_page_table(
++                     dte, page_to_maddr(hd->arch.root_table),
++                     domain->domain_id, hd->arch.paging_mode, sr_flags);
++        if ( rc < 0 )
++        {
++            spin_unlock_irqrestore(&iommu->lock, flags);
++            return rc;
++        }
++        if ( rc &&
++             domain != pdev->domain &&
++             /*
++              * By non-atomically updating the DTE's domain ID field last,
++              * during a short window in time TLB entries with the old domain
++              * ID but the new page tables may have been inserted.  This could
++              * affect I/O of other devices using this same (old) domain ID.
++              * Such updating therefore is not a problem if this was the only
++              * device associated with the old domain ID.  Diverting I/O of any
++              * of a dying domain's devices to the quarantine page tables is
++              * intended anyway.
++              */
++             !pdev->domain->is_dying &&
++             (any_pdev_behind_iommu(pdev->domain, pdev, iommu) ||
++              pdev->phantom_stride) )
++            printk(" %04x:%02x:%02x.%u: reassignment may cause %pd data corruption\n",
++                   pdev->seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
++                   pdev->domain);
+ 
+-        AMD_IOMMU_DEBUG("Setup I/O page table: device id = %#x, type = %#x, "
+-                        "root table = %#"PRIx64", "
+-                        "domain = %d, paging mode = %d\n",
+-                        req_id, pdev->type,
+-                        page_to_maddr(hd->arch.root_table),
+-                        domain->domain_id, hd->arch.paging_mode);
++        if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
++             iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
++            ASSERT(get_field_from_reg_u32(
++                       dte[3], IOMMU_DEV_TABLE_IOTLB_SUPPORT_MASK,
++                       IOMMU_DEV_TABLE_IOTLB_SUPPORT_SHIFT) == dte_i);
++
++        amd_iommu_flush_device(iommu, req_id);
+     }
+ 
+     spin_unlock_irqrestore(&iommu->lock, flags);
+ 
++    AMD_IOMMU_DEBUG("Setup I/O page table: device id = %#x, type = %#x, "
++                    "root table = %#"PRIx64", "
++                    "domain = %d, paging mode = %d\n",
++                    req_id, pdev->type,
++                    page_to_maddr(hd->arch.root_table),
++                    domain->domain_id, hd->arch.paging_mode);
++
+     ASSERT(pcidevs_locked());
+ 
+     if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
+@@ -166,6 +259,8 @@ static void amd_iommu_setup_domain_devic
+ 
+         amd_iommu_flush_iotlb(devfn, pdev, INV_IOMMU_ALL_PAGES_ADDRESS, 0);
+     }
++
++    return 0;
+ }
+ 
+ int __init amd_iov_detect(void)
+@@ -207,17 +302,6 @@ int amd_iommu_alloc_root(struct domain_i
+     return 0;
+ }
+ 
+-static int __must_check allocate_domain_resources(struct domain_iommu *hd)
+-{
+-    int rc;
+-
+-    spin_lock(&hd->arch.mapping_lock);
+-    rc = amd_iommu_alloc_root(hd);
+-    spin_unlock(&hd->arch.mapping_lock);
+-
+-    return rc;
+-}
+-
+ int __read_mostly amd_iommu_min_paging_mode = 1;
+ 
+ static int amd_iommu_domain_init(struct domain *d)
+@@ -336,7 +420,6 @@ static int reassign_device(struct domain
+ {
+     struct amd_iommu *iommu;
+     int bdf, rc;
+-    struct domain_iommu *t = dom_iommu(target);
+     const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
+ 
+     bdf = PCI_BDF2(pdev->bus, pdev->devfn);
+@@ -350,7 +433,15 @@ static int reassign_device(struct domain
+         return -ENODEV;
+     }
+ 
+-    amd_iommu_disable_domain_device(source, iommu, devfn, pdev);
++    rc = amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
++    if ( rc )
++        return rc;
++
++    if ( devfn == pdev->devfn && pdev->domain != target )
++    {
++        list_move(&pdev->domain_list, &target->arch.pdev_list);
++        pdev->domain = target;
++    }
+ 
+     /*
+      * If the device belongs to the hardware domain, and it has a unity mapping,
+@@ -366,27 +457,10 @@ static int reassign_device(struct domain
+             return rc;
+     }
+ 
+-    if ( devfn == pdev->devfn && pdev->domain != dom_io )
+-    {
+-        list_move(&pdev->domain_list, &dom_io->arch.pdev_list);
+-        pdev->domain = dom_io;
+-    }
+-
+-    rc = allocate_domain_resources(t);
+-    if ( rc )
+-        return rc;
+-
+-    amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
+     AMD_IOMMU_DEBUG("Re-assign %04x:%02x:%02x.%u from dom%d to dom%d\n",
+                     pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                     source->domain_id, target->domain_id);
+ 
+-    if ( devfn == pdev->devfn && pdev->domain != target )
+-    {
+-        list_move(&pdev->domain_list, &target->arch.pdev_list);
+-        pdev->domain = target;
+-    }
+-
+     return 0;
+ }
+ 
+@@ -517,8 +591,7 @@ static int amd_iommu_add_device(u8 devfn
+         return -ENODEV;
+     }
+ 
+-    amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
+-    return 0;
++    return amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
+ }
+ 
+ static int amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-06.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-06.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-06.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-06.patch	2022-06-06 07:52:05.000000000 +0100
@@ -0,0 +1,281 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: prepare for per-device quarantine page tables (part I)
+
+Arrange for domain ID and page table root to be passed around, the latter in
+particular to domain_pgd_maddr() such that taking it from the per-domain
+fields can be overridden.
+
+No functional change intended.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+
+--- a/xen/drivers/passthrough/vtd/extern.h
++++ b/xen/drivers/passthrough/vtd/extern.h
+@@ -72,9 +72,10 @@ void *map_vtd_domain_page(u64 maddr);
+ void unmap_vtd_domain_page(void *va);
+ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
+                                uint8_t bus, uint8_t devfn,
+-                               const struct pci_dev *pdev, unsigned int mode);
++                               const struct pci_dev *pdev, domid_t domid,
++                               paddr_t pgd_maddr, unsigned int mode);
+ int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
+-                             u8 bus, u8 devfn);
++                             uint8_t bus, uint8_t devfn, domid_t domid);
+ int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
+ 
+ unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
+@@ -93,7 +94,8 @@ void platform_quirks_init(void);
+ void vtd_ops_preamble_quirk(struct iommu* iommu);
+ void vtd_ops_postamble_quirk(struct iommu* iommu);
+ int __must_check me_wifi_quirk(struct domain *domain, uint8_t bus,
+-                               uint8_t devfn, unsigned int mode);
++                               uint8_t devfn, domid_t domid, paddr_t pgd_maddr,
++                               unsigned int mode);
+ void pci_vtd_quirk(const struct pci_dev *);
+ void quirk_iommu_caps(struct iommu *iommu);
+ 
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -1405,12 +1405,12 @@ int domain_context_mapping_one(
+     struct domain *domain,
+     struct iommu *iommu,
+     uint8_t bus, uint8_t devfn, const struct pci_dev *pdev,
+-    unsigned int mode)
++    domid_t domid, paddr_t pgd_maddr, unsigned int mode)
+ {
+     struct domain_iommu *hd = dom_iommu(domain);
+     struct context_entry *context, *context_entries, lctxt;
+     __uint128_t old;
+-    u64 maddr, pgd_maddr;
++    uint64_t maddr;
+     uint16_t seg = iommu->intel->drhd->segment, prev_did = 0;
+     struct domain *prev_dom = NULL;
+     int agaw, rc, ret;
+@@ -1451,10 +1451,12 @@ int domain_context_mapping_one(
+     }
+     else
+     {
++        paddr_t root = pgd_maddr;
++
+         spin_lock(&hd->arch.mapping_lock);
+ 
+         /* Ensure we have pagetables allocated down to leaf PTE. */
+-        if ( hd->arch.pgd_maddr == 0 )
++        if ( !root )
+         {
+             addr_to_dma_page_maddr(domain, 0, 1);
+             if ( hd->arch.pgd_maddr == 0 )
+@@ -1467,22 +1469,24 @@ int domain_context_mapping_one(
+                     rcu_unlock_domain(prev_dom);
+                 return -ENOMEM;
+             }
++
++            root = hd->arch.pgd_maddr;
+         }
+ 
+         /* Skip top levels of page tables for 2- and 3-level DRHDs. */
+-        pgd_maddr = hd->arch.pgd_maddr;
+         for ( agaw = level_to_agaw(4);
+               agaw != level_to_agaw(iommu->nr_pt_levels);
+               agaw-- )
+         {
+-            struct dma_pte *p = map_vtd_domain_page(pgd_maddr);
+-            pgd_maddr = dma_pte_addr(*p);
++            struct dma_pte *p = map_vtd_domain_page(root);
++
++            root = dma_pte_addr(*p);
+             unmap_vtd_domain_page(p);
+-            if ( pgd_maddr == 0 )
++            if ( !root )
+                 goto nomem;
+         }
+ 
+-        context_set_address_root(lctxt, pgd_maddr);
++        context_set_address_root(lctxt, root);
+         if ( ats_enabled && ecap_dev_iotlb(iommu->ecap) )
+             context_set_translation_type(lctxt, CONTEXT_TT_DEV_IOTLB);
+         else
+@@ -1598,15 +1602,21 @@ int domain_context_mapping_one(
+     unmap_vtd_domain_page(context_entries);
+ 
+     if ( !seg && !rc )
+-        rc = me_wifi_quirk(domain, bus, devfn, mode);
++        rc = me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode);
+ 
+     if ( rc )
+     {
+         if ( !prev_dom )
+-            domain_context_unmap_one(domain, iommu, bus, devfn);
++            domain_context_unmap_one(domain, iommu, bus, devfn,
++                                     domain->domain_id);
+         else if ( prev_dom != domain ) /* Avoid infinite recursion. */
++        {
++            hd = dom_iommu(prev_dom);
+             domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev,
++                                       domain->domain_id,
++                                       hd->arch.pgd_maddr,
+                                        mode & MAP_WITH_RMRR);
++        }
+     }
+ 
+     if ( prev_dom )
+@@ -1623,6 +1633,7 @@ static int domain_context_mapping(struct
+ {
+     struct acpi_drhd_unit *drhd;
+     const struct acpi_rmrr_unit *rmrr;
++    paddr_t pgd_maddr = dom_iommu(domain)->arch.pgd_maddr;
+     int ret = 0;
+     unsigned int i, mode = 0;
+     uint16_t seg = pdev->seg, bdf;
+@@ -1678,7 +1689,8 @@ static int domain_context_mapping(struct
+                    domain->domain_id, seg, bus,
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+         ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                         pdev, mode);
++                                         pdev, domain->domain_id, pgd_maddr,
++                                         mode);
+         if ( ret > 0 )
+             ret = 0;
+         if ( !ret && devfn == pdev->devfn && ats_device(pdev, drhd) > 0 )
+@@ -1693,7 +1705,8 @@ static int domain_context_mapping(struct
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+ 
+         ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                         pdev, mode);
++                                         pdev, domain->domain_id, pgd_maddr,
++                                         mode);
+         if ( ret < 0 )
+             break;
+         prev_present = ret;
+@@ -1719,7 +1732,8 @@ static int domain_context_mapping(struct
+          */
+         if ( ret >= 0 )
+             ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                             NULL, mode);
++                                             NULL, domain->domain_id, pgd_maddr,
++                                             mode);
+ 
+         /*
+          * Devices behind PCIe-to-PCI/PCIx bridge may generate different
+@@ -1734,7 +1748,8 @@ static int domain_context_mapping(struct
+         if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
+              (secbus != pdev->bus || pdev->devfn != 0) )
+             ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
+-                                             NULL, mode);
++                                             NULL, domain->domain_id, pgd_maddr,
++                                             mode);
+ 
+         if ( ret )
+         {
+@@ -1763,7 +1778,7 @@ static int domain_context_mapping(struct
+ int domain_context_unmap_one(
+     struct domain *domain,
+     struct iommu *iommu,
+-    u8 bus, u8 devfn)
++    uint8_t bus, uint8_t devfn, domid_t domid)
+ {
+     struct context_entry *context, *context_entries;
+     u64 maddr;
+@@ -1821,7 +1836,7 @@ int domain_context_unmap_one(
+     unmap_vtd_domain_page(context_entries);
+ 
+     if ( !iommu->intel->drhd->segment && !rc )
+-        rc = me_wifi_quirk(domain, bus, devfn, UNMAP_ME_PHANTOM_FUNC);
++        rc = me_wifi_quirk(domain, bus, devfn, domid, 0, UNMAP_ME_PHANTOM_FUNC);
+ 
+     return rc;
+ }
+@@ -1860,7 +1875,8 @@ static int domain_context_unmap(struct d
+             printk(VTDPREFIX "d%d:PCIe: unmap %04x:%02x:%02x.%u\n",
+                    domain->domain_id, seg, bus,
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+-        ret = domain_context_unmap_one(domain, iommu, bus, devfn);
++        ret = domain_context_unmap_one(domain, iommu, bus, devfn,
++                                       domain->domain_id);
+         if ( !ret && devfn == pdev->devfn && ats_device(pdev, drhd) > 0 )
+             disable_ats_device(pdev);
+ 
+@@ -1870,7 +1886,8 @@ static int domain_context_unmap(struct d
+         if ( iommu_debug )
+             printk(VTDPREFIX "d%d:PCI: unmap %04x:%02x:%02x.%u\n",
+                    domain->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+-        ret = domain_context_unmap_one(domain, iommu, bus, devfn);
++        ret = domain_context_unmap_one(domain, iommu, bus, devfn,
++                                       domain->domain_id);
+         if ( ret )
+             break;
+ 
+@@ -1882,14 +1899,17 @@ static int domain_context_unmap(struct d
+         /* PCIe to PCI/PCIx bridge */
+         if ( pdev_type(seg, tmp_bus, tmp_devfn) == DEV_TYPE_PCIe2PCI_BRIDGE )
+         {
+-            ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn);
++            ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
++                                           domain->domain_id);
+             if ( ret )
+                 return ret;
+ 
+-            ret = domain_context_unmap_one(domain, iommu, secbus, 0);
++            ret = domain_context_unmap_one(domain, iommu, secbus, 0,
++                                           domain->domain_id);
+         }
+         else /* Legacy PCI bridge */
+-            ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn);
++            ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
++                                           domain->domain_id);
+ 
+         break;
+ 
+--- a/xen/drivers/passthrough/vtd/quirks.c
++++ b/xen/drivers/passthrough/vtd/quirks.c
+@@ -331,6 +331,8 @@ void __init platform_quirks_init(void)
+ 
+ static int __must_check map_me_phantom_function(struct domain *domain,
+                                                 unsigned int dev,
++                                                domid_t domid,
++                                                paddr_t pgd_maddr,
+                                                 unsigned int mode)
+ {
+     struct acpi_drhd_unit *drhd;
+@@ -344,16 +346,17 @@ static int __must_check map_me_phantom_f
+     /* map or unmap ME phantom function */
+     if ( !(mode & UNMAP_ME_PHANTOM_FUNC) )
+         rc = domain_context_mapping_one(domain, drhd->iommu, 0,
+-                                        PCI_DEVFN(dev, 7), NULL, mode);
++                                        PCI_DEVFN(dev, 7), NULL,
++                                        domid, pgd_maddr, mode);
+     else
+         rc = domain_context_unmap_one(domain, drhd->iommu, 0,
+-                                      PCI_DEVFN(dev, 7));
++                                      PCI_DEVFN(dev, 7), domid);
+ 
+     return rc;
+ }
+ 
+ int me_wifi_quirk(struct domain *domain, uint8_t bus, uint8_t devfn,
+-                  unsigned int mode)
++                  domid_t domid, paddr_t pgd_maddr, unsigned int mode)
+ {
+     u32 id;
+     int rc = 0;
+@@ -377,7 +380,7 @@ int me_wifi_quirk(struct domain *domain,
+             case 0x423b8086:
+             case 0x423c8086:
+             case 0x423d8086:
+-                rc = map_me_phantom_function(domain, 3, mode);
++                rc = map_me_phantom_function(domain, 3, domid, pgd_maddr, mode);
+                 break;
+             default:
+                 break;
+@@ -403,7 +406,7 @@ int me_wifi_quirk(struct domain *domain,
+             case 0x42388086:        /* Puma Peak */
+             case 0x422b8086:
+             case 0x422c8086:
+-                rc = map_me_phantom_function(domain, 22, mode);
++                rc = map_me_phantom_function(domain, 22, domid, pgd_maddr, mode);
+                 break;
+             default:
+                 break;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-07.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-07.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-07.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-07.patch	2022-06-06 07:52:05.000000000 +0100
@@ -0,0 +1,126 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: VT-d: prepare for per-device quarantine page tables (part II)
+
+Replace the passing of struct domain * by domid_t in preparation of
+per-device quarantine page tables also requiring per-device pseudo
+domain IDs, which aren't going to be associated with any struct domain
+instances.
+
+No functional change intended (except for slightly adjusted log message
+text).
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -52,8 +52,8 @@ static struct tasklet vtd_fault_tasklet;
+ static int setup_hwdom_device(u8 devfn, struct pci_dev *);
+ static void setup_hwdom_rmrr(struct domain *d);
+ 
+-static int domain_iommu_domid(struct domain *d,
+-                              struct iommu *iommu)
++static int get_iommu_did(domid_t domid, const struct iommu *iommu,
++                         bool warn)
+ {
+     unsigned long nr_dom, i;
+ 
+@@ -61,23 +61,24 @@ static int domain_iommu_domid(struct dom
+     i = find_first_bit(iommu->domid_bitmap, nr_dom);
+     while ( i < nr_dom )
+     {
+-        if ( iommu->domid_map[i] == d->domain_id )
++        if ( iommu->domid_map[i] == domid )
+             return i;
+ 
+         i = find_next_bit(iommu->domid_bitmap, nr_dom, i+1);
+     }
+ 
+-    dprintk(XENLOG_ERR VTDPREFIX,
+-            "Cannot get valid iommu domid: domid=%d iommu->index=%d\n",
+-            d->domain_id, iommu->index);
++    if ( warn )
++        dprintk(XENLOG_ERR VTDPREFIX,
++                "No valid iommu %u domid for Dom%d\n",
++                iommu->index, domid);
++
+     return -1;
+ }
+ 
+ #define DID_FIELD_WIDTH 16
+ #define DID_HIGH_OFFSET 8
+ static int context_set_domain_id(struct context_entry *context,
+-                                 struct domain *d,
+-                                 struct iommu *iommu)
++                                 domid_t domid, struct iommu *iommu)
+ {
+     unsigned long nr_dom, i;
+     int found = 0;
+@@ -88,7 +89,7 @@ static int context_set_domain_id(struct
+     i = find_first_bit(iommu->domid_bitmap, nr_dom);
+     while ( i < nr_dom )
+     {
+-        if ( iommu->domid_map[i] == d->domain_id )
++        if ( iommu->domid_map[i] == domid )
+         {
+             found = 1;
+             break;
+@@ -104,7 +105,7 @@ static int context_set_domain_id(struct
+             dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no free domain ids\n");
+             return -EFAULT;
+         }
+-        iommu->domid_map[i] = d->domain_id;
++        iommu->domid_map[i] = domid;
+     }
+ 
+     set_bit(i, iommu->domid_bitmap);
+@@ -131,9 +132,9 @@ static void __init free_intel_iommu(stru
+     xfree(intel);
+ }
+ 
+-static void cleanup_domid_map(struct domain *domain, struct iommu *iommu)
++static void cleanup_domid_map(domid_t domid, struct iommu *iommu)
+ {
+-    int iommu_domid = domain_iommu_domid(domain, iommu);
++    int iommu_domid = get_iommu_did(domid, iommu, false);
+ 
+     if ( iommu_domid >= 0 )
+     {
+@@ -189,7 +190,7 @@ static void check_cleanup_domid_map(stru
+     if ( !found )
+     {
+         clear_bit(iommu->index, &dom_iommu(d)->arch.iommu_bitmap);
+-        cleanup_domid_map(d, iommu);
++        cleanup_domid_map(d->domain_id, iommu);
+     }
+ }
+ 
+@@ -670,7 +671,7 @@ static int __must_check iommu_flush_iotl
+             continue;
+ 
+         flush_dev_iotlb = !!find_ats_dev_drhd(iommu);
+-        iommu_domid= domain_iommu_domid(d, iommu);
++        iommu_domid = get_iommu_did(d->domain_id, iommu, !d->is_dying);
+         if ( iommu_domid == -1 )
+             continue;
+ 
+@@ -1495,7 +1496,7 @@ int domain_context_mapping_one(
+         spin_unlock(&hd->arch.mapping_lock);
+     }
+ 
+-    rc = context_set_domain_id(&lctxt, domain, iommu);
++    rc = context_set_domain_id(&lctxt, domid, iommu);
+     if ( rc )
+     {
+     unlock:
+@@ -1803,7 +1804,7 @@ int domain_context_unmap_one(
+     context_clear_entry(*context);
+     iommu_sync_cache(context, sizeof(struct context_entry));
+ 
+-    iommu_domid= domain_iommu_domid(domain, iommu);
++    iommu_domid = get_iommu_did(domid, iommu, !domain->is_dying);
+     if ( iommu_domid == -1 )
+     {
+         spin_unlock(&iommu->lock);
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-08.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-08.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-08.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-08.patch	2022-06-13 22:37:51.000000000 +0100
@@ -0,0 +1,440 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: IOMMU/x86: maintain a per-device pseudo domain ID
+
+In order to subsequently enable per-device quarantine page tables, we'll
+need domain-ID-like identifiers to be inserted in the respective device
+(AMD) or context (Intel) table entries alongside the per-device page
+table root addresses.
+
+Make use of "real" domain IDs occupying only half of the value range
+coverable by domid_t.
+
+Note that in VT-d's iommu_alloc() I didn't want to introduce new memory
+leaks in case of error, but existing ones don't get plugged - that'll be
+the subject of a later change.
+
+The VT-d changes are slightly asymmetric, but this way we can avoid
+assigning pseudo domain IDs to devices which would never be mapped while
+still avoiding to add a new parameter to domain_context_unmap().
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/include/asm-x86/iommu.h
++++ b/xen/include/asm-x86/iommu.h
+@@ -112,6 +112,10 @@ int pi_update_irte(const struct pi_desc
+         ops->sync_cache(addr, size);                    \
+ })
+ 
++unsigned long *iommu_init_domid(void);
++domid_t iommu_alloc_domid(unsigned long *map);
++void iommu_free_domid(domid_t domid, unsigned long *map);
++
+ #endif /* !__ARCH_X86_IOMMU_H__ */
+ /*
+  * Local variables:
+--- a/xen/include/asm-x86/pci.h
++++ b/xen/include/asm-x86/pci.h
+@@ -15,6 +15,12 @@
+ 
+ struct arch_pci_dev {
+     vmask_t used_vectors;
++    /*
++     * These fields are (de)initialized under pcidevs-lock. Other uses of
++     * them don't race (de)initialization and hence don't strictly need any
++     * locking.
++     */
++    domid_t pseudo_domid;
+ };
+ 
+ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf,
+--- a/xen/include/asm-x86/amd-iommu.h
++++ b/xen/include/asm-x86/amd-iommu.h
+@@ -97,6 +97,7 @@ struct amd_iommu {
+     struct ring_buffer cmd_buffer;
+     struct ring_buffer event_log;
+     struct ring_buffer ppr_log;
++    unsigned long *domid_map;
+ 
+     int exclusion_enable;
+     int exclusion_allow_all;
+--- a/xen/drivers/passthrough/amd/iommu_detect.c
++++ b/xen/drivers/passthrough/amd/iommu_detect.c
+@@ -150,6 +150,11 @@ int __init amd_iommu_detect_one_acpi(
+     if ( rt )
+         goto out;
+ 
++    iommu->domid_map = iommu_init_domid();
++    rt = -ENOMEM;
++    if ( !iommu->domid_map )
++        goto out;
++
+     rt = pci_ro_device(iommu->seg, bus, PCI_DEVFN(dev, func));
+     if ( rt )
+         printk(XENLOG_ERR
+@@ -161,7 +166,10 @@ int __init amd_iommu_detect_one_acpi(
+ 
+  out:
+     if ( rt )
++    {
++        xfree(iommu->domid_map);
+         xfree(iommu);
++    }
+ 
+     return rt;
+ }
+--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
++++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+@@ -567,6 +567,8 @@ static int amd_iommu_add_device(u8 devfn
+ {
+     struct amd_iommu *iommu;
+     u16 bdf;
++    bool fresh_domid = false;
++    int ret;
+ 
+     if ( !pdev->domain )
+         return -EINVAL;
+@@ -591,7 +593,22 @@ static int amd_iommu_add_device(u8 devfn
+         return -ENODEV;
+     }
+ 
+-    return amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
++    if ( iommu_quarantine && pdev->arch.pseudo_domid == DOMID_INVALID )
++    {
++        pdev->arch.pseudo_domid = iommu_alloc_domid(iommu->domid_map);
++        if ( pdev->arch.pseudo_domid == DOMID_INVALID )
++            return -ENOSPC;
++        fresh_domid = true;
++    }
++
++    ret = amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
++    if ( ret && fresh_domid )
++    {
++        iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map);
++        pdev->arch.pseudo_domid = DOMID_INVALID;
++    }
++
++    return ret;
+ }
+ 
+ static int amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
+@@ -613,6 +630,10 @@ static int amd_iommu_remove_device(u8 de
+     }
+ 
+     amd_iommu_disable_domain_device(pdev->domain, iommu, devfn, pdev);
++
++    iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map);
++    pdev->arch.pseudo_domid = DOMID_INVALID;
++
+     return 0;
+ }
+ 
+--- a/xen/drivers/passthrough/pci.c
++++ b/xen/drivers/passthrough/pci.c
+@@ -314,6 +314,7 @@ static struct pci_dev *alloc_pdev(struct
+     *((u8*) &pdev->bus) = bus;
+     *((u8*) &pdev->devfn) = devfn;
+     pdev->domain = NULL;
++    pdev->arch.pseudo_domid = DOMID_INVALID;
+     INIT_LIST_HEAD(&pdev->msi_list);
+ 
+     if ( pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+@@ -1268,10 +1269,13 @@ static int _dump_pci_devices(struct pci_
+ 
+     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
+     {
+-        printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
+-               pseg->nr, pdev->bus,
+-               PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
+-               pdev->domain ? pdev->domain->domain_id : -1,
++        printk("%04x:%02x:%02x.%u - ", pseg->nr, pdev->bus,
++               PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
++        if ( pdev->domain == dom_io )
++            printk("DomIO:%x", pdev->arch.pseudo_domid);
++        else if ( pdev->domain )
++            printk("Dom%d", pdev->domain->domain_id);
++        printk(" - node %-3d - MSIs < ",
+                (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
+         list_for_each_entry ( msi, &pdev->msi_list, list )
+                printk("%d ", msi->irq);
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -22,6 +22,7 @@
+ #include <xen/sched.h>
+ #include <xen/xmalloc.h>
+ #include <xen/domain_page.h>
++#include <xen/err.h>
+ #include <xen/iocap.h>
+ #include <xen/iommu.h>
+ #include <xen/numa.h>
+@@ -1228,7 +1229,7 @@ int __init iommu_alloc(struct acpi_drhd_
+ {
+     struct iommu *iommu;
+     unsigned long sagaw, nr_dom;
+-    int agaw;
++    int agaw, rc;
+ 
+     if ( nr_iommus > MAX_IOMMUS )
+     {
+@@ -1318,10 +1319,19 @@ int __init iommu_alloc(struct acpi_drhd_
+     if ( !iommu->domid_map )
+         return -ENOMEM ;
+ 
++    iommu->pseudo_domid_map = iommu_init_domid();
++    rc = -ENOMEM;
++    if ( !iommu->pseudo_domid_map )
++        goto free;
++
+     spin_lock_init(&iommu->lock);
+     spin_lock_init(&iommu->register_lock);
+ 
+     return 0;
++
++ free:
++    iommu_free(drhd);
++    return rc;
+ }
+ 
+ void __init iommu_free(struct acpi_drhd_unit *drhd)
+@@ -1344,6 +1354,7 @@ void __init iommu_free(struct acpi_drhd_
+ 
+     xfree(iommu->domid_bitmap);
+     xfree(iommu->domid_map);
++    xfree(iommu->pseudo_domid_map);
+ 
+     free_intel_iommu(iommu->intel);
+     if ( iommu->msi.irq >= 0 )
+@@ -1624,8 +1635,8 @@ int domain_context_mapping_one(
+     return rc ?: pdev && prev_dom;
+ }
+ 
+-static int domain_context_unmap(struct domain *d, uint8_t devfn,
+-                                struct pci_dev *pdev);
++static const struct acpi_drhd_unit *domain_context_unmap(
++    struct domain *d, uint8_t devfn, struct pci_dev *pdev);
+ 
+ static int domain_context_mapping(struct domain *domain, u8 devfn,
+                                   struct pci_dev *pdev)
+@@ -1633,6 +1644,7 @@ static int domain_context_mapping(struct
+     struct acpi_drhd_unit *drhd;
+     const struct acpi_rmrr_unit *rmrr;
+     paddr_t pgd_maddr = dom_iommu(domain)->arch.pgd_maddr;
++    domid_t orig_domid = pdev->arch.pseudo_domid;
+     int ret = 0;
+     unsigned int i, mode = 0;
+     uint16_t seg = pdev->seg, bdf;
+@@ -1683,6 +1695,14 @@ static int domain_context_mapping(struct
+         break;
+ 
+     case DEV_TYPE_PCIe_ENDPOINT:
++        if ( iommu_quarantine && orig_domid == DOMID_INVALID )
++        {
++            pdev->arch.pseudo_domid =
++                iommu_alloc_domid(drhd->iommu->pseudo_domid_map);
++            if ( pdev->arch.pseudo_domid == DOMID_INVALID )
++                return -ENOSPC;
++        }
++
+         if ( iommu_debug )
+             printk(VTDPREFIX "d%d:PCIe: map %04x:%02x:%02x.%u\n",
+                    domain->domain_id, seg, bus,
+@@ -1698,6 +1718,14 @@ static int domain_context_mapping(struct
+         break;
+ 
+     case DEV_TYPE_PCI:
++        if ( iommu_quarantine && orig_domid == DOMID_INVALID )
++        {
++            pdev->arch.pseudo_domid =
++                iommu_alloc_domid(drhd->iommu->pseudo_domid_map);
++            if ( pdev->arch.pseudo_domid == DOMID_INVALID )
++                return -ENOSPC;
++        }
++
+         if ( iommu_debug )
+             printk(VTDPREFIX "d%d:PCI: map %04x:%02x:%02x.%u\n",
+                    domain->domain_id, seg, bus,
+@@ -1771,6 +1799,13 @@ static int domain_context_mapping(struct
+     if ( !ret && devfn == pdev->devfn )
+         pci_vtd_quirk(pdev);
+ 
++    if ( ret && drhd && orig_domid == DOMID_INVALID )
++    {
++        iommu_free_domid(pdev->arch.pseudo_domid,
++                         drhd->iommu->pseudo_domid_map);
++        pdev->arch.pseudo_domid = DOMID_INVALID;
++    }
++
+     return ret;
+ }
+ 
+@@ -1840,8 +1875,10 @@ int domain_context_unmap_one(
+     return rc;
+ }
+ 
+-static int domain_context_unmap(struct domain *domain, u8 devfn,
+-                                struct pci_dev *pdev)
++static const struct acpi_drhd_unit *domain_context_unmap(
++    struct domain *domain,
++    uint8_t devfn,
++    struct pci_dev *pdev)
+ {
+     struct acpi_drhd_unit *drhd;
+     struct iommu *iommu;
+@@ -1850,7 +1887,7 @@ static int domain_context_unmap(struct d
+ 
+     drhd = acpi_find_matched_drhd_unit(pdev);
+     if ( !drhd )
+-        return -ENODEV;
++        return ERR_PTR(-ENODEV);
+     iommu = drhd->iommu;
+ 
+     switch ( pdev->type )
+@@ -1861,7 +1898,7 @@ static int domain_context_unmap(struct d
+                    domain->domain_id, seg, bus,
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+         if ( !is_hardware_domain(domain) )
+-            return -EPERM;
++            return ERR_PTR(-EPERM);
+         goto out;
+ 
+     case DEV_TYPE_PCIe_BRIDGE:
+@@ -1900,11 +1937,9 @@ static int domain_context_unmap(struct d
+         {
+             ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
+                                            domain->domain_id);
+-            if ( ret )
+-                return ret;
+-
+-            ret = domain_context_unmap_one(domain, iommu, secbus, 0,
+-                                           domain->domain_id);
++            if ( !ret )
++                ret = domain_context_unmap_one(domain, iommu, secbus, 0,
++                                               domain->domain_id);
+         }
+         else /* Legacy PCI bridge */
+             ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
+@@ -1924,7 +1959,7 @@ static int domain_context_unmap(struct d
+         check_cleanup_domid_map(domain, pdev, iommu);
+ 
+ out:
+-    return ret;
++    return ret ? ERR_PTR(ret) : drhd;
+ }
+ 
+ static void iommu_domain_teardown(struct domain *d)
+@@ -2100,16 +2135,17 @@ static int intel_iommu_enable_device(str
+ 
+ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
+ {
++    const struct acpi_drhd_unit *drhd;
+     struct acpi_rmrr_unit *rmrr;
+     u16 bdf;
+-    int ret, i;
++    unsigned int i;
+ 
+     if ( !pdev->domain )
+         return -EINVAL;
+ 
+-    ret = domain_context_unmap(pdev->domain, devfn, pdev);
+-    if ( ret )
+-        return ret;
++    drhd = domain_context_unmap(pdev->domain, devfn, pdev);
++    if ( IS_ERR(drhd) )
++        return PTR_ERR(drhd);
+ 
+     for_each_rmrr_device ( rmrr, bdf, i )
+     {
+@@ -2126,6 +2162,13 @@ static int intel_iommu_remove_device(u8
+                                rmrr->end_address, 0);
+     }
+ 
++    if ( drhd )
++    {
++        iommu_free_domid(pdev->arch.pseudo_domid,
++                         drhd->iommu->pseudo_domid_map);
++        pdev->arch.pseudo_domid = DOMID_INVALID;
++    }
++
+     return 0;
+ }
+ 
+--- a/xen/drivers/passthrough/vtd/iommu.h
++++ b/xen/drivers/passthrough/vtd/iommu.h
+@@ -538,6 +538,7 @@ struct iommu {
+     struct msi_desc msi;
+     struct intel_iommu *intel;
+     struct list_head ats_devices;
++    unsigned long *pseudo_domid_map; /* "pseudo" domain id bitmap */
+     unsigned long *domid_bitmap;  /* domain id bitmap */
+     u16 *domid_map;               /* domain id mapping array */
+ };
+--- a/xen/drivers/passthrough/x86/iommu.c
++++ b/xen/drivers/passthrough/x86/iommu.c
+@@ -246,6 +246,53 @@ void arch_iommu_domain_destroy(struct domain *d)
+     }
+ }
+ 
++unsigned long *__init iommu_init_domid(void)
++{
++    if ( !iommu_quarantine )
++        return ZERO_BLOCK_PTR;
++
++    BUILD_BUG_ON(DOMID_MASK * 2U >= UINT16_MAX);
++
++    return xzalloc_array(unsigned long,
++                         BITS_TO_LONGS(UINT16_MAX - DOMID_MASK));
++}
++
++domid_t iommu_alloc_domid(unsigned long *map)
++{
++    /*
++     * This is used uniformly across all IOMMUs, such that on typical
++     * systems we wouldn't re-use the same ID very quickly (perhaps never).
++     */
++    static unsigned int start;
++    unsigned int idx = find_next_zero_bit(map, UINT16_MAX - DOMID_MASK, start);
++
++    ASSERT(pcidevs_locked());
++
++    if ( idx >= UINT16_MAX - DOMID_MASK )
++        idx = find_first_zero_bit(map, UINT16_MAX - DOMID_MASK);
++    if ( idx >= UINT16_MAX - DOMID_MASK )
++        return DOMID_INVALID;
++
++    __set_bit(idx, map);
++
++    start = idx + 1;
++
++    return idx | (DOMID_MASK + 1);
++}
++
++void iommu_free_domid(domid_t domid, unsigned long *map)
++{
++    ASSERT(pcidevs_locked());
++
++    if ( domid == DOMID_INVALID )
++        return;
++
++    ASSERT(domid > DOMID_MASK);
++
++    if ( !__test_and_clear_bit(domid & DOMID_MASK, map) )
++        BUG();
++}
++
+ /*
+  * Local variables:
+  * mode: C
+--- a/xen/include/public/xen.h
++++ b/xen/include/public/xen.h
+@@ -584,6 +584,9 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
+ /* Idle domain. */
+ #define DOMID_IDLE           xen_mk_uint(0x7FFF)
+ 
++/* Mask for valid domain id values */
++#define DOMID_MASK           xen_mk_uint(0x7FFF)
++
+ #ifndef __ASSEMBLY__
+ 
+ typedef uint16_t domid_t;
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-09.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-09.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-09.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-09.patch	2022-06-14 12:21:14.000000000 +0100
@@ -0,0 +1,48 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: IOMMU/x86: drop TLB flushes from quarantine_init() hooks
+
+The page tables just created aren't hooked up yet anywhere, so there's
+nothing that could be present in any TLB, and hence nothing to flush.
+Dropping this flush is, at least on the VT-d side, a prereq to per-
+device domain ID use when quarantining devices, as dom_io isn't going
+to be assigned a DID anymore: The warning in get_iommu_did() would
+trigger.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -943,8 +943,6 @@ int __init amd_iommu_quarantine_init(str
+  out:
+     spin_unlock(&hd->arch.mapping_lock);
+ 
+-    amd_iommu_flush_all_pages(d);
+-
+     /* Pages leaked in failure case */
+     return level ? -ENOMEM : 0;
+ }
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -2804,7 +2804,6 @@ static int __init intel_iommu_quarantine
+     struct dma_pte *parent;
+     unsigned int agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+     unsigned int level = agaw_to_level(agaw);
+-    int rc;
+ 
+     if ( hd->arch.pgd_maddr )
+     {
+@@ -2905,10 +2904,8 @@ static int __init intel_iommu_quarantine
+  out:
+     spin_unlock(&hd->arch.mapping_lock);
+ 
+-    rc = iommu_flush_iotlb_all(d);
+-
+     /* Pages leaked in failure case */
+-    return level ? -ENOMEM : rc;
++    return level ? -ENOMEM : 0;
+ }
+ 
+ const struct iommu_ops intel_iommu_ops = {
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-10.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-10.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-10.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-10.patch	2022-06-14 12:31:12.000000000 +0100
@@ -0,0 +1,41 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: AMD/IOMMU: abstract maximum number of page table levels
+
+We will want to use the constant elsewhere.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+@@ -183,7 +183,7 @@ static inline int amd_iommu_get_paging_m
+     while ( max_frames > PTE_PER_TABLE_SIZE )
+     {
+         max_frames = PTE_PER_TABLE_ALIGN(max_frames) >> PTE_PER_TABLE_SHIFT;
+-        if ( ++level > 6 )
++        if ( ++level > IOMMU_MAX_PT_LEVELS )
+             return -ENOMEM;
+     }
+ 
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+@@ -115,6 +115,8 @@
+ #define IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_MASK		0xFFFFF000
+ #define IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_SHIFT	12
+ 
++#define IOMMU_MAX_PT_LEVELS 6
++
+ /* DeviceTable Entry[63:32] */
+ #define IOMMU_DEV_TABLE_GV_SHIFT                    23
+ #define IOMMU_DEV_TABLE_GV_MASK                     0x800000
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -477,7 +477,7 @@ static int iommu_pde_from_dfn(struct dom
+     table = hd->arch.root_table;
+     level = hd->arch.paging_mode;
+ 
+-    BUG_ON( table == NULL || level < 1 || level > 6 );
++    BUG_ON( table == NULL || level < 1 || level > IOMMU_MAX_PT_LEVELS );
+ 
+     /*
+      * A frame number past what the current page tables can represent can't
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-11.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-11.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-11.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa400-4.12-11.patch	2022-06-16 00:58:02.000000000 +0100
@@ -0,0 +1,891 @@
+From: Jan Beulich <jbeulich@suse.com>
+Subject: IOMMU/x86: use per-device page tables for quarantining
+
+Devices with RMRRs / unity mapped regions, due to it being unspecified
+how/when these memory regions may be accessed, may not be left
+disconnected from the mappings of these regions (as long as it's not
+certain that the device has been fully quiesced). Hence even the page
+tables used when quarantining such devices need to have mappings of
+those regions. This implies installing page tables in the first place
+even when not in scratch-page quarantining mode.
+
+This is CVE-2022-26361 / part of XSA-400.
+
+While for the purpose here it would be sufficient to have devices with
+RMRRs / unity mapped regions use per-device page tables, extend this to
+all devices (in scratch-page quarantining mode). This allows the leaf
+pages to be mapped r/w, thus covering also memory writes (rather than
+just reads) issued by non-quiescent devices.
+
+Set up quarantine page tables as late as possible, yet early enough to
+not encounter failure during de-assign. This means setup generally
+happens in assign_device(), while (for now) the one in deassign_device()
+is there mainly to be on the safe side.
+
+In VT-d's DID allocation function don't require the IOMMU lock to be
+held anymore: All involved code paths hold pcidevs_lock, so this way we
+avoid the need to acquire the IOMMU lock around the new call to
+context_set_domain_id().
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -1239,7 +1239,7 @@ int set_identity_p2m_entry(struct domain
+     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+     int ret;
+ 
+-    if ( !paging_mode_translate(p2m->domain) )
++    if ( !paging_mode_translate(d) )
+     {
+         if ( !need_iommu(d) )
+             return 0;
+--- a/xen/include/asm-x86/pci.h
++++ b/xen/include/asm-x86/pci.h
+@@ -1,6 +1,8 @@
+ #ifndef __X86_PCI_H__
+ #define __X86_PCI_H__
+ 
++#include <xen/mm.h>
++
+ #define CF8_BDF(cf8)     (  ((cf8) & 0x00ffff00) >> 8)
+ #define CF8_ADDR_LO(cf8) (   (cf8) & 0x000000fc)
+ #define CF8_ADDR_HI(cf8) (  ((cf8) & 0x0f000000) >> 16)
+@@ -20,7 +22,18 @@ struct arch_pci_dev {
+      * them don't race (de)initialization and hence don't strictly need any
+      * locking.
+      */
++    union {
++        /* Subset of struct arch_iommu's fields, to be used in dom_io. */
++        struct {
++            uint64_t pgd_maddr;
++        } vtd;
++        struct {
++            struct page_info *root_table;
++        } amd;
++    };
+     domid_t pseudo_domid;
++    mfn_t leaf_mfn;
++    struct page_list_head pgtables_list;
+ };
+ 
+ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf,
+--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
++++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+@@ -52,7 +52,8 @@
+ int amd_iommu_update_ivrs_mapping_acpi(void);
+ 
+-int amd_iommu_quarantine_init(struct domain *d);
++int amd_iommu_quarantine_init(struct pci_dev *pdev);
++void amd_iommu_quarantine_teardown(struct pci_dev *pdev);
+ 
+ /* mapping functions */
+ int __must_check amd_iommu_map_page(struct domain *d, unsigned long gfn,
+                                     unsigned long mfn, unsigned int flags);
+--- a/xen/drivers/passthrough/amd/iommu_map.c
++++ b/xen/drivers/passthrough/amd/iommu_map.c
+@@ -883,62 +883,148 @@ void amd_iommu_share_p2m(struct domain *
+     }
+ }
+ 
+-int __init amd_iommu_quarantine_init(struct domain *d)
++static int fill_qpt(uint64_t *this, unsigned int level,
++                    struct page_info *pgs[IOMMU_MAX_PT_LEVELS],
++                    struct pci_dev *pdev)
+ {
+-    struct domain_iommu *hd = dom_iommu(d);
++    unsigned int i;
++    int rc = 0;
++
++    for ( i = 0; !rc && i < PTE_PER_TABLE_SIZE; ++i )
++    {
++        uint32_t *pte = (uint32_t *)&this[i];
++        uint64_t *next;
++
++        if ( !get_field_from_reg_u32(pte[0], IOMMU_PTE_PRESENT_MASK,
++                                     IOMMU_PTE_PRESENT_SHIFT) )
++        {
++            if ( !pgs[level] )
++            {
++                /*
++                 * The pgtable allocator is fine for the leaf page, as well as
++                 * page table pages, and the resulting allocations are always
++                 * zeroed.
++                 */
++                pgs[level] = alloc_amd_iommu_pgtable();
++                if ( !pgs[level] )
++                {
++                    rc = -ENOMEM;
++                    break;
++                }
++
++                page_list_add(pgs[level], &pdev->arch.pgtables_list);
++
++                if ( level )
++                {
++                    next = __map_domain_page(pgs[level]);
++                    rc = fill_qpt(next, level - 1, pgs, pdev);
++                    unmap_domain_page(next);
++                }
++            }
++
++            /*
++             * PDEs are essentially a subset of PTEs, so this function
++             * is fine to use even at the leaf.
++             */
++            set_iommu_pde_present(pte, mfn_x(page_to_mfn(pgs[level])), level,
++                                  true, true);
++        }
++        else if ( level &&
++                  get_field_from_reg_u32(pte[0],
++                                         IOMMU_PDE_NEXT_LEVEL_MASK,
++                                         IOMMU_PDE_NEXT_LEVEL_SHIFT) )
++        {
++            paddr_t addr_hi = get_field_from_reg_u32(pte[1],
++                                                     IOMMU_PTE_ADDR_HIGH_MASK,
++                                                     IOMMU_PTE_ADDR_HIGH_SHIFT);
++            paddr_t addr_lo = get_field_from_reg_u32(pte[0],
++                                                     IOMMU_PTE_ADDR_LOW_MASK,
++                                                     IOMMU_PTE_ADDR_LOW_SHIFT);
++            unsigned long mfn = (addr_hi << (32 - PAGE_SHIFT)) | addr_lo;
++
++            page_list_add(mfn_to_page(_mfn(mfn)), &pdev->arch.pgtables_list);
++            next = map_domain_page(_mfn(mfn));
++            rc = fill_qpt(next, level - 1, pgs, pdev);
++            unmap_domain_page(next);
++        }
++    }
++
++    return rc;
++}
++
++int amd_iommu_quarantine_init(struct pci_dev *pdev)
++{
++    struct domain_iommu *hd = dom_iommu(dom_io);
+     unsigned long end_gfn =
+         1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT);
+     unsigned int level = amd_iommu_get_paging_mode(end_gfn);
+-    uint64_t *table;
++    unsigned int req_id = get_dma_requestor_id(pdev->seg,
++                                               PCI_BDF2(pdev->bus, pdev->devfn));
++    const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
++    int rc;
+ 
+-    if ( hd->arch.root_table )
++    ASSERT(pcidevs_locked());
++    ASSERT(!hd->arch.root_table);
++
++    ASSERT(pdev->arch.pseudo_domid != DOMID_INVALID);
++
++    if ( pdev->arch.amd.root_table )
+     {
+-        ASSERT_UNREACHABLE();
++        clear_domain_page(pdev->arch.leaf_mfn);
+         return 0;
+     }
+ 
+-    spin_lock(&hd->arch.mapping_lock);
+-
+-    hd->arch.root_table = alloc_amd_iommu_pgtable();
+-    if ( !hd->arch.root_table )
+-        goto out;
+-
+-    table = __map_domain_page(hd->arch.root_table);
+-    while ( level )
++    pdev->arch.amd.root_table = alloc_amd_iommu_pgtable();
++    if ( !pdev->arch.amd.root_table )
++        return -ENOMEM;
++
++    /* Transiently install the root into DomIO, for iommu_identity_mapping(). */
++    hd->arch.root_table = pdev->arch.amd.root_table;
++
++    rc = amd_iommu_reserve_domain_unity_map(dom_io,
++                                            ivrs_mappings[req_id].unity_map,
++                                            0);
++
++    iommu_identity_map_teardown(dom_io);
++    hd->arch.root_table = NULL;
++
++    if ( rc )
++        printk("%04x:%02x:%02x.%u: quarantine unity mapping failed\n",
++               pdev->seg, pdev->bus,
++               PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
++    else
+     {
+-        struct page_info *pg;
+-        unsigned int i;
++        uint64_t *root;
++        struct page_info *pgs[IOMMU_MAX_PT_LEVELS] = {};
+ 
+-        /*
+-         * The pgtable allocator is fine for the leaf page, as well as
+-         * page table pages, and the resulting allocations are always
+-         * zeroed.
+-         */
+-        pg = alloc_amd_iommu_pgtable();
+-        if ( !pg )
+-            break;
++        spin_lock(&hd->arch.mapping_lock);
+ 
+-        for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
+-        {
+-            uint32_t *pde = (uint32_t *)&table[i];
++        root = __map_domain_page(pdev->arch.amd.root_table);
++        rc = fill_qpt(root, level - 1, pgs, pdev);
++        unmap_domain_page(root);
+ 
+-            /*
+-             * PDEs are essentially a subset of PTEs, so this function
+-             * is fine to use even at the leaf.
+-             */
+-            set_iommu_pde_present(pde, mfn_x(page_to_mfn(pg)), level - 1,
+-                                  false, true);
+-        }
++        pdev->arch.leaf_mfn = page_to_mfn(pgs[0]);
+ 
+-        unmap_domain_page(table);
+-        table = __map_domain_page(pg);
+-        level--;
++        spin_unlock(&hd->arch.mapping_lock);
+     }
+-    unmap_domain_page(table);
+ 
+- out:
+-    spin_unlock(&hd->arch.mapping_lock);
++    if ( rc )
++        amd_iommu_quarantine_teardown(pdev);
++
++    return rc;
++}
++
++void amd_iommu_quarantine_teardown(struct pci_dev *pdev)
++{
++    struct page_info *pg;
++
++    ASSERT(pcidevs_locked());
++
++    if ( !pdev->arch.amd.root_table )
++        return;
++
++    while ( (pg = page_list_remove_head(&pdev->arch.pgtables_list)) )
++        free_amd_iommu_pgtable(pg);
+ 
+-    /* Pages leaked in failure case */
+-    return level ? -ENOMEM : 0;
++    pdev->arch.amd.root_table = NULL;
+ }
+--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
++++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+@@ -148,6 +148,8 @@ static int __must_check amd_iommu_setup_
+     u8 bus = pdev->bus;
+     struct domain_iommu *hd = dom_iommu(domain);
+     const struct ivrs_mappings *ivrs_dev;
++    const struct page_info *root_pg;
++    domid_t domid;
+ 
+     BUG_ON(!hd->arch.paging_mode || !iommu->dev_table.buffer);
+ 
+@@ -170,14 +172,25 @@ static int __must_check amd_iommu_setup_
+     dte = iommu->dev_table.buffer + (req_id * IOMMU_DEV_TABLE_ENTRY_SIZE);
+     ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
+ 
++    if ( domain != dom_io )
++    {
++        root_pg = hd->arch.root_table;
++        domid = domain->domain_id;
++    }
++    else
++    {
++        root_pg = pdev->arch.amd.root_table;
++        domid = pdev->arch.pseudo_domid;
++    }
++
+     spin_lock_irqsave(&iommu->lock, flags);
+ 
+     if ( !is_translation_valid((u32 *)dte) )
+     {
+         /* bind DTE to domain page-tables */
+         rc = amd_iommu_set_root_page_table(
+-                 dte, page_to_maddr(hd->arch.root_table),
+-                 domain->domain_id, hd->arch.paging_mode, sr_flags);
++                 dte, page_to_maddr(root_pg), domid,
++                 hd->arch.paging_mode, sr_flags);
+         if ( rc )
+         {
+             ASSERT(rc < 0);
+@@ -191,8 +204,7 @@ static int __must_check amd_iommu_setup_
+ 
+         amd_iommu_flush_device(iommu, req_id);
+     }
+-    else if ( amd_iommu_get_root_page_table(dte) !=
+-              page_to_maddr(hd->arch.root_table) )
++    else if ( amd_iommu_get_root_page_table(dte) != page_to_maddr(root_pg) )
+     {
+         /*
+          * Strictly speaking if the device is the only one with this requestor
+@@ -205,8 +217,8 @@ static int __must_check amd_iommu_setup_
+             rc = -EOPNOTSUPP;
+         else
+             rc = amd_iommu_set_root_page_table(
+-                     dte, page_to_maddr(hd->arch.root_table),
+-                     domain->domain_id, hd->arch.paging_mode, sr_flags);
++                     dte, page_to_maddr(root_pg), domid,
++                     hd->arch.paging_mode, sr_flags);
+         if ( rc < 0 )
+         {
+             spin_unlock_irqrestore(&iommu->lock, flags);
+@@ -225,6 +237,7 @@ static int __must_check amd_iommu_setup_
+               * intended anyway.
+               */
+              !pdev->domain->is_dying &&
++             pdev->domain != dom_io &&
+              (any_pdev_behind_iommu(pdev->domain, pdev, iommu) ||
+               pdev->phantom_stride) )
+             printk(" %04x:%02x:%02x.%u: reassignment may cause %pd data corruption\n",
+@@ -245,9 +258,8 @@ static int __must_check amd_iommu_setup_
+     AMD_IOMMU_DEBUG("Setup I/O page table: device id = %#x, type = %#x, "
+                     "root table = %#"PRIx64", "
+                     "domain = %d, paging mode = %d\n",
+-                    req_id, pdev->type,
+-                    page_to_maddr(hd->arch.root_table),
+-                    domain->domain_id, hd->arch.paging_mode);
++                    req_id, pdev->type, page_to_maddr(root_pg),
++                    domid, hd->arch.paging_mode);
+ 
+     ASSERT(pcidevs_locked());
+ 
+@@ -292,7 +304,7 @@ int __init amd_iov_detect(void)
+ 
+ int amd_iommu_alloc_root(struct domain_iommu *hd)
+ {
+-    if ( unlikely(!hd->arch.root_table) )
++    if ( unlikely(!hd->arch.root_table) && hd != dom_iommu(dom_io) )
+     {
+         hd->arch.root_table = alloc_amd_iommu_pgtable();
+         if ( !hd->arch.root_table )
+@@ -402,7 +414,10 @@ void amd_iommu_disable_domain_device(str
+ 
+         AMD_IOMMU_DEBUG("Disable: device id = %#x, "
+                         "domain = %d, paging mode = %d\n",
+-                        req_id,  domain->domain_id,
++                        req_id,
++                        get_field_from_reg_u32(((uint32_t *)dte)[2],
++                                               IOMMU_DEV_TABLE_DOMAIN_ID_MASK,
++                                               IOMMU_DEV_TABLE_DOMAIN_ID_SHIFT),
+                         dom_iommu(domain)->arch.paging_mode);
+     }
+     spin_unlock_irqrestore(&iommu->lock, flags);
+@@ -631,6 +646,8 @@ static int amd_iommu_remove_device(u8 de
+ 
+     amd_iommu_disable_domain_device(pdev->domain, iommu, devfn, pdev);
+ 
++    amd_iommu_quarantine_teardown(pdev);
++
+     iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map);
+     pdev->arch.pseudo_domid = DOMID_INVALID;
+ 
+--- a/xen/drivers/passthrough/iommu.c
++++ b/xen/drivers/passthrough/iommu.c
+@@ -380,19 +380,19 @@ int iommu_iotlb_flush_all(struct domain
+     return rc;
+ }
+ 
+-static int __init iommu_quarantine_init(void)
++int iommu_quarantine_dev_init(device_t *dev)
+ {
+     const struct domain_iommu *hd = dom_iommu(dom_io);
+-    int rc;
+-
+-    rc = iommu_domain_init(dom_io);
+-    if ( rc )
+-        return rc;
+ 
+-    if ( !hd->platform_ops->quarantine_init )
++    if ( !iommu_quarantine || !hd->platform_ops->quarantine_init )
+         return 0;
+ 
+-    return hd->platform_ops->quarantine_init(dom_io);
++    return hd->platform_ops->quarantine_init(dev);
++}
++
++static int __init iommu_quarantine_init(void)
++{
++    return iommu_domain_init(dom_io);
+ }
+ 
+ int __init iommu_setup(void)
+--- a/xen/drivers/passthrough/pci.c
++++ b/xen/drivers/passthrough/pci.c
+@@ -1469,6 +1469,13 @@ static int assign_device(struct domain *
+         msixtbl_init(d);
+     }
+ 
++    if ( pdev->domain != dom_io )
++    {
++        rc = iommu_quarantine_dev_init(pci_to_dev(pdev));
++        if ( rc )
++            goto done;
++    }
++
+     pdev->fault.count = 0;
+ 
+     if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
+@@ -1515,9 +1522,16 @@ int deassign_device(struct domain *d, u1
+         return -ENODEV;
+ 
+     /* De-assignment from dom_io should de-quarantine the device */
+-    target = ((pdev->quarantine || iommu_quarantine) &&
+-              pdev->domain != dom_io) ?
+-        dom_io : hardware_domain;
++    if ( (pdev->quarantine || iommu_quarantine) && pdev->domain != dom_io )
++    {
++        ret = iommu_quarantine_dev_init(pci_to_dev(pdev));
++        if ( ret )
++           return ret;
++
++        target = dom_io;
++    }
++    else
++        target = hardware_domain;
+ 
+     while ( pdev->phantom_stride )
+     {
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -43,6 +43,12 @@
+ #include "vtd.h"
+ #include "../ats.h"
+ 
++#define DEVICE_DOMID(d, pdev) ((d) != dom_io ? (d)->domain_id \
++                                             : (pdev)->arch.pseudo_domid)
++#define DEVICE_PGTABLE(d, pdev) ((d) != dom_io \
++                                 ? dom_iommu(d)->arch.pgd_maddr \
++                                 : (pdev)->arch.vtd.pgd_maddr)
++
+ /* Possible unfiltered LAPIC/MSI messages from untrusted sources? */
+ bool __read_mostly untrusted_msi;
+ 
+@@ -78,13 +84,18 @@ static int get_iommu_did(domid_t domid,
+ 
+ #define DID_FIELD_WIDTH 16
+ #define DID_HIGH_OFFSET 8
++
++/*
++ * This function may have "context" passed as NULL, to merely obtain a DID
++ * for "domid".
++ */
+ static int context_set_domain_id(struct context_entry *context,
+                                  domid_t domid, struct iommu *iommu)
+ {
+     unsigned long nr_dom, i;
+     int found = 0;
+ 
+-    ASSERT(spin_is_locked(&iommu->lock));
++    ASSERT(pcidevs_locked());
+ 
+     nr_dom = cap_ndoms(iommu->cap);
+     i = find_first_bit(iommu->domid_bitmap, nr_dom);
+@@ -110,8 +121,13 @@ static int context_set_domain_id(struct
+     }
+ 
+     set_bit(i, iommu->domid_bitmap);
+-    context->hi &= ~(((1 << DID_FIELD_WIDTH) - 1) << DID_HIGH_OFFSET);
+-    context->hi |= (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OFFSET;
++
++    if ( context )
++    {
++        context->hi &= ~(((1 << DID_FIELD_WIDTH) - 1) << DID_HIGH_OFFSET);
++        context->hi |= (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OFFSET;
++    }
++
+     return 0;
+ }
+ 
+@@ -179,8 +195,12 @@ static void check_cleanup_domid_map(stru
+                                     const struct pci_dev *exclude,
+                                     struct iommu *iommu)
+ {
+-    bool found = any_pdev_behind_iommu(d, exclude, iommu);
++    bool found;
+ 
++    if ( d == dom_io )
++        return;
++
++    found = any_pdev_behind_iommu(d, exclude, iommu);
+     /*
+      * Hidden devices are associated with DomXEN but usable by the hardware
+      * domain. Hence they need considering here as well.
+@@ -1441,7 +1461,7 @@ int domain_context_mapping_one(
+         domid = iommu->domid_map[prev_did];
+         if ( domid < DOMID_FIRST_RESERVED )
+             prev_dom = rcu_lock_domain_by_id(domid);
+-        else if ( domid == DOMID_IO )
++        else if ( pdev ? domid == pdev->arch.pseudo_domid : domid > DOMID_MASK )
+             prev_dom = rcu_lock_domain(dom_io);
+         if ( !prev_dom )
+         {
+@@ -1618,15 +1638,12 @@ int domain_context_mapping_one(
+     {
+         if ( !prev_dom )
+             domain_context_unmap_one(domain, iommu, bus, devfn,
+-                                     domain->domain_id);
++                                     DEVICE_DOMID(domain, pdev));
+         else if ( prev_dom != domain ) /* Avoid infinite recursion. */
+-        {
+-            hd = dom_iommu(prev_dom);
+             domain_context_mapping_one(prev_dom, iommu, bus, devfn, pdev,
+-                                       domain->domain_id,
+-                                       hd->arch.pgd_maddr,
++                                       DEVICE_DOMID(prev_dom, pdev),
++                                       DEVICE_PGTABLE(prev_dom, pdev),
+                                        mode & MAP_WITH_RMRR);
+-        }
+     }
+ 
+     if ( prev_dom )
+@@ -1643,7 +1660,7 @@ static int domain_context_mapping(struct
+ {
+     struct acpi_drhd_unit *drhd;
+     const struct acpi_rmrr_unit *rmrr;
+-    paddr_t pgd_maddr = dom_iommu(domain)->arch.pgd_maddr;
++    paddr_t pgd_maddr = DEVICE_PGTABLE(domain, pdev);
+     domid_t orig_domid = pdev->arch.pseudo_domid;
+     int ret = 0;
+     unsigned int i, mode = 0;
+@@ -1666,7 +1683,7 @@ static int domain_context_mapping(struct
+         break;
+     }
+ 
+-    if ( domain != pdev->domain )
++    if ( domain != pdev->domain && pdev->domain != dom_io )
+     {
+         if ( pdev->domain->is_dying )
+             mode |= MAP_OWNER_DYING;
+@@ -1707,8 +1724,8 @@ static int domain_context_mapping(struct
+             printk(VTDPREFIX "d%d:PCIe: map %04x:%02x:%02x.%u\n",
+                    domain->domain_id, seg, bus,
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+-        ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                         pdev, domain->domain_id, pgd_maddr,
++        ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn, pdev,
++                                         DEVICE_DOMID(domain, pdev), pgd_maddr,
+                                          mode);
+         if ( ret > 0 )
+             ret = 0;
+@@ -1732,8 +1749,8 @@ static int domain_context_mapping(struct
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+ 
+         ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                         pdev, domain->domain_id, pgd_maddr,
+-                                         mode);
++                                         pdev, DEVICE_DOMID(domain, pdev),
++                                         pgd_maddr, mode);
+         if ( ret < 0 )
+             break;
+         prev_present = ret;
+@@ -1759,8 +1776,8 @@ static int domain_context_mapping(struct
+          */
+         if ( ret >= 0 )
+             ret = domain_context_mapping_one(domain, drhd->iommu, bus, devfn,
+-                                             NULL, domain->domain_id, pgd_maddr,
+-                                             mode);
++                                             NULL, DEVICE_DOMID(domain, pdev),
++                                             pgd_maddr, mode);
+ 
+         /*
+          * Devices behind PCIe-to-PCI/PCIx bridge may generate different
+@@ -1775,8 +1792,8 @@ static int domain_context_mapping(struct
+         if ( !ret && pdev_type(seg, bus, devfn) == DEV_TYPE_PCIe2PCI_BRIDGE &&
+              (secbus != pdev->bus || pdev->devfn != 0) )
+             ret = domain_context_mapping_one(domain, drhd->iommu, secbus, 0,
+-                                             NULL, domain->domain_id, pgd_maddr,
+-                                             mode);
++                                             NULL, DEVICE_DOMID(domain, pdev),
++                                             pgd_maddr, mode);
+ 
+         if ( ret )
+         {
+@@ -1912,7 +1929,7 @@ static const struct acpi_drhd_unit *doma
+                    domain->domain_id, seg, bus,
+                    PCI_SLOT(devfn), PCI_FUNC(devfn));
+         ret = domain_context_unmap_one(domain, iommu, bus, devfn,
+-                                       domain->domain_id);
++                                       DEVICE_DOMID(domain, pdev));
+         if ( !ret && devfn == pdev->devfn && ats_device(pdev, drhd) > 0 )
+             disable_ats_device(pdev);
+ 
+@@ -1923,7 +1940,7 @@ static const struct acpi_drhd_unit *doma
+             printk(VTDPREFIX "d%d:PCI: unmap %04x:%02x:%02x.%u\n",
+                    domain->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+         ret = domain_context_unmap_one(domain, iommu, bus, devfn,
+-                                       domain->domain_id);
++                                       DEVICE_DOMID(domain, pdev));
+         if ( ret )
+             break;
+ 
+@@ -1932,18 +1949,12 @@ static const struct acpi_drhd_unit *doma
+         if ( find_upstream_bridge(seg, &tmp_bus, &tmp_devfn, &secbus) < 1 )
+             break;
+ 
++        ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
++                                       DEVICE_DOMID(domain, pdev));
+         /* PCIe to PCI/PCIx bridge */
+-        if ( pdev_type(seg, tmp_bus, tmp_devfn) == DEV_TYPE_PCIe2PCI_BRIDGE )
+-        {
+-            ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
+-                                           domain->domain_id);
+-            if ( !ret )
+-                ret = domain_context_unmap_one(domain, iommu, secbus, 0,
+-                                               domain->domain_id);
+-        }
+-        else /* Legacy PCI bridge */
+-            ret = domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn,
+-                                           domain->domain_id);
++        if ( !ret && pdev_type(seg, tmp_bus, tmp_devfn) == DEV_TYPE_PCIe2PCI_BRIDGE )
++            ret = domain_context_unmap_one(domain, iommu, secbus, 0,
++                                           DEVICE_DOMID(domain, pdev));
+ 
+         break;
+ 
+@@ -1980,6 +1991,25 @@ static void iommu_domain_teardown(struct
+     spin_unlock(&hd->arch.mapping_lock);
+ }
+ 
++static void quarantine_teardown(struct pci_dev *pdev,
++                                const struct acpi_drhd_unit *drhd)
++{
++    struct page_info *pg;
++
++    ASSERT(pcidevs_locked());
++
++    if ( !pdev->arch.vtd.pgd_maddr )
++        return;
++
++    while ( (pg = page_list_remove_head(&pdev->arch.pgtables_list)) )
++        free_domheap_page(pg);
++
++    pdev->arch.vtd.pgd_maddr = 0;
++
++    if ( drhd )
++        cleanup_domid_map(pdev->arch.pseudo_domid, drhd->iommu);
++}
++
+ static int __must_check intel_iommu_map_page(struct domain *d,
+                                              unsigned long gfn,
+                                              unsigned long mfn,
+@@ -2162,6 +2192,8 @@ static int intel_iommu_remove_device(u8
+                                rmrr->end_address, 0);
+     }
+ 
++    quarantine_teardown(pdev, drhd);
++
+     if ( drhd )
+     {
+         iommu_free_domid(pdev->arch.pseudo_domid,
+@@ -2798,60 +2830,139 @@ static void vtd_dump_p2m_table(struct do
+     vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw), 0, 0);
+ }
+ 
+-static int __init intel_iommu_quarantine_init(struct domain *d)
++static int fill_qpt(struct dma_pte *this, unsigned int level,
++                    paddr_t maddrs[6], struct pci_dev *pdev)
+ {
+-    struct domain_iommu *hd = dom_iommu(d);
+-    struct dma_pte *parent;
++    unsigned int i;
++    int rc = 0;
++
++    for ( i = 0; !rc && i < PTE_NUM; ++i )
++    {
++        struct dma_pte *pte = &this[i], *next;
++
++        if ( !dma_pte_present(*pte) )
++        {
++            if ( !maddrs[level] )
++            {
++                /*
++                 * The pgtable allocator is fine for the leaf page, as well as
++                 * page table pages, and the resulting allocations are always
++                 * zeroed.
++                 */
++                maddrs[level] = alloc_pgtable_maddr(NULL, 1);
++                if ( !maddrs[level] )
++                {
++                    rc = -ENOMEM;
++                    break;
++                }
++
++                page_list_add(maddr_to_page(maddrs[level]),
++                              &pdev->arch.pgtables_list);
++
++                if ( level )
++                {
++                    next = map_vtd_domain_page(maddrs[level]);
++                    rc = fill_qpt(next, level - 1, maddrs, pdev);
++                    unmap_vtd_domain_page(next);
++                }
++            }
++
++            dma_set_pte_addr(*pte, maddrs[level]);
++            dma_set_pte_readable(*pte);
++            dma_set_pte_writable(*pte);
++        }
++        else if ( level && !dma_pte_superpage(*pte) )
++        {
++            page_list_add(maddr_to_page(dma_pte_addr(*pte)),
++                          &pdev->arch.pgtables_list);
++            next = map_vtd_domain_page(dma_pte_addr(*pte));
++            rc = fill_qpt(next, level - 1, maddrs, pdev);
++            unmap_vtd_domain_page(next);
++        }
++    }
++
++    return rc;
++}
++
++static int intel_iommu_quarantine_init(struct pci_dev *pdev)
++{
++    struct domain_iommu *hd = dom_iommu(dom_io);
++    paddr_t maddr;
+     unsigned int agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+     unsigned int level = agaw_to_level(agaw);
++    const struct acpi_drhd_unit *drhd;
++    const struct acpi_rmrr_unit *rmrr;
++    unsigned int i, bdf;
++    bool rmrr_found = false;
++    int rc;
+ 
+-    if ( hd->arch.pgd_maddr )
++    ASSERT(pcidevs_locked());
++    ASSERT(!hd->arch.pgd_maddr);
++
++    if ( pdev->arch.vtd.pgd_maddr )
+     {
+-        ASSERT_UNREACHABLE();
++        clear_domain_page(pdev->arch.leaf_mfn);
+         return 0;
+     }
+ 
+-    spin_lock(&hd->arch.mapping_lock);
++    drhd = acpi_find_matched_drhd_unit(pdev);
++    if ( !drhd )
++        return -ENODEV;
+ 
+-    hd->arch.pgd_maddr = alloc_pgtable_maddr(NULL, 1);
+-    if ( !hd->arch.pgd_maddr )
+-        goto out;
++    maddr = alloc_pgtable_maddr(NULL, 1);
++    if ( !maddr )
++        return -ENOMEM;
+ 
+-    parent = map_vtd_domain_page(hd->arch.pgd_maddr);
+-    while ( level )
+-    {
+-        uint64_t maddr;
+-        unsigned int offset;
++    rc = context_set_domain_id(NULL, pdev->arch.pseudo_domid, drhd->iommu);
+ 
+-        /*
+-         * The pgtable allocator is fine for the leaf page, as well as
+-         * page table pages, and the resulting allocations are always
+-         * zeroed.
+-         */
+-        maddr = alloc_pgtable_maddr(NULL, 1);
+-        if ( !maddr )
++    /* Transiently install the root into DomIO, for iommu_identity_mapping(). */
++    hd->arch.pgd_maddr = maddr;
++
++    for_each_rmrr_device ( rmrr, bdf, i )
++    {
++        if ( rc )
+             break;
+ 
+-        for ( offset = 0; offset < PTE_NUM; offset++ )
++        if ( rmrr->segment == pdev->seg &&
++             bdf == PCI_BDF2(pdev->bus, pdev->devfn) )
+         {
+-            struct dma_pte *pte = &parent[offset];
++            rmrr_found = true;
+ 
+-            dma_set_pte_addr(*pte, maddr);
+-            dma_set_pte_readable(*pte);
++            rc = iommu_identity_mapping(dom_io, p2m_access_rw,
++                                        rmrr->base_address, rmrr->end_address,
++                                        0);
++            if ( rc )
++                printk(XENLOG_ERR VTDPREFIX
++                       "%04x:%02x:%02x.%u: RMRR quarantine mapping failed\n",
++                       pdev->seg, pdev->bus,
++                       PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+         }
+-        iommu_sync_cache(parent, PAGE_SIZE);
++    }
+ 
+-        unmap_vtd_domain_page(parent);
+-        parent = map_vtd_domain_page(maddr);
+-        level--;
++    iommu_identity_map_teardown(dom_io);
++    hd->arch.pgd_maddr = 0;
++    pdev->arch.vtd.pgd_maddr = maddr;
++
++    if ( !rc )
++    {
++        struct dma_pte *root;
++        paddr_t maddrs[6] = {};
++
++        spin_lock(&hd->arch.mapping_lock);
++
++        root = map_vtd_domain_page(maddr);
++        rc = fill_qpt(root, level - 1, maddrs, pdev);
++        unmap_vtd_domain_page(root);
++
++        pdev->arch.leaf_mfn = maddr_to_mfn(maddrs[0]);
++
++        spin_unlock(&hd->arch.mapping_lock);
+     }
+-    unmap_vtd_domain_page(parent);
+ 
+- out:
+-    spin_unlock(&hd->arch.mapping_lock);
++    if ( rc )
++        quarantine_teardown(pdev, drhd);
+ 
+-    /* Pages leaked in failure case */
+-    return level ? -ENOMEM : 0;
++    return rc;
+ }
+ 
+ const struct iommu_ops intel_iommu_ops = {
+--- a/xen/drivers/passthrough/vtd/iommu.h
++++ b/xen/drivers/passthrough/vtd/iommu.h
+@@ -532,7 +532,7 @@ struct iommu {
+     u32 nr_pt_levels;
+     u64	cap;
+     u64	ecap;
+-    spinlock_t lock; /* protect context, domain ids */
++    spinlock_t lock; /* protect context */
+     spinlock_t register_lock; /* protect iommu register handling */
+     u64 root_maddr; /* root entry machine address */
+     struct msi_desc msi;
+--- a/xen/include/xen/iommu.h
++++ b/xen/include/xen/iommu.h
+@@ -139,7 +139,7 @@ typedef int iommu_grdm_t(xen_pfn_t start
+ struct iommu_ops {
+     int (*init)(struct domain *d);
+     void (*hwdom_init)(struct domain *d);
+-    int (*quarantine_init)(struct domain *d);
++    int (*quarantine_init)(device_t *dev);
+     int (*add_device)(u8 devfn, device_t *dev);
+     int (*enable_device)(device_t *dev);
+     int (*remove_device)(u8 devfn, device_t *dev);
+@@ -178,6 +178,7 @@ int __must_check iommu_suspend(void);
+ void iommu_resume(void);
+ void iommu_crash_shutdown(void);
+ int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
++int iommu_quarantine_dev_init(device_t *dev);
+ 
+ void iommu_share_p2m_table(struct domain *d);
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa401-4.13-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa401-4.13-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa401-4.13-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa401-4.13-1.patch	2022-06-16 01:11:07.000000000 +0100
@@ -0,0 +1,170 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/pv: Clean up _get_page_type()
+
+Various fixes for clarity, ahead of making complicated changes.
+
+ * Split the overflow check out of the if/else chain for type handling, as
+   it's somewhat unrelated.
+ * Comment the main if/else chain to explain what is going on.  Adjust one
+   ASSERT() and state the bit layout for validate-locked and partial states.
+ * Correct the comment about TLB flushing, as it's backwards.  The problem
+   case is when writeable mappings are retained to a page becoming read-only,
+   as it allows the guest to bypass Xen's safety checks for updates.
+ * Reduce the scope of 'y'.  It is an artefact of the cmpxchg loop and not
+   valid for use by subsequent logic.  Switch to using ACCESS_ONCE() to treat
+   all reads as explicitly volatile.  The only thing preventing the validated
+   wait-loop being infinite is the compiler barrier hidden in cpu_relax().
+ * Replace one page_get_owner(page) with the already-calculated 'd' already in
+   scope.
+
+No functional change.
+
+This is part of XSA-401 / CVE-2022-26362.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: George Dunlap <george.dunlap@citrix.com>
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index ad89bfb45fff..96738b027827 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -2954,16 +2954,17 @@ static int _put_page_type(struct page_info *page, unsigned int flags,
+ static int _get_page_type(struct page_info *page, unsigned long type,
+                           bool preemptible)
+ {
+-    unsigned long nx, x, y = page->u.inuse.type_info;
++    unsigned long nx, x;
+     int rc = 0, iommu_ret = 0;
+ 
+     ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
+     ASSERT(!in_irq());
+ 
+-    for ( ; ; )
++    for ( unsigned long y = ACCESS_ONCE(page->u.inuse.type_info); ; )
+     {
+         x  = y;
+         nx = x + 1;
++
+         if ( unlikely((nx & PGT_count_mask) == 0) )
+         {
+             gdprintk(XENLOG_WARNING,
+@@ -2971,8 +2972,15 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+                      mfn_x(page_to_mfn(page)));
+             return -EINVAL;
+         }
+-        else if ( unlikely((x & PGT_count_mask) == 0) )
++
++        if ( unlikely((x & PGT_count_mask) == 0) )
+         {
++            /*
++             * Typeref 0 -> 1.
++             *
++             * Type changes are permitted when the typeref is 0.  If the type
++             * actually changes, the page needs re-validating.
++             */
+             struct domain *d = page_get_owner(page);
+ 
+             if ( d && shadow_mode_enabled(d) )
+@@ -2983,8 +2991,8 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+             {
+                 /*
+                  * On type change we check to flush stale TLB entries. It is
+-                 * vital that no other CPUs are left with mappings of a frame
+-                 * which is about to become writeable to the guest.
++                 * vital that no other CPUs are left with writeable mappings
++                 * to a frame which is intending to become pgtable/segdesc.
+                  */
+                 cpumask_t *mask = this_cpu(scratch_cpumask);
+ 
+@@ -2996,7 +3004,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+ 
+                 if ( unlikely(!cpumask_empty(mask)) &&
+                      /* Shadow mode: track only writable pages. */
+-                     (!shadow_mode_enabled(page_get_owner(page)) ||
++                     (!shadow_mode_enabled(d) ||
+                       ((nx & PGT_type_mask) == PGT_writable_page)) )
+                 {
+                     perfc_incr(need_flush_tlb_flush);
+@@ -3017,7 +3025,14 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+         }
+         else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) )
+         {
+-            /* Don't log failure if it could be a recursive-mapping attempt. */
++            /*
++             * else, we're trying to take a new reference, of the wrong type.
++             *
++             * This (being able to prohibit use of the wrong type) is what the
++             * typeref system exists for, but skip printing the failure if it
++             * looks like a recursive mapping, as subsequent logic might
++             * ultimately permit the attempt.
++             */
+             if ( ((x & PGT_type_mask) == PGT_l2_page_table) &&
+                  (type == PGT_l1_page_table) )
+                 return -EINVAL;
+@@ -3036,18 +3051,46 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+         }
+         else if ( unlikely(!(x & PGT_validated)) )
+         {
++            /*
++             * else, the count is non-zero, and we're grabbing the right type;
++             * but the page hasn't been validated yet.
++             *
++             * The page is in one of two states (depending on PGT_partial),
++             * and should have exactly one reference.
++             */
++            ASSERT((x & (PGT_type_mask | PGT_count_mask)) == (type | 1));
++
+             if ( !(x & PGT_partial) )
+             {
+-                /* Someone else is updating validation of this page. Wait... */
++                /*
++                 * The page has been left in the "validate locked" state
++                 * (i.e. PGT_[type] | 1) which means that a concurrent caller
++                 * of _get_page_type() is in the middle of validation.
++                 *
++                 * Spin waiting for the concurrent user to complete (partial
++                 * or fully validated), then restart our attempt to acquire a
++                 * type reference.
++                 */
+                 do {
+                     if ( preemptible && hypercall_preempt_check() )
+                         return -EINTR;
+                     cpu_relax();
+-                } while ( (y = page->u.inuse.type_info) == x );
++                } while ( (y = ACCESS_ONCE(page->u.inuse.type_info)) == x );
+                 continue;
+             }
+-            /* Type ref count was left at 1 when PGT_partial got set. */
+-            ASSERT((x & PGT_count_mask) == 1);
++
++            /*
++             * The page has been left in the "partial" state
++             * (i.e., PGT_[type] | PGT_partial | 1).
++             *
++             * Rather than bumping the type count, we need to try to grab the
++             * validation lock; if we succeed, we need to validate the page,
++             * then drop the general ref associated with the PGT_partial bit.
++             *
++             * We grab the validation lock by setting nx to (PGT_[type] | 1)
++             * (i.e., non-zero type count, neither PGT_validated nor
++             * PGT_partial set).
++             */
+             nx = x & ~PGT_partial;
+         }
+ 
+@@ -3094,6 +3137,13 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+     }
+ 
+  out:
++    /*
++     * Did we drop the PGT_partial bit when acquiring the typeref?  If so,
++     * drop the general reference that went along with it.
++     *
++     * N.B. validate_page() may have have re-set PGT_partial, not reflected in
++     * nx, but will have taken an extra ref when doing so.
++     */
+     if ( (x & PGT_partial) && !(nx & PGT_partial) )
+         put_page(page);
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa401-4.13-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa401-4.13-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa401-4.13-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa401-4.13-2.patch	2022-06-09 13:09:25.000000000 +0100
@@ -0,0 +1,171 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/pv: Fix ABAC cmpxchg() race in _get_page_type()
+
+_get_page_type() suffers from a race condition where it incorrectly assumes
+that because 'x' was read and a subsequent a cmpxchg() succeeds, the type
+cannot have changed in-between.  Consider:
+
+CPU A:
+  1. Creates an L2e referencing pg
+     `-> _get_page_type(pg, PGT_l1_page_table), sees count 0, type PGT_writable_page
+  2.     Issues flush_tlb_mask()
+CPU B:
+  3. Creates a writeable mapping of pg
+     `-> _get_page_type(pg, PGT_writable_page), count increases to 1
+  4. Writes into new mapping, creating a TLB entry for pg
+  5. Removes the writeable mapping of pg
+     `-> _put_page_type(pg), count goes back down to 0
+CPU A:
+  7.     Issues cmpxchg(), setting count 1, type PGT_l1_page_table
+
+CPU B now has a writeable mapping to pg, which Xen believes is a pagetable and
+suitably protected (i.e. read-only).  The TLB flush in step 2 must be deferred
+until after the guest is prohibited from creating new writeable mappings,
+which is after step 7.
+
+Defer all safety actions until after the cmpxchg() has successfully taken the
+intended typeref, because that is what prevents concurrent users from using
+the old type.
+
+Also remove the early validation for writeable and shared pages.  This removes
+race conditions where one half of a parallel mapping attempt can return
+successfully before:
+ * The IOMMU pagetables are in sync with the new page type
+ * Writeable mappings to shared pages have been torn down
+
+This is part of XSA-401 / CVE-2022-26362.
+
+Reported-by: Jann Horn <jannh@google.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: George Dunlap <george.dunlap@citrix.com>
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 96738b027827..ee91c7fe5f69 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -3005,46 +3005,12 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+              * Type changes are permitted when the typeref is 0.  If the type
+              * actually changes, the page needs re-validating.
+              */
+-            struct domain *d = page_get_owner(page);
+-
+-            if ( d && shadow_mode_enabled(d) )
+-               shadow_prepare_page_type_change(d, page, type);
+ 
+             ASSERT(!(x & PGT_pae_xen_l2));
+             if ( (x & PGT_type_mask) != type )
+             {
+-                /*
+-                 * On type change we check to flush stale TLB entries. It is
+-                 * vital that no other CPUs are left with writeable mappings
+-                 * to a frame which is intending to become pgtable/segdesc.
+-                 */
+-                cpumask_t *mask = this_cpu(scratch_cpumask);
+-
+-                BUG_ON(in_irq());
+-                cpumask_copy(mask, d->dirty_cpumask);
+-
+-                /* Don't flush if the timestamp is old enough */
+-                tlbflush_filter(mask, page->tlbflush_timestamp);
+-
+-                if ( unlikely(!cpumask_empty(mask)) &&
+-                     /* Shadow mode: track only writable pages. */
+-                     (!shadow_mode_enabled(d) ||
+-                      ((nx & PGT_type_mask) == PGT_writable_page)) )
+-                {
+-                    perfc_incr(need_flush_tlb_flush);
+-                    flush_tlb_mask(mask);
+-                }
+-
+-                /* We lose existing type and validity. */
+                 nx &= ~(PGT_type_mask | PGT_validated);
+                 nx |= type;
+-
+-                /*
+-                 * No special validation needed for writable pages.
+-                 * Page tables and GDT/LDT need to be scanned for validity.
+-                 */
+-                if ( type == PGT_writable_page || type == PGT_shared_page )
+-                    nx |= PGT_validated;
+             }
+         }
+         else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) )
+@@ -3125,6 +3091,46 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+             return -EINTR;
+     }
+ 
++    /*
++     * One typeref has been taken and is now globally visible.
++     *
++     * The page is either in the "validate locked" state (PGT_[type] | 1) or
++     * fully validated (PGT_[type] | PGT_validated | >0).
++     */
++
++    if ( unlikely((x & PGT_count_mask) == 0) )
++    {
++        struct domain *d = page_get_owner(page);
++
++        if ( d && shadow_mode_enabled(d) )
++            shadow_prepare_page_type_change(d, page, type);
++
++        if ( (x & PGT_type_mask) != type )
++        {
++            /*
++             * On type change we check to flush stale TLB entries. It is
++             * vital that no other CPUs are left with writeable mappings
++             * to a frame which is intending to become pgtable/segdesc.
++             */
++            cpumask_t *mask = this_cpu(scratch_cpumask);
++
++            BUG_ON(in_irq());
++            cpumask_copy(mask, d->dirty_cpumask);
++
++            /* Don't flush if the timestamp is old enough */
++            tlbflush_filter(mask, page->tlbflush_timestamp);
++
++            if ( unlikely(!cpumask_empty(mask)) &&
++                 /* Shadow mode: track only writable pages. */
++                 (!shadow_mode_enabled(d) ||
++                  ((nx & PGT_type_mask) == PGT_writable_page)) )
++            {
++                perfc_incr(need_flush_tlb_flush);
++                flush_tlb_mask(mask);
++            }
++        }
++    }
++
+     if ( unlikely((x & PGT_type_mask) != type) )
+     {
+         /* Special pages should not be accessible from devices. */
+@@ -3149,13 +3155,25 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+ 
+     if ( unlikely(!(nx & PGT_validated)) )
+     {
+-        if ( !(x & PGT_partial) )
++        /*
++         * No special validation needed for writable or shared pages.  Page
++         * tables and GDT/LDT need to have their contents audited.
++         *
++         * per validate_page(), non-atomic updates are fine here.
++         */
++        if ( type == PGT_writable_page || type == PGT_shared_page )
++            page->u.inuse.type_info |= PGT_validated;
++        else
+         {
+-            page->nr_validated_ptes = 0;
+-            page->partial_flags = 0;
+-            page->linear_pt_count = 0;
++            if ( !(x & PGT_partial) )
++            {
++                page->nr_validated_ptes = 0;
++                page->partial_flags = 0;
++                page->linear_pt_count = 0;
++            }
++
++            rc = alloc_page_type(page, type, preemptible);
+         }
+-        rc = alloc_page_type(page, type, preemptible);
+     }
+ 
+  out:
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-1.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-1.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-1.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-1.patch	2022-06-09 13:09:25.000000000 +0100
@@ -0,0 +1,43 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/page: Introduce _PAGE_* constants for memory types
+
+... rather than opencoding the PAT/PCD/PWT attributes in __PAGE_HYPERVISOR_*
+constants.  These are going to be needed by forthcoming logic.
+
+No functional change.
+
+This is part of XSA-402.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
+index c1e92937c073..7269ae89b880 100644
+--- a/xen/include/asm-x86/page.h
++++ b/xen/include/asm-x86/page.h
+@@ -320,6 +320,14 @@ void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
+ 
+ #define PAGE_CACHE_ATTRS (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
+ 
++/* Memory types, encoded under Xen's choice of MSR_PAT. */
++#define _PAGE_WB         (                                0)
++#define _PAGE_WT         (                        _PAGE_PWT)
++#define _PAGE_UCM        (            _PAGE_PCD            )
++#define _PAGE_UC         (            _PAGE_PCD | _PAGE_PWT)
++#define _PAGE_WC         (_PAGE_PAT                        )
++#define _PAGE_WP         (_PAGE_PAT |             _PAGE_PWT)
++
+ /*
+  * Debug option: Ensure that granted mappings are not implicitly unmapped.
+  * WARNING: This will need to be disabled to run OSes that use the spare PTE
+@@ -338,8 +346,8 @@ void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
+ #define __PAGE_HYPERVISOR_RX      (_PAGE_PRESENT | _PAGE_ACCESSED)
+ #define __PAGE_HYPERVISOR         (__PAGE_HYPERVISOR_RX | \
+                                    _PAGE_DIRTY | _PAGE_RW)
+-#define __PAGE_HYPERVISOR_UCMINUS (__PAGE_HYPERVISOR | _PAGE_PCD)
+-#define __PAGE_HYPERVISOR_UC      (__PAGE_HYPERVISOR | _PAGE_PCD | _PAGE_PWT)
++#define __PAGE_HYPERVISOR_UCMINUS (__PAGE_HYPERVISOR | _PAGE_UCM)
++#define __PAGE_HYPERVISOR_UC      (__PAGE_HYPERVISOR | _PAGE_UC)
+ 
+ #define MAP_SMALL_PAGES _PAGE_AVAIL0 /* don't use superpages mappings */
+ 
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-2.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-2.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-2.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-2.patch	2022-06-09 13:09:25.000000000 +0100
@@ -0,0 +1,204 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86: Don't change the cacheability of the directmap
+
+Changeset 55f97f49b7ce ("x86: Change cache attributes of Xen 1:1 page mappings
+in response to guest mapping requests") attempted to keep the cacheability
+consistent between different mappings of the same page.
+
+The reason wasn't described in the changelog, but it is understood to be in
+regards to a concern over machine check exceptions, owing to errata when using
+mixed cacheabilities.  It did this primarily by updating Xen's mapping of the
+page in the direct map when the guest mapped a page with reduced cacheability.
+
+Unfortunately, the logic didn't actually prevent mixed cacheability from
+occurring:
+ * A guest could map a page normally, and then map the same page with
+   different cacheability; nothing prevented this.
+ * The cacheability of the directmap was always latest-takes-precedence in
+   terms of guest requests.
+ * Grant-mapped frames with lesser cacheability didn't adjust the page's
+   cacheattr settings.
+ * The map_domain_page() function still unconditionally created WB mappings,
+   irrespective of the page's cacheattr settings.
+
+Additionally, update_xen_mappings() had a bug where the alias calculation was
+wrong for mfn's which were .init content, which should have been treated as
+fully guest pages, not Xen pages.
+
+Worse yet, the logic introduced a vulnerability whereby necessary
+pagetable/segdesc adjustments made by Xen in the validation logic could become
+non-coherent between the cache and main memory.  The CPU could subsequently
+operate on the stale value in the cache, rather than the safe value in main
+memory.
+
+The directmap contains primarily mappings of RAM.  PAT/MTRR conflict
+resolution is asymmetric, and generally for MTRR=WB ranges, PAT of lesser
+cacheability resolves to being coherent.  The special case is WC mappings,
+which are non-coherent against MTRR=WB regions (except for fully-coherent
+CPUs).
+
+Xen must not have any WC cacheability in the directmap, to prevent Xen's
+actions from creating non-coherency.  (Guest actions creating non-coherency is
+dealt with in subsequent patches.)  As all memory types for MTRR=WB ranges
+inter-operate coherently, so leave Xen's directmap mappings as WB.
+
+Only PV guests with access to devices can use reduced-cacheability mappings to
+begin with, and they're trusted not to mount DoSs against the system anyway.
+
+Drop PGC_cacheattr_{base,mask} entirely, and the logic to manipulate them.
+Shift the later PGC_* constants up, to gain 3 extra bits in the main reference
+count.  Retain the check in get_page_from_l1e() for special_pages() because a
+guest has no business using reduced cacheability on these.
+
+This reverts changeset 55f97f49b7ce6c3520c555d19caac6cf3f9a5df0
+
+This is CVE-2022-26363, part of XSA-402.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: George Dunlap <george.dunlap@citrix.com>
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index ee91c7fe5f69..859646b670a8 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -786,24 +786,6 @@ bool is_iomem_page(mfn_t mfn)
+     return (page_get_owner(page) == dom_io);
+ }
+ 
+-static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
+-{
+-    int err = 0;
+-    bool alias = mfn >= PFN_DOWN(xen_phys_start) &&
+-         mfn < PFN_UP(xen_phys_start + xen_virt_end - XEN_VIRT_START);
+-    unsigned long xen_va =
+-        XEN_VIRT_START + ((mfn - PFN_DOWN(xen_phys_start)) << PAGE_SHIFT);
+-
+-    if ( unlikely(alias) && cacheattr )
+-        err = map_pages_to_xen(xen_va, _mfn(mfn), 1, 0);
+-    if ( !err )
+-        err = map_pages_to_xen((unsigned long)mfn_to_virt(mfn), _mfn(mfn), 1,
+-                     PAGE_HYPERVISOR | cacheattr_to_pte_flags(cacheattr));
+-    if ( unlikely(alias) && !cacheattr && !err )
+-        err = map_pages_to_xen(xen_va, _mfn(mfn), 1, PAGE_HYPERVISOR);
+-    return err;
+-}
+-
+ #ifndef NDEBUG
+ struct mmio_emul_range_ctxt {
+     const struct domain *d;
+@@ -1008,47 +990,14 @@ get_page_from_l1e(
+         goto could_not_pin;
+     }
+ 
+-    if ( pte_flags_to_cacheattr(l1f) !=
+-         ((page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base) )
++    if ( (l1f & PAGE_CACHE_ATTRS) != _PAGE_WB && is_xen_heap_page(page) )
+     {
+-        unsigned long x, nx, y = page->count_info;
+-        unsigned long cacheattr = pte_flags_to_cacheattr(l1f);
+-        int err;
+-
+-        if ( is_xen_heap_page(page) )
+-        {
+-            if ( write )
+-                put_page_type(page);
+-            put_page(page);
+-            gdprintk(XENLOG_WARNING,
+-                     "Attempt to change cache attributes of Xen heap page\n");
+-            return -EACCES;
+-        }
+-
+-        do {
+-            x  = y;
+-            nx = (x & ~PGC_cacheattr_mask) | (cacheattr << PGC_cacheattr_base);
+-        } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
+-
+-        err = update_xen_mappings(mfn, cacheattr);
+-        if ( unlikely(err) )
+-        {
+-            cacheattr = y & PGC_cacheattr_mask;
+-            do {
+-                x  = y;
+-                nx = (x & ~PGC_cacheattr_mask) | cacheattr;
+-            } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
+-
+-            if ( write )
+-                put_page_type(page);
+-            put_page(page);
+-
+-            gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" PRI_mfn
+-                     " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for d%d\n",
+-                     mfn, get_gpfn_from_mfn(mfn),
+-                     l1e_get_intpte(l1e), l1e_owner->domain_id);
+-            return err;
+-        }
++        if ( write )
++            put_page_type(page);
++        put_page(page);
++        gdprintk(XENLOG_WARNING,
++                 "Attempt to change cache attributes of Xen heap page\n");
++        return -EACCES;
+     }
+ 
+     return 0;
+@@ -2541,25 +2490,10 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
+  */
+ static int cleanup_page_mappings(struct page_info *page)
+ {
+-    unsigned int cacheattr =
+-        (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
+     int rc = 0;
+     unsigned long mfn = mfn_x(page_to_mfn(page));
+ 
+     /*
+-     * If we've modified xen mappings as a result of guest cache
+-     * attributes, restore them to the "normal" state.
+-     */
+-    if ( unlikely(cacheattr) )
+-    {
+-        page->count_info &= ~PGC_cacheattr_mask;
+-
+-        BUG_ON(is_xen_heap_page(page));
+-
+-        rc = update_xen_mappings(mfn, 0);
+-    }
+-
+-    /*
+      * If this may be in a PV domain's IOMMU, remove it.
+      *
+      * NB that writable xenheap pages have their type set and cleared by
+diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
+index 320c6cd19669..db09849f73f8 100644
+--- a/xen/include/asm-x86/mm.h
++++ b/xen/include/asm-x86/mm.h
+@@ -64,22 +64,19 @@
+  /* Set when is using a page as a page table */
+ #define _PGC_page_table   PG_shift(3)
+ #define PGC_page_table    PG_mask(1, 3)
+- /* 3-bit PAT/PCD/PWT cache-attribute hint. */
+-#define PGC_cacheattr_base PG_shift(6)
+-#define PGC_cacheattr_mask PG_mask(7, 6)
+  /* Page is broken? */
+-#define _PGC_broken       PG_shift(7)
+-#define PGC_broken        PG_mask(1, 7)
++#define _PGC_broken       PG_shift(4)
++#define PGC_broken        PG_mask(1, 4)
+  /* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
+-#define PGC_state         PG_mask(3, 9)
+-#define PGC_state_inuse   PG_mask(0, 9)
+-#define PGC_state_offlining PG_mask(1, 9)
+-#define PGC_state_offlined PG_mask(2, 9)
+-#define PGC_state_free    PG_mask(3, 9)
++#define PGC_state           PG_mask(3, 6)
++#define PGC_state_inuse     PG_mask(0, 6)
++#define PGC_state_offlining PG_mask(1, 6)
++#define PGC_state_offlined  PG_mask(2, 6)
++#define PGC_state_free      PG_mask(3, 6)
+ #define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
+ 
+  /* Count of references to this frame. */
+-#define PGC_count_width   PG_shift(9)
++#define PGC_count_width   PG_shift(6)
+ #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
+ 
+ /*
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-3.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-3.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-3.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-3.patch	2022-06-16 01:29:44.000000000 +0100
@@ -0,0 +1,263 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86: Split cache_flush() out of cache_writeback()
+
+Subsequent changes will want a fully flushing version.
+
+Use the new helper rather than opencoding it in flush_area_local().  This
+resolves an outstanding issue where the conditional sfence is on the wrong
+side of the clflushopt loop.  clflushopt is ordered with respect to older
+stores, not to younger stores.
+
+Rename gnttab_cache_flush()'s helper to avoid colliding in name.
+grant_table.c can see the prototype from cache.h so the build fails
+otherwise.
+
+This is part of XSA-402.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+Xen 4.16 and earlier:
+ * Also backport half of c/s 3330013e67396 "VT-d / x86: re-arrange cache
+   syncing" to split cache_writeback() out of the IOMMU logic, but without the
+   associated hooks changes.
+
+diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
+index 03f92c23dcaf..8568491c7ea9 100644
+--- a/xen/arch/x86/flushtlb.c
++++ b/xen/arch/x86/flushtlb.c
+@@ -230,7 +230,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
+     if ( flags & FLUSH_CACHE )
+     {
+         const struct cpuinfo_x86 *c = &current_cpu_data;
+-        unsigned long i, sz = 0;
++        unsigned long sz = 0;
+ 
+         if ( order < (BITS_PER_LONG - PAGE_SHIFT) )
+             sz = 1UL << (order + PAGE_SHIFT);
+@@ -240,13 +240,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
+              c->x86_clflush_size && c->x86_cache_size && sz &&
+              ((sz >> 10) < c->x86_cache_size) )
+         {
+-            alternative(ASM_NOP3, "sfence", X86_FEATURE_CLFLUSHOPT);
+-            for ( i = 0; i < sz; i += c->x86_clflush_size )
+-                alternative_input(".byte " __stringify(NOP_DS_PREFIX) ";"
+-                                  " clflush %0",
+-                                  "data16 clflush %0",      /* clflushopt */
+-                                  X86_FEATURE_CLFLUSHOPT,
+-                                  "m" (((const char *)va)[i]));
++            cache_flush(va, sz);
+             flags &= ~FLUSH_CACHE;
+         }
+         else
+@@ -262,3 +256,77 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
+ 
+     return flags;
+ }
++
++void cache_flush(const void *addr, unsigned int size)
++{
++    /*
++     * This function may be called before current_cpu_data is established.
++     * Hence a fallback is needed to prevent the loop below becoming infinite.
++     */
++    unsigned int clflush_size = current_cpu_data.x86_clflush_size ?: 16;
++    const void *end = addr + size;
++
++    addr -= (unsigned long)addr & (clflush_size - 1);
++    for ( ; addr < end; addr += clflush_size )
++    {
++        /*
++         * Note regarding the "ds" prefix use: it's faster to do a clflush
++         * + prefix than a clflush + nop, and hence the prefix is added instead
++         * of letting the alternative framework fill the gap by appending nops.
++         */
++        alternative_io("ds; clflush %[p]",
++                       "data16 clflush %[p]", /* clflushopt */
++                       X86_FEATURE_CLFLUSHOPT,
++                       /* no outputs */,
++                       [p] "m" (*(const char *)(addr)));
++    }
++
++    alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
++}
++
++void cache_writeback(const void *addr, unsigned int size)
++{
++    unsigned int clflush_size;
++    const void *end = addr + size;
++
++    /* Fall back to CLFLUSH{,OPT} when CLWB isn't available. */
++    if ( !boot_cpu_has(X86_FEATURE_CLWB) )
++        return cache_flush(addr, size);
++
++    /*
++     * This function may be called before current_cpu_data is established.
++     * Hence a fallback is needed to prevent the loop below becoming infinite.
++     */
++    clflush_size = current_cpu_data.x86_clflush_size ?: 16;
++    addr -= (unsigned long)addr & (clflush_size - 1);
++    for ( ; addr < end; addr += clflush_size )
++    {
++/*
++ * The arguments to a macro must not include preprocessor directives. Doing so
++ * results in undefined behavior, so we have to create some defines here in
++ * order to avoid it.
++ */
++#if defined(HAVE_AS_CLWB)
++# define CLWB_ENCODING "clwb %[p]"
++#elif defined(HAVE_AS_XSAVEOPT)
++# define CLWB_ENCODING "data16 xsaveopt %[p]" /* clwb */
++#else
++# define CLWB_ENCODING ".byte 0x66, 0x0f, 0xae, 0x30" /* clwb (%%rax) */
++#endif
++
++#define BASE_INPUT(addr) [p] "m" (*(const char *)(addr))
++#if defined(HAVE_AS_CLWB) || defined(HAVE_AS_XSAVEOPT)
++# define INPUT BASE_INPUT
++#else
++# define INPUT(addr) "a" (addr), BASE_INPUT(addr)
++#endif
++
++        asm volatile (CLWB_ENCODING :: INPUT(addr));
++
++#undef INPUT
++#undef BASE_INPUT
++#undef CLWB_ENCODING
++    }
++
++    asm volatile ("sfence" ::: "memory");
++}
+diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
+index cbb2ce17c001..709509e0fc9e 100644
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -3320,7 +3320,7 @@ gnttab_swap_grant_ref(XEN_GUEST_HANDLE_PARAM(gnttab_swap_grant_ref_t) uop,
+     return 0;
+ }
+ 
+-static int cache_flush(const gnttab_cache_flush_t *cflush, grant_ref_t *cur_ref)
++static int _cache_flush(const gnttab_cache_flush_t *cflush, grant_ref_t *cur_ref)
+ {
+     struct domain *d, *owner;
+     struct page_info *page;
+@@ -3414,7 +414,7 @@ gnttab_cache_flush(XEN_GUEST_HANDLE_PARAM(gnttab_cache_flush_t) uop,
+             return -EFAULT;
+         for ( ; ; )
+         {
+-            int ret = cache_flush(&op, cur_ref);
++            int ret = _cache_flush(&op, cur_ref);
+ 
+             if ( ret < 0 )
+                 return ret;
+diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
+index fbe951b2fad0..3defe9677f06 100644
+--- a/xen/drivers/passthrough/vtd/extern.h
++++ b/xen/drivers/passthrough/vtd/extern.h
+@@ -77,7 +77,6 @@ int __must_check qinval_device_iotlb_sync(struct iommu *iommu,
+                                           struct pci_dev *pdev,
+                                           u16 did, u16 size, u64 addr);
+ 
+-unsigned int get_cache_line_size(void);
+ void flush_all_cache(void);
+ 
+ u64 alloc_pgtable_maddr(struct acpi_drhd_unit *drhd, unsigned long npages);
+diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
+index f051a55764b9..2bf5f02c08de 100644
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -31,6 +31,7 @@
+ #include <xen/pci.h>
+ #include <xen/pci_regs.h>
+ #include <xen/keyhandler.h>
++#include <asm/cache.h>
+ #include <asm/msi.h>
+ #include <asm/nops.h>
+ #include <asm/irq.h>
+@@ -219,53 +220,10 @@ static int iommus_incoherent;
+ 
+ static void sync_cache(const void *addr, unsigned int size)
+ {
+-    static unsigned long clflush_size = 0;
+-    const void *end = addr + size;
+-
+     if ( !iommus_incoherent )
+         return;
+ 
+-    if ( clflush_size == 0 )
+-        clflush_size = get_cache_line_size();
+-
+-    addr -= (unsigned long)addr & (clflush_size - 1);
+-    for ( ; addr < end; addr += clflush_size )
+-/*
+- * The arguments to a macro must not include preprocessor directives. Doing so
+- * results in undefined behavior, so we have to create some defines here in
+- * order to avoid it.
+- */
+-#if defined(HAVE_AS_CLWB)
+-# define CLWB_ENCODING "clwb %[p]"
+-#elif defined(HAVE_AS_XSAVEOPT)
+-# define CLWB_ENCODING "data16 xsaveopt %[p]" /* clwb */
+-#else
+-# define CLWB_ENCODING ".byte 0x66, 0x0f, 0xae, 0x30" /* clwb (%%rax) */
+-#endif
+-
+-#define BASE_INPUT(addr) [p] "m" (*(const char *)(addr))
+-#if defined(HAVE_AS_CLWB) || defined(HAVE_AS_XSAVEOPT)
+-# define INPUT BASE_INPUT
+-#else
+-# define INPUT(addr) "a" (addr), BASE_INPUT(addr)
+-#endif
+-        /*
+-         * Note regarding the use of NOP_DS_PREFIX: it's faster to do a clflush
+-         * + prefix than a clflush + nop, and hence the prefix is added instead
+-         * of letting the alternative framework fill the gap by appending nops.
+-         */
+-        alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %[p]",
+-                         "data16 clflush %[p]", /* clflushopt */
+-                         X86_FEATURE_CLFLUSHOPT,
+-                         CLWB_ENCODING,
+-                         X86_FEATURE_CLWB, /* no outputs */,
+-                         INPUT(addr));
+-#undef INPUT
+-#undef BASE_INPUT
+-#undef CLWB_ENCODING
+-
+-    alternative_2("", "sfence", X86_FEATURE_CLFLUSHOPT,
+-                      "sfence", X86_FEATURE_CLWB);
++    cache_writeback(addr, size);
+ }
+ 
+ /* Allocate page table, return its machine address */
+diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
+index 229938f3a812..2a18b76e800d 100644
+--- a/xen/drivers/passthrough/vtd/x86/vtd.c
++++ b/xen/drivers/passthrough/vtd/x86/vtd.c
+@@ -48,11 +48,6 @@ void unmap_vtd_domain_page(void *va)
+     unmap_domain_page(va);
+ }
+ 
+-unsigned int get_cache_line_size(void)
+-{
+-    return ((cpuid_ebx(1) >> 8) & 0xff) * 8;
+-}
+-
+ void flush_all_cache()
+ {
+     wbinvd();
+diff --git a/xen/include/asm-x86/cache.h b/xen/include/asm-x86/cache.h
+index 1f7173d8c72c..e4770efb22b9 100644
+--- a/xen/include/asm-x86/cache.h
++++ b/xen/include/asm-x86/cache.h
+@@ -11,4 +11,11 @@
+ 
+ #define __read_mostly __section(".data.read_mostly")
+ 
++#ifndef __ASSEMBLY__
++
++void cache_flush(const void *addr, unsigned int size);
++void cache_writeback(const void *addr, unsigned int size);
++
++#endif
++
+ #endif
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-4.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-4.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-4.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-4.patch	2022-06-16 01:38:06.000000000 +0100
@@ -0,0 +1,83 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/amd: Work around CLFLUSH ordering on older parts
+
+On pre-CLFLUSHOPT AMD CPUs, CLFLUSH is weakely ordered with everything,
+including reads and writes to the address, and LFENCE/SFENCE instructions.
+
+This creates a multitude of problematic corner cases, laid out in the manual.
+Arrange to use MFENCE on both sides of the CLFLUSH to force proper ordering.
+
+This is part of XSA-402.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
+index b77fa1929733..aa1b9d0dda6b 100644
+--- a/xen/arch/x86/cpu/amd.c
++++ b/xen/arch/x86/cpu/amd.c
+@@ -624,6 +624,14 @@ static void init_amd(struct cpuinfo_x86 *c)
+ 	if (!cpu_has_lfence_dispatch)
+ 		__set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
+ 
++	/*
++	 * On pre-CLFLUSHOPT AMD CPUs, CLFLUSH is weakly ordered with
++	 * everything, including reads and writes to address, and
++	 * LFENCE/SFENCE instructions.
++	 */
++	if (!cpu_has_clflushopt)
++		setup_force_cpu_cap(X86_BUG_CLFLUSH_MFENCE);
++
+ 	switch(c->x86)
+ 	{
+ 	case 0xf ... 0x11:
+diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
+index 8568491c7ea9..6f3f5ab1a3c4 100644
+--- a/xen/arch/x86/flushtlb.c
++++ b/xen/arch/x86/flushtlb.c
+@@ -257,6 +257,13 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
+     return flags;
+ }
+ 
++/*
++ * On pre-CLFLUSHOPT AMD CPUs, CLFLUSH is weakly ordered with everything,
++ * including reads and writes to address, and LFENCE/SFENCE instructions.
++ *
++ * This function only works safely after alternatives have run.  Luckily, at
++ * the time of writing, we don't flush the caches that early.
++ */
+ void cache_flush(const void *addr, unsigned int size)
+ {
+     /*
+@@ -266,6 +273,8 @@ void cache_flush(const void *addr, unsigned int size)
+     unsigned int clflush_size = current_cpu_data.x86_clflush_size ?: 16;
+     const void *end = addr + size;
+ 
++    alternative("", "mfence", X86_BUG_CLFLUSH_MFENCE);
++
+     addr -= (unsigned long)addr & (clflush_size - 1);
+     for ( ; addr < end; addr += clflush_size )
+     {
+@@ -281,7 +290,9 @@ void cache_flush(const void *addr, unsigned int size)
+                        [p] "m" (*(const char *)(addr)));
+     }
+ 
+-    alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
++    alternative_2("",
++                  "sfence", X86_FEATURE_CLFLUSHOPT,
++                  "mfence", X86_BUG_CLFLUSH_MFENCE);
+ }
+ 
+ void cache_writeback(const void *addr, unsigned int size)
+diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
+index b9d3cac97538..a8222e978cd9 100644
+--- a/xen/include/asm-x86/cpufeatures.h
++++ b/xen/include/asm-x86/cpufeatures.h
+@@ -44,6 +44,7 @@ XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
+ #define X86_BUG(x) ((FSCAPINTS + X86_NR_SYNTH) * 32 + (x))
+ 
+ #define X86_BUG_FPU_PTRS          X86_BUG( 0) /* (F)X{SAVE,RSTOR} doesn't save/restore FOP/FIP/FDP. */
++#define X86_BUG_CLFLUSH_MFENCE    X86_BUG( 2) /* MFENCE needed to serialise CLFLUSH */
+ 
+ /* Total number of capability words, inc synth and bug words. */
+ #define NCAPINTS (FSCAPINTS + X86_NR_SYNTH + X86_NR_BUG) /* N 32-bit words worth of info */
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-5.patch xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-5.patch
--- xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-5.patch	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/patches/xsa402-4.13-5.patch	2022-06-16 22:26:27.000000000 +0100
@@ -0,0 +1,160 @@
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Subject: x86/pv: Track and flush non-coherent mappings of RAM
+
+There are legitimate uses of WC mappings of RAM, e.g. for DMA buffers with
+devices that make non-coherent writes.  The Linux sound subsystem makes
+extensive use of this technique.
+
+For such usecases, the guest's DMA buffer is mapped and consistently used as
+WC, and Xen doesn't interact with the buffer.
+
+However, a mischevious guest can use WC mappings to deliberately create
+non-coherency between the cache and RAM, and use this to trick Xen into
+validating a pagetable which isn't actually safe.
+
+Allocate a new PGT_non_coherent to track the non-coherency of mappings.  Set
+it whenever a non-coherent writeable mapping is created.  If the page is used
+as anything other than PGT_writable_page, force a cache flush before
+validation.  Also force a cache flush before the page is returned to the heap.
+
+This is CVE-2022-26364, part of XSA-402.
+
+Reported-by: Jann Horn <jannh@google.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: George Dunlap <george.dunlap@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 859646b670a8..f5eeddce5867 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -1085,6 +1085,15 @@ get_page_from_l1e(
+         return -EACCES;
+     }
+ 
++    /*
++     * Track writeable non-coherent mappings to RAM pages, to trigger a cache
++     * flush later if the target is used as anything but a PGT_writeable page.
++     * We care about all writeable mappings, including foreign mappings.
++     */
++    if ( !boot_cpu_has(X86_FEATURE_XEN_SELFSNOOP) &&
++         (l1f & (PAGE_CACHE_ATTRS | _PAGE_RW)) == (_PAGE_WC | _PAGE_RW) )
++        set_bit(_PGT_non_coherent, &page->u.inuse.type_info);
++
+     return 0;
+ 
+  could_not_pin:
+@@ -2516,6 +2525,19 @@ static int cleanup_page_mappings(struct page_info *page)
+         }
+     }
+ 
++    /*
++     * Flush the cache if there were previously non-coherent writeable
++     * mappings of this page.  This forces the page to be coherent before it
++     * is freed back to the heap.
++     */
++    if ( __test_and_clear_bit(_PGT_non_coherent, &page->u.inuse.type_info) )
++    {
++        void *addr = __map_domain_page(page);
++
++        cache_flush(addr, PAGE_SIZE);
++        unmap_domain_page(addr);
++    }
++
+     return rc;
+ }
+ 
+@@ -3068,6 +3090,22 @@ static int _get_page_type(struct page_info *page, unsigned long type,
+     if ( unlikely(!(nx & PGT_validated)) )
+     {
+         /*
++         * Flush the cache if there were previously non-coherent mappings of
++         * this page, and we're trying to use it as anything other than a
++         * writeable page.  This forces the page to be coherent before we
++         * validate its contents for safety.
++         */
++        if ( (nx & PGT_non_coherent) && type != PGT_writable_page )
++        {
++            void *addr = __map_domain_page(page);
++
++            cache_flush(addr, PAGE_SIZE);
++            unmap_domain_page(addr);
++
++            page->u.inuse.type_info &= ~PGT_non_coherent;
++        }
++
++        /*
+          * No special validation needed for writable or shared pages.  Page
+          * tables and GDT/LDT need to have their contents audited.
+          *
+diff --git a/xen/arch/x86/pv/grant_table.c b/xen/arch/x86/pv/grant_table.c
+index 0325618c9883..81c72e61ed55 100644
+--- a/xen/arch/x86/pv/grant_table.c
++++ b/xen/arch/x86/pv/grant_table.c
+@@ -109,7 +109,17 @@ int create_grant_pv_mapping(uint64_t addr, mfn_t frame,
+ 
+     ol1e = *pl1e;
+     if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn_x(gl1mfn), curr, 0) )
++    {
++        /*
++         * We always create mappings in this path.  However, our caller,
++         * map_grant_ref(), only passes potentially non-zero cache_flags for
++         * MMIO frames, so this path doesn't create non-coherent mappings of
++         * RAM frames and there's no need to calculate PGT_non_coherent.
++         */
++        ASSERT(!cache_flags || is_iomem_page(frame));
++
+         rc = GNTST_okay;
++    }
+ 
+  out_unlock:
+     page_unlock(page);
+@@ -294,7 +304,18 @@ int replace_grant_pv_mapping(uint64_t addr, mfn_t frame,
+                  l1e_get_flags(ol1e), addr, grant_pte_flags);
+ 
+     if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn_x(gl1mfn), curr, 0) )
++    {
++        /*
++         * Generally, replace_grant_pv_mapping() is used to destroy mappings
++         * (n1le = l1e_empty()), but it can be a present mapping on the
++         * GNTABOP_unmap_and_replace path.
++         *
++         * In such cases, the PTE is fully transplanted from its old location
++         * via steal_linear_addr(), so we need not perform PGT_non_coherent
++         * checking here.
++         */
+         rc = GNTST_okay;
++    }
+ 
+  out_unlock:
+     page_unlock(page);
+diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
+index db09849f73f8..82d0fd6104a2 100644
+--- a/xen/include/asm-x86/mm.h
++++ b/xen/include/asm-x86/mm.h
+@@ -48,8 +48,12 @@
+ #define _PGT_partial      PG_shift(8)
+ #define PGT_partial       PG_mask(1, 8)
+ 
++/* Has this page been mapped writeable with a non-coherent memory type? */
++#define _PGT_non_coherent PG_shift(9)
++#define PGT_non_coherent  PG_mask(1, 9)
++
+  /* Count of uses of this frame as its current type. */
+-#define PGT_count_width   PG_shift(8)
++#define PGT_count_width   PG_shift(9)
+ #define PGT_count_mask    ((1UL<<PGT_count_width)-1)
+ 
+ /* Are the 'type mask' bits identical? */
+
+diff -u a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
+--- a/xen/include/asm-x86/cpufeature.h
++++ b/xen/include/asm-x86/cpufeature.h
+@@ -101,6 +101,7 @@
+ #define cpu_has_mpx             boot_cpu_has(X86_FEATURE_MPX)
+ #define cpu_has_rdseed          boot_cpu_has(X86_FEATURE_RDSEED)
+ #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
++#define cpu_has_clflushopt      boot_cpu_has(X86_FEATURE_CLFLUSHOPT)
+ #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
+ 
+ /* CPUID level 0x80000007.edx */
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/source/lintian-overrides xen-4.11.3+24-g14b62ab3e5/debian/source/lintian-overrides
--- xen-4.11.3+24-g14b62ab3e5/debian/source/lintian-overrides	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/source/lintian-overrides	2022-06-17 09:26:10.000000000 +0100
@@ -0,0 +1,3 @@
+xen source: debhelper-but-no-misc-depends xen-hypervisor-4.9-amd64
+xen source: debhelper-but-no-misc-depends xen-hypervisor-4.9-arm64
+xen source: debhelper-but-no-misc-depends xen-hypervisor-4.9-armhf
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.install xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.install
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.install	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.install	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,5 @@
+# autogenerated, do not edit
+usr/lib/modules-load.d/* usr/lib/modules-load.d/
+usr/lib/debug/xen* usr/lib/debug/
+# ^ The xen* wildcard excludes the shim symbols.  The shim is treated
+#   as part of the toolstack - see xen-utils-V.install.vsn-in.
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.lintian-overrides xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.lintian-overrides
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.lintian-overrides	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.lintian-overrides	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,5 @@
+# autogenerated, do not edit
+debug-package-should-be-named-dbg usr/lib/debug/xen-*-pre.efi.map.gz
+# ^ as is traditional for kernels too, we ship the debug information
+#   in the hypervisor package rather than creating yet another
+#   package just for this file, which is only 50K or so anyway.
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.postinst xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.postinst
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.postinst	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11-amd64.postinst	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,24 @@
+#!/bin/bash
+# autogenerated, do not edit
+
+set -e
+
+case "$1" in
+    configure)
+        if command -v update-grub > /dev/null && [ -d /boot/grub ]; then
+            update-grub || :
+        fi
+    ;;
+
+    abort-upgrade|abort-remove|abort-deconfigure)
+    ;;
+
+    *)
+	echo "postinst called with unknown argument \`$1'" >&2
+	exit 1
+    ;;
+esac
+
+#DEBHELPER#
+
+exit 0
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.bug-control xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.bug-control
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.bug-control	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.bug-control	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,2 @@
+# autogenerated, do not edit
+Submit-As: src:xen
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.postinst xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.postinst
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.postinst	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.postinst	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,24 @@
+#!/bin/bash
+# autogenerated, do not edit
+
+set -e
+
+case "$1" in
+    configure)
+        if command -v update-grub > /dev/null && [ -d /boot/grub ]; then
+            update-grub || :
+        fi
+    ;;
+
+    abort-upgrade|abort-remove|abort-deconfigure)
+    ;;
+
+    *)
+	echo "postinst called with unknown argument \`$1'" >&2
+	exit 1
+    ;;
+esac
+
+#DEBHELPER#
+
+exit 0
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.postrm xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.postrm
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.postrm	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-4.11.postrm	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,24 @@
+#!/bin/bash
+# autogenerated, do not edit
+
+set -e
+
+case "$1" in
+    remove)
+        if command -v update-grub > /dev/null && [ -d /boot/grub ]; then
+            update-grub || :
+        fi
+    ;;
+
+    purge|upgrade|failed-upgrade|abort-install|abort-upgrade|disappear)
+    ;;
+
+    *)
+        echo "postrm called with unknown argument \`$1'" >&2
+        exit 1
+    ;;
+esac
+
+#DEBHELPER#
+
+exit 0
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-V-F.install.vsn-in xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-V-F.install.vsn-in
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-V-F.install.vsn-in	2020-02-28 13:14:00.000000000 +0000
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-hypervisor-V-F.install.vsn-in	2022-06-19 07:08:13.000000000 +0100
@@ -1,4 +1,3 @@
-
 usr/lib/debug/xen* usr/lib/debug/
 # ^ The xen* wildcard excludes the shim symbols.  The shim is treated
 #   as part of the toolstack - see xen-utils-V.install.vsn-in.
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.bug-control xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.bug-control
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.bug-control	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.bug-control	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,2 @@
+# autogenerated, do not edit
+Submit-As: src:xen
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.install xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.install
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.install	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.install	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,8 @@
+# autogenerated, do not edit
+usr/lib/xen-4.11/bin
+usr/lib/xen-4.11/lib/python
+
+usr/lib/xen-4.11/boot
+usr/lib/debug/usr/lib/xen-*/boot/* usr/lib/debug/xen-syms-4.11-shim
+# ^ Yes, the upstream build system really does install the shim symbols
+#   file in debian/tmp/usr/lib/debug/usr/lib/xen-4.11/boot/xen-shim-syms
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.lintian-overrides xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.lintian-overrides
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.lintian-overrides	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.lintian-overrides	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,14 @@
+# autogenerated, do not edit
+statically-linked-binary usr/lib/xen-4.11/boot/hvmloader
+statically-linked-binary usr/lib/xen-4.11/boot/xen-shim
+
+binary-has-unneeded-section usr/lib/xen-4.11/boot/xen-shim .note
+# ^ that section is certainly needed for the tools etc. to be able
+#   to load it!
+
+binary-from-other-architecture usr/lib/debug/xen-syms-4.11-shim/xen-shim-syms
+# ^ this is a symbols file for the shim
+
+binary-or-shlib-defines-rpath usr/lib/xen-4.11/lib/python/fsimage.so /usr/lib/xen-4.11/lib/x86_64-linux-gnu
+# ^ this module needs to load the libfsimage .so from within
+#   the xen-utils private directory.  less +/fsimage debian/rules
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.postinst xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.postinst
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.postinst	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.postinst	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,25 @@
+#!/bin/sh
+# autogenerated, do not edit
+
+set -e
+
+case "$1" in
+    configure)
+        update-alternatives --remove xen-default /usr/lib/xen-4.11
+        if [ -x "/etc/init.d/xen" ]; then
+	    invoke-rc.d xen start || exit $?
+        fi
+    ;;
+
+    abort-upgrade|abort-remove|abort-deconfigure)
+    ;;
+
+    *)
+	echo "postinst called with unknown argument \`$1'" >&2
+	exit 1
+    ;;
+esac
+
+#DEBHELPER#
+
+exit 0
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.prerm xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.prerm
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.prerm	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.prerm	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,25 @@
+#!/bin/bash
+# autogenerated, do not edit
+
+set -e
+
+case "$1" in
+    remove|upgrade)
+        update-alternatives --remove xen-default /usr/lib/xen-4.11
+        if [ -x "/etc/init.d/xen" ]; then
+            invoke-rc.d xen stop || exit $?
+        fi
+    ;;
+
+    deconfigure|failed-upgrade)
+    ;;
+
+    *)
+        echo "prerm called with unknown argument \`$1'" >&2
+        exit 1
+    ;;
+esac
+
+#DEBHELPER#
+
+exit 0
diff -Nru xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.README.Debian xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.README.Debian
--- xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.README.Debian	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.11.3+24-g14b62ab3e5/debian/xen-utils-4.11.README.Debian	2022-06-17 09:11:17.000000000 +0100
@@ -0,0 +1,49 @@
+# autogenerated, do not edit
+Xen for Debian
+==============
+
+Config behaviour
+----------------
+
+The Debian packages changes the behaviour of some config options.
+
+The options "kernel", "initrd" and "loader" searches in the Xen private boot
+directory (/usr/lib/xen-$version/boot) first. "bootloader" and "device_model"
+also searches the Xen private bin directory (/usr/lib/xen-$version/bin). This
+means that the following entries will properly find anything:
+  loader = 'hvmloader'
+  bootloader = 'pygrub'
+
+Network setup
+-------------
+
+The Debian package of Xen don't change the network setup in any way.  This
+differs from the upstream version, which overwrites the main network card
+(eth0) with a bridge setup and may break the network at this point..
+
+To setup a bridge please follow the instructions in the manpage for
+bridge-utils-interfaces(5).
+
+Loop devices
+------------
+
+If you plan hosting virtual domains with file backed block devices (ie. the
+ones xen-tools creates by default) be careful about two issues:
+
+1. Maximum number of loop devices
+   By default the loop driver supports a maximum of 8 loop devices. Of
+   course since every Xen domain uses at least two (one for the data and one
+   for the swap) this number is absolutely insufficient. You should increase
+   it by adding a file named local-loop in /etc/modprobe.d containing the
+   string "options loop max_loop=128", if the loop driver is compiled as a
+   module, or by appending the string max_loop=128 to your kernel parameters
+   if the driver is in-kernel. Of course you can increase or decrease the
+   number 128 as you see fit.
+
+2. Driver loading (only if loop is compiled as a module)
+   Normally the loop driver gets loaded when the first loop device is
+   accessed. When using udev, though, the loop devices get created only
+   after the driver gets loaded. This means that Xen will fail if the loop
+   driver is not already loaded when it tries to start a file-backed virtual
+   domain.  To fix this just add "loop" in your /etc/modules file, thus
+   forcing it to be loaded at boot time.