[SRU] pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
resource-agents (Ubuntu) |
Fix Released
|
Critical
|
Bryce Harrington | ||
Focal |
Fix Released
|
Critical
|
Bryce Harrington | ||
Groovy |
Fix Released
|
Critical
|
Bryce Harrington |
Bug Description
[Impact]
resource-agent uses crm_mon to determine node state, however crm_mon's output format differs on bionic and focal which results in invalid status reporting for focal hosts. This has resulted in, for example, failure when migrating a bionic pgsql node to focal.
[Test Case]
Set up a 4-nodes Focal Pacemaker/Corosync cluster with the following CIB:
https:/
Check the XML file with the cluster status, the 'pgsql-status' and 'pgsql-data-status' are not listed as nodes attributes:
ubuntu@ekans:~$ sudo crm_mon --as-xml | grep -A11 "<node_attributes>"
<node_attributes>
<node name="budew">
<attribute name="master-pgsql" value="1000"/>
<attribute name="pgsql-
</node>
<node name="ekans">
<attribute name="master-pgsql" value="1000"/>
<attribute name="pgsql-
</node>
<node name="tyrogue">
<attribute name="master-pgsql" value="1000"/>
<attribute name="pgsql-
[Regression Potential]
Since this changes the node status reporting for resource-agents, watch for anything depending on the status information for managing nodes such as issues upgrading software or migrating to new ubuntu releases, or such as web dashboards, etc.
[Fix]
Upstream appears to have encountered and fixed the issue by adjusting the regex to cover the new line format. This corresponds to the following upstream commit:
https:/
[Discussion]
In groovy's 4.6.1, the issue is fixed a bit differently, by switching to use of crm_mon1200 XML format
[Original Report]
There is a bug in the resource agent's node_exist function. It looks at crm_mon output, which has changed between bionic and focal.
The result is that the 'pgsql-status' and 'pgsql-data-status' attributes are missing from crm status --as-xml output on focal.
Here is the focal output:
http://
Here is the bionic output:
http://
This is the node_exist function:
node_exist() {
print_crm_mon | tr '[A-Z]' '[a-z]' | grep -q "^node $1"
}
It's looking for a line starting with "Node <nodename>".
That works in bionic, but in focal, it's " * Node <nodename>".
is_online has the same problem:
is_node_online() {
print_crm_mon | tr '[A-Z]' '[a-z]' | grep -e "^node $1 " -e "^node $1:" | grep -q -v "offline"
}
It looks like this is the upstream:
https:/
It's fixed there; they look at crm_mon xml output instead.
I tested with changing the regex to "node $1:" and it works fine. that could be tightened up a bit to just match "node <nodename>" or " * node <nodename>", but I'm not sure if we shouldn't just pull in something from upstream so I haven't spent time refining that.
this is on focal with resource-agents 1:4.5.0-2ubuntu2
Related branches
- Lucas Kanashiro (community): Approve
- Canonical Server: Pending requested
- Canonical Server Core Reviewers: Pending requested
- Canonical Server packageset reviewers: Pending requested
-
Diff: 79 lines (+57/-0)3 files modifieddebian/changelog (+11/-0)
debian/patches/crm-mon-format.patch (+45/-0)
debian/patches/series (+1/-0)
Changed in resource-agents (Ubuntu): | |
status: | Triaged → In Progress |
Changed in resource-agents (Ubuntu Focal): | |
importance: | Undecided → Critical |
status: | New → In Progress |
assignee: | nobody → Bryce Harrington (bryce) |
status: | In Progress → Triaged |
description: | updated |
description: | updated |
tags: | removed: block-proposed-focal |
sub'd to field high; this breaks our ability to validate postgres HA on focal.