Bug #722201 “CTDB port is not aware of Ubuntu-specific NFS Setti...” : Bugs : samba package : Debian

Revision history for this message

Mathieu Parent (math-parent) wrote on 2011-03-23:

#1

Can you propose a patch?

Revision history for this message

Eric G (erickg) wrote on 2014-12-19:

#2

This seems to be impacting me on 14.04 as well. I am only trying to use CTDB with Samba:

/var/log/ctdb/log.ctdb:

2014/12/19 10:42:23.770789 [ 1077]: startup event failed
2014/12/19 10:42:28.771410 [ 1077]: Recoveries finished. Running the "startup" event.
2014/12/19 10:42:28.891893 [ 1077]: 50.samba: Failed to start samba
2014/12/19 10:42:28.892134 [ 1077]: startup event failed
2014/12/19 10:42:33.893168 [ 1077]: Recoveries finished. Running the "startup" event.
2014/12/19 10:42:34.205463 [ 1077]: 50.samba: Failed to start samba
2014/12/19 10:42:34.205666 [ 1077]: startup event failed
2014/12/19 10:42:39.206372 [ 1077]: Recoveries finished. Running the "startup" event.
2014/12/19 10:42:39.573180 [ 1077]: 50.samba: Failed to start samba
2014/12/19 10:42:39.573395 [ 1077]: startup event failed

root@san1:/etc/ctdb# ctdb status
Number of nodes:2
pnn:0 10.10.1.21 UNHEALTHY (THIS NODE)
pnn:1 10.10.1.22 UNHEALTHY
Generation:1291295798
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-12-19:

#3

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ctdb (Ubuntu):
status:	New → Confirmed

Revision history for this message

Eric G (erickg) wrote on 2014-12-19:

#4

I was able to make some tweaks and get it running for Samba. Looks like these could also apply to upstream Debian.

The service command is in a different path:

root@san1:/etc/ctdb# diff -u functions functions.orig
--- functions 2014-12-19 11:24:12.660339600 -0500
+++ functions.orig 2014-12-19 10:53:29.247030923 -0500
@@ -161,8 +161,6 @@

   if [ -x /sbin/service ]; then
       $_nice /sbin/service "$_service_name" "$_op"
- elif [ -x /usr/sbin/service ]; then
- $_nice /usr/sbin/service "$_service_name" "$_op"
   elif [ -x $CTDB_ETCDIR/init.d/$_service_name ]; then
       $_nice $CTDB_ETCDIR/init.d/$_service_name "$_op"
   elif [ -x $CTDB_ETCDIR/rc.d/init.d/$_service_name ]; then

This might actually be a problem with the samba init script, but smbd and nmbd seem to be controled by upstart and the scripts in /etc/init.d don't actually do anything. Maybe /etc/init.d/samba is being phased out and the services are managed directly, which seems to work here:

root@san1:/etc/ctdb/events.d# diff -u 50.samba ~/50.samba.orig
--- 50.samba 2014-12-19 11:22:05.522193976 -0500
+++ /root/50.samba.orig 2014-12-19 11:21:46.602468765 -0500
@@ -14,8 +14,8 @@
   CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-nmb}
   ;;
  debian)
- CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-smbd}
- CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-nmbd}
+ CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-samba}
+ CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-""}
   ;;
  *)
   # Use redhat style as default:

root@san1:/etc/ctdb/events.d# ctdb status
Number of nodes:2
pnn:0 10.10.1.21 OK (THIS NODE)
pnn:1 10.10.1.22 UNHEALTHY
Generation:330683100
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1

Revision history for this message

Eric G (erickg) wrote on 2014-12-19:

#5

There is a related Samba bug #1321369 open to address the second patch I attached. It seems that my patch is probably the preferred fix for now as /etc/init.d/samba is still buggy.

https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1321369

Revision history for this message

Robert Sander (gurubert) wrote on 2017-12-09:

#6

patch for Ubuntu 17.10 Edit (2.8 KiB, text/plain)

This patch makes nfs-kernel-server work with ctdb.

It rewrites the code to use /etc/default/nfs-kernel-server and the correct systemd service.

As the is a unit nfs-mountd.service that gets started from nfs-kernel-server.service as requirement and this unit only uses $RPCMOUNTDARGS the setting in /etc/default/nfs-kernel-server should be

# MOUNTD_PORT=597
RPCMOUNTDOPTS="-p 597"

Revision history for this message

Ubuntu Foundations Team Bug Bot (crichton) wrote on 2017-12-10:

#7

The attachment "patch for Ubuntu 17.10" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags:

added: patch

Revision history for this message

Robert Sander (gurubert) wrote on 2017-12-16:

#8

Additionally it is necessary to create a file /etc/modprobe.d/lockd.conf with this content:

# Set the TCP port that the NFS lock manager should use.
# port must be a valid TCP port value (1-65535).
options lockd nlm_tcpport=599

# Set the UDP port that the NFS lock manager should use.
# port must be a valid UDP port value (1-65535).
options lockd nlm_udpport=599

Lockd as a kernel process has to listen to the same port on every machine in the CTDB cluster.

Andreas Hasenack (ahasenack) on 2018-11-12

tags:

added: server-next

Revision history for this message

Robert Sander (gurubert) wrote on 2019-03-26:

#9

ctdb-19.04.patch Edit (3.4 KiB, text/plain)

Ubuntu 19.04 comes with ctdb 4.10 which changes the layout of the files.

Attached is a new patch against ctdb 4.10

Revision history for this message

Robert Sander (gurubert) wrote on 2019-03-26:

#10

There are some additional variables to be set in /etc/default/nfs-kernel-server

NFS_HOSTNAME="servername"

RQUOTAD_PORT=598
LOCKD_UDPPORT=599
LOCKD_TCPPORT=599
STATD_PORT=595
STATD_OUTGOING_PORT=596
STATD_HOSTNAME="$NFS_HOSTNAME"

Christian Ehrhardt  (paelzer) on 2019-05-27

Changed in ctdb (Ubuntu):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-05-28:

#11

Download full text (6.0 KiB)

I have reviewed this bug and read all CTDB documentation and my initial thoughts for a proper CTDB Ubuntu Enablement are:

1)

(not addressed here, should we have a default pkg dependency here ? there is none nowadays)

There is a mandatory dependency on a clustered file-system to exist, since all the file atomicity is guaranteed through the filesystem layer, and not CTDB. Existing options are:

a) GFS2 - Ubuntu has gfs2-utils and it is part of Debian HA imports. It depends on having CLVM (clustered LVM2) running, changing LVM2 locking type to 3. clvm depends on having the distributed locking manager running and we had issues in the past (because of redhat -> debianization). Check: https://bugs.launchpad.net/ubuntu/+source/dlm/+bug/1248054 . TODO: Would have to make sure clvmd + dlm + lvm2 locking + gfs2 are good for Eoan (initially), Disco, Cosmic and Bionic (LTS) at least. There is no specific change need in SMB or NFS config files.

b) Gluster - Its a straight forward installation / configuration with existing packages already. Supports other interconnects (like Infiniband). There is no specific change need in SMB or NFS config files.

c) GPFS - Its proprietary (IBM) and could/should be enabled by IBM (possibly 4 hands with us, if ever intended).

d) OCFS2 - Open Source and mature project. Supported by Ubuntu kernel and ocfs2-tools package. Samba config file changes needed:

vfs objects = fileid
fileid:algorithm = fsid

NFS does not need any apparent config file change.

- Its better that the clustered filesystem supports uniform device number across the nodes! (https://wiki.samba.org/index.php/Setting_up_a_cluster_filesystem#Checking_uniformity_of_device_and_inode_numbering)

e) LustreFS - Open source but its almost entirely dependant on Mellanox OFED packages (for Infiniband support) and/or CentOS dkms (usually packages done by DDN). I know DDN/whomcloud is working in upstreaming a kernel tree supporting LustreFS and that could lead Debian to have LustreFS support, but not the case here.

2)

(partially addressed by comment #10)

NFS RPC service ports need to be bound statically to the same ports in all nodes (not the default). There is no proper decision on which ports should be (to replace some old service ports, like smalltalk ?). Would this be changed during package installation ? That might lead us to problems in already existing NFS servers (example).

- NFSv4 is not recommended and should be disabled whenever CTDB is enabled (would install script recommend or force this ?). Official documentation says:

"Unfortunately, RPCNFSDOPTS isn't used by Debian Sys-V init, so there is no way to disable NFSv4 via the configuration file"

We would have to re-verify nfs-kernel-server systemd dependencies and (possibly ?) set it to NFSv3 only (something like work done in https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1590799). CTDB + NFS also depends on specific (to CTDB) environment variables (like the NFS hostname).

3)

https://ctdb.samba.org/manpages/ctdb.7.html:

PRIVATE and PUBLIC addresses + LVS + NATGW + POLICY ROUTING

CTDB configures network interfaces automatically based on decisions made by its internal recovery algorit...

I have reviewed this bug and read all CTDB documentation and my initial thoughts for a proper CTDB Ubuntu Enablement are:

1)

(not addressed here, should we have a default pkg dependency here ? there is none nowadays)

There is a mandatory dependency on a clustered file-system to exist, since all the file atomicity is guaranteed through the filesystem layer, and not CTDB. Existing options are:

a) GFS2 - Ubuntu has gfs2-utils and it is part of Debian HA imports. It depends on having CLVM (clustered LVM2) running, changing LVM2 locking type to 3. clvm depends on having the distributed locking manager running and we had issues in the past (because of redhat -> debianization). Check: https://bugs.launchpad.net/ubuntu/+source/dlm/+bug/1248054 . TODO: Would have to make sure clvmd + dlm + lvm2 locking + gfs2 are good for Eoan (initially), Disco, Cosmic and Bionic (LTS) at least. There is no specific change need in SMB or NFS config files.

b) Gluster - Its a straight forward installation / configuration with existing packages already. Supports other interconnects (like Infiniband). There is no specific change need in SMB or NFS config files.

c) GPFS - Its proprietary (IBM) and could/should be enabled by IBM (possibly 4 hands with us, if ever intended).

d) OCFS2 - Open Source and mature project. Supported by Ubuntu kernel and ocfs2-tools package. Samba config file changes needed:

vfs objects = fileid
fileid:algorithm = fsid

NFS does not need any apparent config file change.

- Its better that the clustered filesystem supports uniform device number across the nodes! (https://wiki.samba.org/index.php/Setting_up_a_cluster_filesystem#Checking_uniformity_of_device_and_inode_numbering)

e) LustreFS - Open source but its almost entirely dependant on Mellanox OFED packages (for Infiniband support) and/or CentOS dkms (usually packages done by DDN). I know DDN/whomcloud is working in upstreaming a kernel tree supporting LustreFS and that could lead Debian to have LustreFS support, but not the case here.

2)

(partially addressed by comment #10)

NFS RPC service ports need to be bound statically to the same ports in all nodes (not the default). There is no proper decision on which ports should be (to replace some old service ports, like smalltalk ?). Would this be changed during package installation ? That might lead us to problems in already existing NFS servers (example).

- NFSv4 is not recommended and should be disabled whenever CTDB is enabled (would install script recommend or force this ?). Official documentation says:

"Unfortunately, RPCNFSDOPTS isn't used by Debian Sys-V init, so there is no way to disable NFSv4 via the configuration file"

We would have to re-verify nfs-kernel-server systemd dependencies and (possibly ?) set it to NFSv3 only (something like work done in https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1590799). CTDB + NFS also depends on specific (to CTDB) environment variables (like the NFS hostname).

3)

https://ctdb.samba.org/manpages/ctdb.7.html:

PRIVATE and PUBLIC addresses + LVS + NATGW + POLICY ROUTING

CTDB configures network interfaces automatically based on decisions made by its internal recovery algorithm (something similar to a master election during reconfs, taking in consideration online votes).

LVS:

- Client sends request packet to LVSMASTER.
- LVSMASTER passes the request on to one node across the internal network.
- Selected node processes the request.
- Node responds back to client.

and NATGW:

When the NATGW functionality is used, one of the nodes is selected to act as a NAT gateway for all the other nodes in the group when they need to communicate with the external services. The NATGW master is selected to be a node that is most likely to have usable networks.

All those features would have to be checked for full enablement and compatibility against systemd-networkd / ifupdown / netplan / initialization scripts. It might not need anything, but testing public and private interfaces being configured in different ways would be a plus here.

4) Specific (to Debian) patches to be carried over on upstream merges:

From patch ctdb_ubuntu_nfs.patch:

+nfs_service="nfs-kernel-server"
+nfslock_service=""
+nfs_config="/etc/default/nfs-kernel-server"

If dealing with systemd (for starting/stopping services) we would have to make sure environment files (from /etc/default) are being read. If dealing with systemd + sysv generator, then the compatibility would be higher and integration easier (existing scripts dealing with script start/stop).

During package installation, would nfs or samba service be disabled and stopped ? This is a requirement since CTDB would control start/stop of those services like if they were resources of a cluster (same decision took from Debian HA packages might be needed here, for corosync + pacemaker resources).

5) Tests that could be integrated at the end:

https://wiki.samba.org/index.php/Setting_up_CTDB_for_Clustered_NFS (# File Handle Consistency)

https://wiki.samba.org/index.php/Setting_up_a_cluster_filesystem (# Checking uniformity of device and inode numbering)

https://wiki.samba.org/index.php/Configuring_clustered_Samba (# Using Samba4 smbtorture)

OBS: Samba ≤ 4.8 needs different configuration options (but I'm assuming this would be a Ubuntu Dev Enablement for future releases).

----

IMO

if we don't address, at least, those concerns, it would not make much of a difference to accept those patches. We would have to guarantee that all items described above are addressed in Ubuntu. For example, accepting a patch that might work if nfs-kernel-server systemd scripts are calling sysv legacy ones, but won't work with systemd-only scripts, would be no good, right ? Just one example I can think of.

TL;DR

I vote for enabling this together with the "Ubuntu HA" effort (on clustered file-systems, distributed locking managers, clustering softwares), guaranteeing that we are enabling/supporting properly all HA software in Ubuntu (possibly documenting different ways of guaranteeing HA on daemons, using one or another, with examples that we will have to use to validate this).

Revision history for this message

Robert Sander (gurubert) wrote on 2019-05-29:

#12

Please to not create a hard dependency on any of the mentioned cluster filesystems (and Ceph) as only one of them is needed torun CTDB.

The latest CTDB is also able to store the recovery lock as a RADOS object in Ceph. In addition with the Ceph VFS module for Samba CephFS is not needed as a mount for example.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-05-29:

#13

Yes Robert,

I tried to documented all things needed to have it fully 'Ubuntu aware'. I do agree with you on the hard dependencies for clustered files systems, they don't make much sense.

I also see the need for a warn on disabling NFSv4, a warn or to guarantee NFS RPC services bound to same static ports in all nodes, a warn or guarantee NFS service is stopped and disabled on CTDB enablement and to make sure CTDB scripts can start/stop networking and services (NFS in this particular case) with sysv (and/or systemd scripts): which this bug is all about.

So at the end it might be a question of warning user about those needs, or something similar (since disabling services and/or guaranteeing fixed RPC ports are likely not something to be done in postinstall scripts).

I'll review your patches today and try to propose something for Ubuntu Eoan.

Thanks for the report, and sorry it took so long for a full review.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-05-30:

#14

Robert,

I'll verify your patch on comment #6, since the nfs-kernel-server:

(k)inaddy@ctdbserver01:/lib/systemd$ systemctl list-dependencies nfs-kernel-server.service
nfs-kernel-server.service
● ├─auth-rpcgss-module.service
● ├─nfs-config.service
● ├─nfs-idmapd.service
● ├─nfs-mountd.service
● ├─proc-fs-nfsd.mount
● ├─rpc-svcgssd.service
● ├─rpcbind.socket
● ├─system.slice
● └─network.target

might be enough for guaranteeing rpc.statd to start/stop altogether with nfsd when CTDB starts/stops the services on the nodes.

I have also to verify missing (or deactivated) environmental variables (NEED_GSSD or NEED_SVCGSSD, for example) brought by /etc/default/nfs-* and, nowadays, with systemd, read by script /usr/lib/systemd/scripts/nfs-utils_env.sh, generating an environment file in /run, executed by nfs-config.service as a "oneshot" service whenever nfs is restarted (will verify if any of those would change by CTDB, for example). This will cover your initial comment:

" ... ctdb is able to control NFS daemons, but it never looks for /etc/init.d/nfs-kernel-server for startup or /etc/default/nfs-kernel-server for settings ... ".

I'll have to think of something for the RPC ports :\. You said in comment #10 that this:

NFS_HOSTNAME="servername"

RQUOTAD_PORT=598
LOCKD_UDPPORT=599
LOCKD_TCPPORT=599
STATD_PORT=595
STATD_OUTGOING_PORT=596
STATD_HOSTNAME="$NFS_HOSTNAME"

needed to be included in /etc/default/nfs-kernel-server, possibly because CTDB needs those environment variables (apart from the "same ports on all nodes need"). If CTDB manipulates systemctl to start/stop the services, then I would have to make sure those are parsed by the nfs-config.service logic (nfs-config -> nfs-utils_env.sh -> /run/sysconfig/nfs-utils environment file).

That would work for needed NFS_HOSTNAME environment variable.. and, for the ports, I could create systemd overrides to set all RPC services to the same port, or provide those commented and give the user a message to comment them out (since touching that during package installation does not seem appropriate to me).

We have 2 other things:

/etc/modprobe.d/lockd.conf -> that is most likely a no-go, like the LOCKD_UDPPORT=599 approach. I mean, at least not automatically (possibly commented, telling user to make sure its all commented out in all CTDB nodes).

And:

Disabling NFSv4.. (and rpc.idmapd), which could follow the same thing... providing information to final user about not having it enabled (specially because installing CTDB would require to have all nfs-* services disabled anyway, so CTDB postinst can't definitely activate systemd services by default, for example).

That is all I can think of to check/implement in Ubuntu Eoan. Please, let me know if there is anything else you can think of.

I'm testing all this in the following scenario:

2 gluster servers providing 1 volume to 3 ctdb servers in 1 network, and those 3 ctdb servers providing a NFS with LVS network IP to 2 NFS clients.

It is very likely that CTDB will need a similar work to make sure samba, for example, is supported, out-of-the-box, like what we are doing here, by Ubuntu CTDB package.

I'll get back to you soon.

Robert,

I'll verify your patch on comment #6, since the nfs-kernel-server:

(k)inaddy@ctdbserver01:/lib/systemd$ systemctl list-dependencies nfs-kernel-server.service
nfs-kernel-server.service
● ├─auth-rpcgss-module.service
● ├─nfs-config.service
● ├─nfs-idmapd.service
● ├─nfs-mountd.service
● ├─proc-fs-nfsd.mount
● ├─rpc-svcgssd.service
● ├─rpcbind.socket
● ├─system.slice
● └─network.target

might be enough for guaranteeing rpc.statd to start/stop altogether with nfsd when CTDB starts/stops the services on the nodes.

I have also to verify missing (or deactivated) environmental variables (NEED_GSSD or NEED_SVCGSSD, for example) brought by /etc/default/nfs-* and, nowadays, with systemd, read by script /usr/lib/systemd/scripts/nfs-utils_env.sh, generating an environment file in /run, executed by nfs-config.service as a "oneshot" service whenever nfs is restarted (will verify if any of those would change by CTDB, for example). This will cover your initial comment:

" ... ctdb is able to control NFS daemons, but it never looks for /etc/init.d/nfs-kernel-server for startup or /etc/default/nfs-kernel-server for settings ... ".

I'll have to think of something for the RPC ports :\. You said in comment #10 that this:

NFS_HOSTNAME="servername"

RQUOTAD_PORT=598
LOCKD_UDPPORT=599
LOCKD_TCPPORT=599
STATD_PORT=595
STATD_OUTGOING_PORT=596
STATD_HOSTNAME="$NFS_HOSTNAME"

needed to be included in /etc/default/nfs-kernel-server, possibly because CTDB needs those environment variables (apart from the "same ports on all nodes need"). If CTDB manipulates systemctl to start/stop the services, then I would have to make sure those are parsed by the nfs-config.service logic (nfs-config -> nfs-utils_env.sh -> /run/sysconfig/nfs-utils environment file).

That would work for needed NFS_HOSTNAME environment variable.. and, for the ports, I could create systemd overrides to set all RPC services to the same port, or provide those commented and give the user a message to comment them out (since touching that during package installation does not seem appropriate to me).

We have 2 other things:

/etc/modprobe.d/lockd.conf -> that is most likely a no-go, like the LOCKD_UDPPORT=599 approach. I mean, at least not automatically (possibly commented, telling user to make sure its all commented out in all CTDB nodes).

And:

Disabling NFSv4.. (and rpc.idmapd), which could follow the same thing... providing information to final user about not having it enabled (specially because installing CTDB would require to have all nfs-* services disabled anyway, so CTDB postinst can't definitely activate systemd services by default, for example).

That is all I can think of to check/implement in Ubuntu Eoan. Please, let me know if there is anything else you can think of.

I'm testing all this in the following scenario:

2 gluster servers providing 1 volume to 3 ctdb servers in 1 network, and those 3 ctdb servers providing a NFS with LVS network IP to 2 NFS clients.

It is very likely that CTDB will need a similar work to make sure samba, for example, is supported, out-of-the-box, like what we are doing here, by Ubuntu CTDB package.

I'll get back to you soon.

Rafael David Tinoco (rafaeldtinoco) on 2019-05-30

Changed in samba (Ubuntu):
status:	New → In Progress
importance:	Undecided → Medium
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)
milestone:	none → eoan-updates
no longer affects:	ctdb (Ubuntu)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-05-31:

#15

Doing the CTDB setup I faced the same issues as:

https://bugs.launchpad.net/ubuntu/+source/ctdb/+bug/1335540
https://bugs.launchpad.net/ubuntu/+source/ctdb/+bug/1821775
https://bugs.launchpad.net/ubuntu/+source/ctdb/+bug/1828799

I'll merge the fixes (for those and this bug) altogether.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#16

One of the issues is being addressed upstream:

https://lists.samba.org/archive/samba-technical/2019-June/133694.html

So we can avoid merge diffs about this in the future (and I can refer upstream patch for this particular fix).

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#17

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929931

Changed in ctdb (Debian):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#18

https://bugzilla.samba.org/show_bug.cgi?id=13984

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#19

Future possible issue While ctdb is not merged to *at least* samba-4.9.7 (in Debian):

commit 022b9a6ca7d8cb6f541b1b24b27da4e1a3bea04b
Author: Martin Schwenke <email address hidden>
Date: Tue Mar 26 00:49:49 2019

ctdb-scripts: Add test variable CTDB_NFS_DISTRO_STYLE

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860

    Signed-off-by: Martin Schwenke <email address hidden>
    Reviewed-by: Amitay Isaacs <email address hidden>
    (cherry picked from commit e72c3c800a50fe746164e319e21180c44d041619)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#20

/etc/ctdb/nfs-linux-kernel-callout :

TODO:
- place non-upstream (?) patch (DEP3) inside debian/patches (quilt)

# Red Hat
# nfs_service="nfs"
# nfslock_service="nfslock"
# nfs_config="/etc/sysconfig/nfs"

# SUSE
# nfs_service="nfsserver"
# nfslock_service=""
# nfs_config="/etc/sysconfig/nfs"

# Debian
nfs_service="nfs-kernel-server"
nfslock_service=""
nfs_config="/etc/default/nfs-kernel-server"

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#21

Future possible issue while ctdb is not merged to *at least* samba-4.9.8 (in Debian):

commit 49fa08814e2a1032e88353eec42b952316d6ec18
Author: Martin Schwenke <email address hidden>
Date: Wed Mar 20 07:22:43 2019

ctdb-scripts: Update statd-callout to try several configuration files

    The alternative seems to be to try something via CTDB_NFS_CALLOUT.
    That would be complicated and seems like overkill for something this
    simple.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860

    Signed-off-by: Martin Schwenke <email address hidden>
    Reviewed-by: Amitay Isaacs <email address hidden>
    (cherry picked from commit a2bd4085896804ee2da811e17f18c78a5bf4e658)

BUT, still, isn't appropriate since Debian systemd NFS server script is called either nfs-kernel-server.service (or an alias called nfs-server.service) OR nfs-ganesha.service (if using the userland nfs server version).

/etc/ctdb/statd-callout:

- place non-upstream (?) patch (DEP3) inside debian/patches(quilt)

load_system_config "nfs-kernel-server"

...

############################################################

# ctdb_setup_state_dir "service" "nfs"
ctdb_setup_state_dir "service" "nfs-kernel-server"

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#22

TODO:
- ctdb has to depend on nfs-common (/usr/lib/systemd/scripts/nfs-utils_env.sh)
- ctdb has to update /usr/lib/systemd/scripts/nfs-utils_env.sh -> NFS_HOSTNAME=nodename
- /etc/default/nfs-common has to be updated -> NFS_HOSTNAME=nodename

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#23

TODO:
- ctdb.postinst and ctdbpostrm to create /var/lib/ctdb{volatile,persistent,state}
- tmpfiles.d to create /run/ctdb

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#24

/etc/ctdb/events/legacy/60.nfs.script: 1: eval: rpc.rquotad: not found

legacy/60.nfs.script needs rpc.rquotad.

ctdb needs to depend on:

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#25

(hit enter too fast)

- nfs-kernel-server
- nfs-common
- quota

So the NFS legacy events won't find problems when they are enabled. Since those are not the only events supported by CTDB:

(k)inaddy@ctdbserver01:/etc/ctdb$ ctdb event script list legacy
* 00.ctdb
* 01.reclock
* 05.system
* 06.nfs
* 10.interface
  11.natgw
  11.routing
  13.per_ip_routing
  20.multipathd
  31.clamd
  40.vsftpd
  41.httpd
  49.winbind
  50.samba
* 60.nfs
  70.iscsi
  91.lvs

Instead of depending on: multipath-tools, clamd, vsftpd, apache, smb, iscsi and nfs-kernel-server... we could put those dependencies as recommends. This way they are installed together BUT if --no-install-recommends is passed they are not. I would start adding only the 3 NFS related services since I'm focused in enabling NFS here.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#26

Following error:

019/05/31 18:21:38.648661 ctdbd[1000]: Starting traverse on DB ctdb.tdb (id 806)
2019/05/31 18:21:38.651601 ctdbd[1000]: Ending traverse on DB ctdb.tdb (id 806), records 0
2019/05/31 18:21:40.185388 ctdb-eventd[1002]: 60.nfs: ss: bison bellows (while parsing filter): "syntax error!" Sorry.
2019/05/31 18:21:40.185421 ctdb-eventd[1002]: 60.nfs: Usage: ss [ OPTIONS ]
2019/05/31 18:21:40.185433 ctdb-eventd[1002]: 60.nfs: ss [ OPTIONS ] [ FILTER ]
2019/05/31 18:21:40.185441 ctdb-eventd[1002]: 60.nfs: -h, --help this message
2019/05/31 18:21:40.185449 ctdb-eventd[1002]: 60.nfs: -V, --version output version information

PROBLEM:

########################################################
# tickle handling
########################################################

update_tickles ()
{
...

# the parentheses can't be empty! -> BROKEN:

    ss -tn state established \
       "${_ip_filter:+( ${_ip_filter} )}" \
       "${_port_filter:+( ${_port_filter} )}" |
    awk 'NR > 1 {print $4, $3}' |
    sort >"$_my_connections"

HERE:

60.nfs: + _port=2049
60.nfs: + tickledir=/var/lib/ctdb/scripts/tickles
60.nfs: + mkdir -p /var/lib/ctdb/scripts/tickles
60.nfs: + ctdb_get_pnn
60.nfs: + _pnn_file=/var/lib/ctdb/scripts/my-pnn
60.nfs: + [ ! -f /var/lib/ctdb/scripts/my-pnn ]
60.nfs: + cat /var/lib/ctdb/scripts/my-pnn
60.nfs: + _pnn=0
60.nfs: + /usr/bin/ctdb -X ip
60.nfs: + awk -F| -v pnn=0 $3 == pnn {print $2}it
60.nfs: + _ips=172.16.17.3
60.nfs: + _ip_filter= ok
60.nfs: + _ip_filter=src [172.16.17.3]
60.nfs: + _port_filter=sport == :2049
60.nfs: + _my_connections=/var/lib/ctdb/scripts/tickles/2049.connections.12623
60.nfs: + ss -tn state established ( src [172.16.17.3] ) ( sport == :2049
)
2019/05/31 18:44:35.631800 ctdb-eventd[12050]: 60.nfs: + awk NR > 1 {print $4, $3}

ss -tn state established "src 172.16.17.3 sport == :2049"

Looks like using "( ... )" syntax for ss was no good.

Changing update_tickles () function to have:

ss -tn state established "${_ip_filter} ${_port_filter}" | awk 'NR > 1 {print $4, $3}' | sort >"$_my_connections"

instead, fixes the issue.

Following error:

019/05/31 18:21:38.648661 ctdbd[1000]: Starting traverse on DB ctdb.tdb (id 806)
2019/05/31 18:21:38.651601 ctdbd[1000]: Ending traverse on DB ctdb.tdb (id 806), records 0
2019/05/31 18:21:40.185388 ctdb-eventd[1002]: 60.nfs: ss: bison bellows (while parsing filter): "syntax error!" Sorry.
2019/05/31 18:21:40.185421 ctdb-eventd[1002]: 60.nfs: Usage: ss [ OPTIONS ]
2019/05/31 18:21:40.185433 ctdb-eventd[1002]: 60.nfs:        ss [ OPTIONS ] [ FILTER ]
2019/05/31 18:21:40.185441 ctdb-eventd[1002]: 60.nfs:    -h, --help          this message
2019/05/31 18:21:40.185449 ctdb-eventd[1002]: 60.nfs:    -V, --version       output version information

PROBLEM:

########################################################
# tickle handling
########################################################

update_tickles ()
{
	...

# the parentheses can't be empty! -> BROKEN:
    
    ss -tn state established \
       "${_ip_filter:+( ${_ip_filter} )}" \
       "${_port_filter:+( ${_port_filter} )}" |
    awk 'NR > 1 {print $4, $3}' |
    sort >"$_my_connections"

HERE:

60.nfs: + _port=2049
60.nfs: + tickledir=/var/lib/ctdb/scripts/tickles
60.nfs: + mkdir -p /var/lib/ctdb/scripts/tickles
60.nfs: + ctdb_get_pnn
60.nfs: + _pnn_file=/var/lib/ctdb/scripts/my-pnn
60.nfs: + [ ! -f /var/lib/ctdb/scripts/my-pnn ]
60.nfs: + cat /var/lib/ctdb/scripts/my-pnn
60.nfs: + _pnn=0
60.nfs: + /usr/bin/ctdb -X ip
60.nfs: + awk -F| -v pnn=0 $3 == pnn {print $2}it
60.nfs: + _ips=172.16.17.3
60.nfs: + _ip_filter=									ok
60.nfs: + _ip_filter=src [172.16.17.3]
60.nfs: + _port_filter=sport == :2049
60.nfs: + _my_connections=/var/lib/ctdb/scripts/tickles/2049.connections.12623
60.nfs: + ss -tn state established ( src [172.16.17.3] ) ( sport == :2049
)
2019/05/31 18:44:35.631800 ctdb-eventd[12050]: 60.nfs: + awk NR > 1 {print $4, $3}

ss -tn state established "src 172.16.17.3 sport == :2049"

Looks like using "( ... )" syntax for ss was no good.

Changing update_tickles () function to have:

ss -tn state established "${_ip_filter} ${_port_filter}" | awk 'NR > 1 {print $4, $3}' | sort >"$_my_connections"

instead, fixes the issue.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#27

Example of appendix to /etc/services to enable CTDB NFS HA:

/etc/services: (append)

rpc.nfsd 2049/tcp # RPC nfsd
rpc.nfsd 2049/udp # RPC nfsd
rpc.nfs-cb 32764/tcp # RPC nfs callback
rpc.nfs-cb 32764/udp # RPC nfs callback
rpc.statd-bc 32765/tcp # RPC statd broadcast
rpc.statd-bc 32765/udp # RPC statd broadcast
rpc.statd 32766/tcp # RPC statd listen
rpc.statd 32766/udp # RPC statd listen
rpc.mountd 32767/tcp # RPC mountd
rpc.mountd 32767/udp # RPC mountd
rpc.lockd 32768/tcp # RPC lockd/nlockmgr
rpc.lockd 32768/udp # RPC lockd/nlockmgr
rpc.quotad 32769/tcp # RPC quotad
rpc.quotad 32769/udp # RPC quotad

TODO:
- provide /etc/ctdb/services.example
- provide instructions to append /etc/ctdb/services.example to /etc/services

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#28

Example of /etc/default/nfs-common file to enable CTDB NFS HA:

## /etc/default/nfs-common

NFS_HOSTNAME="ctdbserver02.public" <- $HOSTNAME

# rpc.statd - daemon listening for reboot notifications (locks related)
NEED_STATD="yes"
STATDOPTS="-n ${NFS_HOSTNAME} -p 32765 -o 32766 -H /etc/ctdb/statd-callout -T 32768 -U 32768"
STATD_HOSTNAME="$NFS_HOSTNAME"

# rpc.gssd - security context for rpc connections
NEED_GSSD="no"

# rpc.idmapd - NFSv4 <-> name mapping daemon (fallback nowadays)
# recent kernels use nfsidmap(8) instead
NEED_IDMAPD="no"

# rpc.quota - usage quota
RPCRQUOTADOPTS="-p 32769"

## end of file

TODO:
- provide a /etc/ctdb/nfs-common.example
- provide instructions to replace /etc/default/nfs-common with the example

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#29

Example of /etc/default/nfs-kernel file to enable CTDB NFS HA:

## /etc/default/nfs-kernel-server

NFS_HOSTNAME="ctdbserver02.public" <- $HOSTNAME

# rpc.nfsd - user level part of nfs service (kernel: nfsd module)
RPCNFSDPRIORITY=0
RPCNFSDCOUNT=8
RPCNFSDOPTS="-N 4"

# rpc.mountd - server side of nfs mount protocol
RPCMOUNTDOPTS="-p 32767 --manage-gids --no-nfs-version 4"

# rpc.svcgssd - userspace daemon to handle security context for kernel rpcsec_gss
NEED_SVCGSSD="no"
RPCSVCGSSDOPTS=""

## end of file

TODO:
- provide a /etc/ctdb/nfs-kernel-server.example
- provide instructions to replace /etc/default/nfs-kernel-server with the example

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#30

(k)inaddy@ctdbserver02:/etc/sysctl.d$ cat 98-nfs-static-ports.conf
fs.nfs.nfs_callback_tcpport = 32764
fs.nfs.nlm_tcpport = 32768
fs.nfs.nlm_udpport = 32768

TODO:
- provide a /etc/ctdb/98-nfs-static-ports-sysctl.conf.example
- provide instructions to place it as /etc/sysctl.d/98-nfs-static-ports.conf

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-03:

#31

With all those changes, that I'll provide either by suggesting upstream patches and backporting to Debian Unstable -> Ubuntu Devel, or by creating changes as regular Debian Package patches, I was able to make CTDB NFS HA to work flawless:

(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver01.public:/mnt/glusterfs/data /mnt/ctdbserver01
(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver02.public:/mnt/glusterfs/data /mnt/ctdbserver02
(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver03.public:/mnt/glusterfs/data /mnt/ctdbserver03

(k)inaddy@ctdbclient01:~$ while true; do sleep 2; dd if=/dev/random of=/mnt/ctdbserver01/file bs=1k count=2 ; dd if=/dev/random of=/mnt/ctdbserver02/file bs=1k count=2; dd if=/dev/random of=/mnt/ctdbserver03/file bs=1k count=2; done

0+2 records in
0+2 records out
160 bytes copied, 0.0127338 s, 12.6 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.00846823 s, 18.7 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0096586 s, 16.4 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.01485 s, 10.6 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0134006 s, 11.8 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.00700728 s, 22.5 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0109971 s, 14.4 kB/s
0+2 records in
0+2 records out

During a failure in one of the ctdbservers (ctdbserver03):

(k)inaddy@ctdbserver02:~$ ctdb status
Number of nodes:3
pnn:0 172.16.9.1 OK
pnn:1 172.16.9.2 OK (THIS NODE)
pnn:2 172.16.9.3 DISCONNECTED|UNHEALTHY|STOPPED|INACTIVE
Generation:92168728
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:0

And the public addresses were correctly set in interface "eth1" as they were supposed:

(k)inaddy@ctdbserver01:~$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:1c:31:c3 brd ff:ff:ff:ff:ff:ff
    inet 172.16.17.1/24 brd 172.16.17.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 172.16.17.3/24 brd 172.16.17.255 scope global secondary eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe1c:31c3/64 scope link
       valid_lft forever preferred_lft forever

(k)inaddy@ctdbserver02:~$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:af:f8:53 brd ff:ff:ff:ff:ff:ff
    inet 172.16.17.2/24 brd 172.16.17.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feaf:f853/64 scope link
       valid_lft forever preferred_lft forever

according to variable NFS_HOSTNAME. Note that ctdbserver01 has its own public ip address and ip address of ctdbserver03, which I failed on purpose. NFS client kept access like it should.

With all those changes, that I'll provide either by suggesting upstream patches and backporting to Debian Unstable -> Ubuntu Devel, or by creating changes as regular Debian Package patches, I was able to make CTDB NFS HA to work flawless:

(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver01.public:/mnt/glusterfs/data /mnt/ctdbserver01
(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver02.public:/mnt/glusterfs/data /mnt/ctdbserver02
(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver03.public:/mnt/glusterfs/data /mnt/ctdbserver03

(k)inaddy@ctdbclient01:~$ while true; do sleep 2; dd if=/dev/random of=/mnt/ctdbserver01/file bs=1k count=2 ; dd if=/dev/random of=/mnt/ctdbserver02/file bs=1k count=2; dd if=/dev/random of=/mnt/ctdbserver03/file bs=1k count=2; done

0+2 records in
0+2 records out
160 bytes copied, 0.0127338 s, 12.6 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.00846823 s, 18.7 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0096586 s, 16.4 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.01485 s, 10.6 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0134006 s, 11.8 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.00700728 s, 22.5 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0109971 s, 14.4 kB/s
0+2 records in
0+2 records out

During a failure in one of the ctdbservers (ctdbserver03):

(k)inaddy@ctdbserver02:~$ ctdb status
Number of nodes:3
pnn:0 172.16.9.1       OK
pnn:1 172.16.9.2       OK (THIS NODE)
pnn:2 172.16.9.3       DISCONNECTED|UNHEALTHY|STOPPED|INACTIVE
Generation:92168728
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:0

And the public addresses were correctly set in interface "eth1" as they were supposed:

(k)inaddy@ctdbserver01:~$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:1c:31:c3 brd ff:ff:ff:ff:ff:ff
    inet 172.16.17.1/24 brd 172.16.17.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 172.16.17.3/24 brd 172.16.17.255 scope global secondary eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe1c:31c3/64 scope link
       valid_lft forever preferred_lft forever

(k)inaddy@ctdbserver02:~$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:af:f8:53 brd ff:ff:ff:ff:ff:ff
    inet 172.16.17.2/24 brd 172.16.17.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feaf:f853/64 scope link
       valid_lft forever preferred_lft forever

according to variable NFS_HOSTNAME. Note that ctdbserver01 has its own public ip address and ip address of ctdbserver03, which I failed on purpose. NFS client kept access like it should.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-04:

#32

For the bad ss syntax in /ctdb/config/functions: update_tickles(), I have opened the following upstream bug:

https://bugzilla.samba.org/show_bug.cgi?id=13985

And provided the following tested patch:

https://lists.samba.org/archive/samba-technical/2019-June/133701.html

Waiting for upstream acceptance (like the other) so I can ask debian to accept my delta (considering they wont merge to latest upstream version).

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-04:

#33

Download full text (3.6 KiB)

For this last "ss" syntax issue, I have opened a bug in samba upstream project and the most important comment, that explains what is happening, is this one:

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13985

COMMENT: https://bugzilla.samba.org/show_bug.cgi?id=13985#c4

"""
Hello Martin,

Errrr, that puzzled me now, my workstation is a debian sid and, for obvious reasons, all my development environment is Ubuntu... I re-checked "ss" execution in Debian and it worked :\, went back to Ubuntu Eoan and it didn't.

inaddy@workstation:~$ ss -tn state established '( src [172.16.0.3] || src [172.16.0.3] ) ( sport == :22 )' | small
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 172.16.0.3:22 172.16.0.6:53580
0 0 172.16.0.3:22 172.16.0.6:62337
0 0 172.16.0.3:22 172.16.0.6:53587
...

(k)inaddy@ctdbserver01:~$ ss -tn state established '( src [172.16.0.3] || src [172.16.0.3] ) ( sport == :22 )' | small
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]
   -h, --help this message
   -V, --version output version information
   -n, --numeric don't resolve service names
   -r, --resolve resolve host names
   -a, --all display all sockets
...

Checking different versions:

inaddy@workstation:~$ rmadison iproute2 | awk '{print $1 $2 $3 $4 $5}'
iproute2|3.16.0-2|oldstable
iproute2|4.9.0-1+deb9u1|stable
iproute2|4.9.0-1+deb9u1|stable-debug
iproute2|4.14.1-1~bpo9+1|stretch-backports
iproute2|4.20.0-2~bpo9+1|stretch-backports
iproute2|4.20.0-2~bpo9+1|stretch-backports-debug
iproute2|4.20.0-2|testing
iproute2|4.20.0-2|unstable
iproute2|4.20.0-2|unstable-debug
iproute2|5.1.0-1|experimental
iproute2|5.1.0-1|experimental-debug

inaddy@workstation:~$ rmad iproute2 | awk '{print $1 $2 $3 $4 $5}'
iproute2|3.12.0-2|trusty
iproute2|3.12.0-2ubuntu1.2|trusty-updates
iproute2|4.3.0-1ubuntu3|xenial
iproute2|4.3.0-1ubuntu3.16.04.5|xenial-updates
iproute2|4.15.0-2ubuntu1|bionic
iproute2|4.18.0-1ubuntu2~ubuntu18.04.1|bionic-backports
iproute2|4.18.0-1ubuntu2|cosmic
iproute2|4.18.0-1ubuntu2|disco
iproute2|4.18.0-1ubuntu2|eoan

Since Ubuntu is using an older version for quite awhile, I think latest change to this line broke compatibility with older versions. That, per se, could justify this patch.

commit 04fe9e20749985c71fef1bce7f6e4c439fe11c81
Author: Martin Schwenke <email address hidden>
Date: Thu Aug 27 00:22:49 2015

ctdb-scripts: Use ss instead of netstat for finding TCP connections

    ss with a filter is much faster than post-processing output from
    netstat. CTDB already has a hard dependency on iproute2 for IP
    address handling, so depending on ss is no big deal.

Signed-off-by: Martin Schwenke <email address hidden>
Reviewed-by: Amitay Isaacs <email address hidden>

This is the change that introduced this "ss" code, and, I believe that this has been broken in Ubuntu since then. Actually I'm not aware that CTDB ever worked in Ubuntu the way it should, that's why I'm working in the following Bugs:

https://bugs.launchpad.net/ubuntu/+source/samba/+bug/722201
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1821775
https://bugs.launchpad.net/u...

For this last "ss" syntax issue, I have opened a bug in samba upstream project and the most important comment, that explains what is happening, is this one:

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13985

COMMENT: https://bugzilla.samba.org/show_bug.cgi?id=13985#c4

"""
Hello Martin,

Errrr, that puzzled me now, my workstation is a debian sid and, for obvious reasons, all my development environment is Ubuntu... I re-checked "ss" execution in Debian and it worked :\, went back to Ubuntu Eoan and it didn't.

inaddy@workstation:~$ ss -tn state established '( src [172.16.0.3] || src [172.16.0.3] ) ( sport == :22 )' | small
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 172.16.0.3:22 172.16.0.6:53580
0 0 172.16.0.3:22 172.16.0.6:62337
0 0 172.16.0.3:22 172.16.0.6:53587
...

(k)inaddy@ctdbserver01:~$ ss -tn state established '( src [172.16.0.3] || src [172.16.0.3] ) ( sport == :22 )' | small
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]
   -h, --help          this message
   -V, --version       output version information
   -n, --numeric       don't resolve service names
   -r, --resolve       resolve host names
   -a, --all           display all sockets
...

Checking different versions:

inaddy@workstation:~$ rmadison iproute2 | awk '{print $1 $2 $3 $4 $5}'
iproute2|3.16.0-2|oldstable
iproute2|4.9.0-1+deb9u1|stable
iproute2|4.9.0-1+deb9u1|stable-debug
iproute2|4.14.1-1~bpo9+1|stretch-backports
iproute2|4.20.0-2~bpo9+1|stretch-backports
iproute2|4.20.0-2~bpo9+1|stretch-backports-debug
iproute2|4.20.0-2|testing
iproute2|4.20.0-2|unstable
iproute2|4.20.0-2|unstable-debug
iproute2|5.1.0-1|experimental
iproute2|5.1.0-1|experimental-debug

inaddy@workstation:~$ rmad iproute2 | awk '{print $1 $2 $3 $4 $5}'
iproute2|3.12.0-2|trusty
iproute2|3.12.0-2ubuntu1.2|trusty-updates
iproute2|4.3.0-1ubuntu3|xenial
iproute2|4.3.0-1ubuntu3.16.04.5|xenial-updates
iproute2|4.15.0-2ubuntu1|bionic
iproute2|4.18.0-1ubuntu2~ubuntu18.04.1|bionic-backports
iproute2|4.18.0-1ubuntu2|cosmic
iproute2|4.18.0-1ubuntu2|disco
iproute2|4.18.0-1ubuntu2|eoan

Since Ubuntu is using an older version for quite awhile, I think latest change to this line broke compatibility with older versions. That, per se, could justify this patch.

commit 04fe9e20749985c71fef1bce7f6e4c439fe11c81
Author: Martin Schwenke <martin@meltin.net>
Date:   Thu Aug 27 00:22:49 2015

ctdb-scripts: Use ss instead of netstat for finding TCP connections

ss with a filter is much faster than post-processing output from
    netstat.  CTDB already has a hard dependency on iproute2 for IP
    address handling, so depending on ss is no big deal.

Signed-off-by: Martin Schwenke <martin@meltin.net>
    Reviewed-by: Amitay Isaacs <amitay@gmail.com>

This is the change that introduced this "ss" code, and, I believe that this has been broken in Ubuntu since then. Actually I'm not aware that CTDB ever worked in Ubuntu the way it should, that's why I'm working in the following Bugs:

https://bugs.launchpad.net/ubuntu/+source/samba/+bug/722201
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1821775
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1828799
https://bugs.launchpad.net/ubuntu-server-ha/+bug/1831381
https://bugzilla.samba.org/show_bug.cgi?id=13984
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929931

To make it fully function for Eoan, and backport fixes up to Bionic, *at least*. Would you mind including this patch so we can have iproute2 backwards compatibility ? (Upgrading iproute2 in Ubuntu would mean re-certificating a bunch of stuff depending on it, including all openstack products, etc).

Thanks a lot!

Best Regards,
Rafael
"""

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-05:

#34

For the iproute2 issue, Marting (samba upstream) and I found out that iproute2 seems to be broken in Cosmic, Disco and Eoan. I have opened the following bug:

https://bugs.launchpad.net/ubuntu/+source/iproute2/+bug/1831775

And will bisect iproute2 upstream code to propose a SRU for it.

With that, the following upstream samba bug:

https://bugzilla.samba.org/show_bug.cgi?id=13985

Would be solved by the Ubuntu issue resolution on iproute2.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-05:

#35

CTDB will definitely depend on SRUs of:

https://bugs.launchpad.net/ubuntu/+source/iproute2/+bug/1831775/comments/3

For Cosmic and Disco. Eoan will be merged to iproute2 latest.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-06:

#36

I just realized bionic-backports has an affected (broken) version of iproute2 (backported from cosmic). That should also be fixed.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-07:

#37

I have created the following PPA with all fixes:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1831381

And the following git repo:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/samba/+git/samba

Contains all commits I'm gonna propose tomorrow.

Because of git-ubuntu, I had to rebase all the code I have tested before, and knew was working good, so I want to give it a last test before asking for a merge request do Eoan, suggesting do Debian and cherry-picking to Disco, Cosmic and Bionic.

@Robert,

Feel free to test this PPA if you would like (its Eoan only for now). If not, I'll test all the others anyway. Note that I have created a small script to "enable" the NFS HA nodes and make our lives easier. I'll create proper documentation on how to use it also, but its very straighforward if you execute:

$ cd /etc/ctdb/examples
$ ./enable-nfs.sh

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-07:

#38

I have fully tested the PPA for Eoan and the documentation is here:

https://blueprints.launchpad.net/ubuntu-server-ha/+spec/ctdb-enablement-nfs

And the PPA:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1831381

And the git repo:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/samba/+git/samba/+ref/lp1831381-eoan-devel-ctdb

A merge request has been made also.

Changed in samba (Ubuntu Disco):
status:	New → Confirmed
Changed in samba (Ubuntu Cosmic):
status:	New → Confirmed
Changed in samba (Ubuntu Bionic):
status:	New → Confirmed
Changed in samba (Ubuntu Disco):
importance:	Undecided → Medium
Changed in samba (Ubuntu Cosmic):
importance:	Undecided → Medium
Changed in samba (Ubuntu Bionic):
importance:	Undecided → Medium
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in samba (Ubuntu Cosmic):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in samba (Ubuntu Disco):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)

Rafael David Tinoco (rafaeldtinoco) on 2019-06-07

Changed in ctdb (Debian):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody
importance:	Undecided → Unknown
status:	New → Unknown

Bug Watch Updater (bug-watch-updater) on 2019-06-08

Changed in ctdb (Debian):
status:	Unknown → Confirmed

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-06-28:

#39

Provided documentation in here:

https://discourse.ubuntu.com/t/ctdb-create-a-3-node-nfs-ha-backed-by-a-clustered-filesystem/11608

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-01:

#40

Fix (merge request) is being discussed and will (likely) get merged, for Eoan, soon. I'll work in Bionic and Disco backports right after (SRU).

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-07-06:

#41

This bug was fixed in the package samba - 2:4.10.0+dfsg-0ubuntu5

---------------
samba (2:4.10.0+dfsg-0ubuntu5) eoan; urgency=medium

  * debian/rules: Make DEB_HOST_ARCH_CPU initialized through
    dpkg-architecture (Closes: #931138)
  * d/p/ctdb-scripts-fix-tcp_tw_recycle-existence-check.patch:
    fix tcp_tw_recycle existence check. (LP: #722201)
  * d/p/fix-nfs-service-name-to-nfs-kernel-server.patch:
    change nfs service name from nfs to nfs-kernel-server
    (LP: #722201)
  * d/ctdb.install, d/rules: create ctdb run directory into tmpfiles.d
    to allow pid file to exist (LP: #1821775)
  * Allow proper ctdb initialization (LP: #1828799):
    - d/ctdb.dirs: added /var/lib/ctdb/* directories
    - d/ctdb.postrm: remove leftovers from:
      /var/lib/ctdb/{state,persistent,volatile,scripts}
  * d/rules: installing provided config examples and helper scripts
  * Examples of NFS HA CTDB config files + helper script:
    - d/ctdb.example.enable.nfs.sh
    - d/ctdb.example.nfs-common
    - d/ctdb.example.nfs-kernel-server
    - d/ctdb.example.services
    - d/ctdb.example.sysctl-nfs-static-ports.conf
  * d/p/ctdb-config-depend-on-etc-default-nodes-file.patch:
    do not try to start daemon if /etc/ctdb/nodes does not exist
  * d/p/ctdb-config-enable-syslog-by-default.patch:
    enable syslog and systemd journal by default

-- Rafael David Tinoco <email address hidden> Fri, 28 Jun 2019 00:14:27 +0000

Changed in samba (Ubuntu Eoan):
status:	In Progress → Fix Released

Rafael David Tinoco (rafaeldtinoco) on 2019-07-08

Changed in samba (Ubuntu Cosmic):
status:	Confirmed → Won't Fix

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-11:

#42

Okay, for Eoan merge, I forgot to fix script 06.nfs (just like 60.nfs was fixed) in the following patch:

* d/p/fix-nfs-service-name-to-nfs-kernel-server.patch:
change nfs service name from nfs to nfs-kernel-server

I'm creating a new MR to include this fix and the SRUs will contain both.

Changed in samba (Ubuntu Eoan):
status:	Fix Released → In Progress

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-12:

#43

For my last comment, this was fixed in the following version:

samba (2:4.10.0+dfsg-0ubuntu6) eoan; urgency=medium

  * d/p/fix-nfs-service-name-to-nfs-kernel-server.patch:
    change service name from nfs to nfs-kernel-server in
    legacy script 06.nfs.script also (LP: #722201)

-- Rafael David Tinoco <email address hidden> Thu, 11 Jul 2019 21:44:49 +0000

Changed in samba (Ubuntu Eoan):
status:	In Progress → Fix Released

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-12:

#44

I'm marking Bionic as won't fix since it has a pretty old CTDB implementation and the effort in making all changes I've done into Upstream/Sid/Eoan/Disco is not worth to be done in Bionic, in my opinion. Opened to discussion if users are relying in Bionic CTDB for NFS HA and can't make it work w/ old documentation/code.

Changed in samba (Ubuntu Bionic):
status:	Confirmed → Won't Fix

Rafael David Tinoco (rafaeldtinoco) on 2019-07-12

description:

updated

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-12:

#45

I have done the SRU for Disco, but, right now I'm facing:

Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: Usage: ss [ OPTIONS ]
Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: ss [ OPTIONS ] [ FILTER ]
Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: -h, --help this message
Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: -V, --version output version information

During tests from PPA:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp722201

Because of the iproute2 ss issue:

https://bugs.launchpad.net/ubuntu/+source/iproute2/+bug/1831775

As soon as the SRU happens to disco, CTDB will also be good to be SRUed on Disco.

Bug Watch Updater (bug-watch-updater) on 2019-07-12

Changed in ctdb (Debian):
status:	Confirmed → Fix Released

Rafael David Tinoco (rafaeldtinoco) on 2019-07-12

Changed in samba (Ubuntu Disco):
status:	Confirmed → In Progress
Changed in samba (Ubuntu Eoan):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in samba (Ubuntu Bionic):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in samba (Ubuntu Cosmic):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-15:

#46

I need sponsorship for:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/samba/+git/samba/+ref/lp1831381-disco-devel-ctdb

So this SRU is done. Thank you.

Mathew Hodson (mhodson) on 2019-07-16

affects:

ctdb (Debian) → samba (Debian)

Christian Ehrhardt  (paelzer) on 2019-07-29

description:

updated

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-08-06:

#47

I just re-pushed the same merge request with the changes flagged in MR review. Thanks a lot Christian and Andreas for the review. I think this is ready for sponsorship and SRU.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-08-08:

#48

Disco Merge request was approved by Canonical Server Team. Waiting sponsorship for Disco. Thanks a lot for the reviews!

Revision history for this message

Brian Murray (brian-murray) wrote on 2019-08-13: Please test proposed package

#49

Hello Kirill, or anyone else affected,

Accepted samba into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/samba/2:4.10.0+dfsg-0ubuntu2.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in samba (Ubuntu Disco):
status:	In Progress → Fix Committed
tags:	added: verification-needed verification-needed-disco

Revision history for this message

Ubuntu SRU Bot (ubuntu-sru-bot) wrote on 2019-08-29: Autopkgtest regression report (samba/2:4.10.0+dfsg-0ubuntu2.3)

#50

All autopkgtests for the newly accepted samba (2:4.10.0+dfsg-0ubuntu2.3) for disco have finished running.
The following regressions have been reported in tests triggered by the package:

gvfs/1.40.1-1ubuntu0.1 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/disco/update_excuses.html#samba

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-08-30:

#51

FYI - this was resolved by a retry, all green now

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-08-30:

#52

I'm verifying Disco SRU and will provide feedback here very soon.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-08-30:

#53

Download full text (5.0 KiB)

Following https://discourse.ubuntu.com/t/ctdb-create-a-3-node-nfs-ha-backed-by-a-clustered-filesystem/11608:

(c)inaddy@ctdb01:~$ cat /etc/ctdb/public_addresses
192.168.0.1/24 fakeinternal01
192.168.0.2/24 fakeinternal01
192.168.0.3/24 fakeinternal01

(c)inaddy@ctdb01:~$ cat /etc/ctdb/nodes
172.16.0.200
172.16.0.201
172.16.0.202

(c)inaddy@ctdb01:~$ onnode -p all systemctl stop ctdb

(c)inaddy@ctdb01:~$ onnode -p all "ctdb event script enable legacy 60.nfs"
(c)inaddy@ctdb01:~$ onnode -p all "ctdb event script enable legacy 06.nfs"

(c)inaddy@ctdb01:~$ onnode -p 0 "systemctl start ctdb"

(c)inaddy@ctdb01:~$ ctdb status
Number of nodes:3
pnn:0 172.16.0.200 UNHEALTHY (THIS NODE)
pnn:1 172.16.0.201 DISCONNECTED|UNHEALTHY|INACTIVE
pnn:2 172.16.0.202 DISCONNECTED|UNHEALTHY|INACTIVE
Generation:126780233
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0

(c)inaddy@ctdb01:~$ sudo tail -f /var/log/ctdb/log.ctdb
2019/08/30 18:35:54.015358 ctdb-recoverd[3872]: Unassigned IP 192.168.0.3 can be served by this node
2019/08/30 18:35:54.015421 ctdb-recoverd[3872]: Unassigned IP 192.168.0.2 can be served by this node
2019/08/30 18:35:54.015438 ctdb-recoverd[3872]: Unassigned IP 192.168.0.1 can be served by this node
2019/08/30 18:35:54.015518 ctdb-recoverd[3872]: Trigger takeoverrun
2019/08/30 18:35:54.015618 ctdb-recoverd[3872]: Takeover run starting
2019/08/30 18:35:54.018475 ctdbd[3817]: Takeover of IP 192.168.0.3/24 on interface fakeinternal01
2019/08/30 18:35:54.018556 ctdbd[3817]: Takeover of IP 192.168.0.2/24 on interface fakeinternal01
2019/08/30 18:35:54.018597 ctdbd[3817]: Takeover of IP 192.168.0.1/24 on interface fakeinternal01
2019/08/30 18:35:54.133116 ctdb-eventd[3819]: 60.nfs: Reconfiguring service "nfs-kernel-server"...
2019/08/30 18:35:54.133351 ctdb-recoverd[3872]: Takeover run completed successfully

(c)inaddy@ctdb01:~$ ctdb status
Number of nodes:3
pnn:0 172.16.0.200 OK (THIS NODE)
pnn:1 172.16.0.201 DISCONNECTED|UNHEALTHY|INACTIVE
pnn:2 172.16.0.202 DISCONNECTED|UNHEALTHY|INACTIVE
Generation:126780233
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0

(c)inaddy@ctdb01:~$ onnode -p 1 "systemctl start ctdb"
(c)inaddy@ctdb01:~$ onnode -p 2 "systemctl start ctdb"

(c)inaddy@ctdb01:~$ ctdb status
Number of nodes:3
pnn:0 172.16.0.200 OK (THIS NODE)
pnn:1 172.16.0.201 OK
pnn:2 172.16.0.202 OK
Generation:1337933514
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:0

(c)inaddy@ctdb01:~$ onnode -p all "systemctl status ctdb" | grep Active
[172.16.0.200] Active: active (running) since Fri 2019-08-30 18:35:31 UTC; 6min ago
[172.16.0.201] Active: active (running) since Fri 2019-08-30 18:40:35 UTC; 1min 26s ago
[172.16.0.202] Active: active (running) since Fri 2019-08-30 18:40:39 UTC; 1min 22s ago

(c)inaddy@ctdb01:~$ onnode -p all "ip addr show fakeinternal01 | grep 192.168.0"
[172.16.0.201] inet 192.168.0.2/24 brd 192.168.0.255 scope global fakeinternal01
[172.16.0.202] inet 192.168.0.3/24 brd 192.168.0.255 scope global fakeinternal01
[172.16.0.200] inet 192.168.0.1/24 brd 192.168.0.255 scope global fakeinternal...

Debian
samba package

CTDB port is not aware of Ubuntu-specific NFS Settings

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Patches

Remote bug watches

	Status	Importance	Assigned to	Milestone
samba (Debian)	Fix Released	Unknown	debbugs #929931
samba (Ubuntu)	Fix Released	Medium	Unassigned	Ubuntu eoan-updates
Bionic	Won't Fix	Medium	Unassigned
Cosmic	Won't Fix	Medium	Unassigned
Disco	Fix Released	Medium	Rafael David Tinoco
Eoan	Fix Released	Medium	Unassigned	Ubuntu eoan-updates

Changed in samba (Ubuntu Disco):
status:	Fix Committed → Fix Released

Debiansamba package

CTDB port is not aware of Ubuntu-specific NFS Settings

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Patches

Remote bug watches

Debian
samba package