CTDB port is not aware of Ubuntu-specific NFS Settings

Bug #722201 reported by Kirill Peskov on 2011-02-20
42
This bug affects 5 people
Affects Status Importance Assigned to Milestone
samba (Debian)
Fix Released
Unknown
samba (Ubuntu)
Status tracked in Eoan
Bionic
Medium
Unassigned
Cosmic
Medium
Unassigned
Disco
Medium
Rafael David Tinoco
Eoan
Medium
Unassigned

Bug Description

[Impact]

 * SAMBA CTDB cluster suite does not work for High Available NFS setups
 * LP: #1821775 - ctdb cannot create PID file
 * LP: #1828799 - Package ctdb does not create directories in /var/lib/ctdb

[Test Case]

 * Installing CTDB and trying to start the service (check /var/log/ctdb/ctdb.log):
   - no /etc/ctdb/nodes file, can't start
   - /var/lib/ctdb/volatile does not exist
   - and some other errors addressed in here

[Regression Potential]

 * very small chances of causing issues to other parts of samba
 * ctdb app is placed in a specific directory and is not working nowadays

[Other Info]

 * Documentation on how to enable this SRU: https://discourse.ubuntu.com/t/ctdb-create-a-3-node-nfs-ha-backed-by-a-clustered-filesystem/11608

ORIGINAL DESCRIPTION:

 * n/a

Binary package hint: ctdb

CTDB suppose to detect distro-specific Samba/NFS/Apache settings, or at least provide the way to manual tweaking/config of ctdb. In the current state looks like ctdb has been ported from RH-based distribution and its partially aware of some Debian-specific settings/files/scripts locations, but completely unaware of Ubuntu. For examle: ctdb is able to control NFS daemons, but it never looks for /etc/init.d/nfs-kernel-server for startup or /etc/default/nfs-kernel-server for settings.

1) Found on Ubuntu 10.04
2) CTDB version: 1.0.108-3ubuntu3
3) Expected: ability of ctdb to control nfs-kernel-server
4) Happened: without significant ctdb script changes (adding sections, aware of ubuntu-specific nfs- and samba- config- and startup scripts locations) ctdb is not able to function properly

Related branches

Mathieu Parent (math-parent) wrote :

Can you propose a patch?

Eric G (erickg) wrote :

This seems to be impacting me on 14.04 as well. I am only trying to use CTDB with Samba:

/var/log/ctdb/log.ctdb:

2014/12/19 10:42:23.770789 [ 1077]: startup event failed
2014/12/19 10:42:28.771410 [ 1077]: Recoveries finished. Running the "startup" event.
2014/12/19 10:42:28.891893 [ 1077]: 50.samba: Failed to start samba
2014/12/19 10:42:28.892134 [ 1077]: startup event failed
2014/12/19 10:42:33.893168 [ 1077]: Recoveries finished. Running the "startup" event.
2014/12/19 10:42:34.205463 [ 1077]: 50.samba: Failed to start samba
2014/12/19 10:42:34.205666 [ 1077]: startup event failed
2014/12/19 10:42:39.206372 [ 1077]: Recoveries finished. Running the "startup" event.
2014/12/19 10:42:39.573180 [ 1077]: 50.samba: Failed to start samba
2014/12/19 10:42:39.573395 [ 1077]: startup event failed

root@san1:/etc/ctdb# ctdb status
Number of nodes:2
pnn:0 10.10.1.21 UNHEALTHY (THIS NODE)
pnn:1 10.10.1.22 UNHEALTHY
Generation:1291295798
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ctdb (Ubuntu):
status: New → Confirmed
Eric G (erickg) wrote :

I was able to make some tweaks and get it running for Samba. Looks like these could also apply to upstream Debian.

The service command is in a different path:

root@san1:/etc/ctdb# diff -u functions functions.orig
--- functions 2014-12-19 11:24:12.660339600 -0500
+++ functions.orig 2014-12-19 10:53:29.247030923 -0500
@@ -161,8 +161,6 @@

   if [ -x /sbin/service ]; then
       $_nice /sbin/service "$_service_name" "$_op"
- elif [ -x /usr/sbin/service ]; then
- $_nice /usr/sbin/service "$_service_name" "$_op"
   elif [ -x $CTDB_ETCDIR/init.d/$_service_name ]; then
       $_nice $CTDB_ETCDIR/init.d/$_service_name "$_op"
   elif [ -x $CTDB_ETCDIR/rc.d/init.d/$_service_name ]; then

This might actually be a problem with the samba init script, but smbd and nmbd seem to be controled by upstart and the scripts in /etc/init.d don't actually do anything. Maybe /etc/init.d/samba is being phased out and the services are managed directly, which seems to work here:

root@san1:/etc/ctdb/events.d# diff -u 50.samba ~/50.samba.orig
--- 50.samba 2014-12-19 11:22:05.522193976 -0500
+++ /root/50.samba.orig 2014-12-19 11:21:46.602468765 -0500
@@ -14,8 +14,8 @@
   CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-nmb}
   ;;
  debian)
- CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-smbd}
- CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-nmbd}
+ CTDB_SERVICE_SMB=${CTDB_SERVICE_SMB:-samba}
+ CTDB_SERVICE_NMB=${CTDB_SERVICE_NMB:-""}
   ;;
  *)
   # Use redhat style as default:

root@san1:/etc/ctdb/events.d# ctdb status
Number of nodes:2
pnn:0 10.10.1.21 OK (THIS NODE)
pnn:1 10.10.1.22 UNHEALTHY
Generation:330683100
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1

Eric G (erickg) wrote :

There is a related Samba bug #1321369 open to address the second patch I attached. It seems that my patch is probably the preferred fix for now as /etc/init.d/samba is still buggy.

https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1321369

Robert Sander (gurubert) wrote :

This patch makes nfs-kernel-server work with ctdb.

It rewrites the code to use /etc/default/nfs-kernel-server and the correct systemd service.

As the is a unit nfs-mountd.service that gets started from nfs-kernel-server.service as requirement and this unit only uses $RPCMOUNTDARGS the setting in /etc/default/nfs-kernel-server should be

 # MOUNTD_PORT=597
 RPCMOUNTDOPTS="-p 597"

The attachment "patch for Ubuntu 17.10" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Robert Sander (gurubert) wrote :

Additionally it is necessary to create a file /etc/modprobe.d/lockd.conf with this content:

 # Set the TCP port that the NFS lock manager should use.
 # port must be a valid TCP port value (1-65535).
 options lockd nlm_tcpport=599

 # Set the UDP port that the NFS lock manager should use.
 # port must be a valid UDP port value (1-65535).
 options lockd nlm_udpport=599

Lockd as a kernel process has to listen to the same port on every machine in the CTDB cluster.

tags: added: server-next
Robert Sander (gurubert) wrote :

Ubuntu 19.04 comes with ctdb 4.10 which changes the layout of the files.

Attached is a new patch against ctdb 4.10

Robert Sander (gurubert) wrote :

There are some additional variables to be set in /etc/default/nfs-kernel-server

NFS_HOSTNAME="servername"

RQUOTAD_PORT=598
LOCKD_UDPPORT=599
LOCKD_TCPPORT=599
STATD_PORT=595
STATD_OUTGOING_PORT=596
STATD_HOSTNAME="$NFS_HOSTNAME"

Changed in ctdb (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Download full text (6.0 KiB)

I have reviewed this bug and read all CTDB documentation and my initial thoughts for a proper CTDB Ubuntu Enablement are:

1)

(not addressed here, should we have a default pkg dependency here ? there is none nowadays)

There is a mandatory dependency on a clustered file-system to exist, since all the file atomicity is guaranteed through the filesystem layer, and not CTDB. Existing options are:

a) GFS2 - Ubuntu has gfs2-utils and it is part of Debian HA imports. It depends on having CLVM (clustered LVM2) running, changing LVM2 locking type to 3. clvm depends on having the distributed locking manager running and we had issues in the past (because of redhat -> debianization). Check: https://bugs.launchpad.net/ubuntu/+source/dlm/+bug/1248054 . TODO: Would have to make sure clvmd + dlm + lvm2 locking + gfs2 are good for Eoan (initially), Disco, Cosmic and Bionic (LTS) at least. There is no specific change need in SMB or NFS config files.

b) Gluster - Its a straight forward installation / configuration with existing packages already. Supports other interconnects (like Infiniband). There is no specific change need in SMB or NFS config files.

c) GPFS - Its proprietary (IBM) and could/should be enabled by IBM (possibly 4 hands with us, if ever intended).

d) OCFS2 - Open Source and mature project. Supported by Ubuntu kernel and ocfs2-tools package. Samba config file changes needed:

vfs objects = fileid
fileid:algorithm = fsid

NFS does not need any apparent config file change.

- Its better that the clustered filesystem supports uniform device number across the nodes! (https://wiki.samba.org/index.php/Setting_up_a_cluster_filesystem#Checking_uniformity_of_device_and_inode_numbering)

e) LustreFS - Open source but its almost entirely dependant on Mellanox OFED packages (for Infiniband support) and/or CentOS dkms (usually packages done by DDN). I know DDN/whomcloud is working in upstreaming a kernel tree supporting LustreFS and that could lead Debian to have LustreFS support, but not the case here.

2)

(partially addressed by comment #10)

NFS RPC service ports need to be bound statically to the same ports in all nodes (not the default). There is no proper decision on which ports should be (to replace some old service ports, like smalltalk ?). Would this be changed during package installation ? That might lead us to problems in already existing NFS servers (example).

- NFSv4 is not recommended and should be disabled whenever CTDB is enabled (would install script recommend or force this ?). Official documentation says:

"Unfortunately, RPCNFSDOPTS isn't used by Debian Sys-V init, so there is no way to disable NFSv4 via the configuration file"

We would have to re-verify nfs-kernel-server systemd dependencies and (possibly ?) set it to NFSv3 only (something like work done in https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1590799). CTDB + NFS also depends on specific (to CTDB) environment variables (like the NFS hostname).

3)

https://ctdb.samba.org/manpages/ctdb.7.html:

PRIVATE and PUBLIC addresses + LVS + NATGW + POLICY ROUTING

CTDB configures network interfaces automatically based on decisions made by its internal recovery algorit...

Read more...

Robert Sander (gurubert) wrote :

Please to not create a hard dependency on any of the mentioned cluster filesystems (and Ceph) as only one of them is needed torun CTDB.

The latest CTDB is also able to store the recovery lock as a RADOS object in Ceph. In addition with the Ceph VFS module for Samba CephFS is not needed as a mount for example.

Yes Robert,

I tried to documented all things needed to have it fully 'Ubuntu aware'. I do agree with you on the hard dependencies for clustered files systems, they don't make much sense.

I also see the need for a warn on disabling NFSv4, a warn or to guarantee NFS RPC services bound to same static ports in all nodes, a warn or guarantee NFS service is stopped and disabled on CTDB enablement and to make sure CTDB scripts can start/stop networking and services (NFS in this particular case) with sysv (and/or systemd scripts): which this bug is all about.

So at the end it might be a question of warning user about those needs, or something similar (since disabling services and/or guaranteeing fixed RPC ports are likely not something to be done in postinstall scripts).

I'll review your patches today and try to propose something for Ubuntu Eoan.

Thanks for the report, and sorry it took so long for a full review.

Robert,

I'll verify your patch on comment #6, since the nfs-kernel-server:

(k)inaddy@ctdbserver01:/lib/systemd$ systemctl list-dependencies nfs-kernel-server.service
nfs-kernel-server.service
● ├─auth-rpcgss-module.service
● ├─nfs-config.service
● ├─nfs-idmapd.service
● ├─nfs-mountd.service
● ├─proc-fs-nfsd.mount
● ├─rpc-svcgssd.service
● ├─rpcbind.socket
● ├─system.slice
● └─network.target

might be enough for guaranteeing rpc.statd to start/stop altogether with nfsd when CTDB starts/stops the services on the nodes.

I have also to verify missing (or deactivated) environmental variables (NEED_GSSD or NEED_SVCGSSD, for example) brought by /etc/default/nfs-* and, nowadays, with systemd, read by script /usr/lib/systemd/scripts/nfs-utils_env.sh, generating an environment file in /run, executed by nfs-config.service as a "oneshot" service whenever nfs is restarted (will verify if any of those would change by CTDB, for example). This will cover your initial comment:

" ... ctdb is able to control NFS daemons, but it never looks for /etc/init.d/nfs-kernel-server for startup or /etc/default/nfs-kernel-server for settings ... ".

I'll have to think of something for the RPC ports :\. You said in comment #10 that this:

NFS_HOSTNAME="servername"

RQUOTAD_PORT=598
LOCKD_UDPPORT=599
LOCKD_TCPPORT=599
STATD_PORT=595
STATD_OUTGOING_PORT=596
STATD_HOSTNAME="$NFS_HOSTNAME"

needed to be included in /etc/default/nfs-kernel-server, possibly because CTDB needs those environment variables (apart from the "same ports on all nodes need"). If CTDB manipulates systemctl to start/stop the services, then I would have to make sure those are parsed by the nfs-config.service logic (nfs-config -> nfs-utils_env.sh -> /run/sysconfig/nfs-utils environment file).

That would work for needed NFS_HOSTNAME environment variable.. and, for the ports, I could create systemd overrides to set all RPC services to the same port, or provide those commented and give the user a message to comment them out (since touching that during package installation does not seem appropriate to me).

We have 2 other things:

/etc/modprobe.d/lockd.conf -> that is most likely a no-go, like the LOCKD_UDPPORT=599 approach. I mean, at least not automatically (possibly commented, telling user to make sure its all commented out in all CTDB nodes).

And:

Disabling NFSv4.. (and rpc.idmapd), which could follow the same thing... providing information to final user about not having it enabled (specially because installing CTDB would require to have all nfs-* services disabled anyway, so CTDB postinst can't definitely activate systemd services by default, for example).

That is all I can think of to check/implement in Ubuntu Eoan. Please, let me know if there is anything else you can think of.

I'm testing all this in the following scenario:

2 gluster servers providing 1 volume to 3 ctdb servers in 1 network, and those 3 ctdb servers providing a NFS with LVS network IP to 2 NFS clients.

It is very likely that CTDB will need a similar work to make sure samba, for example, is supported, out-of-the-box, like what we are doing here, by Ubuntu CTDB package.

I'll get back to you soon.

Changed in samba (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
milestone: none → eoan-updates
no longer affects: ctdb (Ubuntu)

One of the issues is being addressed upstream:

https://lists.samba.org/archive/samba-technical/2019-June/133694.html

So we can avoid merge diffs about this in the future (and I can refer upstream patch for this particular fix).

Changed in ctdb (Debian):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)

Future possible issue While ctdb is not merged to *at least* samba-4.9.7 (in Debian):

commit 022b9a6ca7d8cb6f541b1b24b27da4e1a3bea04b
Author: Martin Schwenke <email address hidden>
Date: Tue Mar 26 00:49:49 2019

    ctdb-scripts: Add test variable CTDB_NFS_DISTRO_STYLE

    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860

    Signed-off-by: Martin Schwenke <email address hidden>
    Reviewed-by: Amitay Isaacs <email address hidden>
    (cherry picked from commit e72c3c800a50fe746164e319e21180c44d041619)

/etc/ctdb/nfs-linux-kernel-callout :

TODO:
 - place non-upstream (?) patch (DEP3) inside debian/patches (quilt)

 # Red Hat
 # nfs_service="nfs"
 # nfslock_service="nfslock"
 # nfs_config="/etc/sysconfig/nfs"

 # SUSE
 # nfs_service="nfsserver"
 # nfslock_service=""
 # nfs_config="/etc/sysconfig/nfs"

 # Debian
 nfs_service="nfs-kernel-server"
 nfslock_service=""
 nfs_config="/etc/default/nfs-kernel-server"

Future possible issue while ctdb is not merged to *at least* samba-4.9.8 (in Debian):

commit 49fa08814e2a1032e88353eec42b952316d6ec18
Author: Martin Schwenke <email address hidden>
Date: Wed Mar 20 07:22:43 2019

    ctdb-scripts: Update statd-callout to try several configuration files

    The alternative seems to be to try something via CTDB_NFS_CALLOUT.
    That would be complicated and seems like overkill for something this
    simple.

    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860

    Signed-off-by: Martin Schwenke <email address hidden>
    Reviewed-by: Amitay Isaacs <email address hidden>
    (cherry picked from commit a2bd4085896804ee2da811e17f18c78a5bf4e658)

BUT, still, isn't appropriate since Debian systemd NFS server script is called either nfs-kernel-server.service (or an alias called nfs-server.service) OR nfs-ganesha.service (if using the userland nfs server version).

/etc/ctdb/statd-callout:

 - place non-upstream (?) patch (DEP3) inside debian/patches(quilt)

 load_system_config "nfs-kernel-server"

        ...

 ############################################################

 # ctdb_setup_state_dir "service" "nfs"
 ctdb_setup_state_dir "service" "nfs-kernel-server"

TODO:
 - ctdb has to depend on nfs-common (/usr/lib/systemd/scripts/nfs-utils_env.sh)
 - ctdb has to update /usr/lib/systemd/scripts/nfs-utils_env.sh -> NFS_HOSTNAME=nodename
 - /etc/default/nfs-common has to be updated -> NFS_HOSTNAME=nodename

TODO:
 - ctdb.postinst and ctdbpostrm to create /var/lib/ctdb{volatile,persistent,state}
 - tmpfiles.d to create /run/ctdb

/etc/ctdb/events/legacy/60.nfs.script: 1: eval: rpc.rquotad: not found

legacy/60.nfs.script needs rpc.rquotad.

ctdb needs to depend on:

(hit enter too fast)

- nfs-kernel-server
- nfs-common
- quota

So the NFS legacy events won't find problems when they are enabled. Since those are not the only events supported by CTDB:

(k)inaddy@ctdbserver01:/etc/ctdb$ ctdb event script list legacy
* 00.ctdb
* 01.reclock
* 05.system
* 06.nfs
* 10.interface
  11.natgw
  11.routing
  13.per_ip_routing
  20.multipathd
  31.clamd
  40.vsftpd
  41.httpd
  49.winbind
  50.samba
* 60.nfs
  70.iscsi
  91.lvs

Instead of depending on: multipath-tools, clamd, vsftpd, apache, smb, iscsi and nfs-kernel-server... we could put those dependencies as recommends. This way they are installed together BUT if --no-install-recommends is passed they are not. I would start adding only the 3 NFS related services since I'm focused in enabling NFS here.

Following error:

019/05/31 18:21:38.648661 ctdbd[1000]: Starting traverse on DB ctdb.tdb (id 806)
2019/05/31 18:21:38.651601 ctdbd[1000]: Ending traverse on DB ctdb.tdb (id 806), records 0
2019/05/31 18:21:40.185388 ctdb-eventd[1002]: 60.nfs: ss: bison bellows (while parsing filter): "syntax error!" Sorry.
2019/05/31 18:21:40.185421 ctdb-eventd[1002]: 60.nfs: Usage: ss [ OPTIONS ]
2019/05/31 18:21:40.185433 ctdb-eventd[1002]: 60.nfs: ss [ OPTIONS ] [ FILTER ]
2019/05/31 18:21:40.185441 ctdb-eventd[1002]: 60.nfs: -h, --help this message
2019/05/31 18:21:40.185449 ctdb-eventd[1002]: 60.nfs: -V, --version output version information

PROBLEM:

########################################################
# tickle handling
########################################################

update_tickles ()
{
 ...

    # the parentheses can't be empty! -> BROKEN:

    ss -tn state established \
       "${_ip_filter:+( ${_ip_filter} )}" \
       "${_port_filter:+( ${_port_filter} )}" |
    awk 'NR > 1 {print $4, $3}' |
    sort >"$_my_connections"

HERE:

60.nfs: + _port=2049
60.nfs: + tickledir=/var/lib/ctdb/scripts/tickles
60.nfs: + mkdir -p /var/lib/ctdb/scripts/tickles
60.nfs: + ctdb_get_pnn
60.nfs: + _pnn_file=/var/lib/ctdb/scripts/my-pnn
60.nfs: + [ ! -f /var/lib/ctdb/scripts/my-pnn ]
60.nfs: + cat /var/lib/ctdb/scripts/my-pnn
60.nfs: + _pnn=0
60.nfs: + /usr/bin/ctdb -X ip
60.nfs: + awk -F| -v pnn=0 $3 == pnn {print $2}it
60.nfs: + _ips=172.16.17.3
60.nfs: + _ip_filter= ok
60.nfs: + _ip_filter=src [172.16.17.3]
60.nfs: + _port_filter=sport == :2049
60.nfs: + _my_connections=/var/lib/ctdb/scripts/tickles/2049.connections.12623
60.nfs: + ss -tn state established ( src [172.16.17.3] ) ( sport == :2049
)
2019/05/31 18:44:35.631800 ctdb-eventd[12050]: 60.nfs: + awk NR > 1 {print $4, $3}

          ss -tn state established "src 172.16.17.3 sport == :2049"

Looks like using "( ... )" syntax for ss was no good.

Changing update_tickles () function to have:

    ss -tn state established "${_ip_filter} ${_port_filter}" | awk 'NR > 1 {print $4, $3}' | sort >"$_my_connections"

instead, fixes the issue.

Example of appendix to /etc/services to enable CTDB NFS HA:

/etc/services: (append)

rpc.nfsd 2049/tcp # RPC nfsd
rpc.nfsd 2049/udp # RPC nfsd
rpc.nfs-cb 32764/tcp # RPC nfs callback
rpc.nfs-cb 32764/udp # RPC nfs callback
rpc.statd-bc 32765/tcp # RPC statd broadcast
rpc.statd-bc 32765/udp # RPC statd broadcast
rpc.statd 32766/tcp # RPC statd listen
rpc.statd 32766/udp # RPC statd listen
rpc.mountd 32767/tcp # RPC mountd
rpc.mountd 32767/udp # RPC mountd
rpc.lockd 32768/tcp # RPC lockd/nlockmgr
rpc.lockd 32768/udp # RPC lockd/nlockmgr
rpc.quotad 32769/tcp # RPC quotad
rpc.quotad 32769/udp # RPC quotad

TODO:
 - provide /etc/ctdb/services.example
 - provide instructions to append /etc/ctdb/services.example to /etc/services

Example of /etc/default/nfs-common file to enable CTDB NFS HA:

## /etc/default/nfs-common

NFS_HOSTNAME="ctdbserver02.public" <- $HOSTNAME

# rpc.statd - daemon listening for reboot notifications (locks related)
NEED_STATD="yes"
STATDOPTS="-n ${NFS_HOSTNAME} -p 32765 -o 32766 -H /etc/ctdb/statd-callout -T 32768 -U 32768"
STATD_HOSTNAME="$NFS_HOSTNAME"

# rpc.gssd - security context for rpc connections
NEED_GSSD="no"

# rpc.idmapd - NFSv4 <-> name mapping daemon (fallback nowadays)
# recent kernels use nfsidmap(8) instead
NEED_IDMAPD="no"

# rpc.quota - usage quota
RPCRQUOTADOPTS="-p 32769"

## end of file

TODO:
 - provide a /etc/ctdb/nfs-common.example
 - provide instructions to replace /etc/default/nfs-common with the example

Example of /etc/default/nfs-kernel file to enable CTDB NFS HA:

## /etc/default/nfs-kernel-server

NFS_HOSTNAME="ctdbserver02.public" <- $HOSTNAME

# rpc.nfsd - user level part of nfs service (kernel: nfsd module)
RPCNFSDPRIORITY=0
RPCNFSDCOUNT=8
RPCNFSDOPTS="-N 4"

# rpc.mountd - server side of nfs mount protocol
RPCMOUNTDOPTS="-p 32767 --manage-gids --no-nfs-version 4"

# rpc.svcgssd - userspace daemon to handle security context for kernel rpcsec_gss
NEED_SVCGSSD="no"
RPCSVCGSSDOPTS=""

## end of file

TODO:
 - provide a /etc/ctdb/nfs-kernel-server.example
 - provide instructions to replace /etc/default/nfs-kernel-server with the example

(k)inaddy@ctdbserver02:/etc/sysctl.d$ cat 98-nfs-static-ports.conf
fs.nfs.nfs_callback_tcpport = 32764
fs.nfs.nlm_tcpport = 32768
fs.nfs.nlm_udpport = 32768

TODO:
 - provide a /etc/ctdb/98-nfs-static-ports-sysctl.conf.example
 - provide instructions to place it as /etc/sysctl.d/98-nfs-static-ports.conf

With all those changes, that I'll provide either by suggesting upstream patches and backporting to Debian Unstable -> Ubuntu Devel, or by creating changes as regular Debian Package patches, I was able to make CTDB NFS HA to work flawless:

(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver01.public:/mnt/glusterfs/data /mnt/ctdbserver01
(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver02.public:/mnt/glusterfs/data /mnt/ctdbserver02
(k)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdbserver03.public:/mnt/glusterfs/data /mnt/ctdbserver03

(k)inaddy@ctdbclient01:~$ while true; do sleep 2; dd if=/dev/random of=/mnt/ctdbserver01/file bs=1k count=2 ; dd if=/dev/random of=/mnt/ctdbserver02/file bs=1k count=2; dd if=/dev/random of=/mnt/ctdbserver03/file bs=1k count=2; done

0+2 records in
0+2 records out
160 bytes copied, 0.0127338 s, 12.6 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.00846823 s, 18.7 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0096586 s, 16.4 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.01485 s, 10.6 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0134006 s, 11.8 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.00700728 s, 22.5 kB/s
0+2 records in
0+2 records out
158 bytes copied, 0.0109971 s, 14.4 kB/s
0+2 records in
0+2 records out

During a failure in one of the ctdbservers (ctdbserver03):

(k)inaddy@ctdbserver02:~$ ctdb status
Number of nodes:3
pnn:0 172.16.9.1 OK
pnn:1 172.16.9.2 OK (THIS NODE)
pnn:2 172.16.9.3 DISCONNECTED|UNHEALTHY|STOPPED|INACTIVE
Generation:92168728
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:0

And the public addresses were correctly set in interface "eth1" as they were supposed:

(k)inaddy@ctdbserver01:~$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:1c:31:c3 brd ff:ff:ff:ff:ff:ff
    inet 172.16.17.1/24 brd 172.16.17.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 172.16.17.3/24 brd 172.16.17.255 scope global secondary eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe1c:31c3/64 scope link
       valid_lft forever preferred_lft forever

(k)inaddy@ctdbserver02:~$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:af:f8:53 brd ff:ff:ff:ff:ff:ff
    inet 172.16.17.2/24 brd 172.16.17.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feaf:f853/64 scope link
       valid_lft forever preferred_lft forever

according to variable NFS_HOSTNAME. Note that ctdbserver01 has its own public ip address and ip address of ctdbserver03, which I failed on purpose. NFS client kept access like it should.

For the bad ss syntax in /ctdb/config/functions: update_tickles(), I have opened the following upstream bug:

https://bugzilla.samba.org/show_bug.cgi?id=13985

And provided the following tested patch:

https://lists.samba.org/archive/samba-technical/2019-June/133701.html

Waiting for upstream acceptance (like the other) so I can ask debian to accept my delta (considering they wont merge to latest upstream version).

Download full text (3.6 KiB)

For this last "ss" syntax issue, I have opened a bug in samba upstream project and the most important comment, that explains what is happening, is this one:

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13985

COMMENT: https://bugzilla.samba.org/show_bug.cgi?id=13985#c4

"""
Hello Martin,

Errrr, that puzzled me now, my workstation is a debian sid and, for obvious reasons, all my development environment is Ubuntu... I re-checked "ss" execution in Debian and it worked :\, went back to Ubuntu Eoan and it didn't.

inaddy@workstation:~$ ss -tn state established '( src [172.16.0.3] || src [172.16.0.3] ) ( sport == :22 )' | small
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 172.16.0.3:22 172.16.0.6:53580
0 0 172.16.0.3:22 172.16.0.6:62337
0 0 172.16.0.3:22 172.16.0.6:53587
...

(k)inaddy@ctdbserver01:~$ ss -tn state established '( src [172.16.0.3] || src [172.16.0.3] ) ( sport == :22 )' | small
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]
   -h, --help this message
   -V, --version output version information
   -n, --numeric don't resolve service names
   -r, --resolve resolve host names
   -a, --all display all sockets
...

Checking different versions:

inaddy@workstation:~$ rmadison iproute2 | awk '{print $1 $2 $3 $4 $5}'
iproute2|3.16.0-2|oldstable
iproute2|4.9.0-1+deb9u1|stable
iproute2|4.9.0-1+deb9u1|stable-debug
iproute2|4.14.1-1~bpo9+1|stretch-backports
iproute2|4.20.0-2~bpo9+1|stretch-backports
iproute2|4.20.0-2~bpo9+1|stretch-backports-debug
iproute2|4.20.0-2|testing
iproute2|4.20.0-2|unstable
iproute2|4.20.0-2|unstable-debug
iproute2|5.1.0-1|experimental
iproute2|5.1.0-1|experimental-debug

inaddy@workstation:~$ rmad iproute2 | awk '{print $1 $2 $3 $4 $5}'
iproute2|3.12.0-2|trusty
iproute2|3.12.0-2ubuntu1.2|trusty-updates
iproute2|4.3.0-1ubuntu3|xenial
iproute2|4.3.0-1ubuntu3.16.04.5|xenial-updates
iproute2|4.15.0-2ubuntu1|bionic
iproute2|4.18.0-1ubuntu2~ubuntu18.04.1|bionic-backports
iproute2|4.18.0-1ubuntu2|cosmic
iproute2|4.18.0-1ubuntu2|disco
iproute2|4.18.0-1ubuntu2|eoan

Since Ubuntu is using an older version for quite awhile, I think latest change to this line broke compatibility with older versions. That, per se, could justify this patch.

commit 04fe9e20749985c71fef1bce7f6e4c439fe11c81
Author: Martin Schwenke <email address hidden>
Date: Thu Aug 27 00:22:49 2015

    ctdb-scripts: Use ss instead of netstat for finding TCP connections

    ss with a filter is much faster than post-processing output from
    netstat. CTDB already has a hard dependency on iproute2 for IP
    address handling, so depending on ss is no big deal.

    Signed-off-by: Martin Schwenke <email address hidden>
    Reviewed-by: Amitay Isaacs <email address hidden>

This is the change that introduced this "ss" code, and, I believe that this has been broken in Ubuntu since then. Actually I'm not aware that CTDB ever worked in Ubuntu the way it should, that's why I'm working in the following Bugs:

https://bugs.launchpad.net/ubuntu/+source/samba/+bug/722201
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1821775
https://bugs.launchpad.net/u...

Read more...

For the iproute2 issue, Marting (samba upstream) and I found out that iproute2 seems to be broken in Cosmic, Disco and Eoan. I have opened the following bug:

https://bugs.launchpad.net/ubuntu/+source/iproute2/+bug/1831775

And will bisect iproute2 upstream code to propose a SRU for it.

With that, the following upstream samba bug:

https://bugzilla.samba.org/show_bug.cgi?id=13985

Would be solved by the Ubuntu issue resolution on iproute2.

CTDB will definitely depend on SRUs of:

https://bugs.launchpad.net/ubuntu/+source/iproute2/+bug/1831775/comments/3

For Cosmic and Disco. Eoan will be merged to iproute2 latest.

 I just realized bionic-backports has an affected (broken) version of iproute2 (backported from cosmic). That should also be fixed.

I have created the following PPA with all fixes:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1831381

And the following git repo:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/samba/+git/samba

Contains all commits I'm gonna propose tomorrow.

Because of git-ubuntu, I had to rebase all the code I have tested before, and knew was working good, so I want to give it a last test before asking for a merge request do Eoan, suggesting do Debian and cherry-picking to Disco, Cosmic and Bionic.

@Robert,

Feel free to test this PPA if you would like (its Eoan only for now). If not, I'll test all the others anyway. Note that I have created a small script to "enable" the NFS HA nodes and make our lives easier. I'll create proper documentation on how to use it also, but its very straighforward if you execute:

$ cd /etc/ctdb/examples
$ ./enable-nfs.sh

Changed in samba (Ubuntu Disco):
status: New → Confirmed
Changed in samba (Ubuntu Cosmic):
status: New → Confirmed
Changed in samba (Ubuntu Bionic):
status: New → Confirmed
Changed in samba (Ubuntu Disco):
importance: Undecided → Medium
Changed in samba (Ubuntu Cosmic):
importance: Undecided → Medium
Changed in samba (Ubuntu Bionic):
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in samba (Ubuntu Cosmic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in samba (Ubuntu Disco):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in ctdb (Debian):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
importance: Undecided → Unknown
status: New → Unknown
Changed in ctdb (Debian):
status: Unknown → Confirmed

Fix (merge request) is being discussed and will (likely) get merged, for Eoan, soon. I'll work in Bionic and Disco backports right after (SRU).

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package samba - 2:4.10.0+dfsg-0ubuntu5

---------------
samba (2:4.10.0+dfsg-0ubuntu5) eoan; urgency=medium

  * debian/rules: Make DEB_HOST_ARCH_CPU initialized through
    dpkg-architecture (Closes: #931138)
  * d/p/ctdb-scripts-fix-tcp_tw_recycle-existence-check.patch:
    fix tcp_tw_recycle existence check. (LP: #722201)
  * d/p/fix-nfs-service-name-to-nfs-kernel-server.patch:
    change nfs service name from nfs to nfs-kernel-server
    (LP: #722201)
  * d/ctdb.install, d/rules: create ctdb run directory into tmpfiles.d
    to allow pid file to exist (LP: #1821775)
  * Allow proper ctdb initialization (LP: #1828799):
    - d/ctdb.dirs: added /var/lib/ctdb/* directories
    - d/ctdb.postrm: remove leftovers from:
      /var/lib/ctdb/{state,persistent,volatile,scripts}
  * d/rules: installing provided config examples and helper scripts
  * Examples of NFS HA CTDB config files + helper script:
    - d/ctdb.example.enable.nfs.sh
    - d/ctdb.example.nfs-common
    - d/ctdb.example.nfs-kernel-server
    - d/ctdb.example.services
    - d/ctdb.example.sysctl-nfs-static-ports.conf
  * d/p/ctdb-config-depend-on-etc-default-nodes-file.patch:
    do not try to start daemon if /etc/ctdb/nodes does not exist
  * d/p/ctdb-config-enable-syslog-by-default.patch:
    enable syslog and systemd journal by default

 -- Rafael David Tinoco <email address hidden> Fri, 28 Jun 2019 00:14:27 +0000

Changed in samba (Ubuntu Eoan):
status: In Progress → Fix Released
Changed in samba (Ubuntu Cosmic):
status: Confirmed → Won't Fix

Okay, for Eoan merge, I forgot to fix script 06.nfs (just like 60.nfs was fixed) in the following patch:

  * d/p/fix-nfs-service-name-to-nfs-kernel-server.patch:
    change nfs service name from nfs to nfs-kernel-server

I'm creating a new MR to include this fix and the SRUs will contain both.

Changed in samba (Ubuntu Eoan):
status: Fix Released → In Progress

For my last comment, this was fixed in the following version:

samba (2:4.10.0+dfsg-0ubuntu6) eoan; urgency=medium

  * d/p/fix-nfs-service-name-to-nfs-kernel-server.patch:
    change service name from nfs to nfs-kernel-server in
    legacy script 06.nfs.script also (LP: #722201)

 -- Rafael David Tinoco <email address hidden> Thu, 11 Jul 2019 21:44:49 +0000

Changed in samba (Ubuntu Eoan):
status: In Progress → Fix Released

I'm marking Bionic as won't fix since it has a pretty old CTDB implementation and the effort in making all changes I've done into Upstream/Sid/Eoan/Disco is not worth to be done in Bionic, in my opinion. Opened to discussion if users are relying in Bionic CTDB for NFS HA and can't make it work w/ old documentation/code.

Changed in samba (Ubuntu Bionic):
status: Confirmed → Won't Fix
description: updated

I have done the SRU for Disco, but, right now I'm facing:

Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: Usage: ss [ OPTIONS ]
Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: ss [ OPTIONS ] [ FILTER ]
Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: -h, --help this message
Jul 12 03:03:54 ctdbdisco ctdb-eventd[30929]: 60.nfs: -V, --version output version information

During tests from PPA:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp722201

Because of the iproute2 ss issue:

https://bugs.launchpad.net/ubuntu/+source/iproute2/+bug/1831775

As soon as the SRU happens to disco, CTDB will also be good to be SRUed on Disco.

Changed in ctdb (Debian):
status: Confirmed → Fix Released
Changed in samba (Ubuntu Disco):
status: Confirmed → In Progress
Changed in samba (Ubuntu Eoan):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in samba (Ubuntu Bionic):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in samba (Ubuntu Cosmic):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
affects: ctdb (Debian) → samba (Debian)
description: updated

I just re-pushed the same merge request with the changes flagged in MR review. Thanks a lot Christian and Andreas for the review. I think this is ready for sponsorship and SRU.

Disco Merge request was approved by Canonical Server Team. Waiting sponsorship for Disco. Thanks a lot for the reviews!

Hello Kirill, or anyone else affected,

Accepted samba into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/samba/2:4.10.0+dfsg-0ubuntu2.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in samba (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.