2021-04-26 09:36:38 |
Martijn van Duren |
bug |
|
|
added bug |
2021-04-26 09:36:38 |
Martijn van Duren |
attachment added |
|
dhclient strace and pcap https://bugs.launchpad.net/bugs/1926139/+attachment/5492409/+files/dhcp.tar.bz2 |
|
2021-04-26 11:37:24 |
Martijn van Duren |
description |
Platform: Qemu/libvirt on AMD64
Ubuntu version: 20.04
isc-dhcp-client version: 4.4.1-2.1ubuntu5
Problem: When dhclient is used during boot every few reboots the DHCP OFFER packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no read event is triggered.
Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still occurs. So this issue might also be kernel related.
Attached diff shows a strace of all threads and a pcap showing the tcpdump output. |
Platform: Qemu/libvirt on AMD64
Ubuntu version: 20.04
isc-dhcp-client version: 4.4.1-2.1ubuntu5
Problem: When dhclient is used during boot every few reboots the DHCP OFFER packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no read event is triggered.
Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still occurs. So this issue might also be kernel related.
Attached diff shows a strace of all threads and a pcap showing the tcpdump output.
Edit:
- Sometimes the dhclient command does receive the OFFER packet and connection is restored.
- In my testing running dhclient manually from the terminal when the OFFERs aren't received will result in a new dhclient session which does receive the OFFER packet and connection is restored. |
|
2022-05-26 12:03:19 |
Chris Patterson |
attachment added |
|
parallel test https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh |
|
2022-05-26 12:03:46 |
Chris Patterson |
attachment added |
|
sequential test https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593046/+files/test-sequential.sh |
|
2022-12-14 23:13:55 |
Matthew Ruffell |
bug |
|
|
added subscriber Matthew Ruffell |
2022-12-15 02:18:34 |
Jay Vosburgh |
bug |
|
|
added subscriber Jay Vosburgh |
2023-01-16 03:43:31 |
Matthew Ruffell |
attachment added |
|
Debdiff for bind9-libs for Focal https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641515/+files/lp1926139_focal.debdiff |
|
2023-01-16 03:44:17 |
Matthew Ruffell |
attachment added |
|
Debdiff for bind9-libs for Jammy https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641516/+files/lp1926139_jammy.debdiff |
|
2023-01-16 03:44:42 |
Matthew Ruffell |
tags |
|
focal jammy sts |
|
2023-01-16 03:45:32 |
Matthew Ruffell |
bug task added |
|
bind9-libs (Ubuntu) |
|
2023-01-16 03:45:50 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Jammy |
|
2023-01-16 03:45:50 |
Matthew Ruffell |
bug task added |
|
isc-dhcp (Ubuntu Jammy) |
|
2023-01-16 03:45:50 |
Matthew Ruffell |
bug task added |
|
bind9-libs (Ubuntu Jammy) |
|
2023-01-16 03:45:50 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Focal |
|
2023-01-16 03:45:50 |
Matthew Ruffell |
bug task added |
|
isc-dhcp (Ubuntu Focal) |
|
2023-01-16 03:45:50 |
Matthew Ruffell |
bug task added |
|
bind9-libs (Ubuntu Focal) |
|
2023-01-16 03:46:04 |
Matthew Ruffell |
bug task deleted |
isc-dhcp (Ubuntu Focal) |
|
|
2023-01-16 03:46:08 |
Matthew Ruffell |
bug task deleted |
isc-dhcp (Ubuntu Jammy) |
|
|
2023-01-16 03:46:13 |
Matthew Ruffell |
isc-dhcp (Ubuntu): status |
New |
Invalid |
|
2023-01-16 03:46:19 |
Matthew Ruffell |
bind9-libs (Ubuntu Focal): status |
New |
In Progress |
|
2023-01-16 03:46:21 |
Matthew Ruffell |
bind9-libs (Ubuntu Jammy): status |
New |
In Progress |
|
2023-01-16 03:46:24 |
Matthew Ruffell |
bind9-libs (Ubuntu): status |
New |
Fix Released |
|
2023-01-16 03:46:33 |
Matthew Ruffell |
bind9-libs (Ubuntu Focal): importance |
Undecided |
High |
|
2023-01-16 03:46:35 |
Matthew Ruffell |
bind9-libs (Ubuntu Jammy): importance |
Undecided |
High |
|
2023-01-16 03:46:37 |
Matthew Ruffell |
bind9-libs (Ubuntu Focal): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-01-16 03:46:39 |
Matthew Ruffell |
bind9-libs (Ubuntu Jammy): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-01-16 04:00:34 |
Matthew Ruffell |
summary |
dhclient doesn't receive dhcp offer from kernel |
dhclient: thread concurrency race leads to DHCPOFFER packets not being received |
|
2023-01-17 03:11:10 |
Matthew Ruffell |
description |
Platform: Qemu/libvirt on AMD64
Ubuntu version: 20.04
isc-dhcp-client version: 4.4.1-2.1ubuntu5
Problem: When dhclient is used during boot every few reboots the DHCP OFFER packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no read event is triggered.
Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still occurs. So this issue might also be kernel related.
Attached diff shows a strace of all threads and a pcap showing the tcpdump output.
Edit:
- Sometimes the dhclient command does receive the OFFER packet and connection is restored.
- In my testing running dhclient manually from the terminal when the OFFERs aren't received will result in a new dhclient session which does receive the OFFER packet and connection is restored. |
[Impact]
Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable.
This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud-init instead of systemd-networkd, and this is causing issues with larger deployments.
The logs of an affected dhclient produce the following:
Listening on LPF/enp1s0/52:54:00:1c:d7:00
Sending on LPF/enp1s0/52:54:00:1c:d7:00
Sending on Socket/fallback
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
...
(omitting 20 similar lines)
...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/
The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up.
tcpdump:
Screenshot of Wireshark:
This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies.
The full explanation is in the "Other Info" section, but the fix for this is to change bind9-libs from being built multithreaded, back to single threaded as intended by dhclient maintainers.
In Focal and Jammy, isc-dhcp links against bind9 libraries provided in bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9 library it uses, which is already configured properly to --disable-threads.
Change the Focal and Jammy bind9-libs to --disable-threads and update symbol files to reflect the library is single threaded again.
[Testcase]
Start a fresh Focal or Jammy instance.
Download and set executable test-parallel.sh, and edit some lines:
1) wget https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh
2) chmod +x test-parallel.sh
3) vim test-parallel.sh
Change iface="enp5s0" to your interface, likely iface="enp1s0".
Comment out the line "# cp bionic-dhclient $workdir/dhclient".
4) sudo ./test-parallel.sh
After five minutes, if you issue reproduces, you will see "TEST FAILED".
You can watch the output with:
5) cat /tmp/dhclient-* | less
Next, for instrumented runs, you need to build dhclient from source.
1) sudo apt install build-essential devscripts
2) apt source isc-dhcp
3) sudo apt build-dep isc-dhcp
4) cd isc-dhcp
Apply the below patch:
https://paste.ubuntu.com/p/hGsssrVyG4/
5) patch -p1 < ~/patch.patch
6) debuild -b -uc -us
7) cd ..
8) sudo dpkg -i isc-dhcp-client-*
9) sudo ./test-parallel.sh
10) cat /tmp/dhclient-* | less
Look for the race, as described in "Other Info", namely:
mruffell: registering with socket manager
mruffell: callback called
mruffell: omapi object is NULL
mruffell: omapi object is NULL
mruffell: Adding obj to linked list
mruffell: Obj added to list
The issue has reproduced.
If you install the test package from the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf337873-test
Instructions to install (on a Focal or Jammy system):
1) sudo add-apt-repository ppa:mruffell/sf337873-test
2) sudo apt update
3) sudo apt install libdns-export1109 libisc-export1105
4) sudo apt-cache policy libisc-export1105 | grep Installed
Installed: 1:9.11.16+dfsg-3~ubuntu1+sf337873v20230116b2
You will notice the issue no longer reproduces.
You can also check to see if dhclient runs as multithreaded by:
$ ps aux | grep dhclient
<look for PID>
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
When single threaded, the output should be:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
[Where problems could occur]
We are changing the configuration of bind9-libs for Focal and Jammy from a net --enable-threads, to --disable-threads. This will remove multithreading for bind9-libs and all applications that link to it.
bind9-libs only services isc-dhcp, as per apt rdepends in the "Other Info" section, and the package was only introduced as a way to get isc-dhcp to build with older bind9 libs, and avoid having bind9 in-tree.
In Kinetic onward, the bind9 libs are shipped in-tree for isc-dhcp, and bind9-libs was removed from the archive.
Since we are only affecting isc-dhcp, we won't affect users of other packages, unless they for some reason decided to link against bind9-libs instead of the actual bind9 in the archive.
dhclient was never written for thread concurrency, shown by the lack of any synchronisation primitives whatsoever, and won't be negatively affected by going back to being single threaded, as it is in all other Ubuntu releases.
We are editing the symbol files to remove exported library calls. I have manually grep'd the isc-dhcp source, and those functions are not used, but for any users that got confused and link against bind9-libs, they might be affected by the change. For them, they need to link against the actual bind9 in the archive.
[Other info]
I was reading around the upstream issue trackers, and found the following two bug reports:
https://gitlab.isc.org/isc-projects/dhcp/-/issues/264
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996356
The ISC upstream report was actually quite detailed, and it has the same symptoms of what we are experiencing.
Let's have a look at the root cause. The code I am using is isc-dhcp
4.4.1-2.1ubuntu5.20.04.4 from Focal.
common/discover.c
567 void
568 discover_interfaces(int state) {
...
1002 case AF_INET:
1003 default:
1004 status = omapi_register_io_object((omapi_object_t *)tmp,
1005 if_readsocket,
1006 0, got_one, 0, 0);
1007 break;
1008 }
...
In discover.c, we call discover_interfaces() to iterate over the interfaces, and attempt to register a raw socket against it. We do this by calling omapi_register_io_object() which is used for reading data, and calls the elusive got_one() callback that you instrumented your code to see if it was being called or not.
omapip/dispatch.c
196 /* Register an I/O handle so that we can do asynchronous I/O on it. */
197
198 isc_result_t omapi_register_io_object (omapi_object_t *h,
199 int (*readfd) (omapi_object_t *),
200 int (*writefd) (omapi_object_t *),
201 isc_result_t (*reader)
202 (omapi_object_t *),
203 isc_result_t (*writer)
204 (omapi_object_t *),
205 isc_result_t (*reaper)
206 (omapi_object_t *))
207 {
...
241 /*
242 * Attach the I/O object to the isc socket library via the
243 * fdwatch function. This allows the socket library to watch
244 * over a socket that we built. If there are both a read and
245 * a write socket we asssume they are the same socket.
246 */
247
248 if (readfd) {
249 fd_flags |= ISC_SOCKFDWATCH_READ;
250 fd = readfd(h);
251 }
...
257
258 if (fd_flags != 0) {
259 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
260 fd, fd_flags,
261 omapi_iscsock_cb,
262 obj,
263 dhcp_gbl_ctx.task,
264 &obj->fd);
...
275 }
276
277
278 /* Find the last I/O state, if there are any. */
279 for (p = omapi_io_states.next;
280 p && p -> next; p = p -> next)
281 ;
282 if (p)
283 omapi_io_reference (&p -> next, obj, MDL);
284 else
285 omapi_io_reference (&omapi_io_states.next, obj, MDL);
...
omapi_register_io_object() is called for each socket created, in this case, the if_readsocket from discover_interfaces(). The file descriptor is assigned ISC_SOCKFDWATCH_READ, and we enter the if statement.
The if statement calls isc_socket_fdwatchcreate(), which registers the socket with the ISC socket manager, and sets up the callback omapi_iscsock_cb(), to be called.
Once that has been done, we iterate over the omapi_io_states linked list, which is a global list of registered sockets. We get to the end of the list (or the beginning, if the list is empty), and add the socket to the list.
Now, the bug happens between calling isc_socket_fdwatchcreate() to register the socket with the socket manager, and adding it to the global linked list.
Sometimes, the callback omapi_iscsock_cb() is called inbetween.
omapip/dispatch.c
101 /*
102 * Callback routine to connect the omapi I/O object and socket with
103 * the isc socket code. The isc socket code will call this routine
104 * which will then call the correct local routine to process the bytes.
105 *
106 * Currently we are always willing to read more data, this should be modified
107 * so that on connections we don't read more if we already have enough.
108 *
109 * If we have more bytes to write we ask the library to call us when
110 * we can write more. If we indicate we don't have more to write we need
111 * to poke the library via isc_socket_fdwatchpoke.
112 */
...
118 #define SOCKDELETE 1
119 int
120 omapi_iscsock_cb(isc_task_t *task,
121 isc_socket_t *socket,
122 void *cbarg,
123 int flags)
124 {
...
132 #if SOCKDELETE
133 /*
134 * walk through the io states list, if our object is on there
135 * service it. if not ignore it.
136 */
137 for (obj = omapi_io_states.next; obj != NULL; obj = obj->next) {
138 if (obj == cbarg)
139 break;
140 }
141
142 if (obj == NULL) {
143 return(0);
144 }
145 #else
...
164 if ((flags == ISC_SOCKFDWATCH_READ) &&
165 (obj->reader != NULL) &&
166 (obj->inner != NULL)) {
167 status = obj->reader(obj->inner);
168 /*
169 * If we are shutting down (basically tried to
170 * read and got no bytes) we don't need to try
171 * again.
172 */
173 if (status == ISC_R_SHUTTINGDOWN)
174 return (0);
175 /* Otherwise We always ask for more when reading */
176 return (1);
...
188 /*
189 * We get here if we either had an error (inconsistent
190 * structures etc) or no more to write, tell the socket
191 * lib we don't have more to do right now.
192 */
193 return (0);
194 }
When omapi_iscsock_cb() is called, we iterate over that same omapi_io_states global linked list, and since the raw socket we are about to add is the very first item going to be placed on the list, omapi_io_states.next will be NULL, making obj set to NULL. Well what happens when obj is NULL? We return 0.
142 if (obj == NULL) {
143 return(0);
144 }
If you look down the code a little further, we see that if we are finished with the socket, like shutting down, or an error happens, we return 0 to indicate that we are done with this socket, and won't be needing it anymore, and won't be using it again.
If we return 1, then we ask for more data later on, and omapi_iscsock_cb() will be called again in the future.
So, by returning 0, we will no longer read any packets from the socket, and thus, we simply ignore and will never read any DHCPOFFER packets.
We obviously have two sockets open, one for reading and one for writing, since one is bound to port 67 and the other to port 68, so we keep sending out DHCPDISCOVER packets, not knowing we are ignoring all DHCPOFFERS due to the read socket being closed and not calling omapi_iscsock_cb() ever again.
So, as explained on the upstream bug, what is needed for this race condition to occur?
1) There needs to be data that can be read immediately by the socket. This is how omapi_iscsock_cb() is called, when there is data waiting. If there is no data to read, then the race can't happen.
2) Thread scheduling must occur in a particular order.
There are three threads:
a. The main thread, which is registering the socket
b. The socket manager coordinator thread
c. A socket manager worker thread
For the race to occur, we need to switch from a) to b) to schedule the callback, b) to c) to call the callback and return 0, all before we return from c) to a) to add the socket to the omapi_io_states linked list.
To verify this, I added some print statements to omapi_iscsock_cb() and omapi_register_io_object() to see what is happening:
https://paste.ubuntu.com/p/hGsssrVyG4/
I instrumented entry to the callback, as well as if obj is NULL, and before and after the socket is registered to the socket manager, and before and after the socket is added to the global linked list.
I built the new dhclient, and ran test-parallel.sh, since it by far is the best at reproducing the issue. I concatenated all 20 /tmp/dhclient-* logs to read:
https://paste.ubuntu.com/p/VddddSmdMV/
If we look at one that works and gets the DHCPOFFER, we see:
https://paste.ubuntu.com/p/dfdvNM3RDd/
We have two sockets, one that reads and one that writes. It registers with the socket manager, adds the socket to the linked list, for both sockets, one after the other, and then the DHCPDISOVER is set, and then callbacks are called.
Now, when the issue reproduces:
https://paste.ubuntu.com/p/77BNNhDwdh/
We see that we register the read socket with the socket manager, and immediately get a callback to omapi_iscsock_cb(), so there is likely a packet to read already. We return NULL, since the socket is not on the linked list yet, and strangely this happens twice when we have only entered omapi_iscsock_cb() once. We then add the socket to the linked list, showing we have hit the race condition. The writer socket is added to the socket manager, but does not race, as it is added to the linked list before the callback happens. The callback only seems to be called for the writer socket, and we ignore all DHCPOFFER packets, only sending DHCPDISCOVER packets.
I ran the test-parallel.sh script a few times, and each and every time the issue reproduced, we had a NULL linked list, and returned 0 from omapi_iscsock_cb(). So I am confident we have come across the root cause.
Now, I had an idea of adding a sleep between registering to the socket manager and adding the socket to the linked list, to try and get every single agent to fail 100% of the time, hoping that the callback would be called first.
The change is something like the likes of below:
https://paste.ubuntu.com/p/8zJ2FQK99X/
But after I built it and ran it a few times, I couldn't reproduce the issue at all. A typical run is below:
https://paste.ubuntu.com/p/k6wRhx4RCM/
We can see that everything happens in the correct order, and the issue is not reproduced. I think because each dhclient has to wait 2 seconds before sending a DHCPDISCOVER, the network is quiet, and there is no packet to read immediately, since neighboring agents are also quiet, so the callback is not called, and the socket is added to the linked list before the network gets busy.
The code itself to add the socket to the linked list was added in the below commit, 22 years ago:
commit 61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
From: Ted Lemon <source@isc.org>
Date: Thu, 2 Sep 1999 00:32:56 +0000
Subject: Omapi library, initial checkin
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
Later on, the registration to the socket manager and the callback was added 13 years ago:
commit 98bf16077d22f28e288a18e184a9d1f97cb5f4f7
From: Shawn Routhier <sar@isc.org>
Date: Wed, 28 Oct 2009 04:12:30 +0000
Subject: Support for asynchronous ddns per ticket 19216 - convert to
using isclib and dnslib.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/98bf16077d22f28e288a18e184a9d1f97cb5f4f7
Finally, the null linked list check in the callback that returns 0 was also added 13 years ago:
commit 8fa0112dd11ee301046b42fe463074b067e61a35
From: Shawn Routhier <sar@isc.org>
Date: Thu, 19 Nov 2009 23:49:57 +0000
Subject: Ticket 20540 - clean ups for DHCP-DDNS work, add an change
dropped in the merge, arrange to handle callbacks from the socket code after
we've deleted the socket and deal with the quantum issue in the tasks. This
last will be removed shortly as a better fix is now available.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/8fa0112dd11ee301046b42fe463074b067e61a35
This code and thus the bug is present in all versions of Ubuntu from 10.04 LTS onward.
However, I could not reproduce on Bionic, Kinetic or Lunar, so I looked into why this only seems to affect Focal and Jammy, when all versions share the same buggy code.
It appears that Focal and Jammy dhclient is multithreaded:
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
While on Bionic, Kinetic and Lunar, it presents only a single thread:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
You can't have thread concurrency issues if you run as a single thread. The race conditions simply did not happen, since everything had to execute in order, which is why this bug has existed for 13 years in its current form.
Chris Patterson in comment #2 mentions when he built dhclient from the latest Debian source, available here:
https://salsa.debian.org/debian/isc-dhcp/-/commits/master/debian
Chris mentions that it solves the problem, and that it uses in-tree bind libraries. Looking into this, we see the tree has bind 9.11.36:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/version.tmp
and its makefile explicitly shows threads being disabled:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/Makefile.in
bindconfig =
...
--disable-threads
...
bind 9.11.36 still supports disabling threads. Hence this is how the upstream debian build did not reproduce the issue on Focal, since threading was disabled and dhclient ran as a single thread.
On Kinetic and Lunar, the isc-dhcp package seems to track the debian upstream version, with the very same in-tree bind libraries, set to --disable-threads, which is why it is fixed in those versions.
So, we have two options for a fix for Focal and Jammy:
1) We disable threading for dhclient.
2) We add in a mutex to resolve this particular concurrency issue.
2) was the plan, but when I think about it, if dhclient has such a basic thread concurrency issue, and has had it for 13 years without anyone noticing, it is a fair chance that this will not be the only thread concurrency issue.
Looking at the source code, there is no mutexes or semaphores. pthread isn't even a dependency, and it is not linked in. There is no thread synchronisation primitives at all.
I think if we fix the problem, another issue will crop up in six months time, and it will be another concurrency issue.
Now, there is a small issue that isc-dhcp is now officially End Of Life, and has effectively been abandoned by upstream. You can read about it in these notices:
https://lists.isc.org/pipermail/dhcp-users/2022-October/022786.html
https://www.isc.org/blogs/isc-dhcp-eol/
Upstream won't fix any more bugs, make any new releases, or even accept any new commits. They are putting their efforts into isc-kea now.
In which case, 1) to disabling threading seems better.
But it seems isc-dhcp on Focal and Jammy are both incompatible with their native bind9 packages, and instead, use the libraries from bind9-libs, a separate package that packages 9.11.16 for Focal and 9.11.19 for Jammy, to work around the incompatibility issue.
https://launchpad.net/ubuntu/+source/bind9-libs
Looking at the dependencies for isc-dhcp in Focal:
$ apt depends isc-dhcp-client
isc-dhcp-client
Depends: libc6 (>= 2.15)
Depends: libdns-export1109
Depends: libisc-export1105
Depends: debianutils (>= 2.8.2)
Depends: iproute2
Recommends: isc-dhcp-common
Suggests: resolvconf
openresolv
Suggests: avahi-autoipd
Suggests: isc-dhcp-client-ddns
We see we depend on libdns-export1109 and libisc-export1105 from bind9-libs.
Looking at those reverse dependencies:
$ apt rdepends libisc-export1105
libisc-export1105
Reverse Depends:
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~ubuntu1)
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~build1)
Depends: isc-dhcp-server
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: isc-dhcp-client
Depends: isc-dhcp-server
It seems it is only isc-dhcp is a reverse dependency.
So it appears perfectly safe to change bind9-libs from --enable-threads to --disable-threads.
Looking at their debian/rules files:
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/focal-proposed
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/jammy-devel
We see:
CONFIGURE_COMMON := \
...
--disable-threads \
...
--enable-threads \
...
It appears there has been a mistake when setting up the configure options, and both --disable-threads and --enable-threads is set. Because of the way configure is set up --enable-threads will always win.
The fix would be a one line change, to remove --enable-threads.
Since isc-dhcp is dynamically linked to libdns-export1109 and libisc-export1105, we would probably not even need to rebuild isc-dhcp, and instead, only a fix to bind9-libs would be needed. |
|
2023-01-17 03:15:26 |
Matthew Ruffell |
attachment added |
|
Packet capture during reproduction https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641810/+files/test.pcap |
|
2023-01-17 03:16:23 |
Matthew Ruffell |
attachment added |
|
Screenshot of wireshark https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png |
|
2023-01-17 03:17:04 |
Matthew Ruffell |
description |
[Impact]
Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable.
This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud-init instead of systemd-networkd, and this is causing issues with larger deployments.
The logs of an affected dhclient produce the following:
Listening on LPF/enp1s0/52:54:00:1c:d7:00
Sending on LPF/enp1s0/52:54:00:1c:d7:00
Sending on Socket/fallback
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
...
(omitting 20 similar lines)
...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/
The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up.
tcpdump:
Screenshot of Wireshark:
This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies.
The full explanation is in the "Other Info" section, but the fix for this is to change bind9-libs from being built multithreaded, back to single threaded as intended by dhclient maintainers.
In Focal and Jammy, isc-dhcp links against bind9 libraries provided in bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9 library it uses, which is already configured properly to --disable-threads.
Change the Focal and Jammy bind9-libs to --disable-threads and update symbol files to reflect the library is single threaded again.
[Testcase]
Start a fresh Focal or Jammy instance.
Download and set executable test-parallel.sh, and edit some lines:
1) wget https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh
2) chmod +x test-parallel.sh
3) vim test-parallel.sh
Change iface="enp5s0" to your interface, likely iface="enp1s0".
Comment out the line "# cp bionic-dhclient $workdir/dhclient".
4) sudo ./test-parallel.sh
After five minutes, if you issue reproduces, you will see "TEST FAILED".
You can watch the output with:
5) cat /tmp/dhclient-* | less
Next, for instrumented runs, you need to build dhclient from source.
1) sudo apt install build-essential devscripts
2) apt source isc-dhcp
3) sudo apt build-dep isc-dhcp
4) cd isc-dhcp
Apply the below patch:
https://paste.ubuntu.com/p/hGsssrVyG4/
5) patch -p1 < ~/patch.patch
6) debuild -b -uc -us
7) cd ..
8) sudo dpkg -i isc-dhcp-client-*
9) sudo ./test-parallel.sh
10) cat /tmp/dhclient-* | less
Look for the race, as described in "Other Info", namely:
mruffell: registering with socket manager
mruffell: callback called
mruffell: omapi object is NULL
mruffell: omapi object is NULL
mruffell: Adding obj to linked list
mruffell: Obj added to list
The issue has reproduced.
If you install the test package from the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf337873-test
Instructions to install (on a Focal or Jammy system):
1) sudo add-apt-repository ppa:mruffell/sf337873-test
2) sudo apt update
3) sudo apt install libdns-export1109 libisc-export1105
4) sudo apt-cache policy libisc-export1105 | grep Installed
Installed: 1:9.11.16+dfsg-3~ubuntu1+sf337873v20230116b2
You will notice the issue no longer reproduces.
You can also check to see if dhclient runs as multithreaded by:
$ ps aux | grep dhclient
<look for PID>
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
When single threaded, the output should be:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
[Where problems could occur]
We are changing the configuration of bind9-libs for Focal and Jammy from a net --enable-threads, to --disable-threads. This will remove multithreading for bind9-libs and all applications that link to it.
bind9-libs only services isc-dhcp, as per apt rdepends in the "Other Info" section, and the package was only introduced as a way to get isc-dhcp to build with older bind9 libs, and avoid having bind9 in-tree.
In Kinetic onward, the bind9 libs are shipped in-tree for isc-dhcp, and bind9-libs was removed from the archive.
Since we are only affecting isc-dhcp, we won't affect users of other packages, unless they for some reason decided to link against bind9-libs instead of the actual bind9 in the archive.
dhclient was never written for thread concurrency, shown by the lack of any synchronisation primitives whatsoever, and won't be negatively affected by going back to being single threaded, as it is in all other Ubuntu releases.
We are editing the symbol files to remove exported library calls. I have manually grep'd the isc-dhcp source, and those functions are not used, but for any users that got confused and link against bind9-libs, they might be affected by the change. For them, they need to link against the actual bind9 in the archive.
[Other info]
I was reading around the upstream issue trackers, and found the following two bug reports:
https://gitlab.isc.org/isc-projects/dhcp/-/issues/264
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996356
The ISC upstream report was actually quite detailed, and it has the same symptoms of what we are experiencing.
Let's have a look at the root cause. The code I am using is isc-dhcp
4.4.1-2.1ubuntu5.20.04.4 from Focal.
common/discover.c
567 void
568 discover_interfaces(int state) {
...
1002 case AF_INET:
1003 default:
1004 status = omapi_register_io_object((omapi_object_t *)tmp,
1005 if_readsocket,
1006 0, got_one, 0, 0);
1007 break;
1008 }
...
In discover.c, we call discover_interfaces() to iterate over the interfaces, and attempt to register a raw socket against it. We do this by calling omapi_register_io_object() which is used for reading data, and calls the elusive got_one() callback that you instrumented your code to see if it was being called or not.
omapip/dispatch.c
196 /* Register an I/O handle so that we can do asynchronous I/O on it. */
197
198 isc_result_t omapi_register_io_object (omapi_object_t *h,
199 int (*readfd) (omapi_object_t *),
200 int (*writefd) (omapi_object_t *),
201 isc_result_t (*reader)
202 (omapi_object_t *),
203 isc_result_t (*writer)
204 (omapi_object_t *),
205 isc_result_t (*reaper)
206 (omapi_object_t *))
207 {
...
241 /*
242 * Attach the I/O object to the isc socket library via the
243 * fdwatch function. This allows the socket library to watch
244 * over a socket that we built. If there are both a read and
245 * a write socket we asssume they are the same socket.
246 */
247
248 if (readfd) {
249 fd_flags |= ISC_SOCKFDWATCH_READ;
250 fd = readfd(h);
251 }
...
257
258 if (fd_flags != 0) {
259 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
260 fd, fd_flags,
261 omapi_iscsock_cb,
262 obj,
263 dhcp_gbl_ctx.task,
264 &obj->fd);
...
275 }
276
277
278 /* Find the last I/O state, if there are any. */
279 for (p = omapi_io_states.next;
280 p && p -> next; p = p -> next)
281 ;
282 if (p)
283 omapi_io_reference (&p -> next, obj, MDL);
284 else
285 omapi_io_reference (&omapi_io_states.next, obj, MDL);
...
omapi_register_io_object() is called for each socket created, in this case, the if_readsocket from discover_interfaces(). The file descriptor is assigned ISC_SOCKFDWATCH_READ, and we enter the if statement.
The if statement calls isc_socket_fdwatchcreate(), which registers the socket with the ISC socket manager, and sets up the callback omapi_iscsock_cb(), to be called.
Once that has been done, we iterate over the omapi_io_states linked list, which is a global list of registered sockets. We get to the end of the list (or the beginning, if the list is empty), and add the socket to the list.
Now, the bug happens between calling isc_socket_fdwatchcreate() to register the socket with the socket manager, and adding it to the global linked list.
Sometimes, the callback omapi_iscsock_cb() is called inbetween.
omapip/dispatch.c
101 /*
102 * Callback routine to connect the omapi I/O object and socket with
103 * the isc socket code. The isc socket code will call this routine
104 * which will then call the correct local routine to process the bytes.
105 *
106 * Currently we are always willing to read more data, this should be modified
107 * so that on connections we don't read more if we already have enough.
108 *
109 * If we have more bytes to write we ask the library to call us when
110 * we can write more. If we indicate we don't have more to write we need
111 * to poke the library via isc_socket_fdwatchpoke.
112 */
...
118 #define SOCKDELETE 1
119 int
120 omapi_iscsock_cb(isc_task_t *task,
121 isc_socket_t *socket,
122 void *cbarg,
123 int flags)
124 {
...
132 #if SOCKDELETE
133 /*
134 * walk through the io states list, if our object is on there
135 * service it. if not ignore it.
136 */
137 for (obj = omapi_io_states.next; obj != NULL; obj = obj->next) {
138 if (obj == cbarg)
139 break;
140 }
141
142 if (obj == NULL) {
143 return(0);
144 }
145 #else
...
164 if ((flags == ISC_SOCKFDWATCH_READ) &&
165 (obj->reader != NULL) &&
166 (obj->inner != NULL)) {
167 status = obj->reader(obj->inner);
168 /*
169 * If we are shutting down (basically tried to
170 * read and got no bytes) we don't need to try
171 * again.
172 */
173 if (status == ISC_R_SHUTTINGDOWN)
174 return (0);
175 /* Otherwise We always ask for more when reading */
176 return (1);
...
188 /*
189 * We get here if we either had an error (inconsistent
190 * structures etc) or no more to write, tell the socket
191 * lib we don't have more to do right now.
192 */
193 return (0);
194 }
When omapi_iscsock_cb() is called, we iterate over that same omapi_io_states global linked list, and since the raw socket we are about to add is the very first item going to be placed on the list, omapi_io_states.next will be NULL, making obj set to NULL. Well what happens when obj is NULL? We return 0.
142 if (obj == NULL) {
143 return(0);
144 }
If you look down the code a little further, we see that if we are finished with the socket, like shutting down, or an error happens, we return 0 to indicate that we are done with this socket, and won't be needing it anymore, and won't be using it again.
If we return 1, then we ask for more data later on, and omapi_iscsock_cb() will be called again in the future.
So, by returning 0, we will no longer read any packets from the socket, and thus, we simply ignore and will never read any DHCPOFFER packets.
We obviously have two sockets open, one for reading and one for writing, since one is bound to port 67 and the other to port 68, so we keep sending out DHCPDISCOVER packets, not knowing we are ignoring all DHCPOFFERS due to the read socket being closed and not calling omapi_iscsock_cb() ever again.
So, as explained on the upstream bug, what is needed for this race condition to occur?
1) There needs to be data that can be read immediately by the socket. This is how omapi_iscsock_cb() is called, when there is data waiting. If there is no data to read, then the race can't happen.
2) Thread scheduling must occur in a particular order.
There are three threads:
a. The main thread, which is registering the socket
b. The socket manager coordinator thread
c. A socket manager worker thread
For the race to occur, we need to switch from a) to b) to schedule the callback, b) to c) to call the callback and return 0, all before we return from c) to a) to add the socket to the omapi_io_states linked list.
To verify this, I added some print statements to omapi_iscsock_cb() and omapi_register_io_object() to see what is happening:
https://paste.ubuntu.com/p/hGsssrVyG4/
I instrumented entry to the callback, as well as if obj is NULL, and before and after the socket is registered to the socket manager, and before and after the socket is added to the global linked list.
I built the new dhclient, and ran test-parallel.sh, since it by far is the best at reproducing the issue. I concatenated all 20 /tmp/dhclient-* logs to read:
https://paste.ubuntu.com/p/VddddSmdMV/
If we look at one that works and gets the DHCPOFFER, we see:
https://paste.ubuntu.com/p/dfdvNM3RDd/
We have two sockets, one that reads and one that writes. It registers with the socket manager, adds the socket to the linked list, for both sockets, one after the other, and then the DHCPDISOVER is set, and then callbacks are called.
Now, when the issue reproduces:
https://paste.ubuntu.com/p/77BNNhDwdh/
We see that we register the read socket with the socket manager, and immediately get a callback to omapi_iscsock_cb(), so there is likely a packet to read already. We return NULL, since the socket is not on the linked list yet, and strangely this happens twice when we have only entered omapi_iscsock_cb() once. We then add the socket to the linked list, showing we have hit the race condition. The writer socket is added to the socket manager, but does not race, as it is added to the linked list before the callback happens. The callback only seems to be called for the writer socket, and we ignore all DHCPOFFER packets, only sending DHCPDISCOVER packets.
I ran the test-parallel.sh script a few times, and each and every time the issue reproduced, we had a NULL linked list, and returned 0 from omapi_iscsock_cb(). So I am confident we have come across the root cause.
Now, I had an idea of adding a sleep between registering to the socket manager and adding the socket to the linked list, to try and get every single agent to fail 100% of the time, hoping that the callback would be called first.
The change is something like the likes of below:
https://paste.ubuntu.com/p/8zJ2FQK99X/
But after I built it and ran it a few times, I couldn't reproduce the issue at all. A typical run is below:
https://paste.ubuntu.com/p/k6wRhx4RCM/
We can see that everything happens in the correct order, and the issue is not reproduced. I think because each dhclient has to wait 2 seconds before sending a DHCPDISCOVER, the network is quiet, and there is no packet to read immediately, since neighboring agents are also quiet, so the callback is not called, and the socket is added to the linked list before the network gets busy.
The code itself to add the socket to the linked list was added in the below commit, 22 years ago:
commit 61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
From: Ted Lemon <source@isc.org>
Date: Thu, 2 Sep 1999 00:32:56 +0000
Subject: Omapi library, initial checkin
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
Later on, the registration to the socket manager and the callback was added 13 years ago:
commit 98bf16077d22f28e288a18e184a9d1f97cb5f4f7
From: Shawn Routhier <sar@isc.org>
Date: Wed, 28 Oct 2009 04:12:30 +0000
Subject: Support for asynchronous ddns per ticket 19216 - convert to
using isclib and dnslib.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/98bf16077d22f28e288a18e184a9d1f97cb5f4f7
Finally, the null linked list check in the callback that returns 0 was also added 13 years ago:
commit 8fa0112dd11ee301046b42fe463074b067e61a35
From: Shawn Routhier <sar@isc.org>
Date: Thu, 19 Nov 2009 23:49:57 +0000
Subject: Ticket 20540 - clean ups for DHCP-DDNS work, add an change
dropped in the merge, arrange to handle callbacks from the socket code after
we've deleted the socket and deal with the quantum issue in the tasks. This
last will be removed shortly as a better fix is now available.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/8fa0112dd11ee301046b42fe463074b067e61a35
This code and thus the bug is present in all versions of Ubuntu from 10.04 LTS onward.
However, I could not reproduce on Bionic, Kinetic or Lunar, so I looked into why this only seems to affect Focal and Jammy, when all versions share the same buggy code.
It appears that Focal and Jammy dhclient is multithreaded:
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
While on Bionic, Kinetic and Lunar, it presents only a single thread:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
You can't have thread concurrency issues if you run as a single thread. The race conditions simply did not happen, since everything had to execute in order, which is why this bug has existed for 13 years in its current form.
Chris Patterson in comment #2 mentions when he built dhclient from the latest Debian source, available here:
https://salsa.debian.org/debian/isc-dhcp/-/commits/master/debian
Chris mentions that it solves the problem, and that it uses in-tree bind libraries. Looking into this, we see the tree has bind 9.11.36:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/version.tmp
and its makefile explicitly shows threads being disabled:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/Makefile.in
bindconfig =
...
--disable-threads
...
bind 9.11.36 still supports disabling threads. Hence this is how the upstream debian build did not reproduce the issue on Focal, since threading was disabled and dhclient ran as a single thread.
On Kinetic and Lunar, the isc-dhcp package seems to track the debian upstream version, with the very same in-tree bind libraries, set to --disable-threads, which is why it is fixed in those versions.
So, we have two options for a fix for Focal and Jammy:
1) We disable threading for dhclient.
2) We add in a mutex to resolve this particular concurrency issue.
2) was the plan, but when I think about it, if dhclient has such a basic thread concurrency issue, and has had it for 13 years without anyone noticing, it is a fair chance that this will not be the only thread concurrency issue.
Looking at the source code, there is no mutexes or semaphores. pthread isn't even a dependency, and it is not linked in. There is no thread synchronisation primitives at all.
I think if we fix the problem, another issue will crop up in six months time, and it will be another concurrency issue.
Now, there is a small issue that isc-dhcp is now officially End Of Life, and has effectively been abandoned by upstream. You can read about it in these notices:
https://lists.isc.org/pipermail/dhcp-users/2022-October/022786.html
https://www.isc.org/blogs/isc-dhcp-eol/
Upstream won't fix any more bugs, make any new releases, or even accept any new commits. They are putting their efforts into isc-kea now.
In which case, 1) to disabling threading seems better.
But it seems isc-dhcp on Focal and Jammy are both incompatible with their native bind9 packages, and instead, use the libraries from bind9-libs, a separate package that packages 9.11.16 for Focal and 9.11.19 for Jammy, to work around the incompatibility issue.
https://launchpad.net/ubuntu/+source/bind9-libs
Looking at the dependencies for isc-dhcp in Focal:
$ apt depends isc-dhcp-client
isc-dhcp-client
Depends: libc6 (>= 2.15)
Depends: libdns-export1109
Depends: libisc-export1105
Depends: debianutils (>= 2.8.2)
Depends: iproute2
Recommends: isc-dhcp-common
Suggests: resolvconf
openresolv
Suggests: avahi-autoipd
Suggests: isc-dhcp-client-ddns
We see we depend on libdns-export1109 and libisc-export1105 from bind9-libs.
Looking at those reverse dependencies:
$ apt rdepends libisc-export1105
libisc-export1105
Reverse Depends:
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~ubuntu1)
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~build1)
Depends: isc-dhcp-server
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: isc-dhcp-client
Depends: isc-dhcp-server
It seems it is only isc-dhcp is a reverse dependency.
So it appears perfectly safe to change bind9-libs from --enable-threads to --disable-threads.
Looking at their debian/rules files:
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/focal-proposed
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/jammy-devel
We see:
CONFIGURE_COMMON := \
...
--disable-threads \
...
--enable-threads \
...
It appears there has been a mistake when setting up the configure options, and both --disable-threads and --enable-threads is set. Because of the way configure is set up --enable-threads will always win.
The fix would be a one line change, to remove --enable-threads.
Since isc-dhcp is dynamically linked to libdns-export1109 and libisc-export1105, we would probably not even need to rebuild isc-dhcp, and instead, only a fix to bind9-libs would be needed. |
[Impact]
Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable.
This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud-init instead of systemd-networkd, and this is causing issues with larger deployments.
The logs of an affected dhclient produce the following:
Listening on LPF/enp1s0/52:54:00:1c:d7:00
Sending on LPF/enp1s0/52:54:00:1c:d7:00
Sending on Socket/fallback
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
...
(omitting 20 similar lines)
...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/
The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up.
tcpdump: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641810/+files/test.pcap
Screenshot of Wireshark: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png
This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies.
The full explanation is in the "Other Info" section, but the fix for this is to change bind9-libs from being built multithreaded, back to single threaded as intended by dhclient maintainers.
In Focal and Jammy, isc-dhcp links against bind9 libraries provided in bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9 library it uses, which is already configured properly to --disable-threads.
Change the Focal and Jammy bind9-libs to --disable-threads and update symbol files to reflect the library is single threaded again.
[Testcase]
Start a fresh Focal or Jammy instance.
Download and set executable test-parallel.sh, and edit some lines:
1) wget https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh
2) chmod +x test-parallel.sh
3) vim test-parallel.sh
Change iface="enp5s0" to your interface, likely iface="enp1s0".
Comment out the line "# cp bionic-dhclient $workdir/dhclient".
4) sudo ./test-parallel.sh
After five minutes, if you issue reproduces, you will see "TEST FAILED".
You can watch the output with:
5) cat /tmp/dhclient-* | less
Next, for instrumented runs, you need to build dhclient from source.
1) sudo apt install build-essential devscripts
2) apt source isc-dhcp
3) sudo apt build-dep isc-dhcp
4) cd isc-dhcp
Apply the below patch:
https://paste.ubuntu.com/p/hGsssrVyG4/
5) patch -p1 < ~/patch.patch
6) debuild -b -uc -us
7) cd ..
8) sudo dpkg -i isc-dhcp-client-*
9) sudo ./test-parallel.sh
10) cat /tmp/dhclient-* | less
Look for the race, as described in "Other Info", namely:
mruffell: registering with socket manager
mruffell: callback called
mruffell: omapi object is NULL
mruffell: omapi object is NULL
mruffell: Adding obj to linked list
mruffell: Obj added to list
The issue has reproduced.
If you install the test package from the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf337873-test
Instructions to install (on a Focal or Jammy system):
1) sudo add-apt-repository ppa:mruffell/sf337873-test
2) sudo apt update
3) sudo apt install libdns-export1109 libisc-export1105
4) sudo apt-cache policy libisc-export1105 | grep Installed
Installed: 1:9.11.16+dfsg-3~ubuntu1+sf337873v20230116b2
You will notice the issue no longer reproduces.
You can also check to see if dhclient runs as multithreaded by:
$ ps aux | grep dhclient
<look for PID>
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
When single threaded, the output should be:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
[Where problems could occur]
We are changing the configuration of bind9-libs for Focal and Jammy from a net --enable-threads, to --disable-threads. This will remove multithreading for bind9-libs and all applications that link to it.
bind9-libs only services isc-dhcp, as per apt rdepends in the "Other Info" section, and the package was only introduced as a way to get isc-dhcp to build with older bind9 libs, and avoid having bind9 in-tree.
In Kinetic onward, the bind9 libs are shipped in-tree for isc-dhcp, and bind9-libs was removed from the archive.
Since we are only affecting isc-dhcp, we won't affect users of other packages, unless they for some reason decided to link against bind9-libs instead of the actual bind9 in the archive.
dhclient was never written for thread concurrency, shown by the lack of any synchronisation primitives whatsoever, and won't be negatively affected by going back to being single threaded, as it is in all other Ubuntu releases.
We are editing the symbol files to remove exported library calls. I have manually grep'd the isc-dhcp source, and those functions are not used, but for any users that got confused and link against bind9-libs, they might be affected by the change. For them, they need to link against the actual bind9 in the archive.
[Other info]
I was reading around the upstream issue trackers, and found the following two bug reports:
https://gitlab.isc.org/isc-projects/dhcp/-/issues/264
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996356
The ISC upstream report was actually quite detailed, and it has the same symptoms of what we are experiencing.
Let's have a look at the root cause. The code I am using is isc-dhcp
4.4.1-2.1ubuntu5.20.04.4 from Focal.
common/discover.c
567 void
568 discover_interfaces(int state) {
...
1002 case AF_INET:
1003 default:
1004 status = omapi_register_io_object((omapi_object_t *)tmp,
1005 if_readsocket,
1006 0, got_one, 0, 0);
1007 break;
1008 }
...
In discover.c, we call discover_interfaces() to iterate over the interfaces, and attempt to register a raw socket against it. We do this by calling omapi_register_io_object() which is used for reading data, and calls the elusive got_one() callback that you instrumented your code to see if it was being called or not.
omapip/dispatch.c
196 /* Register an I/O handle so that we can do asynchronous I/O on it. */
197
198 isc_result_t omapi_register_io_object (omapi_object_t *h,
199 int (*readfd) (omapi_object_t *),
200 int (*writefd) (omapi_object_t *),
201 isc_result_t (*reader)
202 (omapi_object_t *),
203 isc_result_t (*writer)
204 (omapi_object_t *),
205 isc_result_t (*reaper)
206 (omapi_object_t *))
207 {
...
241 /*
242 * Attach the I/O object to the isc socket library via the
243 * fdwatch function. This allows the socket library to watch
244 * over a socket that we built. If there are both a read and
245 * a write socket we asssume they are the same socket.
246 */
247
248 if (readfd) {
249 fd_flags |= ISC_SOCKFDWATCH_READ;
250 fd = readfd(h);
251 }
...
257
258 if (fd_flags != 0) {
259 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
260 fd, fd_flags,
261 omapi_iscsock_cb,
262 obj,
263 dhcp_gbl_ctx.task,
264 &obj->fd);
...
275 }
276
277
278 /* Find the last I/O state, if there are any. */
279 for (p = omapi_io_states.next;
280 p && p -> next; p = p -> next)
281 ;
282 if (p)
283 omapi_io_reference (&p -> next, obj, MDL);
284 else
285 omapi_io_reference (&omapi_io_states.next, obj, MDL);
...
omapi_register_io_object() is called for each socket created, in this case, the if_readsocket from discover_interfaces(). The file descriptor is assigned ISC_SOCKFDWATCH_READ, and we enter the if statement.
The if statement calls isc_socket_fdwatchcreate(), which registers the socket with the ISC socket manager, and sets up the callback omapi_iscsock_cb(), to be called.
Once that has been done, we iterate over the omapi_io_states linked list, which is a global list of registered sockets. We get to the end of the list (or the beginning, if the list is empty), and add the socket to the list.
Now, the bug happens between calling isc_socket_fdwatchcreate() to register the socket with the socket manager, and adding it to the global linked list.
Sometimes, the callback omapi_iscsock_cb() is called inbetween.
omapip/dispatch.c
101 /*
102 * Callback routine to connect the omapi I/O object and socket with
103 * the isc socket code. The isc socket code will call this routine
104 * which will then call the correct local routine to process the bytes.
105 *
106 * Currently we are always willing to read more data, this should be modified
107 * so that on connections we don't read more if we already have enough.
108 *
109 * If we have more bytes to write we ask the library to call us when
110 * we can write more. If we indicate we don't have more to write we need
111 * to poke the library via isc_socket_fdwatchpoke.
112 */
...
118 #define SOCKDELETE 1
119 int
120 omapi_iscsock_cb(isc_task_t *task,
121 isc_socket_t *socket,
122 void *cbarg,
123 int flags)
124 {
...
132 #if SOCKDELETE
133 /*
134 * walk through the io states list, if our object is on there
135 * service it. if not ignore it.
136 */
137 for (obj = omapi_io_states.next; obj != NULL; obj = obj->next) {
138 if (obj == cbarg)
139 break;
140 }
141
142 if (obj == NULL) {
143 return(0);
144 }
145 #else
...
164 if ((flags == ISC_SOCKFDWATCH_READ) &&
165 (obj->reader != NULL) &&
166 (obj->inner != NULL)) {
167 status = obj->reader(obj->inner);
168 /*
169 * If we are shutting down (basically tried to
170 * read and got no bytes) we don't need to try
171 * again.
172 */
173 if (status == ISC_R_SHUTTINGDOWN)
174 return (0);
175 /* Otherwise We always ask for more when reading */
176 return (1);
...
188 /*
189 * We get here if we either had an error (inconsistent
190 * structures etc) or no more to write, tell the socket
191 * lib we don't have more to do right now.
192 */
193 return (0);
194 }
When omapi_iscsock_cb() is called, we iterate over that same omapi_io_states global linked list, and since the raw socket we are about to add is the very first item going to be placed on the list, omapi_io_states.next will be NULL, making obj set to NULL. Well what happens when obj is NULL? We return 0.
142 if (obj == NULL) {
143 return(0);
144 }
If you look down the code a little further, we see that if we are finished with the socket, like shutting down, or an error happens, we return 0 to indicate that we are done with this socket, and won't be needing it anymore, and won't be using it again.
If we return 1, then we ask for more data later on, and omapi_iscsock_cb() will be called again in the future.
So, by returning 0, we will no longer read any packets from the socket, and thus, we simply ignore and will never read any DHCPOFFER packets.
We obviously have two sockets open, one for reading and one for writing, since one is bound to port 67 and the other to port 68, so we keep sending out DHCPDISCOVER packets, not knowing we are ignoring all DHCPOFFERS due to the read socket being closed and not calling omapi_iscsock_cb() ever again.
So, as explained on the upstream bug, what is needed for this race condition to occur?
1) There needs to be data that can be read immediately by the socket. This is how omapi_iscsock_cb() is called, when there is data waiting. If there is no data to read, then the race can't happen.
2) Thread scheduling must occur in a particular order.
There are three threads:
a. The main thread, which is registering the socket
b. The socket manager coordinator thread
c. A socket manager worker thread
For the race to occur, we need to switch from a) to b) to schedule the callback, b) to c) to call the callback and return 0, all before we return from c) to a) to add the socket to the omapi_io_states linked list.
To verify this, I added some print statements to omapi_iscsock_cb() and omapi_register_io_object() to see what is happening:
https://paste.ubuntu.com/p/hGsssrVyG4/
I instrumented entry to the callback, as well as if obj is NULL, and before and after the socket is registered to the socket manager, and before and after the socket is added to the global linked list.
I built the new dhclient, and ran test-parallel.sh, since it by far is the best at reproducing the issue. I concatenated all 20 /tmp/dhclient-* logs to read:
https://paste.ubuntu.com/p/VddddSmdMV/
If we look at one that works and gets the DHCPOFFER, we see:
https://paste.ubuntu.com/p/dfdvNM3RDd/
We have two sockets, one that reads and one that writes. It registers with the socket manager, adds the socket to the linked list, for both sockets, one after the other, and then the DHCPDISOVER is set, and then callbacks are called.
Now, when the issue reproduces:
https://paste.ubuntu.com/p/77BNNhDwdh/
We see that we register the read socket with the socket manager, and immediately get a callback to omapi_iscsock_cb(), so there is likely a packet to read already. We return NULL, since the socket is not on the linked list yet, and strangely this happens twice when we have only entered omapi_iscsock_cb() once. We then add the socket to the linked list, showing we have hit the race condition. The writer socket is added to the socket manager, but does not race, as it is added to the linked list before the callback happens. The callback only seems to be called for the writer socket, and we ignore all DHCPOFFER packets, only sending DHCPDISCOVER packets.
I ran the test-parallel.sh script a few times, and each and every time the issue reproduced, we had a NULL linked list, and returned 0 from omapi_iscsock_cb(). So I am confident we have come across the root cause.
Now, I had an idea of adding a sleep between registering to the socket manager and adding the socket to the linked list, to try and get every single agent to fail 100% of the time, hoping that the callback would be called first.
The change is something like the likes of below:
https://paste.ubuntu.com/p/8zJ2FQK99X/
But after I built it and ran it a few times, I couldn't reproduce the issue at all. A typical run is below:
https://paste.ubuntu.com/p/k6wRhx4RCM/
We can see that everything happens in the correct order, and the issue is not reproduced. I think because each dhclient has to wait 2 seconds before sending a DHCPDISCOVER, the network is quiet, and there is no packet to read immediately, since neighboring agents are also quiet, so the callback is not called, and the socket is added to the linked list before the network gets busy.
The code itself to add the socket to the linked list was added in the below commit, 22 years ago:
commit 61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
From: Ted Lemon <source@isc.org>
Date: Thu, 2 Sep 1999 00:32:56 +0000
Subject: Omapi library, initial checkin
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
Later on, the registration to the socket manager and the callback was added 13 years ago:
commit 98bf16077d22f28e288a18e184a9d1f97cb5f4f7
From: Shawn Routhier <sar@isc.org>
Date: Wed, 28 Oct 2009 04:12:30 +0000
Subject: Support for asynchronous ddns per ticket 19216 - convert to
using isclib and dnslib.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/98bf16077d22f28e288a18e184a9d1f97cb5f4f7
Finally, the null linked list check in the callback that returns 0 was also added 13 years ago:
commit 8fa0112dd11ee301046b42fe463074b067e61a35
From: Shawn Routhier <sar@isc.org>
Date: Thu, 19 Nov 2009 23:49:57 +0000
Subject: Ticket 20540 - clean ups for DHCP-DDNS work, add an change
dropped in the merge, arrange to handle callbacks from the socket code after
we've deleted the socket and deal with the quantum issue in the tasks. This
last will be removed shortly as a better fix is now available.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/8fa0112dd11ee301046b42fe463074b067e61a35
This code and thus the bug is present in all versions of Ubuntu from 10.04 LTS onward.
However, I could not reproduce on Bionic, Kinetic or Lunar, so I looked into why this only seems to affect Focal and Jammy, when all versions share the same buggy code.
It appears that Focal and Jammy dhclient is multithreaded:
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
While on Bionic, Kinetic and Lunar, it presents only a single thread:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
You can't have thread concurrency issues if you run as a single thread. The race conditions simply did not happen, since everything had to execute in order, which is why this bug has existed for 13 years in its current form.
Chris Patterson in comment #2 mentions when he built dhclient from the latest Debian source, available here:
https://salsa.debian.org/debian/isc-dhcp/-/commits/master/debian
Chris mentions that it solves the problem, and that it uses in-tree bind libraries. Looking into this, we see the tree has bind 9.11.36:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/version.tmp
and its makefile explicitly shows threads being disabled:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/Makefile.in
bindconfig =
...
--disable-threads
...
bind 9.11.36 still supports disabling threads. Hence this is how the upstream debian build did not reproduce the issue on Focal, since threading was disabled and dhclient ran as a single thread.
On Kinetic and Lunar, the isc-dhcp package seems to track the debian upstream version, with the very same in-tree bind libraries, set to --disable-threads, which is why it is fixed in those versions.
So, we have two options for a fix for Focal and Jammy:
1) We disable threading for dhclient.
2) We add in a mutex to resolve this particular concurrency issue.
2) was the plan, but when I think about it, if dhclient has such a basic thread concurrency issue, and has had it for 13 years without anyone noticing, it is a fair chance that this will not be the only thread concurrency issue.
Looking at the source code, there is no mutexes or semaphores. pthread isn't even a dependency, and it is not linked in. There is no thread synchronisation primitives at all.
I think if we fix the problem, another issue will crop up in six months time, and it will be another concurrency issue.
Now, there is a small issue that isc-dhcp is now officially End Of Life, and has effectively been abandoned by upstream. You can read about it in these notices:
https://lists.isc.org/pipermail/dhcp-users/2022-October/022786.html
https://www.isc.org/blogs/isc-dhcp-eol/
Upstream won't fix any more bugs, make any new releases, or even accept any new commits. They are putting their efforts into isc-kea now.
In which case, 1) to disabling threading seems better.
But it seems isc-dhcp on Focal and Jammy are both incompatible with their native bind9 packages, and instead, use the libraries from bind9-libs, a separate package that packages 9.11.16 for Focal and 9.11.19 for Jammy, to work around the incompatibility issue.
https://launchpad.net/ubuntu/+source/bind9-libs
Looking at the dependencies for isc-dhcp in Focal:
$ apt depends isc-dhcp-client
isc-dhcp-client
Depends: libc6 (>= 2.15)
Depends: libdns-export1109
Depends: libisc-export1105
Depends: debianutils (>= 2.8.2)
Depends: iproute2
Recommends: isc-dhcp-common
Suggests: resolvconf
openresolv
Suggests: avahi-autoipd
Suggests: isc-dhcp-client-ddns
We see we depend on libdns-export1109 and libisc-export1105 from bind9-libs.
Looking at those reverse dependencies:
$ apt rdepends libisc-export1105
libisc-export1105
Reverse Depends:
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~ubuntu1)
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~build1)
Depends: isc-dhcp-server
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: isc-dhcp-client
Depends: isc-dhcp-server
It seems it is only isc-dhcp is a reverse dependency.
So it appears perfectly safe to change bind9-libs from --enable-threads to --disable-threads.
Looking at their debian/rules files:
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/focal-proposed
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/jammy-devel
We see:
CONFIGURE_COMMON := \
...
--disable-threads \
...
--enable-threads \
...
It appears there has been a mistake when setting up the configure options, and both --disable-threads and --enable-threads is set. Because of the way configure is set up --enable-threads will always win.
The fix would be a one line change, to remove --enable-threads.
Since isc-dhcp is dynamically linked to libdns-export1109 and libisc-export1105, we would probably not even need to rebuild isc-dhcp, and instead, only a fix to bind9-libs would be needed. |
|
2023-01-17 04:04:32 |
Matthew Ruffell |
tags |
focal jammy sts |
focal jammy sts sts-sponsor |
|
2023-01-17 12:19:17 |
Mauricio Faria de Oliveira |
bug |
|
|
added subscriber SE SRU ("STS") Sponsors |
2023-01-17 12:19:37 |
Mauricio Faria de Oliveira |
tags |
focal jammy sts sts-sponsor |
focal jammy sts |
|
2023-01-17 21:13:25 |
Dexuan Cui |
bug |
|
|
added subscriber Dexuan Cui |
2023-01-19 03:53:59 |
Mauricio Faria de Oliveira |
attachment added |
|
lp1926139_focal_isc-dhcp.debdiff https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5642442/+files/lp1926139_focal_isc-dhcp.debdiff |
|
2023-01-25 01:57:49 |
Matthew Ruffell |
description |
[Impact]
Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable.
This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud-init instead of systemd-networkd, and this is causing issues with larger deployments.
The logs of an affected dhclient produce the following:
Listening on LPF/enp1s0/52:54:00:1c:d7:00
Sending on LPF/enp1s0/52:54:00:1c:d7:00
Sending on Socket/fallback
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
...
(omitting 20 similar lines)
...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/
The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up.
tcpdump: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641810/+files/test.pcap
Screenshot of Wireshark: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png
This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies.
The full explanation is in the "Other Info" section, but the fix for this is to change bind9-libs from being built multithreaded, back to single threaded as intended by dhclient maintainers.
In Focal and Jammy, isc-dhcp links against bind9 libraries provided in bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9 library it uses, which is already configured properly to --disable-threads.
Change the Focal and Jammy bind9-libs to --disable-threads and update symbol files to reflect the library is single threaded again.
[Testcase]
Start a fresh Focal or Jammy instance.
Download and set executable test-parallel.sh, and edit some lines:
1) wget https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh
2) chmod +x test-parallel.sh
3) vim test-parallel.sh
Change iface="enp5s0" to your interface, likely iface="enp1s0".
Comment out the line "# cp bionic-dhclient $workdir/dhclient".
4) sudo ./test-parallel.sh
After five minutes, if you issue reproduces, you will see "TEST FAILED".
You can watch the output with:
5) cat /tmp/dhclient-* | less
Next, for instrumented runs, you need to build dhclient from source.
1) sudo apt install build-essential devscripts
2) apt source isc-dhcp
3) sudo apt build-dep isc-dhcp
4) cd isc-dhcp
Apply the below patch:
https://paste.ubuntu.com/p/hGsssrVyG4/
5) patch -p1 < ~/patch.patch
6) debuild -b -uc -us
7) cd ..
8) sudo dpkg -i isc-dhcp-client-*
9) sudo ./test-parallel.sh
10) cat /tmp/dhclient-* | less
Look for the race, as described in "Other Info", namely:
mruffell: registering with socket manager
mruffell: callback called
mruffell: omapi object is NULL
mruffell: omapi object is NULL
mruffell: Adding obj to linked list
mruffell: Obj added to list
The issue has reproduced.
If you install the test package from the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf337873-test
Instructions to install (on a Focal or Jammy system):
1) sudo add-apt-repository ppa:mruffell/sf337873-test
2) sudo apt update
3) sudo apt install libdns-export1109 libisc-export1105
4) sudo apt-cache policy libisc-export1105 | grep Installed
Installed: 1:9.11.16+dfsg-3~ubuntu1+sf337873v20230116b2
You will notice the issue no longer reproduces.
You can also check to see if dhclient runs as multithreaded by:
$ ps aux | grep dhclient
<look for PID>
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
When single threaded, the output should be:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
[Where problems could occur]
We are changing the configuration of bind9-libs for Focal and Jammy from a net --enable-threads, to --disable-threads. This will remove multithreading for bind9-libs and all applications that link to it.
bind9-libs only services isc-dhcp, as per apt rdepends in the "Other Info" section, and the package was only introduced as a way to get isc-dhcp to build with older bind9 libs, and avoid having bind9 in-tree.
In Kinetic onward, the bind9 libs are shipped in-tree for isc-dhcp, and bind9-libs was removed from the archive.
Since we are only affecting isc-dhcp, we won't affect users of other packages, unless they for some reason decided to link against bind9-libs instead of the actual bind9 in the archive.
dhclient was never written for thread concurrency, shown by the lack of any synchronisation primitives whatsoever, and won't be negatively affected by going back to being single threaded, as it is in all other Ubuntu releases.
We are editing the symbol files to remove exported library calls. I have manually grep'd the isc-dhcp source, and those functions are not used, but for any users that got confused and link against bind9-libs, they might be affected by the change. For them, they need to link against the actual bind9 in the archive.
[Other info]
I was reading around the upstream issue trackers, and found the following two bug reports:
https://gitlab.isc.org/isc-projects/dhcp/-/issues/264
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996356
The ISC upstream report was actually quite detailed, and it has the same symptoms of what we are experiencing.
Let's have a look at the root cause. The code I am using is isc-dhcp
4.4.1-2.1ubuntu5.20.04.4 from Focal.
common/discover.c
567 void
568 discover_interfaces(int state) {
...
1002 case AF_INET:
1003 default:
1004 status = omapi_register_io_object((omapi_object_t *)tmp,
1005 if_readsocket,
1006 0, got_one, 0, 0);
1007 break;
1008 }
...
In discover.c, we call discover_interfaces() to iterate over the interfaces, and attempt to register a raw socket against it. We do this by calling omapi_register_io_object() which is used for reading data, and calls the elusive got_one() callback that you instrumented your code to see if it was being called or not.
omapip/dispatch.c
196 /* Register an I/O handle so that we can do asynchronous I/O on it. */
197
198 isc_result_t omapi_register_io_object (omapi_object_t *h,
199 int (*readfd) (omapi_object_t *),
200 int (*writefd) (omapi_object_t *),
201 isc_result_t (*reader)
202 (omapi_object_t *),
203 isc_result_t (*writer)
204 (omapi_object_t *),
205 isc_result_t (*reaper)
206 (omapi_object_t *))
207 {
...
241 /*
242 * Attach the I/O object to the isc socket library via the
243 * fdwatch function. This allows the socket library to watch
244 * over a socket that we built. If there are both a read and
245 * a write socket we asssume they are the same socket.
246 */
247
248 if (readfd) {
249 fd_flags |= ISC_SOCKFDWATCH_READ;
250 fd = readfd(h);
251 }
...
257
258 if (fd_flags != 0) {
259 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
260 fd, fd_flags,
261 omapi_iscsock_cb,
262 obj,
263 dhcp_gbl_ctx.task,
264 &obj->fd);
...
275 }
276
277
278 /* Find the last I/O state, if there are any. */
279 for (p = omapi_io_states.next;
280 p && p -> next; p = p -> next)
281 ;
282 if (p)
283 omapi_io_reference (&p -> next, obj, MDL);
284 else
285 omapi_io_reference (&omapi_io_states.next, obj, MDL);
...
omapi_register_io_object() is called for each socket created, in this case, the if_readsocket from discover_interfaces(). The file descriptor is assigned ISC_SOCKFDWATCH_READ, and we enter the if statement.
The if statement calls isc_socket_fdwatchcreate(), which registers the socket with the ISC socket manager, and sets up the callback omapi_iscsock_cb(), to be called.
Once that has been done, we iterate over the omapi_io_states linked list, which is a global list of registered sockets. We get to the end of the list (or the beginning, if the list is empty), and add the socket to the list.
Now, the bug happens between calling isc_socket_fdwatchcreate() to register the socket with the socket manager, and adding it to the global linked list.
Sometimes, the callback omapi_iscsock_cb() is called inbetween.
omapip/dispatch.c
101 /*
102 * Callback routine to connect the omapi I/O object and socket with
103 * the isc socket code. The isc socket code will call this routine
104 * which will then call the correct local routine to process the bytes.
105 *
106 * Currently we are always willing to read more data, this should be modified
107 * so that on connections we don't read more if we already have enough.
108 *
109 * If we have more bytes to write we ask the library to call us when
110 * we can write more. If we indicate we don't have more to write we need
111 * to poke the library via isc_socket_fdwatchpoke.
112 */
...
118 #define SOCKDELETE 1
119 int
120 omapi_iscsock_cb(isc_task_t *task,
121 isc_socket_t *socket,
122 void *cbarg,
123 int flags)
124 {
...
132 #if SOCKDELETE
133 /*
134 * walk through the io states list, if our object is on there
135 * service it. if not ignore it.
136 */
137 for (obj = omapi_io_states.next; obj != NULL; obj = obj->next) {
138 if (obj == cbarg)
139 break;
140 }
141
142 if (obj == NULL) {
143 return(0);
144 }
145 #else
...
164 if ((flags == ISC_SOCKFDWATCH_READ) &&
165 (obj->reader != NULL) &&
166 (obj->inner != NULL)) {
167 status = obj->reader(obj->inner);
168 /*
169 * If we are shutting down (basically tried to
170 * read and got no bytes) we don't need to try
171 * again.
172 */
173 if (status == ISC_R_SHUTTINGDOWN)
174 return (0);
175 /* Otherwise We always ask for more when reading */
176 return (1);
...
188 /*
189 * We get here if we either had an error (inconsistent
190 * structures etc) or no more to write, tell the socket
191 * lib we don't have more to do right now.
192 */
193 return (0);
194 }
When omapi_iscsock_cb() is called, we iterate over that same omapi_io_states global linked list, and since the raw socket we are about to add is the very first item going to be placed on the list, omapi_io_states.next will be NULL, making obj set to NULL. Well what happens when obj is NULL? We return 0.
142 if (obj == NULL) {
143 return(0);
144 }
If you look down the code a little further, we see that if we are finished with the socket, like shutting down, or an error happens, we return 0 to indicate that we are done with this socket, and won't be needing it anymore, and won't be using it again.
If we return 1, then we ask for more data later on, and omapi_iscsock_cb() will be called again in the future.
So, by returning 0, we will no longer read any packets from the socket, and thus, we simply ignore and will never read any DHCPOFFER packets.
We obviously have two sockets open, one for reading and one for writing, since one is bound to port 67 and the other to port 68, so we keep sending out DHCPDISCOVER packets, not knowing we are ignoring all DHCPOFFERS due to the read socket being closed and not calling omapi_iscsock_cb() ever again.
So, as explained on the upstream bug, what is needed for this race condition to occur?
1) There needs to be data that can be read immediately by the socket. This is how omapi_iscsock_cb() is called, when there is data waiting. If there is no data to read, then the race can't happen.
2) Thread scheduling must occur in a particular order.
There are three threads:
a. The main thread, which is registering the socket
b. The socket manager coordinator thread
c. A socket manager worker thread
For the race to occur, we need to switch from a) to b) to schedule the callback, b) to c) to call the callback and return 0, all before we return from c) to a) to add the socket to the omapi_io_states linked list.
To verify this, I added some print statements to omapi_iscsock_cb() and omapi_register_io_object() to see what is happening:
https://paste.ubuntu.com/p/hGsssrVyG4/
I instrumented entry to the callback, as well as if obj is NULL, and before and after the socket is registered to the socket manager, and before and after the socket is added to the global linked list.
I built the new dhclient, and ran test-parallel.sh, since it by far is the best at reproducing the issue. I concatenated all 20 /tmp/dhclient-* logs to read:
https://paste.ubuntu.com/p/VddddSmdMV/
If we look at one that works and gets the DHCPOFFER, we see:
https://paste.ubuntu.com/p/dfdvNM3RDd/
We have two sockets, one that reads and one that writes. It registers with the socket manager, adds the socket to the linked list, for both sockets, one after the other, and then the DHCPDISOVER is set, and then callbacks are called.
Now, when the issue reproduces:
https://paste.ubuntu.com/p/77BNNhDwdh/
We see that we register the read socket with the socket manager, and immediately get a callback to omapi_iscsock_cb(), so there is likely a packet to read already. We return NULL, since the socket is not on the linked list yet, and strangely this happens twice when we have only entered omapi_iscsock_cb() once. We then add the socket to the linked list, showing we have hit the race condition. The writer socket is added to the socket manager, but does not race, as it is added to the linked list before the callback happens. The callback only seems to be called for the writer socket, and we ignore all DHCPOFFER packets, only sending DHCPDISCOVER packets.
I ran the test-parallel.sh script a few times, and each and every time the issue reproduced, we had a NULL linked list, and returned 0 from omapi_iscsock_cb(). So I am confident we have come across the root cause.
Now, I had an idea of adding a sleep between registering to the socket manager and adding the socket to the linked list, to try and get every single agent to fail 100% of the time, hoping that the callback would be called first.
The change is something like the likes of below:
https://paste.ubuntu.com/p/8zJ2FQK99X/
But after I built it and ran it a few times, I couldn't reproduce the issue at all. A typical run is below:
https://paste.ubuntu.com/p/k6wRhx4RCM/
We can see that everything happens in the correct order, and the issue is not reproduced. I think because each dhclient has to wait 2 seconds before sending a DHCPDISCOVER, the network is quiet, and there is no packet to read immediately, since neighboring agents are also quiet, so the callback is not called, and the socket is added to the linked list before the network gets busy.
The code itself to add the socket to the linked list was added in the below commit, 22 years ago:
commit 61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
From: Ted Lemon <source@isc.org>
Date: Thu, 2 Sep 1999 00:32:56 +0000
Subject: Omapi library, initial checkin
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
Later on, the registration to the socket manager and the callback was added 13 years ago:
commit 98bf16077d22f28e288a18e184a9d1f97cb5f4f7
From: Shawn Routhier <sar@isc.org>
Date: Wed, 28 Oct 2009 04:12:30 +0000
Subject: Support for asynchronous ddns per ticket 19216 - convert to
using isclib and dnslib.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/98bf16077d22f28e288a18e184a9d1f97cb5f4f7
Finally, the null linked list check in the callback that returns 0 was also added 13 years ago:
commit 8fa0112dd11ee301046b42fe463074b067e61a35
From: Shawn Routhier <sar@isc.org>
Date: Thu, 19 Nov 2009 23:49:57 +0000
Subject: Ticket 20540 - clean ups for DHCP-DDNS work, add an change
dropped in the merge, arrange to handle callbacks from the socket code after
we've deleted the socket and deal with the quantum issue in the tasks. This
last will be removed shortly as a better fix is now available.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/8fa0112dd11ee301046b42fe463074b067e61a35
This code and thus the bug is present in all versions of Ubuntu from 10.04 LTS onward.
However, I could not reproduce on Bionic, Kinetic or Lunar, so I looked into why this only seems to affect Focal and Jammy, when all versions share the same buggy code.
It appears that Focal and Jammy dhclient is multithreaded:
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
While on Bionic, Kinetic and Lunar, it presents only a single thread:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
You can't have thread concurrency issues if you run as a single thread. The race conditions simply did not happen, since everything had to execute in order, which is why this bug has existed for 13 years in its current form.
Chris Patterson in comment #2 mentions when he built dhclient from the latest Debian source, available here:
https://salsa.debian.org/debian/isc-dhcp/-/commits/master/debian
Chris mentions that it solves the problem, and that it uses in-tree bind libraries. Looking into this, we see the tree has bind 9.11.36:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/version.tmp
and its makefile explicitly shows threads being disabled:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/Makefile.in
bindconfig =
...
--disable-threads
...
bind 9.11.36 still supports disabling threads. Hence this is how the upstream debian build did not reproduce the issue on Focal, since threading was disabled and dhclient ran as a single thread.
On Kinetic and Lunar, the isc-dhcp package seems to track the debian upstream version, with the very same in-tree bind libraries, set to --disable-threads, which is why it is fixed in those versions.
So, we have two options for a fix for Focal and Jammy:
1) We disable threading for dhclient.
2) We add in a mutex to resolve this particular concurrency issue.
2) was the plan, but when I think about it, if dhclient has such a basic thread concurrency issue, and has had it for 13 years without anyone noticing, it is a fair chance that this will not be the only thread concurrency issue.
Looking at the source code, there is no mutexes or semaphores. pthread isn't even a dependency, and it is not linked in. There is no thread synchronisation primitives at all.
I think if we fix the problem, another issue will crop up in six months time, and it will be another concurrency issue.
Now, there is a small issue that isc-dhcp is now officially End Of Life, and has effectively been abandoned by upstream. You can read about it in these notices:
https://lists.isc.org/pipermail/dhcp-users/2022-October/022786.html
https://www.isc.org/blogs/isc-dhcp-eol/
Upstream won't fix any more bugs, make any new releases, or even accept any new commits. They are putting their efforts into isc-kea now.
In which case, 1) to disabling threading seems better.
But it seems isc-dhcp on Focal and Jammy are both incompatible with their native bind9 packages, and instead, use the libraries from bind9-libs, a separate package that packages 9.11.16 for Focal and 9.11.19 for Jammy, to work around the incompatibility issue.
https://launchpad.net/ubuntu/+source/bind9-libs
Looking at the dependencies for isc-dhcp in Focal:
$ apt depends isc-dhcp-client
isc-dhcp-client
Depends: libc6 (>= 2.15)
Depends: libdns-export1109
Depends: libisc-export1105
Depends: debianutils (>= 2.8.2)
Depends: iproute2
Recommends: isc-dhcp-common
Suggests: resolvconf
openresolv
Suggests: avahi-autoipd
Suggests: isc-dhcp-client-ddns
We see we depend on libdns-export1109 and libisc-export1105 from bind9-libs.
Looking at those reverse dependencies:
$ apt rdepends libisc-export1105
libisc-export1105
Reverse Depends:
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~ubuntu1)
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~build1)
Depends: isc-dhcp-server
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: isc-dhcp-client
Depends: isc-dhcp-server
It seems it is only isc-dhcp is a reverse dependency.
So it appears perfectly safe to change bind9-libs from --enable-threads to --disable-threads.
Looking at their debian/rules files:
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/focal-proposed
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/jammy-devel
We see:
CONFIGURE_COMMON := \
...
--disable-threads \
...
--enable-threads \
...
It appears there has been a mistake when setting up the configure options, and both --disable-threads and --enable-threads is set. Because of the way configure is set up --enable-threads will always win.
The fix would be a one line change, to remove --enable-threads.
Since isc-dhcp is dynamically linked to libdns-export1109 and libisc-export1105, we would probably not even need to rebuild isc-dhcp, and instead, only a fix to bind9-libs would be needed. |
[Impact]
Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable.
This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud-init instead of systemd-networkd, and this is causing issues with larger deployments.
The logs of an affected dhclient produce the following:
Listening on LPF/enp1s0/52:54:00:1c:d7:00
Sending on LPF/enp1s0/52:54:00:1c:d7:00
Sending on Socket/fallback
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
...
(omitting 20 similar lines)
...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/
The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up.
tcpdump:
Screenshot of Wireshark:
This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies.
The full explanation is in the "Other Info" section, but the fix is to add a mutex that restricts access to the global linked list of open sockets, and ensures that a newly created socket is added to this list, before the socketmanager callback has an opportunity to walk this list when there is data immediately able to be read.
Mauricio has provided such a patch, and includes options to disable this behaviour during runtime to minimise regression risk.
[Testcase]
Reproducer based on GDB and DHCP noise injection.
It uses 3 veth pairs (DHCP server/client/injector,
the latter two under namespaces) on a linux bridge.
LXD VM:
$ lxc launch ubuntu:focal lp1926139-focal --vm
$ lxc shell lp1926139-focal
Network Setup:
# ip link add br0 type bridge
# ip link set br0 up
# ip link add veth0 type veth peer name veth0br
# ip link set veth0 up
# ip link set veth0br up master br0
# ip netns add ns1
# ip link add veth1 netns ns1 type veth peer name veth1br
# ip -n ns1 link set veth1 up
# ip link set veth1br up master br0
# ip netns add ns2
# ip link add veth2 netns ns2 type veth peer name veth2br
# ip -n ns2 link set veth2 up
# ip link set veth2br up master br0
Network Check:
# ip link show type veth | grep veth
5: veth0br@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
6: veth0@veth0br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
7: veth1br@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
8: veth2br@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
# ip -n ns1 link show type veth | grep veth
2: veth1@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
# ip -n ns2 link show type veth | grep veth
2: veth2@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
DHCP Server Setup:
# apt install -y isc-dhcp-server
# ip addr add 192.168.42.1/24 dev veth0
# echo 'INTERFACESv4="veth0"' >>/etc/default/isc-dhcp-server
# cat <<EOF >>/etc/dhcp/dhcpd.conf
subnet 192.168.42.0 netmask 255.255.255.0 {
range 192.168.42.100 192.168.42.200;
}
EOF
# systemctl restart isc-dhcp-server.service
# systemctl status isc-dhcp-server.service | grep Active:
Active: active (running) since Thu 2023-01-19 02:06:18 UTC; 19s ago
# ss -nlp | grep 0.0.0.0:67
udp UNCONN 0 0 0.0.0.0:67 0.0.0.0:* users:(("dhcpd",pid=3279,fd=9))
DHCP Server Check:
# ip netns exec ns1 \
dhclient -v veth1
...
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 3 (xid=0xd147ab17)
DHCPOFFER of 192.168.42.100 from 192.168.42.1
DHCPREQUEST for 192.168.42.100 on veth1 to 255.255.255.255 port 67 (xid=0x17ab47d1)
DHCPACK of 192.168.42.100 from 192.168.42.1 (xid=0xd147ab17)
bound to 192.168.42.100 -- renewal in 245 seconds.
# ip netns exec ns1 \
dhclient -v veth1 -r
...
DHCPRELEASE of 192.168.42.100 on veth1 to 192.168.42.1 port 67 (xid=0x1cd4aacf)
DHCP Noise Setup:
# ip -n ns2 addr add 192.168.42.2/24 dev veth2
# ip netns exec ns2 \
/bin/sh -c 'while sleep 0.1; do echo; done | nc -u -v -b -s 192.168.42.2 -p 67 255.255.255.255 68' &
Connection to 255.255.255.255 68 port [udp/bootpc] succeeded!
i.e., every 0.1 seconds, broadcast a message as DHCP (port 67) to DHCP client receive (port 68).
DHCP Noise Check:
# tcpdump -i veth0 -n 'udp and host 255.255.255.255' -c 10
...
02:13:26.993233 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.098317 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.205879 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.314234 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.424486 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.532431 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.639614 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.747633 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.864037 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.977402 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
...
GDB Reproducer (original package):
==============
# apt install -y gdb
Capture DHCP Server's UDP packets for reference:
# tcpdump -i veth0 -n 'udp and host 192.168.42.1' -w veth0-udp-192-168-42-1.pcap & pid=$!
Debug symbols:
# wget https://launchpad.net/ubuntu/+archive/primary/+files/isc-dhcp-client-dbgsym_4.4.1-2.1ubuntu5.20.04.4_amd64.ddeb
# apt install -y ./isc-dhcp-client-dbgsym_4.4.1-2.1ubuntu5.20.04.4_amd64.ddeb
Source code line numbers (for breakpoint):
198 isc_result_t omapi_register_io_object (omapi_object_t *h,
...
260 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
...
278 /* Find the last I/O state, if there are any. */
279 for (p = omapi_io_states.next;
Reproduce the issue with a delay introduced via breakpoint on line 279:
# ip netns exec ns1 \
gdb -ex 'set target-async on' -ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm off' -q dhclient
(gdb) break omapip/dispatch.c:279
(gdb) commands
shell sleep 0.2
continue
end
(gdb) run -v -d veth1
...
Listening on LPF/veth1/ea:7a:1d:d1:53:59
Sending on LPF/veth1/ea:7a:1d:d1:53:59
Thread 1 "dhclient" hit Breakpoint 1, omapi_register_io_object ...
Sending on Socket/fallback
Thread 1 "dhclient" hit Breakpoint 1, omapi_register_io_object ...
279 in dispatch.c
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 3 (xid=0xe3b19607)
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 8 (xid=0xe3b19607)
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 13 (xid=0xe3b19607)
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 18 (xid=0xe3b19607)
^C
...
(gdb) quit
The tcpdump confirms the DHCP Server _sent_ DHCP Offer packets,
not handled by the DHCP Client.
# kill $pid
4 packets captured
4 packets received by filter
0 packets dropped by kernel
[2]+ Done tcpdump -i veth0 -n 'udp and host 192.168.42.1' -w veth0-udp-192-168-42-1.pcap
# tcpdump -i veth0 -n 'udp and host 192.168.42.1' -r veth0-udp-192-168-42-1.pcap -v
...
192.168.42.1.67 > 192.168.42.100.68: BOOTP/DHCP, Reply, length 300, xid 0xe3b19607, Flags [none]
Your-IP 192.168.42.100
...
DHCP-Message Option 53, length 1: Offer
...
192.168.42.1.67 > 192.168.42.100.68: BOOTP/DHCP, Reply, length 300, xid 0xe3b19607, secs 4, Flags [none]
Your-IP 192.168.42.100
...
DHCP-Message Option 53, length 1: Offer
...
192.168.42.1.67 > 192.168.42.100.68: BOOTP/DHCP, Reply, length 300, xid 0xe3b19607, secs 12, Flags [none]
Your-IP 192.168.42.100
...
DHCP-Message Option 53, length 1: Offer
...
192.168.42.1.67 > 192.168.42.100.68: BOOTP/DHCP, Reply, length 300, xid 0xe3b19607, secs 25, Flags [none]
Your-IP 192.168.42.100
...
DHCP-Message Option 53, length 1: Offer
...
GDB Reproducer (patched package):
==============
Client & Debug symbols:
# wget \
https://launchpad.net/~mfo/+archive/ubuntu/lp1926139/+files/isc-dhcp-client_4.4.1-2.1ubuntu5.20.04.4+lp1926139.1_amd64.deb \
https://launchpad.net/~mfo/+archive/ubuntu/lp1926139/+files/isc-dhcp-client-dbgsym_4.4.1-2.1ubuntu5.20.04.4+lp1926139.1_amd64.ddeb
# sudo apt install \
./isc-dhcp-client_4.4.1-2.1ubuntu5.20.04.4+lp1926139.1_amd64.deb \
./isc-dhcp-client-dbgsym_4.4.1-2.1ubuntu5.20.04.4+lp1926139.1_amd64.ddeb
Source code line numbers (for breakpoint):
253 isc_result_t omapi_register_io_object (omapi_object_t *h,
...
324 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
...
343 /* Find the last I/O state, if there are any. */
344 for (p = omapi_io_states.next;
Attempt to reproduce the issue again, the same way,
with a delay introduced via breakpoint on line 344:
# ip netns exec ns1 \
gdb -ex 'set target-async on' -ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm off' -q dhclient
(gdb) break omapip/dispatch.c:344
(gdb) commands
shell sleep 0.2
continue
end
(gdb) run -v -d veth1
...
Listening on LPF/veth1/ea:7a:1d:d1:53:59
Sending on LPF/veth1/ea:7a:1d:d1:53:59
Thread 1 "dhclient" hit Breakpoint 1, omapi_register_io_object ...
Waiting for object registration to finish...
(This can be disabled with: <VAR>/<cmdline>)
Sending on Socket/fallback
Object registration finished.
Thread 1 "dhclient" hit Breakpoint 1, omapi_register_io_object ...
344 in dispatch.c
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 3 (xid=0x13d35e3b)
DHCPOFFER of 192.168.42.100 from 192.168.42.1
DHCPREQUEST for 192.168.42.100 on veth1 to 255.255.255.255 port 67 (xid=0x3b5ed313)
DHCPACK of 192.168.42.100 from 192.168.42.1 (xid=0x13d35e3b)
[Detaching after fork from child process 15283]
bound to 192.168.42.100 -- renewal in 252 seconds.
^C
...
(gdb) quit
The issue did not happen!
The DHCP client successfully acquired a DHCP address (above).
It can even be released later, outside of GDB (below).
# ip netns exec ns1 \
dhclient -v veth1 -r
...
DHCPRELEASE of 192.168.42.100 on veth1 to 192.168.42.1 port 67 (xid=0x70f6c778)
[Where problems could occur]
isc-dhcp is a core package, and any change comes with the risk that users would not be able to receive dhcp leases with dhclient, leaving their systems with no IP address and unreachable, and could potentially cripple images that depend on it, e.g. Microsoft Azure uses dhclient called from cloud-init, instead of systemd-networkd, so a regression could potentially affect all Ubuntu users on Azure.
Additionally, the code is called whenever sockets are constructed, and isc-dhcp-server could also be affected.
We have mitigated the risks of regression as best as possible by adding as much detail as possible to this launchpad bug, so it is clear how the race operates and how the patch fixes the issue.
Mauricio has additionally added a environment variable and a kernel command line parameter, that when present, disables the mutex from operating. If a regression were to occur, users can add these parameters to their deployments to work around any issues.
Mauricio and Matthew have decided that the mutex route is best in terms of lessening regression risk, as the alternate solution would be to disable threading on bind9-libs.
Disabling threading on bind9-libs, while complete as a solution, and removes the risk of a future regression caused by thread concurrency issues that are currently undetected, comes with the fact that it removes publicly exported symbols from bind9-libs, and adds others, and changes the entire library from multithreaded to single threaded. If any users happen to use bind9-libs outside of isc-dhcp, they would see their applications either fail to work due to missing symbols, or performance would change.
Disabling threading on bind9-libs is shelved, and can be looked at in the future if necessary.
Back to the mutex solution, Chris Patterson, has been testing this solution at scale on Azure, and in 13k instances, has not had a failure. With the gdb reproducer, we are confident that adding the mutex will not prevent other parts of the software from functioning correctly.
[Other info]
I was reading around the upstream issue trackers, and found the following two bug reports:
https://gitlab.isc.org/isc-projects/dhcp/-/issues/264
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996356
The ISC upstream report was actually quite detailed, and it has the same symptoms of what we are experiencing.
Let's have a look at the root cause. The code I am using is isc-dhcp
4.4.1-2.1ubuntu5.20.04.4 from Focal.
common/discover.c
567 void
568 discover_interfaces(int state) {
...
1002 case AF_INET:
1003 default:
1004 status = omapi_register_io_object((omapi_object_t *)tmp,
1005 if_readsocket,
1006 0, got_one, 0, 0);
1007 break;
1008 }
...
In discover.c, we call discover_interfaces() to iterate over the interfaces, and attempt to register a raw socket against it. We do this by calling omapi_register_io_object() which is used for reading data, and calls the elusive got_one() callback that you instrumented your code to see if it was being called or not.
omapip/dispatch.c
196 /* Register an I/O handle so that we can do asynchronous I/O on it. */
197
198 isc_result_t omapi_register_io_object (omapi_object_t *h,
199 int (*readfd) (omapi_object_t *),
200 int (*writefd) (omapi_object_t *),
201 isc_result_t (*reader)
202 (omapi_object_t *),
203 isc_result_t (*writer)
204 (omapi_object_t *),
205 isc_result_t (*reaper)
206 (omapi_object_t *))
207 {
...
241 /*
242 * Attach the I/O object to the isc socket library via the
243 * fdwatch function. This allows the socket library to watch
244 * over a socket that we built. If there are both a read and
245 * a write socket we asssume they are the same socket.
246 */
247
248 if (readfd) {
249 fd_flags |= ISC_SOCKFDWATCH_READ;
250 fd = readfd(h);
251 }
...
257
258 if (fd_flags != 0) {
259 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
260 fd, fd_flags,
261 omapi_iscsock_cb,
262 obj,
263 dhcp_gbl_ctx.task,
264 &obj->fd);
...
275 }
276
277
278 /* Find the last I/O state, if there are any. */
279 for (p = omapi_io_states.next;
280 p && p -> next; p = p -> next)
281 ;
282 if (p)
283 omapi_io_reference (&p -> next, obj, MDL);
284 else
285 omapi_io_reference (&omapi_io_states.next, obj, MDL);
...
omapi_register_io_object() is called for each socket created, in this case, the if_readsocket from discover_interfaces(). The file descriptor is assigned ISC_SOCKFDWATCH_READ, and we enter the if statement.
The if statement calls isc_socket_fdwatchcreate(), which registers the socket with the ISC socket manager, and sets up the callback omapi_iscsock_cb(), to be called.
Once that has been done, we iterate over the omapi_io_states linked list, which is a global list of registered sockets. We get to the end of the list (or the beginning, if the list is empty), and add the socket to the list.
Now, the bug happens between calling isc_socket_fdwatchcreate() to register the socket with the socket manager, and adding it to the global linked list.
Sometimes, the callback omapi_iscsock_cb() is called inbetween.
omapip/dispatch.c
101 /*
102 * Callback routine to connect the omapi I/O object and socket with
103 * the isc socket code. The isc socket code will call this routine
104 * which will then call the correct local routine to process the bytes.
105 *
106 * Currently we are always willing to read more data, this should be modified
107 * so that on connections we don't read more if we already have enough.
108 *
109 * If we have more bytes to write we ask the library to call us when
110 * we can write more. If we indicate we don't have more to write we need
111 * to poke the library via isc_socket_fdwatchpoke.
112 */
...
118 #define SOCKDELETE 1
119 int
120 omapi_iscsock_cb(isc_task_t *task,
121 isc_socket_t *socket,
122 void *cbarg,
123 int flags)
124 {
...
132 #if SOCKDELETE
133 /*
134 * walk through the io states list, if our object is on there
135 * service it. if not ignore it.
136 */
137 for (obj = omapi_io_states.next; obj != NULL; obj = obj->next) {
138 if (obj == cbarg)
139 break;
140 }
141
142 if (obj == NULL) {
143 return(0);
144 }
145 #else
...
164 if ((flags == ISC_SOCKFDWATCH_READ) &&
165 (obj->reader != NULL) &&
166 (obj->inner != NULL)) {
167 status = obj->reader(obj->inner);
168 /*
169 * If we are shutting down (basically tried to
170 * read and got no bytes) we don't need to try
171 * again.
172 */
173 if (status == ISC_R_SHUTTINGDOWN)
174 return (0);
175 /* Otherwise We always ask for more when reading */
176 return (1);
...
188 /*
189 * We get here if we either had an error (inconsistent
190 * structures etc) or no more to write, tell the socket
191 * lib we don't have more to do right now.
192 */
193 return (0);
194 }
When omapi_iscsock_cb() is called, we iterate over that same omapi_io_states global linked list, and since the raw socket we are about to add is the very first item going to be placed on the list, omapi_io_states.next will be NULL, making obj set to NULL. Well what happens when obj is NULL? We return 0.
142 if (obj == NULL) {
143 return(0);
144 }
If you look down the code a little further, we see that if we are finished with the socket, like shutting down, or an error happens, we return 0 to indicate that we are done with this socket, and won't be needing it anymore, and won't be using it again.
If we return 1, then we ask for more data later on, and omapi_iscsock_cb() will be called again in the future.
So, by returning 0, we will no longer read any packets from the socket, and thus, we simply ignore and will never read any DHCPOFFER packets.
We obviously have two sockets open, one for reading and one for writing, since one is bound to port 67 and the other to port 68, so we keep sending out DHCPDISCOVER packets, not knowing we are ignoring all DHCPOFFERS due to the read socket being closed and not calling omapi_iscsock_cb() ever again.
So, as explained on the upstream bug, what is needed for this race condition to occur?
1) There needs to be data that can be read immediately by the socket. This is how omapi_iscsock_cb() is called, when there is data waiting. If there is no data to read, then the race can't happen.
2) Thread scheduling must occur in a particular order.
There are three threads:
a. The main thread, which is registering the socket
b. The socket manager coordinator thread
c. A socket manager worker thread
For the race to occur, we need to switch from a) to b) to schedule the callback, b) to c) to call the callback and return 0, all before we return from c) to a) to add the socket to the omapi_io_states linked list.
To verify this, I added some print statements to omapi_iscsock_cb() and omapi_register_io_object() to see what is happening:
https://paste.ubuntu.com/p/hGsssrVyG4/
I instrumented entry to the callback, as well as if obj is NULL, and before and after the socket is registered to the socket manager, and before and after the socket is added to the global linked list.
I built the new dhclient, and ran test-parallel.sh, since it by far is the best at reproducing the issue. I concatenated all 20 /tmp/dhclient-* logs to read:
https://paste.ubuntu.com/p/VddddSmdMV/
If we look at one that works and gets the DHCPOFFER, we see:
https://paste.ubuntu.com/p/dfdvNM3RDd/
We have two sockets, one that reads and one that writes. It registers with the socket manager, adds the socket to the linked list, for both sockets, one after the other, and then the DHCPDISOVER is set, and then callbacks are called.
Now, when the issue reproduces:
https://paste.ubuntu.com/p/77BNNhDwdh/
We see that we register the read socket with the socket manager, and immediately get a callback to omapi_iscsock_cb(), so there is likely a packet to read already. We return NULL, since the socket is not on the linked list yet, and strangely this happens twice when we have only entered omapi_iscsock_cb() once. We then add the socket to the linked list, showing we have hit the race condition. The writer socket is added to the socket manager, but does not race, as it is added to the linked list before the callback happens. The callback only seems to be called for the writer socket, and we ignore all DHCPOFFER packets, only sending DHCPDISCOVER packets.
I ran the test-parallel.sh script a few times, and each and every time the issue reproduced, we had a NULL linked list, and returned 0 from omapi_iscsock_cb(). So I am confident we have come across the root cause.
Now, I had an idea of adding a sleep between registering to the socket manager and adding the socket to the linked list, to try and get every single agent to fail 100% of the time, hoping that the callback would be called first.
The change is something like the likes of below:
https://paste.ubuntu.com/p/8zJ2FQK99X/
But after I built it and ran it a few times, I couldn't reproduce the issue at all. A typical run is below:
https://paste.ubuntu.com/p/k6wRhx4RCM/
We can see that everything happens in the correct order, and the issue is not reproduced. I think because each dhclient has to wait 2 seconds before sending a DHCPDISCOVER, the network is quiet, and there is no packet to read immediately, since neighboring agents are also quiet, so the callback is not called, and the socket is added to the linked list before the network gets busy.
The code itself to add the socket to the linked list was added in the below commit, 22 years ago:
commit 61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
From: Ted Lemon <source@isc.org>
Date: Thu, 2 Sep 1999 00:32:56 +0000
Subject: Omapi library, initial checkin
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
Later on, the registration to the socket manager and the callback was added 13 years ago:
commit 98bf16077d22f28e288a18e184a9d1f97cb5f4f7
From: Shawn Routhier <sar@isc.org>
Date: Wed, 28 Oct 2009 04:12:30 +0000
Subject: Support for asynchronous ddns per ticket 19216 - convert to
using isclib and dnslib.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/98bf16077d22f28e288a18e184a9d1f97cb5f4f7
Finally, the null linked list check in the callback that returns 0 was also added 13 years ago:
commit 8fa0112dd11ee301046b42fe463074b067e61a35
From: Shawn Routhier <sar@isc.org>
Date: Thu, 19 Nov 2009 23:49:57 +0000
Subject: Ticket 20540 - clean ups for DHCP-DDNS work, add an change
dropped in the merge, arrange to handle callbacks from the socket code after
we've deleted the socket and deal with the quantum issue in the tasks. This
last will be removed shortly as a better fix is now available.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/8fa0112dd11ee301046b42fe463074b067e61a35
This code and thus the bug is present in all versions of Ubuntu from 10.04 LTS onward.
However, I could not reproduce on Bionic, Kinetic or Lunar, so I looked into why this only seems to affect Focal and Jammy, when all versions share the same buggy code.
It appears that Focal and Jammy dhclient is multithreaded:
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
While on Bionic, Kinetic and Lunar, it presents only a single thread:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
You can't have thread concurrency issues if you run as a single thread. The race conditions simply did not happen, since everything had to execute in order, which is why this bug has existed for 13 years in its current form.
Chris Patterson in comment #2 mentions when he built dhclient from the latest Debian source, available here:
https://salsa.debian.org/debian/isc-dhcp/-/commits/master/debian
Chris mentions that it solves the problem, and that it uses in-tree bind libraries. Looking into this, we see the tree has bind 9.11.36:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/version.tmp
and its makefile explicitly shows threads being disabled:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/Makefile.in
bindconfig =
...
--disable-threads
...
bind 9.11.36 still supports disabling threads. Hence this is how the upstream debian build did not reproduce the issue on Focal, since threading was disabled and dhclient ran as a single thread.
On Kinetic and Lunar, the isc-dhcp package seems to track the debian upstream version, with the very same in-tree bind libraries, set to --disable-threads, which is why it is fixed in those versions.
So, we have two options for a fix for Focal and Jammy:
1) We disable threading for dhclient.
2) We add in a mutex to resolve this particular concurrency issue.
Looking at the source code, there is no mutexes or semaphores. pthread isn't even a dependency, and it is not linked in. There is no thread synchronisation primitives at all.
Initially Matthew proposed 1) as the solution.
It seems isc-dhcp on Focal and Jammy are both incompatible with their native bind9 packages, and instead, use the libraries from bind9-libs, a separate package that packages 9.11.16 for Focal and 9.11.19 for Jammy, to work around the incompatibility issue.
https://launchpad.net/ubuntu/+source/bind9-libs
Looking at the dependencies for isc-dhcp in Focal:
$ apt depends isc-dhcp-client
isc-dhcp-client
Depends: libc6 (>= 2.15)
Depends: libdns-export1109
Depends: libisc-export1105
Depends: debianutils (>= 2.8.2)
Depends: iproute2
Recommends: isc-dhcp-common
Suggests: resolvconf
openresolv
Suggests: avahi-autoipd
Suggests: isc-dhcp-client-ddns
We see we depend on libdns-export1109 and libisc-export1105 from bind9-libs.
Looking at those reverse dependencies:
$ apt rdepends libisc-export1105
libisc-export1105
Reverse Depends:
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~ubuntu1)
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~build1)
Depends: isc-dhcp-server
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: isc-dhcp-client
Depends: isc-dhcp-server
It seems it is only isc-dhcp is a reverse dependency.
So it appears perfectly safe to change bind9-libs from --enable-threads to --disable-threads.
Looking at their debian/rules files:
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/focal-proposed
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/jammy-devel
We see:
CONFIGURE_COMMON := \
...
--disable-threads \
...
--enable-threads \
...
It appears there has been a mistake when setting up the configure options, and both --disable-threads and --enable-threads is set. Because of the way configure is set up --enable-threads will always win.
The fix would be a one line change, to remove --enable-threads.
However, this is a unacceptably high regression risk, due to the fact that building bind9-libs with --disable-threads means we remove publicly exported symbols from libraries:
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641515/+files/lp1926139_focal.debdiff
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641516/+files/lp1926139_jammy.debdiff
As an alternative, Mauricio developed 2), which is to add a mutex that restricts access to the global socket list, ensuring that sockets are added to the list before a callback is allowed to walk the list and read its entries.
https://launchpadlibrarian.net/646801520/lp1926139_focal_isc-dhcp.debdiff
Mauricio's solution is elegant as it reuses an unused variable, obj->closed, as a mutex, and uses a compiler built-in memory barrier, __sync_synchronize(), meaning that there is no need to link libpthread or add any extra thread synchronisation primitives.
This has been tested with 13k VM deployments on Microsoft Azure, and has found to work as expected with no failures, meaning risk of additional race conditions we are not aware of is low.
The reason why this patch was not forwarded upstream, is that isc-dhcp is now officially End Of Life, and has effectively been abandoned by upstream. You can read about it in these notices:
https://lists.isc.org/pipermail/dhcp-users/2022-October/022786.html
https://www.isc.org/blogs/isc-dhcp-eol/
Upstream won't fix any more bugs, make any new releases, or even accept any new commits. They are putting their efforts into isc-kea now. |
|
2023-01-25 01:58:05 |
Matthew Ruffell |
bug task deleted |
bind9-libs (Ubuntu Focal) |
|
|
2023-01-25 01:58:12 |
Matthew Ruffell |
bug task deleted |
bind9-libs (Ubuntu Jammy) |
|
|
2023-01-25 01:58:22 |
Matthew Ruffell |
bind9-libs (Ubuntu): status |
Fix Released |
Won't Fix |
|
2023-01-25 01:59:01 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Focal |
|
2023-01-25 01:59:01 |
Matthew Ruffell |
bug task added |
|
isc-dhcp (Ubuntu Focal) |
|
2023-01-25 01:59:01 |
Matthew Ruffell |
bug task added |
|
bind9-libs (Ubuntu Focal) |
|
2023-01-25 01:59:01 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Jammy |
|
2023-01-25 01:59:01 |
Matthew Ruffell |
bug task added |
|
isc-dhcp (Ubuntu Jammy) |
|
2023-01-25 01:59:01 |
Matthew Ruffell |
bug task added |
|
bind9-libs (Ubuntu Jammy) |
|
2023-01-25 01:59:09 |
Matthew Ruffell |
bug task deleted |
bind9-libs (Ubuntu Focal) |
|
|
2023-01-25 01:59:14 |
Matthew Ruffell |
bug task deleted |
bind9-libs (Ubuntu Jammy) |
|
|
2023-01-25 01:59:38 |
Matthew Ruffell |
isc-dhcp (Ubuntu Focal): status |
New |
In Progress |
|
2023-01-25 01:59:41 |
Matthew Ruffell |
isc-dhcp (Ubuntu Jammy): status |
New |
In Progress |
|
2023-01-25 01:59:52 |
Matthew Ruffell |
isc-dhcp (Ubuntu Focal): importance |
Undecided |
High |
|
2023-01-25 01:59:54 |
Matthew Ruffell |
isc-dhcp (Ubuntu Jammy): importance |
Undecided |
High |
|
2023-01-25 02:00:03 |
Matthew Ruffell |
isc-dhcp (Ubuntu Focal): assignee |
|
Matthew Ruffell (mruffell) |
|
2023-01-25 02:00:13 |
Matthew Ruffell |
isc-dhcp (Ubuntu Jammy): assignee |
|
Mauricio Faria de Oliveira (mfo) |
|
2023-01-26 19:30:09 |
Mauricio Faria de Oliveira |
attachment added |
|
lp1926139_focal_isc-dhcp_v2.debdiff https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5643626/+files/lp1926139_focal_isc-dhcp_v2.debdiff |
|
2023-02-01 00:06:31 |
Mauricio Faria de Oliveira |
bug watch added |
|
https://gitlab.isc.org/isc-projects/dhcp/-/issues/264 |
|
2023-02-01 00:06:31 |
Mauricio Faria de Oliveira |
bug watch added |
|
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996356 |
|
2023-02-01 00:09:52 |
Mauricio Faria de Oliveira |
description |
[Impact]
Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable.
This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud-init instead of systemd-networkd, and this is causing issues with larger deployments.
The logs of an affected dhclient produce the following:
Listening on LPF/enp1s0/52:54:00:1c:d7:00
Sending on LPF/enp1s0/52:54:00:1c:d7:00
Sending on Socket/fallback
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
...
(omitting 20 similar lines)
...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/
The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up.
tcpdump:
Screenshot of Wireshark:
This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies.
The full explanation is in the "Other Info" section, but the fix is to add a mutex that restricts access to the global linked list of open sockets, and ensures that a newly created socket is added to this list, before the socketmanager callback has an opportunity to walk this list when there is data immediately able to be read.
Mauricio has provided such a patch, and includes options to disable this behaviour during runtime to minimise regression risk.
[Testcase]
Reproducer based on GDB and DHCP noise injection.
It uses 3 veth pairs (DHCP server/client/injector,
the latter two under namespaces) on a linux bridge.
LXD VM:
$ lxc launch ubuntu:focal lp1926139-focal --vm
$ lxc shell lp1926139-focal
Network Setup:
# ip link add br0 type bridge
# ip link set br0 up
# ip link add veth0 type veth peer name veth0br
# ip link set veth0 up
# ip link set veth0br up master br0
# ip netns add ns1
# ip link add veth1 netns ns1 type veth peer name veth1br
# ip -n ns1 link set veth1 up
# ip link set veth1br up master br0
# ip netns add ns2
# ip link add veth2 netns ns2 type veth peer name veth2br
# ip -n ns2 link set veth2 up
# ip link set veth2br up master br0
Network Check:
# ip link show type veth | grep veth
5: veth0br@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
6: veth0@veth0br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
7: veth1br@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
8: veth2br@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
# ip -n ns1 link show type veth | grep veth
2: veth1@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
# ip -n ns2 link show type veth | grep veth
2: veth2@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
DHCP Server Setup:
# apt install -y isc-dhcp-server
# ip addr add 192.168.42.1/24 dev veth0
# echo 'INTERFACESv4="veth0"' >>/etc/default/isc-dhcp-server
# cat <<EOF >>/etc/dhcp/dhcpd.conf
subnet 192.168.42.0 netmask 255.255.255.0 {
range 192.168.42.100 192.168.42.200;
}
EOF
# systemctl restart isc-dhcp-server.service
# systemctl status isc-dhcp-server.service | grep Active:
Active: active (running) since Thu 2023-01-19 02:06:18 UTC; 19s ago
# ss -nlp | grep 0.0.0.0:67
udp UNCONN 0 0 0.0.0.0:67 0.0.0.0:* users:(("dhcpd",pid=3279,fd=9))
DHCP Server Check:
# ip netns exec ns1 \
dhclient -v veth1
...
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 3 (xid=0xd147ab17)
DHCPOFFER of 192.168.42.100 from 192.168.42.1
DHCPREQUEST for 192.168.42.100 on veth1 to 255.255.255.255 port 67 (xid=0x17ab47d1)
DHCPACK of 192.168.42.100 from 192.168.42.1 (xid=0xd147ab17)
bound to 192.168.42.100 -- renewal in 245 seconds.
# ip netns exec ns1 \
dhclient -v veth1 -r
...
DHCPRELEASE of 192.168.42.100 on veth1 to 192.168.42.1 port 67 (xid=0x1cd4aacf)
DHCP Noise Setup:
# ip -n ns2 addr add 192.168.42.2/24 dev veth2
# ip netns exec ns2 \
/bin/sh -c 'while sleep 0.1; do echo; done | nc -u -v -b -s 192.168.42.2 -p 67 255.255.255.255 68' &
Connection to 255.255.255.255 68 port [udp/bootpc] succeeded!
i.e., every 0.1 seconds, broadcast a message as DHCP (port 67) to DHCP client receive (port 68).
DHCP Noise Check:
# tcpdump -i veth0 -n 'udp and host 255.255.255.255' -c 10
...
02:13:26.993233 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.098317 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.205879 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.314234 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.424486 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.532431 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.639614 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.747633 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.864037 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
02:13:27.977402 IP 192.168.42.2.67 > 255.255.255.255.68: BOOTP/DHCP, unknown (0x0a) [|bootp]
...
GDB Reproducer (original package):
==============
# apt install -y gdb
Capture DHCP Server's UDP packets for reference:
# tcpdump -i veth0 -n 'udp and host 192.168.42.1' -w veth0-udp-192-168-42-1.pcap & pid=$!
Debug symbols:
# wget https://launchpad.net/ubuntu/+archive/primary/+files/isc-dhcp-client-dbgsym_4.4.1-2.1ubuntu5.20.04.4_amd64.ddeb
# apt install -y ./isc-dhcp-client-dbgsym_4.4.1-2.1ubuntu5.20.04.4_amd64.ddeb
Source code line numbers (for breakpoint):
198 isc_result_t omapi_register_io_object (omapi_object_t *h,
...
260 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
...
278 /* Find the last I/O state, if there are any. */
279 for (p = omapi_io_states.next;
Reproduce the issue with a delay introduced via breakpoint on line 279:
# ip netns exec ns1 \
gdb -ex 'set target-async on' -ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm off' -q dhclient
(gdb) break omapip/dispatch.c:279
(gdb) commands
shell sleep 0.2
continue
end
(gdb) run -v -d veth1
...
Listening on LPF/veth1/ea:7a:1d:d1:53:59
Sending on LPF/veth1/ea:7a:1d:d1:53:59
Thread 1 "dhclient" hit Breakpoint 1, omapi_register_io_object ...
Sending on Socket/fallback
Thread 1 "dhclient" hit Breakpoint 1, omapi_register_io_object ...
279 in dispatch.c
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 3 (xid=0xe3b19607)
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 8 (xid=0xe3b19607)
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 13 (xid=0xe3b19607)
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 18 (xid=0xe3b19607)
^C
...
(gdb) quit
The tcpdump confirms the DHCP Server _sent_ DHCP Offer packets,
not handled by the DHCP Client.
# kill $pid
4 packets captured
4 packets received by filter
0 packets dropped by kernel
[2]+ Done tcpdump -i veth0 -n 'udp and host 192.168.42.1' -w veth0-udp-192-168-42-1.pcap
# tcpdump -i veth0 -n 'udp and host 192.168.42.1' -r veth0-udp-192-168-42-1.pcap -v
...
192.168.42.1.67 > 192.168.42.100.68: BOOTP/DHCP, Reply, length 300, xid 0xe3b19607, Flags [none]
Your-IP 192.168.42.100
...
DHCP-Message Option 53, length 1: Offer
...
192.168.42.1.67 > 192.168.42.100.68: BOOTP/DHCP, Reply, length 300, xid 0xe3b19607, secs 4, Flags [none]
Your-IP 192.168.42.100
...
DHCP-Message Option 53, length 1: Offer
...
192.168.42.1.67 > 192.168.42.100.68: BOOTP/DHCP, Reply, length 300, xid 0xe3b19607, secs 12, Flags [none]
Your-IP 192.168.42.100
...
DHCP-Message Option 53, length 1: Offer
...
192.168.42.1.67 > 192.168.42.100.68: BOOTP/DHCP, Reply, length 300, xid 0xe3b19607, secs 25, Flags [none]
Your-IP 192.168.42.100
...
DHCP-Message Option 53, length 1: Offer
...
GDB Reproducer (patched package):
==============
Client & Debug symbols:
# wget \
https://launchpad.net/~mfo/+archive/ubuntu/lp1926139/+files/isc-dhcp-client_4.4.1-2.1ubuntu5.20.04.4+lp1926139.1_amd64.deb \
https://launchpad.net/~mfo/+archive/ubuntu/lp1926139/+files/isc-dhcp-client-dbgsym_4.4.1-2.1ubuntu5.20.04.4+lp1926139.1_amd64.ddeb
# sudo apt install \
./isc-dhcp-client_4.4.1-2.1ubuntu5.20.04.4+lp1926139.1_amd64.deb \
./isc-dhcp-client-dbgsym_4.4.1-2.1ubuntu5.20.04.4+lp1926139.1_amd64.ddeb
Source code line numbers (for breakpoint):
253 isc_result_t omapi_register_io_object (omapi_object_t *h,
...
324 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
...
343 /* Find the last I/O state, if there are any. */
344 for (p = omapi_io_states.next;
Attempt to reproduce the issue again, the same way,
with a delay introduced via breakpoint on line 344:
# ip netns exec ns1 \
gdb -ex 'set target-async on' -ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm off' -q dhclient
(gdb) break omapip/dispatch.c:344
(gdb) commands
shell sleep 0.2
continue
end
(gdb) run -v -d veth1
...
Listening on LPF/veth1/ea:7a:1d:d1:53:59
Sending on LPF/veth1/ea:7a:1d:d1:53:59
Thread 1 "dhclient" hit Breakpoint 1, omapi_register_io_object ...
Waiting for object registration to finish...
(This can be disabled with: <VAR>/<cmdline>)
Sending on Socket/fallback
Object registration finished.
Thread 1 "dhclient" hit Breakpoint 1, omapi_register_io_object ...
344 in dispatch.c
DHCPDISCOVER on veth1 to 255.255.255.255 port 67 interval 3 (xid=0x13d35e3b)
DHCPOFFER of 192.168.42.100 from 192.168.42.1
DHCPREQUEST for 192.168.42.100 on veth1 to 255.255.255.255 port 67 (xid=0x3b5ed313)
DHCPACK of 192.168.42.100 from 192.168.42.1 (xid=0x13d35e3b)
[Detaching after fork from child process 15283]
bound to 192.168.42.100 -- renewal in 252 seconds.
^C
...
(gdb) quit
The issue did not happen!
The DHCP client successfully acquired a DHCP address (above).
It can even be released later, outside of GDB (below).
# ip netns exec ns1 \
dhclient -v veth1 -r
...
DHCPRELEASE of 192.168.42.100 on veth1 to 192.168.42.1 port 67 (xid=0x70f6c778)
[Where problems could occur]
isc-dhcp is a core package, and any change comes with the risk that users would not be able to receive dhcp leases with dhclient, leaving their systems with no IP address and unreachable, and could potentially cripple images that depend on it, e.g. Microsoft Azure uses dhclient called from cloud-init, instead of systemd-networkd, so a regression could potentially affect all Ubuntu users on Azure.
Additionally, the code is called whenever sockets are constructed, and isc-dhcp-server could also be affected.
We have mitigated the risks of regression as best as possible by adding as much detail as possible to this launchpad bug, so it is clear how the race operates and how the patch fixes the issue.
Mauricio has additionally added a environment variable and a kernel command line parameter, that when present, disables the mutex from operating. If a regression were to occur, users can add these parameters to their deployments to work around any issues.
Mauricio and Matthew have decided that the mutex route is best in terms of lessening regression risk, as the alternate solution would be to disable threading on bind9-libs.
Disabling threading on bind9-libs, while complete as a solution, and removes the risk of a future regression caused by thread concurrency issues that are currently undetected, comes with the fact that it removes publicly exported symbols from bind9-libs, and adds others, and changes the entire library from multithreaded to single threaded. If any users happen to use bind9-libs outside of isc-dhcp, they would see their applications either fail to work due to missing symbols, or performance would change.
Disabling threading on bind9-libs is shelved, and can be looked at in the future if necessary.
Back to the mutex solution, Chris Patterson, has been testing this solution at scale on Azure, and in 13k instances, has not had a failure. With the gdb reproducer, we are confident that adding the mutex will not prevent other parts of the software from functioning correctly.
[Other info]
I was reading around the upstream issue trackers, and found the following two bug reports:
https://gitlab.isc.org/isc-projects/dhcp/-/issues/264
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996356
The ISC upstream report was actually quite detailed, and it has the same symptoms of what we are experiencing.
Let's have a look at the root cause. The code I am using is isc-dhcp
4.4.1-2.1ubuntu5.20.04.4 from Focal.
common/discover.c
567 void
568 discover_interfaces(int state) {
...
1002 case AF_INET:
1003 default:
1004 status = omapi_register_io_object((omapi_object_t *)tmp,
1005 if_readsocket,
1006 0, got_one, 0, 0);
1007 break;
1008 }
...
In discover.c, we call discover_interfaces() to iterate over the interfaces, and attempt to register a raw socket against it. We do this by calling omapi_register_io_object() which is used for reading data, and calls the elusive got_one() callback that you instrumented your code to see if it was being called or not.
omapip/dispatch.c
196 /* Register an I/O handle so that we can do asynchronous I/O on it. */
197
198 isc_result_t omapi_register_io_object (omapi_object_t *h,
199 int (*readfd) (omapi_object_t *),
200 int (*writefd) (omapi_object_t *),
201 isc_result_t (*reader)
202 (omapi_object_t *),
203 isc_result_t (*writer)
204 (omapi_object_t *),
205 isc_result_t (*reaper)
206 (omapi_object_t *))
207 {
...
241 /*
242 * Attach the I/O object to the isc socket library via the
243 * fdwatch function. This allows the socket library to watch
244 * over a socket that we built. If there are both a read and
245 * a write socket we asssume they are the same socket.
246 */
247
248 if (readfd) {
249 fd_flags |= ISC_SOCKFDWATCH_READ;
250 fd = readfd(h);
251 }
...
257
258 if (fd_flags != 0) {
259 status = isc_socket_fdwatchcreate(dhcp_gbl_ctx.socketmgr,
260 fd, fd_flags,
261 omapi_iscsock_cb,
262 obj,
263 dhcp_gbl_ctx.task,
264 &obj->fd);
...
275 }
276
277
278 /* Find the last I/O state, if there are any. */
279 for (p = omapi_io_states.next;
280 p && p -> next; p = p -> next)
281 ;
282 if (p)
283 omapi_io_reference (&p -> next, obj, MDL);
284 else
285 omapi_io_reference (&omapi_io_states.next, obj, MDL);
...
omapi_register_io_object() is called for each socket created, in this case, the if_readsocket from discover_interfaces(). The file descriptor is assigned ISC_SOCKFDWATCH_READ, and we enter the if statement.
The if statement calls isc_socket_fdwatchcreate(), which registers the socket with the ISC socket manager, and sets up the callback omapi_iscsock_cb(), to be called.
Once that has been done, we iterate over the omapi_io_states linked list, which is a global list of registered sockets. We get to the end of the list (or the beginning, if the list is empty), and add the socket to the list.
Now, the bug happens between calling isc_socket_fdwatchcreate() to register the socket with the socket manager, and adding it to the global linked list.
Sometimes, the callback omapi_iscsock_cb() is called inbetween.
omapip/dispatch.c
101 /*
102 * Callback routine to connect the omapi I/O object and socket with
103 * the isc socket code. The isc socket code will call this routine
104 * which will then call the correct local routine to process the bytes.
105 *
106 * Currently we are always willing to read more data, this should be modified
107 * so that on connections we don't read more if we already have enough.
108 *
109 * If we have more bytes to write we ask the library to call us when
110 * we can write more. If we indicate we don't have more to write we need
111 * to poke the library via isc_socket_fdwatchpoke.
112 */
...
118 #define SOCKDELETE 1
119 int
120 omapi_iscsock_cb(isc_task_t *task,
121 isc_socket_t *socket,
122 void *cbarg,
123 int flags)
124 {
...
132 #if SOCKDELETE
133 /*
134 * walk through the io states list, if our object is on there
135 * service it. if not ignore it.
136 */
137 for (obj = omapi_io_states.next; obj != NULL; obj = obj->next) {
138 if (obj == cbarg)
139 break;
140 }
141
142 if (obj == NULL) {
143 return(0);
144 }
145 #else
...
164 if ((flags == ISC_SOCKFDWATCH_READ) &&
165 (obj->reader != NULL) &&
166 (obj->inner != NULL)) {
167 status = obj->reader(obj->inner);
168 /*
169 * If we are shutting down (basically tried to
170 * read and got no bytes) we don't need to try
171 * again.
172 */
173 if (status == ISC_R_SHUTTINGDOWN)
174 return (0);
175 /* Otherwise We always ask for more when reading */
176 return (1);
...
188 /*
189 * We get here if we either had an error (inconsistent
190 * structures etc) or no more to write, tell the socket
191 * lib we don't have more to do right now.
192 */
193 return (0);
194 }
When omapi_iscsock_cb() is called, we iterate over that same omapi_io_states global linked list, and since the raw socket we are about to add is the very first item going to be placed on the list, omapi_io_states.next will be NULL, making obj set to NULL. Well what happens when obj is NULL? We return 0.
142 if (obj == NULL) {
143 return(0);
144 }
If you look down the code a little further, we see that if we are finished with the socket, like shutting down, or an error happens, we return 0 to indicate that we are done with this socket, and won't be needing it anymore, and won't be using it again.
If we return 1, then we ask for more data later on, and omapi_iscsock_cb() will be called again in the future.
So, by returning 0, we will no longer read any packets from the socket, and thus, we simply ignore and will never read any DHCPOFFER packets.
We obviously have two sockets open, one for reading and one for writing, since one is bound to port 67 and the other to port 68, so we keep sending out DHCPDISCOVER packets, not knowing we are ignoring all DHCPOFFERS due to the read socket being closed and not calling omapi_iscsock_cb() ever again.
So, as explained on the upstream bug, what is needed for this race condition to occur?
1) There needs to be data that can be read immediately by the socket. This is how omapi_iscsock_cb() is called, when there is data waiting. If there is no data to read, then the race can't happen.
2) Thread scheduling must occur in a particular order.
There are three threads:
a. The main thread, which is registering the socket
b. The socket manager coordinator thread
c. A socket manager worker thread
For the race to occur, we need to switch from a) to b) to schedule the callback, b) to c) to call the callback and return 0, all before we return from c) to a) to add the socket to the omapi_io_states linked list.
To verify this, I added some print statements to omapi_iscsock_cb() and omapi_register_io_object() to see what is happening:
https://paste.ubuntu.com/p/hGsssrVyG4/
I instrumented entry to the callback, as well as if obj is NULL, and before and after the socket is registered to the socket manager, and before and after the socket is added to the global linked list.
I built the new dhclient, and ran test-parallel.sh, since it by far is the best at reproducing the issue. I concatenated all 20 /tmp/dhclient-* logs to read:
https://paste.ubuntu.com/p/VddddSmdMV/
If we look at one that works and gets the DHCPOFFER, we see:
https://paste.ubuntu.com/p/dfdvNM3RDd/
We have two sockets, one that reads and one that writes. It registers with the socket manager, adds the socket to the linked list, for both sockets, one after the other, and then the DHCPDISOVER is set, and then callbacks are called.
Now, when the issue reproduces:
https://paste.ubuntu.com/p/77BNNhDwdh/
We see that we register the read socket with the socket manager, and immediately get a callback to omapi_iscsock_cb(), so there is likely a packet to read already. We return NULL, since the socket is not on the linked list yet, and strangely this happens twice when we have only entered omapi_iscsock_cb() once. We then add the socket to the linked list, showing we have hit the race condition. The writer socket is added to the socket manager, but does not race, as it is added to the linked list before the callback happens. The callback only seems to be called for the writer socket, and we ignore all DHCPOFFER packets, only sending DHCPDISCOVER packets.
I ran the test-parallel.sh script a few times, and each and every time the issue reproduced, we had a NULL linked list, and returned 0 from omapi_iscsock_cb(). So I am confident we have come across the root cause.
Now, I had an idea of adding a sleep between registering to the socket manager and adding the socket to the linked list, to try and get every single agent to fail 100% of the time, hoping that the callback would be called first.
The change is something like the likes of below:
https://paste.ubuntu.com/p/8zJ2FQK99X/
But after I built it and ran it a few times, I couldn't reproduce the issue at all. A typical run is below:
https://paste.ubuntu.com/p/k6wRhx4RCM/
We can see that everything happens in the correct order, and the issue is not reproduced. I think because each dhclient has to wait 2 seconds before sending a DHCPDISCOVER, the network is quiet, and there is no packet to read immediately, since neighboring agents are also quiet, so the callback is not called, and the socket is added to the linked list before the network gets busy.
The code itself to add the socket to the linked list was added in the below commit, 22 years ago:
commit 61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
From: Ted Lemon <source@isc.org>
Date: Thu, 2 Sep 1999 00:32:56 +0000
Subject: Omapi library, initial checkin
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/61b844bfd7641a62d681a1f70d3b6dbc485ce4b6
Later on, the registration to the socket manager and the callback was added 13 years ago:
commit 98bf16077d22f28e288a18e184a9d1f97cb5f4f7
From: Shawn Routhier <sar@isc.org>
Date: Wed, 28 Oct 2009 04:12:30 +0000
Subject: Support for asynchronous ddns per ticket 19216 - convert to
using isclib and dnslib.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/98bf16077d22f28e288a18e184a9d1f97cb5f4f7
Finally, the null linked list check in the callback that returns 0 was also added 13 years ago:
commit 8fa0112dd11ee301046b42fe463074b067e61a35
From: Shawn Routhier <sar@isc.org>
Date: Thu, 19 Nov 2009 23:49:57 +0000
Subject: Ticket 20540 - clean ups for DHCP-DDNS work, add an change
dropped in the merge, arrange to handle callbacks from the socket code after
we've deleted the socket and deal with the quantum issue in the tasks. This
last will be removed shortly as a better fix is now available.
Link: https://gitlab.isc.org/isc-projects/dhcp/-/commit/8fa0112dd11ee301046b42fe463074b067e61a35
This code and thus the bug is present in all versions of Ubuntu from 10.04 LTS onward.
However, I could not reproduce on Bionic, Kinetic or Lunar, so I looked into why this only seems to affect Focal and Jammy, when all versions share the same buggy code.
It appears that Focal and Jammy dhclient is multithreaded:
$ sudo ps -T -p 19512
PID SPID TTY TIME CMD
19512 19512 ? 00:00:00 dhclient
19512 19519 ? 00:00:00 isc-worker0000
19512 19521 ? 00:00:00 isc-socket
19512 19522 ? 00:00:00 isc-timer
While on Bionic, Kinetic and Lunar, it presents only a single thread:
$ sudo ps -T -p 23894
PID SPID TTY TIME CMD
23894 23894 ? 00:00:00 dhclient
You can't have thread concurrency issues if you run as a single thread. The race conditions simply did not happen, since everything had to execute in order, which is why this bug has existed for 13 years in its current form.
Chris Patterson in comment #2 mentions when he built dhclient from the latest Debian source, available here:
https://salsa.debian.org/debian/isc-dhcp/-/commits/master/debian
Chris mentions that it solves the problem, and that it uses in-tree bind libraries. Looking into this, we see the tree has bind 9.11.36:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/version.tmp
and its makefile explicitly shows threads being disabled:
https://salsa.debian.org/debian/isc-dhcp/-/blob/master/bind/Makefile.in
bindconfig =
...
--disable-threads
...
bind 9.11.36 still supports disabling threads. Hence this is how the upstream debian build did not reproduce the issue on Focal, since threading was disabled and dhclient ran as a single thread.
On Kinetic and Lunar, the isc-dhcp package seems to track the debian upstream version, with the very same in-tree bind libraries, set to --disable-threads, which is why it is fixed in those versions.
So, we have two options for a fix for Focal and Jammy:
1) We disable threading for dhclient.
2) We add in a mutex to resolve this particular concurrency issue.
Looking at the source code, there is no mutexes or semaphores. pthread isn't even a dependency, and it is not linked in. There is no thread synchronisation primitives at all.
Initially Matthew proposed 1) as the solution.
It seems isc-dhcp on Focal and Jammy are both incompatible with their native bind9 packages, and instead, use the libraries from bind9-libs, a separate package that packages 9.11.16 for Focal and 9.11.19 for Jammy, to work around the incompatibility issue.
https://launchpad.net/ubuntu/+source/bind9-libs
Looking at the dependencies for isc-dhcp in Focal:
$ apt depends isc-dhcp-client
isc-dhcp-client
Depends: libc6 (>= 2.15)
Depends: libdns-export1109
Depends: libisc-export1105
Depends: debianutils (>= 2.8.2)
Depends: iproute2
Recommends: isc-dhcp-common
Suggests: resolvconf
openresolv
Suggests: avahi-autoipd
Suggests: isc-dhcp-client-ddns
We see we depend on libdns-export1109 and libisc-export1105 from bind9-libs.
Looking at those reverse dependencies:
$ apt rdepends libisc-export1105
libisc-export1105
Reverse Depends:
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~ubuntu1)
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: libbind-export-dev (= 1:9.11.16+dfsg-3~build1)
Depends: isc-dhcp-server
Depends: isc-dhcp-client
Depends: isc-dhcp-server-ldap
Depends: isc-dhcp-relay
Depends: isc-dhcp-client-ddns
Depends: libisccfg-export163
Depends: libisccc-export161
Depends: libirs-export161
Depends: libdns-export1109
Depends: isc-dhcp-client
Depends: isc-dhcp-server
It seems it is only isc-dhcp is a reverse dependency.
So it appears perfectly safe to change bind9-libs from --enable-threads to --disable-threads.
Looking at their debian/rules files:
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/focal-proposed
https://git.launchpad.net/ubuntu/+source/bind9-libs/tree/debian/rules?h=applied/ubuntu/jammy-devel
We see:
CONFIGURE_COMMON := \
...
--disable-threads \
...
--enable-threads \
...
It appears there has been a mistake when setting up the configure options, and both --disable-threads and --enable-threads is set. Because of the way configure is set up --enable-threads will always win.
The fix would be a one line change, to remove --enable-threads.
However, this is a unacceptably high regression risk, due to the fact that building bind9-libs with --disable-threads means we remove publicly exported symbols from libraries:
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641515/+files/lp1926139_focal.debdiff
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641516/+files/lp1926139_jammy.debdiff
As an alternative, Mauricio developed 2), which is to add a mutex that restricts access to the global socket list, ensuring that sockets are added to the list before a callback is allowed to walk the list and read its entries.
https://launchpadlibrarian.net/646801520/lp1926139_focal_isc-dhcp.debdiff
Mauricio's solution is elegant as it reuses an unused variable, obj->closed, as a mutex, and uses a compiler built-in memory barrier, __sync_synchronize(), meaning that there is no need to link libpthread or add any extra thread synchronisation primitives.
This has been tested with 13k VM deployments on Microsoft Azure, and has found to work as expected with no failures, meaning risk of additional race conditions we are not aware of is low.
The reason why this patch was not forwarded upstream, is that isc-dhcp is now officially End Of Life, and has effectively been abandoned by upstream. You can read about it in these notices:
https://lists.isc.org/pipermail/dhcp-users/2022-October/022786.html
https://www.isc.org/blogs/isc-dhcp-eol/
Upstream won't fix any more bugs, make any new releases, or even accept any new commits. They are putting their efforts into isc-kea now. |
[Impact]
* Occasionally, during instance boot or machine start-up,
dhclient will attempt to acquire a dhcp lease and fail,
leaving the instance with no IP address and making it
unreachable.
* This happens about once every 100 reboots on bare metal,
or affecting between ~0.3% to 2% of deployments on Azure
(comment #2).
* Azure uses dhclient called from cloud-init instead of
systemd-networkd, and this is causing issues with larger
deployments.
* The logs of an affected dhclient produce the following:
Listening on LPF/enp1s0/52:54:00:1c:d7:00
Sending on LPF/enp1s0/52:54:00:1c:d7:00
Sending on Socket/fallback
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 ...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 ...
...
(omitting 20 similar lines)
...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 ...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 ...
DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 ...
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
* This only impacts Focal and Jammy, where bind9-libs
are multi-threaded (Bionic/earlier and Kinetic/later
are single-threaded).
* The actual problem is dhclient containing a thread
concurrency race condition, and when the race occurs,
the read socket is incorrectly/prematurely unwatched
because required structures are not yet consistent,
thus dhclient does not read any DHCPOFFER replies.
* Detailed analysis of the issue is in comment #17.
[Fix]
* Prevent the race condition by starting to watch the
read socket after required structures are consistent.
* The fix has been tested in Azure w/ 13500 instances,
and no errors have been observed (previously: 0.4%).
* Anyway, in case regressions are observed, the patch
introduces 2 switches to revert to previous behavior,
which can be applied per-process or system-wide:
- DHCP_FD_FLAGS_POKE=0 environment variable
- dhcp.fd_flags_poke=0 kernel cmdline option
* (Previous approaches/discussions included reverting
bind9-libs to single-threaded, but we concluded it
would have more regression risk than the expected
[some bits in comment #8, and some internal chat],
and remove exported symbols (apparently unused, but).
We also considered a mutex/spinlock approach, but
later found a simpler way w/ isc lib; comment #13.)
[Test Plan]
* Synthetic reproducer with GDB to force the race
condition, and DHCP server/client/noise injection
is described in comment #9.
* Test with the original package (problem occurs).
* Test with the modified package (problem fixed).
- Set DHCP_FD_FLAGS_POKE=0 (problem occurs).
- Set dhcp.fd_flags_poke=0 (problem occurs).
[Regression Potential]
* 1) dhclient failing to acquire DHCP leases.
* 2) dhcpd is also affected by code changes,
thus failures to handle DHCP lease requests
also have potential for regressions.
* 3) the functional change added by the fix,
if a regression were to occur, would likely
be an issue only under some (unknown) race
condition as well, thus expected to be rare.
* Note: this potentially affects Focal/Jammy
on Azure as a whole, per usage of dhclient
in cloud-init instead of systemd-networkd.
Azure provided extensive testing for all 3
approaches (mostly internal communications,
and some bug comments), with ~13k instances.
No issues were observed (previously: 0.4%).
* Such testing scale seems to indicate that
there are no regressions for dhclient to
acquire DHCP leases (1), nor another race
condition that hit the fix/new behavior (3).
With that, apparently (2) should be OK too.
* Also, so to mitigate the regression risk
as much as possible, there's very detailed
analysis provided here (comments #17, #18)
and more information about the fix in its
patch file's comment. |
|
2023-02-02 13:02:25 |
Mauricio Faria de Oliveira |
tags |
focal jammy sts |
focal jammy se-sru-sponsor-mfo sts |
|
2023-02-09 13:16:58 |
Łukasz Zemczak |
isc-dhcp (Ubuntu Jammy): status |
In Progress |
Fix Committed |
|
2023-02-09 13:17:00 |
Łukasz Zemczak |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2023-02-09 13:17:02 |
Łukasz Zemczak |
bug |
|
|
added subscriber SRU Verification |
2023-02-09 13:17:06 |
Łukasz Zemczak |
tags |
focal jammy se-sru-sponsor-mfo sts |
focal jammy se-sru-sponsor-mfo sts verification-needed verification-needed-jammy |
|
2023-02-09 13:49:29 |
Łukasz Zemczak |
isc-dhcp (Ubuntu Focal): status |
In Progress |
Fix Committed |
|
2023-02-09 13:49:34 |
Łukasz Zemczak |
tags |
focal jammy se-sru-sponsor-mfo sts verification-needed verification-needed-jammy |
focal jammy se-sru-sponsor-mfo sts verification-needed verification-needed-focal verification-needed-jammy |
|
2023-02-13 10:10:59 |
FinnRM |
isc-dhcp (Ubuntu Focal): status |
Fix Committed |
Fix Released |
|
2023-02-13 10:11:02 |
FinnRM |
isc-dhcp (Ubuntu Focal): status |
Fix Released |
Fix Committed |
|
2023-02-15 14:31:35 |
Mauricio Faria de Oliveira |
tags |
focal jammy se-sru-sponsor-mfo sts verification-needed verification-needed-focal verification-needed-jammy |
focal jammy se-sru-sponsor-mfo sts verification-done verification-done-focal verification-done-jammy |
|
2023-02-27 17:11:30 |
Launchpad Janitor |
isc-dhcp (Ubuntu Jammy): status |
Fix Committed |
Fix Released |
|
2023-02-27 17:11:36 |
Brian Murray |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2023-02-27 17:14:38 |
Launchpad Janitor |
isc-dhcp (Ubuntu Focal): status |
Fix Committed |
Fix Released |
|