netfilter: iptables-restore: setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security...", ...) return -EAGAIN
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Hi,
I'm debugging an iptables-restore failure, which happens about 5% of the
time when I keep stopping and starting the Linux VM. The VM has only 1
CPU, and kernel version is 4.15.0-1098-azure, but I suspect the issue may
also exist in the mainline Linux kernel.
When the failure happens, it's always caused by line 27 of the rule file:
1 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
2 *raw
3 :PREROUTING ACCEPT [0:0]
4 :OUTPUT ACCEPT [0:0]
5 -A PREROUTING ! -s 168.63.129.16/32 -p tcp -j NOTRACK
6 -A OUTPUT ! -d 168.63.129.16/32 -p tcp -j NOTRACK
7 COMMIT
8 # Completed on Fri Apr 23 09:22:59 2021
9 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
10 *filter
11 :INPUT ACCEPT [2407:79190058]
12 :FORWARD ACCEPT [0:0]
13 :OUTPUT ACCEPT [1648:2190051]
14 -A OUTPUT -d 169.254.169.254/32 -m owner --uid-owner 33 -j DROP
15 COMMIT
16 # Completed on Fri Apr 23 09:22:59 2021
17 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
18 *security
19 :INPUT ACCEPT [2345:79155398]
20 :FORWARD ACCEPT [0:0]
21 :OUTPUT ACCEPT [1504:2129015]
22 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT
23 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
24 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT
25 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
26 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
27 COMMIT
The related part of the strace log is:
1 socket(PF_INET, SOCK_RAW, IPPROTO_RAW) = 3
2 getsockopt(3, SOL_IP, IPT_SO_GET_INFO, "security\
3 getsockopt(3, SOL_IP, IPT_SO_GET_ENTRIES, "security\
4 setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security\
5 close(3) = 0
6 write(2, "iptables-restore: line 27 failed"..., 33) = 33
The -EAGAIN error comes from line 1240 of xt_replace_table():
do_ipt_set_ctl
do_replace
__do_replace
1216 xt_replace_
1217 unsigned int num_counters,
1218 struct xt_table_info *newinfo,
1219 int *error)
1220 {
1221 struct xt_table_info *private;
1222 unsigned int cpu;
1223 int ret;
1224
1225 ret = xt_jumpstack_
1226 if (ret < 0) {
1227 *error = ret;
1228 return NULL;
1229 }
1230
1231 /* Do the substitution. */
1232 local_bh_disable();
1233 private = table->private;
1234
1235 /* Check inside lock: is the old number correct? */
1236 if (num_counters != private->number) {
1237 pr_debug(
1238 num_counters, private->number);
1239 local_bh_enable();
1240 *error = -EAGAIN;
1241 return NULL;
1242 }
When the function returns -EAGAIN, the 'num_counters' is 5 while
'private->number' is 6.
If I re-run the iptables-restore program upon the failure, the program
will succeed.
I checked the function xt_replace_table() in the recent mainline kernel and it
looks like the function is the same.
It looks like there is a race condition between iptables-restore calls
getsockopt() to get the number of table entries and iptables call
setsockopt() to replace the entries? Looks like some other program is
concurrently calling getsockopt(
not the case according to the messages I print via trace_printk() around
do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is
no other program calling do_replace(); the table entry number was changed
to 5 by another program 'iptables' about 1.3 milliseconds ago, and then
this program 'iptables-restore' calls setsockopt() and the kernel sees
'num_counters' being 5 and the 'private->number' being 6 (how can this
happen??); the next setsockopt() call for the same 'security' table
happens in about 1 minute with both the numbers being 6.
Can you please shed some light on the issue? Thanks!
BTW, iptables does have a retry mechanism for getsockopt():
2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.")
(https:/
But it looks like this is enough? e.g. here getsockopt() returns 0, but
setsockopt() returns -EAGAIN.
Thanks,
Dexuan
I reported the issue to the mailing list: /lwn.net/ ml/linux- kernel/ MW2PR2101MB0892 FC0F67BD25661CD CE149BF529% 40MW2PR2101MB08 92.namprd21. prod.outlook. com/
https:/