wide-dhcpv6 client apparently sets incorrect PD specfic vltime

Bug #1559741 reported by BatteryKing
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
wide-dhcpv6 (Ubuntu)
New
Undecided
Unassigned

Bug Description

This is on Ubuntu 14.04 LTS 64-bit with the last update done on 03/09/2016.

I setup wide-dhcpv6 client on my home network to grab IPv6 addresses from my ISP. The /128 address for my firewall is fine and after 24 hours I see in the logs the address just for this external interface being renewed, but the /60 based address block I request for a few internal subnets for my home network do not renew and after 24 hours the ISP stops routing to them. I did a Wireshark capture and all looked well, in particular the IA_PD vltime value looks correct, specifically 0x00015180. However when I run the wide-dhcpv6 client with the -D debug option, I see an incorrect value, specifically 0x7FFF00015180, which is 48-bits long while the hex dumped field in Wireshark is only 32-bits long. These extra bytes don't belong and are not part of the reply from my ISP's server. Ergo, this must be a fault on the wide-dhcpv6 client side.

Revision history for this message
David Beveridge (dage) wrote :

see http://manpages.ubuntu.com/manpages/lucid/man5/dhcp6c.conf.5.html

Have you tried to specify the vltime in your config file?

Revision history for this message
BatteryKing (jmcsnyder) wrote :

No. As I am understanding the linked documentation, if you are setting up a wide-dhcpv6 server, this can be specified. However seeing I am not an ISP, but instead a customer, I am relegated to being a client and specifying these parameters do not seem to be an option for the client. Especially with IPv6 my understanding is you need the ISP to assign you Internet valid subnets (and parameters of operation) to use for your internal network in order for things to work, so there is two layers to client: one kind of serverish as it delegates subnets to your subnets, but is a client to the ISP IPv6 DHCP server. The second is a pure client for internal nodes to configure themselves.

Revision history for this message
BatteryKing (jmcsnyder) wrote :

If it helps, I have been documenting my setup in a templated format here: http://ubuntuforums.org/showthread.php?t=2279612

Revision history for this message
BatteryKing (jmcsnyder) wrote :

I have found evidence that the wide-dhcpv6 version in Ubuntu 14.04 at least was written on a 32-bit architecture and never properly redesigned for a 64-bit compilation. The evidence for this found in the source code goes as follows:
1. Consistent use of the printf formatter %ul for 32-bit int fields. On a 32-bit architecture this is fine, but on a 64-bit compile with gcc unsigned long (ul) is 64-bits. This explains why the debug print statement grabs extra bytes. It also hints at another problem in that while the printf code is consistently wrong for a 64-bit compile, only the PD specific vltime picks up extra non-zero bytes beyond the 32-bit boundary. As best as I can tell this is due to compiler specific circumstance.

2. As above is a display issue with debugging messages, if you consider there are problems consistent with 32-bit only testing, it seems logical to assume somewhere those extra bytes beyond the 32-bit vltime gets assigned to the timer in an unsafe manner. I think I may have found this in that in prefixconf.c there is an assignment of vltime to tv_sec, which on a 64-bit compile would be assigning a 32-bit unsigned int to a 64-bit integer without a cast. As I normally due a cast for such conversions, I am suspecting without a cast the random bytes beyond the 32-bit boundary are getting sucked in on this particular assignment. For the NA specific vltime value the bytes just happen to be all zero, so no noticed issue, but as the incorrectly implemented printf statement shows, the bytes beyond the 32-bit boundary for the PD specific vltime are not zero and so a tv_sec value for the refresh timer gets set to something ridiculously high. The logs show that the NA specific address gets refreshed at the prescribed time while the PD specific address range does not, so everything matches up for this hypothesis.

Now I just need to make some changes and test it all out...

Revision history for this message
BatteryKing (jmcsnyder) wrote :

Got distracted for a bit, but finally got some testing done. It looks like I was only partially right in that the display formatter was wrong, so now I am seeing the correct time displayed. There is still something going wrong with the dhcpv6 subnet update. Will have to spend more time to properly understand this problem.

Revision history for this message
BatteryKing (jmcsnyder) wrote :

I noticed going through the avahi docs that this is used by Apple apparently. Also Cox says there is a compatibility issue with the Apple Airport Extreme and IPv6 failing after a day with the resolution being restart the device. My issue is the IPv6 update for the subnet fails after 24 hours or so, so sounds to be about the same symptoms. Can these two be related?

Revision history for this message
BatteryKing (jmcsnyder) wrote :

I made some headway in figuring this out. One thing I did not understand was T1 and T2 time and so was looking at the logs at the vltime hour mark. The proper place to be looking for the first error was at the T1 mark when the renew operation takes place. At this point the following happens in the logs:

A Date XX:XX:XX machine_name dhcp6c[31722]: update_address: update an address XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX pltime=86400, vltime=86400
A Date XX:XX:XX machine_name dhcp6c[31722]: ifaddrconf: failed to add an address on WAN_interface: File exists
A Date XX:XX:XX machine_name dhcp6c[31722]: update_ia: failed to update an address XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX
A Date XX:XX:XX machine_name dhcp6c[31722]: dhcp6_remove_event: removing an event on WAN_interface, state=RENEW

I went through the source code to try to trace this out and I came up with the system call to change the IP address fails and this causes the code to update the refresh timer to not execute as it executes after this call except in the case where the former operation fails, then it gets skipped. Once vltime for the WAN interface is reached the WAN interface expires and requests a new address, but the internal interfaces don't request such an update. I believe it is at this point of the WAN interface expiring and re-initing where Cox's server drops the route for the internal subnets. Specifically:
1. In dhcp6c_ia.c line 166 - Calls update_address()
2. In addrconf.c line 177 - Call to na_ifaddrconf() is made. This fails and returns before executing timer code below.
3. In addrconf.c line 396 - ifaddrconf() is made.
4. In common.c line 3398 - System call is made to add /128 IP address issued by DHCPv6 server and fails because the address has not expired yet. At least if I understand correctly.

Revision history for this message
BatteryKing (jmcsnyder) wrote :

[solved] I seem to have found and fixed the problem and so have some confirmation I was on the right track in my previous post. Specifically in file common.c line 3328 I changed:

    if (ioctl(s, ioctl_cmd, &req)) {
        debug_printf(LOG_NOTICE, FNAME, "failed to %s an address on %s: %s",
            cmdstr, ifname, strerror(errno));
        close(s);
        return (-1);
    }

to:
    if (ioctl(s, ioctl_cmd, &req)) {
        if ((strcmp("add",cmdstr) !=0) || (strcmp("File exists",strerror(errno)) != 0)) { //Ignore if trying to add existing address
            debug_printf(LOG_NOTICE, FNAME, "failed to %s an address on %s: %s",
                cmdstr, ifname, strerror(errno));
            close(s);
            return (-1);
        }
    }

I have gone over 36 hours now and have 3 renew operations in the logs and they all look identical where before it was obviously not going well once I released what I needed to look for. Also my internal subnets are still routing. I figured 3 seemingly perfect renew cycles is enough to declare success in solving this problem.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.