Mtr

Timeout option to avoid false postives for packet loss in report mode on high latency links

Bug #776211 reported by Marty
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Mtr
Fix Released
Undecided
rew

Bug Description

I often use MTR to generate reports to email to other network admins.

On high latency connections there is often false positives for packet loss on the last few hops because the replies don't arrive before the timeout.

It would be nice to have an option to specify a higher timeout when testing high latency connections to obtain an accurate report.

Revision history for this message
Jeremy Chadwick (koitsu) wrote :

I second this request. This makes troubleshooting high-latency paths (such as from the US west coast to Sweden) very difficult, and ends up indicating there's loss when in fact there isn't. Again: this ONLY happens with --report.

Here's a real-life example (same geographical locations as above, actually); this is with ICMP (not UDP mode), by the way, with a report count of 40.

=== Mon Jun 27 03:35:00 PDT 2011 (1309170900)
HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst
  1.|-- 72.20.98.65 0.0% 40 40 0.2 0.3 0.2 0.5
  2.|-- 69.163.64.44 0.0% 40 40 0.3 0.3 0.2 0.4
  3.|-- 38.112.2.73 0.0% 40 40 1.5 2.6 1.5 4.3
  4.|-- 38.20.55.213 0.0% 40 40 90.3 57.4 1.1 194.1
  5.|-- 154.54.1.25 0.0% 40 40 2.5 2.4 2.2 2.8
  6.|-- 154.54.45.77 0.0% 40 40 48.3 48.0 47.9 48.5
  7.|-- 154.54.30.169 0.0% 40 40 53.5 53.5 53.4 53.8
  8.|-- 154.54.0.129 0.0% 40 40 76.9 77.0 76.8 77.3
  9.|-- 154.54.1.134 0.0% 40 40 77.0 76.8 76.6 77.3
 10.|-- 154.54.11.10 0.0% 40 40 71.3 72.1 71.1 95.6
 11.|-- 80.91.248.161 0.0% 40 40 71.3 78.1 71.2 155.4
 12.|-- 80.91.247.118 0.0% 40 40 167.8 176.3 167.8 259.9
 13.|-- 213.155.130.50 0.0% 40 40 177.4 183.5 177.2 256.1
 14.|-- 213.248.66.14 0.0% 40 40 180.0 182.1 179.9 225.1
 15.|-- 213.248.77.186 0.0% 40 40 177.5 183.4 177.4 240.2
 16.|-- 81.228.94.14 0.0% 40 40 178.8 178.9 178.3 179.8
 17.|-- 81.228.79.226 0.0% 40 40 184.4 184.5 183.8 185.1
 18.|-- 81.228.73.227 0.0% 40 40 190.6 193.8 190.6 312.7
 19.|-- 81.228.72.97 2.5% 40 39 194.5 200.1 194.2 283.8
 20.|-- 81.228.75.117 2.5% 40 39 195.0 197.4 194.8 271.6
 21.|-- 81.226.229.103 2.5% 40 39 198.0 199.4 197.5 200.9
=== END

Hops 19-21 indicate 2.5% packet loss (and sometimes this varies, occasionally all the way up to 7.5%), but only when using --report. Standard ICMP ping and traceroute -P icmp (on FreeBSD) never indicate any degree of packet loss. This is purely a misreporting issue with mtr.

I'd provide a patch myself -- honest/really! -- but the code doesn't read well. I imagine the issue is in net.c, possibly the calculation used in function net_loss(), but I'm not entirely sure.

Revision history for this message
rew (r-e-wolff) wrote :

koitsu, No, as the original reporter said: MTR stops looking for replies as soon as all the packets have been sent.

At about 20 hops, each packet would need to be sent 50ms after the previous packet. So with 200ms round trip times, about 4 packets are still in transit when the packet for host "21" is getting sent. Apparently you were lucky that the packet foir host 18, the fourth from the bottom was already received when mtr stopped processing (just after sending the packet towards the 21st host. (on the other hand, it makes sense, as it's rtt is just BELOW 200ms....

There are two ways to solve this. First, and best is to change the main mtr loop to wait some time after "maxiterations" before stopping. This is the

          if(NumPing >= MaxPing && (!Interactive || ForceMaxPing))
            return;

in select.c that has to be modified.

Second would be to patch the reporting side to "guess" how many packets are still "in transit".

While looking for how to help you find the code to fix it, I tought of a good way to fix it, and implemented it. ... Now lets see if I can get mtr compiled....

Revision history for this message
rew (r-e-wolff) wrote :

IMplemented, tested.

Revision history for this message
Brandon Thetford (bthetford) wrote :

What was the fix you came up with?
I didn't see this bug mentioned in the 0.80 diff.

rew (r-e-wolff)
Changed in mtr:
status: New → Fix Released
assignee: nobody → rew (r-e-wolff)
Revision history for this message
Jeremy Chadwick (koitsu) wrote :
Download full text (3.9 KiB)

Confirmed that mtr 0.81 fixes this problem.

Before:

=== Fri Oct 21 06:18:00 PDT 2011 (1319203080)
HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst
  1.|-- 72.20.98.65 0.0% 40 40 0.3 0.4 0.3 0.6
  2.|-- 69.163.64.44 0.0% 40 40 0.4 0.3 0.2 0.4
  3.|-- 38.112.2.73 2.5% 40 39 2.7 2.5 1.4 3.4
  4.|-- 38.20.55.213 0.0% 40 40 1.5 48.1 1.1 205.1
  5.|-- 154.54.1.25 0.0% 40 40 2.5 2.5 2.3 2.8
  6.|-- 154.54.45.77 0.0% 40 40 48.2 48.3 48.1 48.5
  7.|-- 154.54.30.169 0.0% 40 40 53.8 53.6 53.5 54.0
  8.|-- 154.54.0.129 0.0% 40 40 77.0 77.1 76.9 77.3
  9.|-- 154.54.80.182 0.0% 40 40 76.7 76.9 76.7 77.2
 10.|-- 154.54.11.10 0.0% 40 40 71.5 72.7 71.2 106.1
 11.|-- 213.155.131.136 0.0% 40 40 71.5 79.5 71.3 129.8
 12.|-- 80.91.249.20 0.0% 40 40 161.5 166.6 161.4 237.1
 13.|-- 80.91.246.106 0.0% 40 40 172.5 180.5 172.5 327.0
 14.|-- 80.91.246.149 0.0% 40 40 172.8 178.8 172.7 220.6
 15.|-- 80.91.247.97 0.0% 40 40 170.7 173.4 170.1 211.0
 16.|-- 81.228.75.166 0.0% 40 40 176.0 176.0 173.8 178.1
 17.|-- 81.228.79.224 0.0% 40 40 180.3 180.3 179.3 191.1
 18.|-- 81.228.73.229 2.5% 40 39 183.0 182.9 182.8 183.3
 19.|-- 81.228.72.181 2.5% 40 39 191.7 188.0 187.2 196.4
 20.|-- 81.228.73.233 2.5% 40 39 186.3 186.6 186.3 189.5
 21.|-- 81.226.229.103 2.5% 40 39 227.8 233.3 191.7 385.6
=== END

After:

=== Fri Oct 21 06:23:00 PDT 2011 (1319203380)
HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst
  1.|-- 72.20.98.65 0.0% 41 41 0.4 0.6 0.3 7.7
  2.|-- 69.163.64.44 0.0% 40 40 0.5 0.4 0.3 0.6
  3.|-- 38.112.2.73 0.0% 40 40 2.3 2.9 1.3 6.6
  4.|-- 38.20.55.213 0.0% 40 40 1.2 37.3 1.2 199.0
  5.|-- 154.54.1.25 0.0% 40 40 2.6 2.5 2.3 2.7
  6.|-- 154.54.45.77 0.0% 40 40 48.5 48.3 48.1 48.6
  7.|-- 154.54.30.169 0.0% 40 40 53.6 53.7 53.4 54.0
  8.|-- 154.54.0.129 0.0% 40 40 77.0 77.1 76.9 77.5
  9.|-- 154.54.80.182 0.0% 40 40 76.7 76.9 76.7 77.3
 10.|-- 154.54.11.10 0.0% 40 40 72.8 72.2 71.3 99.2
 11.|-- 213.155.131.136 0.0% 40 40 71.4 107.1 71.2 240.3
 12.|-- 80.91.249.20 0.0% 40 40 161.4 167.5 161.4 273.4
 13.|-- 80.91.246.106 0.0% 40 40 225.9 181.3 172.5 284.6
 14.|-- 80.91.246.149 0.0% 40 40 172.7 173.6 172.6 202.7
 15.|-- 80.91.247.97 0.0% 40 40 170.2 172.7 170.1 223.3
 16.|-- 81.228.75.166 0.0% 40 40 175.8 176.5 173.6 189.8
 17.|-- 81.228.79.224 0.0% 40 40 180.1 180.0 179.3 1...

Read more...

Revision history for this message
rew (r-e-wolff) wrote : Re: [Bug 776211] Re: Timeout option to avoid false postives for packet loss in report mode on high latency links

On Fri, Oct 21, 2011 at 01:29:01PM -0000, koitsu wrote:
> Also, something about mtr that has bothered me for a very long time
> (years): can you PLEASE remove ChangeLog? All of your changes are
> documented in the NEWS file in distributions, yet you still have a
> ChangeLog file that contains changes circa 2001-2002. This messes me up
> every time I go to see what changed in mtr. Maybe merge the information
> from ChangeLog into the bottom of the NEWS file? My point is just to
> have *one place* that contains the changes.

OK. Done.

 Roger.

--
** <email address hidden> ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
The plan was simple, like my brother-in-law Phil. But unlike
Phil, this plan just might work.

Revision history for this message
Miklos (spasser) wrote :

This bug is reintroduced into 0.82

Compiled both .81 and .82 and it works properly in .81 - not in .82

Revision history for this message
rew (r-e-wolff) wrote :

Miklos, do you happen to know a target that I can test it on?

Revision history for this message
Stefan (sewi) wrote :

I confirm this bug reoccurs in 0.82.

I complied 0.7x - bug occurred - then downloaded 0.82 in hopes of getting the fix and then some - bug's still there. Only in 0.81 the bug is fixed.

About targets, you could likely mtr your own box, right? You're sitting on an Internet capable computer, after all. Otherwise - the website of the author responds to pings, surely you can arrange something with them.

I'd change the bug's status, but I can't. This definitely isn't fixed in the newest version though.

Revision history for this message
rew (r-e-wolff) wrote :

OK. I'll look into it again.

I probably did something wrong with my private git repository. As it's git, I should be able to dig up the patch again....

Changed in mtr:
status: Fix Released → In Progress
Revision history for this message
Sven Schmidt (svenne-svenne) wrote :

it seems the bug is still be present even in the newest releases. Can this be looked at again please?

rew (r-e-wolff)
Changed in mtr:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.