New World Symphony

Network troubleshooting example

This page serves as a sample email of what information is helpful and necessary for a general user to deliver to appropriate staff (IT, network operations) when they believe there is a network problem to be resolved.

It is important to perform these tests so that you first identify that the problem is not with your own machine—PC problems and network problems manifest themselves identically. Do the upfront work, and you will arrive at a timely resolution.

NWS gratefully acknowledges Charles Chambers, network planning manager at the University of Houston. NWS first saw this format used by Charles during preparations for an event, and we thought it was a good idea.


From: Andrew Hollis
To: Daryl Hall
Cc: John Oates
Subject: RE: I2-RENATER - NWS-IRCAM : bandwidth troubleshooting

Hi Daryl and John, this information from our test last Friday may be of use to you and RENATER.

 

Synopsis: there appears to be packet loss occuring from IRCAM to AMPATH (NWS).

Describe the problem as concisely as possible.

 

Video app: DVTS available here. We are using OSX Tx and XP Rx.

Note the application you are using

 

Current testing to date:

Note the direction of the test with a brief description. Test every possibly direction to isolate the problem:

 

  1. NWS to IRCAM - 30 Mbps video stream, successful
    Intended application test, near site to far site – looks good.
  2. IRCAM to NWS - 30 Mbps video stream, fail, intermittent but frequent packet loss, (1-3 of 2670/second)
    Intended application test, far site to near site – problems.
  3. NWS to IRCAM - 30 Mbps iperf data stream, successful
    Verify #1 with an iperf test – iperf passes as expected
  4. IRCAM to NWS - 30 Mbps iperf data stream, fail, 0.009% packet loss
    Verify #2 with an iperf test – iperf fails with loss on the order of the loss seen in #2 with the application. This direction is now suspicious.
  5. IRCAM to AMPATH - 30 Mbps iperf data stream, fail, 0.009% packet loss
    Test far site sending to a different location (in this case AMPATH, which is one hop before NWS.) This test reasonably absolves NWS as the issue.
  6. Abilene (Michigan) to NWS - 30 Mbps video stream, successful (always-on multicast from 233.45.17.51:8000)
    Test near site receiving from a different location. This test absolves NWS and AMPATH per test #5. It is reasonable to believe that the problem is occuring before the far site reaches Abilene, and is most likely an issue with the far site.
  7. NWS to NWS internal endpoints – successful
    An internal test should be performed by both sites before any other testing begins. This internal test absolves NWS’s local endpoint as a problem, but this was also determined by test #6. Measure twice, cut once. This test should be done before anything at all, because most problems occur internally directly with the endpoint. You can save a lot of people a lot of time by testing your own machine first.

 

Path traceroutes:
Every path per the completed tests should be identified so that network ops can identify what portion might be an issue.

 

Per the tests above, we know that the problem is occuring somewhere in the highlighted portions of the paths below. It is not an issue with the near site and will therefore require investigation from networks that the path traverses, IRCAM, RENATER, GEANT2, and possibly the Washington node of Abilene (because that Abilene node was not traversed in #7):

 

NWS to IRCAM (test #1)

 

  1    <1 ms    <1 ms    <1 ms  nws-m5.nws.edu [67.17.206.129]
  2    <1 ms    <1 ms    <1 ms  nws-I2gige.ampath.net [198.32.252.201]
  3    13 ms    27 ms    13 ms  abilene-flr-10g.ampath.net [198.32.252.238]
  4    29 ms    29 ms    29 ms  washng-atlang.abilene.ucaid.edu [198.32.8.66]
  5   135 ms   135 ms   135 ms  abilene-wash.rt1.fra.de.geant2.net [62.40.125.17]
  6   143 ms   143 ms   143 ms  so-6-2-0.rt1.gen.ch.geant2.net [62.40.112.21]
  7   224 ms   152 ms   152 ms  so-6-0-0.rt1.par.fr.geant2.net [62.40.112.30]
  8   152 ms   152 ms   152 ms  renater-gw.rt1.par.fr.geant2.net [62.40.124.70]
  9   153 ms   153 ms   153 ms  nri-c-g3-0-0-101.cssi.renater.fr [193.51.187.21]
 10   153 ms   153 ms   153 ms  jussieu-pos1-0.cssi.renater.fr [193.51.180.157]
 11   153 ms   153 ms   153 ms  ircam-g0-1.cssi.renater.fr [193.51.180.201]
 12   154 ms   153 ms   153 ms  acces-ircam.cssi.renater.fr [193.51.182.9]
 13   154 ms   153 ms   153 ms  gw.net.ircam.fr [129.102.254.246]
 14   154 ms   154 ms   154 ms  dvts1.ircam.fr [129.102.1.50]

IRCAM to NWS (test #2)

 1  129.102.65.254      1.142 ms  0.435 ms  0.448 ms
 2  129.102.254.1      0.337 ms  0.252 ms  0.324 ms
 3  129.102.254.245      0.432 ms  1.315 ms  6.613 ms
 4  193.51.182.10      1.361 ms  3.393 ms  13.589 ms
 5  193.51.180.202      2.094 ms  1.555 ms  1.877 ms
 6  193.51.180.158      2.624 ms  2.736 ms  2.379 ms
 7  193.51.187.18      2.550 ms  7.261 ms  9.105 ms
 8  62.40.124.69      4.577 ms  1.700 ms  4.712 ms
 9  62.40.112.29      11.645 ms  17.698 ms  10.994 ms
10  62.40.112.22      22.002 ms  19.096 ms  19.457 ms
11  62.40.125.18      125.468 ms  130.282 ms  125.061 ms
12  198.32.8.65      141.133 ms  142.946 ms  151.726 ms

13  198.32.252.237      154.658 ms  154.665 ms  154.420 ms
14  198.32.252.202      154.694 ms  154.381 ms  154.443 ms
15  67.17.206.157      154.917 ms  154.694 ms  154.469 ms

NWS to AMPATH (~test #5, AMPATH is NWS provider/edge)

  1    <1 ms    <1 ms    <1 ms  nws-m5.nws.edu [67.17.206.129]
  2    <1 ms    <1 ms    <1 ms  nws-I2gige.ampath.net [198.32.252.201]
  3    <1 ms    <1 ms    <1 ms  shelflife.ampath.net [198.32.252.58]

NWS to Abilene (Michigan) (~test #7) Multicast address: 233.45.17.51:8000)

  1    <1 ms    <1 ms    <1 ms  nws-m5.nws.edu [67.17.206.129]
  2    <1 ms    <1 ms    <1 ms  nws-I2gige.ampath.net [198.32.252.201]
  3    30 ms    13 ms    13 ms  abilene-flr-10g.ampath.net [198.32.252.238]
  4    24 ms    24 ms    24 ms  iplsng-atlang.abilene.ucaid.edu [198.32.8.79]
  5    29 ms    29 ms    29 ms  so-0-2-0x1.aa1.mich.net [192.122.183.9]
  6    41 ms    40 ms    41 ms  198.108.23.121
  7    36 ms    36 ms    36 ms  ge-0-2-0x43.aa1.mich.net [198.108.22.241]
  8    37 ms    36 ms    36 ms  v-um-al-inet.r-bin-arb.umnet.umich.edu [192.122.183.46]
  9    36 ms    36 ms    36 ms  pc-bin-arb-seb.r-bin-seb.umnet.umich.edu [192.122.183.194]
 10     *        *        *     141.211.43.83


test #3, NWS to IRCAM iperf, successful:
Iperf is handy because it produces hard statistics that can be copied. It is recommended to send these as an example of the frequency and amplitude of packet loss. This is better than just using the application (in this case streaming video) to identify loss, because most applications do not save this data. Please also check that iperf is first working internally and that losses are not the result of a faulty iperf endpoint or incorrect strings.

[ ID] Interval       Transfer     Bandwidth
[1872]  0.0-60.0 sec   222 MBytes  31.0 Mbits/sec
[1872] Server Report:
[1872]  0.0-60.0 sec   222 MBytes  31.0 Mbits/sec  0.544 ms    0/169373 (0%)
[1872] Sent 169373 datagrams


test #4, IRCAM to NWS (67.17.206.157) iperf, packet loss:

[1916] local 67.17.206.157 port 8000 connected with 129.102.1.51 port 49411
[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
[1916]  0.0- 1.0 sec  3.70 MBytes  31.0 Mbits/sec  0.535 ms 1465662019/ 2828 (5.2e+007%)
[1916]  1.0- 2.0 sec  3.69 MBytes  31.0 Mbits/sec  0.480 ms    0/ 2821 (0%)
[1916]  1.0- 2.0 sec  1 datagrams received out-of-order
[1916]  2.0- 3.0 sec  3.70 MBytes  31.1 Mbits/sec  0.473 ms    0/ 2830 (0%)
[1916]  3.0- 4.0 sec  3.70 MBytes  31.0 Mbits/sec  0.474 ms    1/ 2828 (0.035%)
[1916]  4.0- 5.0 sec  3.70 MBytes  31.1 Mbits/sec  0.473 ms    1/ 2830 (0.035%)
[1916]  5.0- 6.0 sec  3.70 MBytes  31.1 Mbits/sec  0.481 ms    0/ 2829 (0%)
[1916]  6.0- 7.0 sec  3.66 MBytes  30.7 Mbits/sec  0.482 ms    0/ 2800 (0%)
[1916]  7.0- 8.0 sec  3.70 MBytes  31.0 Mbits/sec  0.488 ms    0/ 2828 (0%)
[1916]  8.0- 9.0 sec  3.70 MBytes  31.1 Mbits/sec  0.463 ms    0/ 2831 (0%)
[1916]  9.0-10.0 sec  3.70 MBytes  31.0 Mbits/sec  0.478 ms    0/ 2828 (0%)
[1916]  9.0-10.0 sec  1 datagrams received out-of-order
[1916] 10.0-11.0 sec  3.70 MBytes  31.1 Mbits/sec  0.467 ms    0/ 2830 (0%)
[1916] 10.0-11.0 sec  1 datagrams received out-of-order
[1916] 11.0-12.0 sec  3.70 MBytes  31.0 Mbits/sec  0.461 ms    0/ 2828 (0%)
[1916] 12.0-13.0 sec  3.70 MBytes  31.1 Mbits/sec  0.465 ms    0/ 2830 (0%)
[1916] 13.0-14.0 sec  3.66 MBytes  30.7 Mbits/sec  0.484 ms    0/ 2799 (0%)
[1916] 14.0-15.0 sec  3.70 MBytes  31.1 Mbits/sec  0.470 ms    0/ 2829 (0%)
[1916] 14.0-15.0 sec  1 datagrams received out-of-order
[1916] 15.0-16.0 sec  3.70 MBytes  31.1 Mbits/sec  0.466 ms    0/ 2830 (0%)
[1916] 16.0-17.0 sec  3.70 MBytes  31.1 Mbits/sec  0.472 ms    0/ 2829 (0%)
[1916] 17.0-18.0 sec  3.70 MBytes  31.1 Mbits/sec  0.465 ms    1/ 2830 (0.035%)
[1916] 18.0-19.0 sec  3.70 MBytes  31.0 Mbits/sec  0.460 ms    0/ 2828 (0%)
[1916] 18.0-19.0 sec  1 datagrams received out-of-order
[1916] 19.0-20.0 sec  3.70 MBytes  31.0 Mbits/sec  0.468 ms    1/ 2829 (0.035%)
[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
[1916] 20.0-21.0 sec  3.66 MBytes  30.7 Mbits/sec  0.484 ms    0/ 2799 (0%)
[1916] 21.0-22.0 sec  3.70 MBytes  31.1 Mbits/sec  0.465 ms    0/ 2830 (0%)
[1916] 22.0-23.0 sec  3.70 MBytes  31.1 Mbits/sec  0.479 ms    0/ 2829 (0%)
[1916] 23.0-24.0 sec  3.70 MBytes  31.1 Mbits/sec  0.472 ms    1/ 2830 (0.035%)
[1916] 24.0-25.0 sec  3.70 MBytes  31.1 Mbits/sec  0.466 ms    0/ 2829 (0%)
[1916] 25.0-26.0 sec  3.70 MBytes  31.0 Mbits/sec  0.482 ms    1/ 2828 (0.035%)
[1916] 26.0-27.0 sec  3.70 MBytes  31.1 Mbits/sec  0.467 ms    1/ 2830 (0.035%)
[1916] 27.0-28.0 sec  3.66 MBytes  30.7 Mbits/sec  0.484 ms    0/ 2798 (0%)
[1916] 28.0-29.0 sec  3.71 MBytes  31.1 Mbits/sec  0.464 ms    0/ 2832 (0%)
[1916]  0.0-30.0 sec   111 MBytes  31.0 Mbits/sec  0.640 ms    7/84739 (0.0083%)
[1916]  0.0-30.0 sec  6 datagrams received out-of-order


test #5, IRCAM to AMPATH (198.32.252.58) iperf, identical packet loss as test #4.

[  3] local 198.32.252.58 port 8000 connected with 129.102.1.51 port 49412
[  3]  0.0- 1.0 sec  3.59 MBytes  30.1 Mbits/sec  0.289 ms    0/ 2740 (0%)
[  3]  1.0- 2.0 sec  3.58 MBytes  30.1 Mbits/sec  0.322 ms    2/ 2741 (0.073%)
[  3]  2.0- 3.0 sec  3.59 MBytes  30.1 Mbits/sec  0.316 ms    0/ 2740 (0%)
[  3]  3.0- 4.0 sec  3.58 MBytes  30.1 Mbits/sec  0.295 ms    1/ 2739 (0.037%)
[  3]  3.0- 4.0 sec  1 datagrams received out-of-order
[  3]  4.0- 5.0 sec  3.58 MBytes  30.1 Mbits/sec  0.269 ms    1/ 2739 (0.037%)
[  3]  5.0- 6.0 sec  3.59 MBytes  30.1 Mbits/sec  0.273 ms    2/ 2741 (0.073%)
[  3]  5.0- 6.0 sec  2 datagrams received out-of-order
[  3]  6.0- 7.0 sec  3.58 MBytes  30.1 Mbits/sec  0.326 ms    0/ 2739 (0%)
[  3]  7.0- 8.0 sec  3.59 MBytes  30.1 Mbits/sec  0.312 ms    0/ 2740 (0%)
[  3]  8.0- 9.0 sec  3.59 MBytes  30.1 Mbits/sec  0.325 ms    2/ 2740 (0.073%)
[  3]  8.0- 9.0 sec  2 datagrams received out-of-order
[  3]  9.0-10.0 sec  3.59 MBytes  30.1 Mbits/sec  0.323 ms    0/ 2740 (0%)
[  3] 10.0-11.0 sec  3.58 MBytes  30.1 Mbits/sec  0.328 ms    0/ 2739 (0%)
[  3] 11.0-12.0 sec  3.59 MBytes  30.1 Mbits/sec  0.343 ms    0/ 2740 (0%)
[  3] 12.0-13.0 sec  3.59 MBytes  30.1 Mbits/sec  0.369 ms    0/ 2740 (0%)
[  3] 13.0-14.0 sec  3.58 MBytes  30.1 Mbits/sec  0.329 ms    0/ 2739 (0%)
[  3] 14.0-15.0 sec  3.59 MBytes  30.1 Mbits/sec  0.315 ms    0/ 2740 (0%)
[  3] 15.0-16.0 sec  3.59 MBytes  30.1 Mbits/sec  0.310 ms    0/ 2740 (0%)
[  3] 16.0-17.0 sec  3.59 MBytes  30.1 Mbits/sec  0.303 ms    1/ 2740 (0.036%)
[  3] 16.0-17.0 sec  1 datagrams received out-of-order
[  3] 17.0-18.0 sec  3.58 MBytes  30.1 Mbits/sec  0.301 ms    0/ 2739 (0%)
[  3] 18.0-19.0 sec  3.58 MBytes  30.1 Mbits/sec  0.297 ms    1/ 2740 (0.036%)
[  3] 19.0-20.0 sec  3.59 MBytes  30.1 Mbits/sec  0.328 ms    1/ 2740 (0.036%)
[  3] 19.0-20.0 sec  1 datagrams received out-of-order
[  3] 20.0-21.0 sec  3.59 MBytes  30.1 Mbits/sec  0.345 ms    0/ 2740 (0%)
[  3] 21.0-22.0 sec  3.58 MBytes  30.1 Mbits/sec  0.326 ms    0/ 2739 (0%)
[  3] 22.0-23.0 sec  3.58 MBytes  30.1 Mbits/sec  0.296 ms    1/ 2740 (0.036%)
[  3] 23.0-24.0 sec  3.59 MBytes  30.1 Mbits/sec  0.273 ms    1/ 2740 (0.036%)
[  3] 23.0-24.0 sec  1 datagrams received out-of-order
[  3] 24.0-25.0 sec  3.58 MBytes  30.1 Mbits/sec  0.292 ms    0/ 2739 (0%)
[  3] 25.0-26.0 sec  3.58 MBytes  30.1 Mbits/sec  0.320 ms    0/ 2739 (0%)
[  3] 26.0-27.0 sec  3.58 MBytes  30.1 Mbits/sec  0.337 ms    1/ 2740 (0.036%)
[  3] 27.0-28.0 sec  3.59 MBytes  30.1 Mbits/sec  0.318 ms    0/ 2740 (0%)
[  3] 28.0-29.0 sec  3.58 MBytes  30.1 Mbits/sec  0.265 ms    1/ 2739 (0.037%)
[  3] 28.0-29.0 sec  1 datagrams received out-of-order
[  3] 29.0-30.0 sec  3.58 MBytes  30.1 Mbits/sec  0.302 ms    1/ 2740 (0.036%)
[  3]  0.0-30.0 sec    108 MBytes  30.1 Mbits/sec  0.303 ms    7/82193 (0.0085%)
[  3]  0.0-30.0 sec  9 datagrams received out-of-order

 

Recommended iperf strings to mimic a 30 Mbps UDP DVTS stream:
Tell everyone what iperf commands you were using so that they can duplicate the exact test. This is also because iperf’s default strings cannot handle high UDP bandwidths.

 

Server:
iperf -s -u -l 1372 -w 64k -i 1 -p 8000

 

Client:
iperf -c 198.32.252.58 -u -l 1372 -w 64k -i 1 -t 30 -p 8000 -b 30M -r

 

 

Next step:
Define a next action so that something gets done.

 

Triangulate test between IRCAM, RENATER, NWS (Abilene), and to determine where loss is occuring. NWS is standing by to hear from Daryl and John regarding their availability.

 

 


Specify the individuals that need to be informed of progress and their contact information.

 

Contact at NWS:
Andrew Hollis / email / phone

Contact at IRCAM:
Daryl Hall / email / phone

Contact at RENATER:
John Oates / email / phone