Question about TCP retransmission timeout - TCP-IP

This is a discussion on Question about TCP retransmission timeout - TCP-IP ; TCP retransmission timeout (RTO) occurs when ACK does not arrives on time, or does not arrive at all! Usually, RTO is not a fixed value, but changes to gain better network performance. Anyway, there is a maximum value for RTO, ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Question about TCP retransmission timeout

  1. Question about TCP retransmission timeout

    TCP retransmission timeout (RTO) occurs when ACK does not arrives on
    time, or does not arrive at all!
    Usually, RTO is not a fixed value, but changes to gain better network
    performance.
    Anyway, there is a maximum value for RTO, that rfc1122 reports equal
    to 240 seconds (4 minutes).
    In the Linux kernel, such max value is fixed to 120 seconds by the
    macro TCP_RTO_MAX.

    I live in China, where country-wide network connection is fast
    enought, but connection across country border is really bad.
    Reliability of network packet is not constant, and usually @#!? low.
    I have tried changing TOS field of outcoming packet, but no sensible
    benefit.

    I often noticed that when the connection get particularly bad, RTO
    approaces its max value. Tcpdump shows no network activity for long
    time, then a new transmission burst.
    In such situation, with 2 minutes timeout, the network becomes not
    usable at all.
    Browser, mail client, ssh client and so on, they close the connection
    for their own internal timeout.

    In Linux, TCP_RTO_MAX cannot be changed by /proc/sys/net/...
    I recompiled the kernel with TCP_RTO_MAX = 10 seconds, and I'm using
    such kernel since already 50 days; never got problems anymore. No
    timeout, no application closing connection, and so on.

    I have noticed nothing against my modification, and I have found no
    suggestion in the web.

    Is there someone that could rise exception to my modification?
    What problem should I face with such "small" value for TCP_RTO_MAX ?

    Thank you!
    Best Regards,
    Antonio Borneo

  2. Re: Question about TCP retransmission timeout

    borneo.antonio@gmail.com writes:

    >In Linux, TCP_RTO_MAX cannot be changed by /proc/sys/net/...
    >I recompiled the kernel with TCP_RTO_MAX = 10 seconds, and I'm using
    >such kernel since already 50 days; never got problems anymore. No
    >timeout, no application closing connection, and so on.


    >I have noticed nothing against my modification, and I have found no
    >suggestion in the web.


    >Is there someone that could rise exception to my modification?
    >What problem should I face with such "small" value for TCP_RTO_MAX ?


    Sure. The classic problem is that if your delays are due to congestion,
    you will, in certain cases, find your connections failing due to maximum
    number of retransmissions. (The numbers in RFC 1122 were not picked at
    random -- they reflected our experience over very congested Internet links
    of the times).

    Another problem is that if the delay gets really long, your round-trip
    time estimator will fail because it cannot get accurate round-trip time
    samples.

    As to whether you'll see either case in your situation, a lot depends on
    where you connect to.

    Also, focusing on the RTO is usually the wrong answer. The goal should
    be to avoid RTO inspired retransmissions whenever possible. Did you
    turn on SACK before tinkering with the RTO? SACK solves the problem
    much better as it retransmits into gaps/losses as the gaps/losses are
    detected and if you're seeing high loss rates, SACK will give a much
    more complete picture of the losses than simple round-trip timeouts.

    Craig

  3. Re: Question about TCP retransmission timeout

    borneo.antonio@gmail.com wrote:
    > TCP retransmission timeout (RTO) occurs when ACK does not arrives on
    > time, or does not arrive at all! Usually, RTO is not a fixed value,
    > but changes to gain better network performance. Anyway, there is a
    > maximum value for RTO, that rfc1122 reports equal to 240 seconds (4
    > minutes). In the Linux kernel, such max value is fixed to 120
    > seconds by the macro TCP_RTO_MAX.


    > I live in China, where country-wide network connection is fast
    > enought, but connection across country border is really bad.
    > Reliability of network packet is not constant, and usually @#!? low.
    > I have tried changing TOS field of outcoming packet, but no sensible
    > benefit.


    > I often noticed that when the connection get particularly bad, RTO
    > approaces its max value. Tcpdump shows no network activity for long
    > time, then a new transmission burst. In such situation, with 2
    > minutes timeout, the network becomes not usable at all. Browser,
    > mail client, ssh client and so on, they close the connection for
    > their own internal timeout.


    How antisocially impatient of them!-)

    Do you have any data showing if the issue is congestion vs corruption
    of the traffic? Are you able to look at statistics on the far sides
    of the connections you are establishing?

    > In Linux, TCP_RTO_MAX cannot be changed by /proc/sys/net/... I
    > recompiled the kernel with TCP_RTO_MAX = 10 seconds, and I'm using
    > such kernel since already 50 days; never got problems anymore. No
    > timeout, no application closing connection, and so on.


    > I have noticed nothing against my modification, and I have found no
    > suggestion in the web.


    > Is there someone that could rise exception to my modification?
    > What problem should I face with such "small" value for TCP_RTO_MAX ?


    Were you tracking your retransmission rates before and after you made
    the source change? If you commonly have "drop outs" in the network of
    > 10 seconds duration you may be shoving useless retransmissions into

    the network.

    Basically you are clipping TCP's ability to back-off in the face of
    congestion. In theory, depending on the depth of queues in routers
    and what not, if all your peers were to do the same thing it could
    lead to "congestive collapse" of the network - where the network is
    completely saturated with nothing but retransmissions of data. That
    leads to the toss-up question of "how long in seconds are most router
    queues these days?" Which is probably about as easy to answer as "How
    long is a piece of string?"

    That's not to say that 240 or even 120 isn't larger than it needs to
    be, but I think you would be well served to see if you can find
    satisfaction at a value of TCP_RTO_MAX larger than 10 seconds.
    Perhaps 60 seconds? IIRC there are a number of reasonably popular TCP
    stacks out there which cap their RTO at 60 seconds.

    rick jones

    one way I suppose to measure the duration of outtages would be to have
    a ping going in the background and look at the length of the gaps in
    the sequence numbers. of course, one could think of such a ping as
    being akin to a very small MSS TCP connection with a TCP_RTO_MAX of 1


    --
    No need to believe in either side, or any side. There is no cause.
    There's only yourself. The belief is in your own precision. - Jobert
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  4. Re: Question about TCP retransmission timeout

    On Nov 7, 10:26*am, borneo.anto...@gmail.com wrote:

    > In Linux, TCP_RTO_MAX cannot be changed by /proc/sys/net/...
    > I recompiled the kernel with TCP_RTO_MAX = 10 seconds, and I'm using
    > such kernel since already 50 days; never got problems anymore. No
    > timeout, no application closing connection, and so on.
    >
    > I have noticed nothing against my modification, and I have found no
    > suggestion in the web.
    >
    > Is there someone that could rise exception to my modification?
    > What problem should I face with such "small" value for TCP_RTO_MAX ?


    What you're doing is conceptually the same as after every packet loss,
    sending the next packet twice. You reduce the impact of packet loss on
    you own traffic and in exchange increase the impact of packet loss on
    everyone else. You are essentially stealing more than "your fair
    share".

    That said, there really is no reason every TCP connection should be
    entitled to an equal rate. If two otherwise equal web servers have
    different numbers of connections, there's no particular reason the one
    with twice as many connections should get twice as much bandwidth.

    You may also run out of retransmissions too soon. If a network link
    breaks for a small amount of time, you may wind up retransmitting too
    many times in that window. If bad luck causes the next retransmission
    to drop due to congestion after the link is up, your TCP connection
    may break needlessly.

    Note that making TCP more aggressive is considered anti-social.

    DS

+ Reply to Thread