Jonathan Lynam | 15 May 2003 05:14

Re: AD comments on draft-ietf-l2tpext-l2tp-base-07.txt

Ignacio Goyret wrote:
> At 17:54 5/14/2003 +0200, W. Mark Townsley wrote:
> 
>>>MUST use exponential backoff on retransmission (SHOULD is not
>>>sufficient). Same for congestion control/slow start.
>>
>>OK
> 
> 
> I disagree with MUST. If you are going to put a MUST in this context,
> it shouldn't be with 'exponential'. Our experience with L2TPv2 has shown
> us that the algorithm shown in L2TPv2 is overly passive and adds too much
> unnecessary delay. Remember that you have to establish the control connection
> within the barely 30 seconds that LCP may wait.
> 
> I can agree to something like this: "MUST use a backoff retransmission
> algorithm that MAY/SHOULD be of exponential nature".

Experience leads me to agree strongly with Ignacio - exponential backoff 
should be a MAY.

In my opinion, TCP's exponential backoff requirement was misapplied to 
L2TPv2. The demands placed on the tunnel's control channel for session 
setups makes it necessary to be able to aggressively retransmit control 
messages to keep the tunnel viable even as data flowing through the 
tunnel is congesting the path. I also believe that the case for using 
TCP's exponential backoff (esp. without TCP's other mechanisms to 
prevent timeouts) has not been made very well.

The L2TP control channel has undeniable differences from TCP in 
practice. TCP's retransmits consume large amounts of bandwidth because 
typically they retransmit entire MTU-sized segments of data. This 
happens fairly often because the most bandwidth-significant protocols 
running over TCP simply move bulk data as fast as possible. Usually TCPs 
are the cause of congestion experienced, and their self-imposed flow 
control helps keep the network available.

L2TP's messages are small, often time-sensitive, and critical for the 
operation of the tunnel as a whole (lots of sessions). The total 
bandwidth used by the control channel is normally a tiny fraction of the 
total data transmitted through the tunnel. Retransmitted L2TP control 
packets don't "add fuel to the fire" to any extent resembling TCP 
retransmits - there's simply not enough of them, they are small, and 
L2TP's passive retransmission policies ensures that there is at least 1 
second before a message is retransmitted.

In the real world, A backoff of N seconds is essentially a service 
outage of N seconds if there is only one tunnel. In today's deployments, 
SP's are starting to load more than 30K sessions per chassis, many have 
tested up to 100K. A complete stall in the session bringup rate can have 
a ripple effect even when the system is carefully designed for scaling.

A single packet drop between LAC and LNS will hamper session setup rates 
for that second. This is has a much more averse impact on the service as 
a whole than delaying data for a single TCP session.

The SP's network must be readily capable of sustaining the maximum 
amount of *control traffic* that is dictated by the session setup rate 
they want. If there is congestion, let data traffic flow-control itself, 
don't penalize the control channel; it doesn't help and it just makes 
the session setup rate slow to a crawl. It's like taxing the poor as 
much as the rich.

Of course backoff is overall a reasonable and good thing, but L2TP's 
scheme needs to be beefed up a bit, especially considering that the 
tunnel itself is carrying flows that don't flow-control themselves at 
all (or don't do it well). Retransmits based on a computed RTT would be 
good but the implementation expense is high. I would like to allow 
retransmits to be allowed after, say, 400ms and encourage the use of TCP 
mechanisms like fast retransmit (and others).

I'll spare the group my lengthier (but more specific) diatribes on the 
subject unless someone disagrees and needs to read them.

Jonathan Lynam
Redback Networks

> 
> 
> 
>>>Killing a connection after only 5 retransmission seems aggressive. Is
>>>this the right default value to recommend??
>>
>>It's what we had for v2, and at least some implementations have increased this 
>>value. So, I agree that it should be something less aggressive. If others do, I 
>>don't see why we can't change the recommendation here.
> 
> 
> Extending beyond 5 retransmissions is a problem if you use the algorithm
> shown in 2661 and this draft: 1 sec + 2 secs + 4 secs + 8 secs + 8 secs + 8 secs
> equal 31 seconds, 1 second beyond the default LCP limit.
> After this, it doesn't make sense retransmitting again because chances
> are that the call would drop anyway.
> 
> 
> 
>>>   (c) LNS-LNS Reference Model: This model has two LNSs as the LCCEs.  A
>>>   user-level, traffic-generated, or signaled event typically drives
>>>   session establishment from one side of the tunnel.
>>>
>>>This model could be better explained/motivated. When does one use it?
>>>how is it different than LAC/LAC? Presumably you mention it in the
>>>document because there are features in the l2tp protocol to support this?
>>
>>LNS - LNS is somewhat common in L2TPv2, even though it wasn't outlined 
>>specifically as such in RFC2661. It is sometimes referred to as a "voluntary 
>>tunnel" from a CPE device, or a tunnel initiated from a "LAC Client" on a host. 
>>In L2TPv3, we tried to make this at least sound a bit more peer-to-peer.
>>
>>When would it be used? Imagine an LNS that accepts connections from a variety of 
>>sources. Some of those sources are from CE devices, some from PE devices. The 
>>PEs typically have an attachment circuit to a CE, sold as a leased-line service. 
>>The CEs have no need for this, so they virtualize the attachment circuit 
>>interface. The CE looks like an LNS here since it has no L2 attachment circuit 
>>for the L2TP session to cross-connect to. Instead, it is routing packets at L3.
> 
> 
> 
>>>>  Each subsequent retransmission of a message MUST employ an
>>>>  exponential backoff interval.  Thus, if the first retransmission
>>>>  occurred after 1 second, the next retransmission should occur after 2
>>>>  seconds has elapsed, then 4 seconds, etc.  An implementation MAY
>>>>  place a cap upon the maximum interval between retransmissions.  This
>>>>  cap MUST be no less than 8 seconds per retransmission.  If no peer
>>>>  response is detected after several retransmissions (a recommended
>>>>  default is 5, but SHOULD be configurable), the control connection and
>>>>  all associated sessions MUST be cleared.
>>>
>>>
>>>Is the default of 5 reasonable in practice? Seems pretty aggressive to
>>>me.
>>
>>Agreed.
> 
> 
> It has to be aggressive, if you have any hope of actually completing
> the tunnel before the PPP call drops due to LCP timeouts (~30 seconds).
> See above.
> 
> 
> 
>>>>  In addition, a peer MUST NOT withhold acknowledgment of messages in
>>>>  order to maintain state in the L2TP state machine.  Conversely, the
>>>>  L2TP state machine MUST be capable of maintaining state if a ZLB ACK
>>>>  is received in response to a control message.  However, determining
>>>>  when a state should no longer be maintained (e.g. how long to wait in
>>>>  wait-reply state for an ICRP from the peer) before destroying a
>>>>  session or control connection is an issue that is left to each
>>>>  implementation.
>>>
>>>
>>>Leaving this implementation dependent is inconsistent with previous
>>>text about MUST keep state around for the full retransmission
>>>interval.
>>
>>No, it's supposed to be two different things (though this is obviously not 
>>entirely clear). Maintaining control connection state after a StopCCN is the 
>>1+2+4+8+8... period of time mentioned above. Maintaining state after an ICRQ is 
>>sent, acknowledged by a ZLB ack or other control message, and waiting for an 
>>ICRP to be received, was out of scope here.
>>
>>I agree that this needs tightening up.
> 
> 
> Mark, may be it is high time to separate the transmission layer from the
> control layer. Then, the timeouts and other interactions become more
> clear and easy to understand.
> 
> 
> 
>>>>5.3  Hiding of AVP Attribute Values
>>>
>>>
>>>verify with security ADs that this is OK.
>>
>>OK, but know that there is no change here at all from RFC2661.
> 
> 
> Note that the intention of "hiding" is not the same as "encrypting".
> The only intention here is to make the AVPs non-clear-text so network
> sniffers can't read it directly. If you want true encryption, use IPSEC.
> 
> 
> 
>>>>Control Connection Tie Breaker (SCCRQ)
>>>
>>>
>>>I don't quite see the need for this. If running over IP, why not just
>>>use the IP addresses to break ties?
>>
>>I suppose you could. However, this AVP does two things. (1) indicate that there 
>>is a desire to limit the number of tunnels between two peers by tie breaking, 
>>and (2) provide the value to break the tie with. So, you would need the AVP 
>>anyway. I suppose you could just use the IP address to say who wins. Both should 
>>work. Do you have a strong objection here? What about for IPv6?
> 
> 
> Detail: you should use (IP addresses + UDP ports) for matching, if using UDP/IP,
> not just the IP address.
> 
> 
> 
> 
>>>>Host Name (SCCRQ, SCCRP)
>>>>
>>>>  The Host Name AVP, Attribute Type 7, indicates the name of the
>>>>  issuing LAC or LNS.
>>>>
>>>>  The Attribute Value field for this AVP has the following format:
>>>>
>>>>   0                   1                   2                   3
>>>>   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>>>>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>  | Host Name ... (arbitrary number of octets)
>>>>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>
>>>>  The Host Name is of arbitrary length, but MUST be at least 1 octet.
>>>>
>>>>  This name should be as broadly unique as possible; for hosts
>>>>  participating in DNS [RFC1034], a host name with fully qualified
>>>>  domain would be appropriate.  The Host Name AVP and/or Router ID AVP
>>>>  MUST be used to identify an LCCE as described in Section 3.3.
>>>
>>>
>>>need to be more clear about the encoding of the DNS name. Just using
>>>strings can lead to interoperability issues.
>>
>>Suggestions? Have a pointer to proper DNS formatting?
> 
> 
> Actually, the DNS reference is just a suggestion. Any string can be used
> here and things will work fine.
> 
> _______________________________________________
> L2tpext mailing list
> L2tpext <at> ietf.org
> https://www1.ietf.org/mailman/listinfo/l2tpext

Gmane