Paul Howard | 12 Nov 2003 23:19
Favicon

Re: Questions about draft-ietf-l2tpext-failover-02.txt

Vipin,

Thanx for your responses.  Rather than inlining further comments, I've tried to summarize (from my perspective at least) where things stand.  I've tried to bring out all of the proposals that we've discussed, but may have missed some in the morass our e-mail has become :-)

It seems that the crux of the discussion boils down to whether to use a separate recovery tunnel for each extant tunnel (as described in the current draft) or to do recovery within each tunnel.   Any solution needs to allow for the possibility of a completely hitless data plane (i.e. 0 data packet loss).   Discussions so far have focused upon a basic 3 way handshake to perform the recovery of the control connection's sequence numbers regardless of whether a separate recovery tunnel is used.

Using a separate recovery tunnel seems to present the following advantages and disadvantages:

Plus:

- Leverages extant reliable packet delivery of L2TP control connection via separate recovery tunnel

Minus:

- To allow parallelism (necessary to recover in a timely fashion and avoid disconnects due to tunnel hello failures), the endpoints must be capable of supporting some number of tunnels over and above their normal operating limits.   Given my implementation currently runs up to 8,000 tunnels, I'd be looking at having to allow transient peaks of 16,000 tunnels for fully parallel recovery.   Using less than a 100% resource reserve and a retry mechanism as you have proposed (thus doing only a subset of the tunnels in parallel) would reduce the resource requirements, but increase the likelihood of tunnels failing due to increased latency in tunnel recovery.
- Unless the semantics are changed to allow two tunnels with the same TIDs, the TIDs of the recovery tunnel must differ from the old tunnel.   This implies re-programming of the data plane for all active sessions and, most likely, at least a brief interruption of data flow.    Your proposal to reset the old tunnel's sequence numbers instead of replacing the old tunnel with the recovery tunnel would address this issue.   It does, however, carry with it additional packet overhead (due to the required shutdown of the recovery tunnel as opposed to the silent abandonment of the old tunnel).

Using the exisiting tunnel to recover from the failover:

Plus:

- No additional resources required to allow fully parallel recovery
- No reprogramming of the data plane required (in the case of a control plane only failover) - thus no interruption of the data flow.

Minus:

- Reliable packet delivery mechanisms of the control connection are unavailable to do the recovery handshake (since one side has lost it's knowledge of Ns and Nr).   This implies that the handshake must be done in an unnumbered mode.   Procedures for reliable delivery and acknowledgement would need to be provided; however, the nature of the 3 way handshake deals with most of these issues.

You mentioned specific concerns about unnumbered mode in your last response.

 - Acknowledgements - A possible approach is that all but the last packet of the 3 way handshake is handled by the next packet in the handshake; the last packet in the handshake delivers the last of the sequence number reset data and thus can be acknowledged by requiring a normal ZLB Ack upon receipt.
- Reliable transmit - It seems that retransmits of outstanding handshake frames pending acknowledgement would handle this issue.
- Transmit queue - The normal transmit queue of the control connection is out of commission pending re-sync of the sequence numbers.   Any packets on this queue would be held pending conclusion of the resync.
- Receive window constraints - The receive window is also out of commission pending re-sync of the sequence numbers thus it's not clear how it could even be applied.   The 3 way handshake does in itself effectively enforce a flow control with an RWS of 1.  This shouldn't be an issue since nothing else can happen on the tunnel's control connection pending resync.
- Bombarded with unnumbered traffic - This could happen with any frame (numbered or unnumbered) and a non-well behaved peer (or hacker).   The control connection would have to do at least some packet examination to discard (but it does for bogus numbered mode frames as well).   If a more efficient discard of such frames was an issue, then an exchange of re-sync cookies as part of the initial control connection setup would allow more efficient discard of bogus frames and allow preliminary validation of a resync request (all unnumbered mode frames would be required to carrry the appropriate resync cookie).
- DOS attack - I'm assuming a hacker with no ability to snoop (if the hacker can snoop, then they can just send the appropriate StopCCN and any protections on the resync mechanism or issues with numbered vs unnumbered mode are moot).   Up to the point of the hacker guessing source IP, dest IP, and TID there don't seem to be any difference in the susceptabilty of numbered vs unnumbered mode.   After this point, numbered mode has the advantage of requiring the proper guess for Ns.   The use of a resync cookie (or more generically a unnumbered mode cookie) in all unnumbered frames would provide an equivalent level of protection for unnumbered mode.

Other outstanding issues that I see in the e-mail thread:

- What to do about frames on the transmit queue of the control connection?    I had suggested renumbering them once the new sequence numbers had been established.   You suggested discarding all of the frames.   I think either approach will work.   I'm concerned about some of the issues arising from a decision to discard - primarily the impact on the protocol layer which can no longer submit frames and expect them to be unconditionally delivered baring a tunnel failure.   Consider an established session and the non failed endpoint has just queued a CDN for transmit when a failover occurs.   Both endpoints think the session is established, but with a discard of the pending transmits, the CDN never gets sent.  Now the FSQ/FSR mechanism would detect this; however, the discard of the CDN may have tossed information of interest to the peer (e.g disconnect cause from RFC 3145).   I guess I don't seem the harm in re-numbering upon resync.   Any frames in the transmit queue are by definition outstanding (some may not even have been sent yet due to flow control).   The frames may or may not have been received at the peer before the failover (and if received may or may not have been remembered).   The worst case is the frame was received and remembered at which point the protocol layer at the peer will get a second copy of the packet (the control connection duplicate elimination won't catch it) and the session will most likely get torn down as a result (since the 2nd packet would arrive while the state machine is not expecting it).   Frames not remembered would continue whatever action was being attempted before the failover without harm.    I suppose that if failover is slow enough, any frames in the transmit queue would be so stale as to be not worth sending.   I question whether this is justification to toss the frames in the transmit queue given that failover may occur fast enough to avoid imposing any staleness on these frames.   A decision to toss outstanding frames means the protocol layer must be adjusted to deal with transmit failure in the absence of tunnel failure - thus e.g. an established session may have to re-submit a CDN.  In one of your comments, you brought up re-transmitting without re-numbering.   I believe this choice will cause the tunnel to fail since such packets will not be acknowledged by the peer causing re-transmission of the packets and eventual tunnel failure.   Having said all of the above, it's probably reasonable that this be an implementation decision (the implementation must either discard the contents of the transmit queue or re-number the frames; the implementation must not transmit without re-numbering).   It may be worthwhile to have a section describing the alternatives and the potential issues with each.

Thanx,

Paul

Vipin Jain wrote:
hi Paul, my response inline..
Reserved Bits: I am in for that. Based on the discussions earlier Mark andwe agreed that if we can get it done by using a control plane message exchangethere is no need to make a L2TP-header change for this.
[pwh4] I'm not sure how to interpret this comment. I agree we don't wantto make changes where we can avoid them, but this needs to be balanced againstproviding a resource efficient solution. By creating the concept of unnumberedframes to do reliable layer signaling then we can have the reliable
layerre-sync the control plane without requiring a separate tunnel to do the resyncand without requiring a tid change. IMHO, this would seem to justify takinga reserved bit to indicate UI.
I don't think introducing the concept of having unnumbered messages and changing header bits is a good idea. It would be bring in following problems: - How do you ack an unumbered message? More importantly how do you relibaly transmit them? Do they take same transmit queue and apply with rx window constrains? - What if a node is bombarded with such unnumbered messages? Are we suppose to interpret each one of them?
Using Ns=0, Nr=0 Scheme: This is flawed because it might naturally coincidewith the sequence numbers of a tunnel thereby confusing this as anacknowledgement for a packet sent earlier. [pwh4] I don't think so. With unnumbered frames, there is no Ns/Nr thusthey
are ignored. I mentioned 0/0 only because they should probably beset to something. The UI bit would be looked at by the reliable layer beforeit looks at Ns/Nr. UI frames would be handled entirely by the reliablelayer and not the protocol layer.
Using Ns=0, Nr=0 when UI bit is set invites interpreting any message with Nr=0, Ns=0 and UI bit set - Does it invite a DoS attack?
One negativeto this approach is that it would require the systems to have the resourcesavailable to at least temporarily manage the additional tunnels - this mightimply being able to temporarily peak at double the normally supported numberof tunnels. If the non-failed endpoint was at it's maximum number of tunnels,how would it know to make an exception for the failover tunnel setup?
And therefore we would RECOMMEND keeping space for at least tunnel for recoverypurposes. If it can't establish the tunnel, then it could retry 'x' timesbefore concluding the tunnel recovery mechanism failed. This is much betterthan current proposal where we'd establish one new tunnel for every old
tunnel.
[pwh4] If we want recovery to proceed expeditiously (and we really only havea
maximum of 1 hello timeout plus max retransmit timeout), then we reallyneed to be able to recovery tunnels in parallel. This could be very resourceintensive. It doesn't seem likely that a large number of tunnels will necessarilybe coming from the same peer. I typically see not more than a small handful(say 3) tunnels coming from the same peer to allow for different servicepolicies. With this scenario, I'd need an additional 33% resources to recoverthe tunnels in parallel - that's a rather steep price to pay.
Parallel recovery was definitely one of the design goals (Appendix A.2); So inline with your thinking if we wish to recover we'd need to reserve the resources. Having a three-way dialogue (to reset one another's control plane) is a MUST. Now, doing that on the existing tunnel is what we are evaluating. My proposal is that using the existing mechanism in the draft if we reset the old tunnel's control plane thereby keeping the data plane hitless (because old tunnel-id is intact) is something workable and fits in existing constructs of tunnel establishment, including individually authenticating peers upon restart.
Howdoes the transition to the new sequence number occur? I presume we handoff the new sequence numbers to the reliable layer andit then purges it'sre-ordering rx queue, renumbers any outstanding,transmits and immediatelyre-transmits them? Any stale receives get automatically discarded
as outof window. - Renumbering outstanding transmits might create more unpredictability for nobenefit. Because control plane on the failed node would have lost the contextrelated to previous messages, it is better to flush everything off and
startwith new sequence numbers.
[pwh4] I disagree. Say the outstanding transmit is an ICCN. The failednode may or may not have remembered sending the ICRP. By re-sending theICCN, we allow the failed node at least the option of continuing the setup.
The target of the draft was to recover only the sessions that were in established state. This means if an endpoint does not keep track of session's intermediate state (i.e. ICRP sent, or awaiting ICCN) then it doesn't matter if the other sends an ICCN or not, it would be discarded upon control plane restart for the tunnel.
The state machines at the protocol layer should already be capable of handlingan unexpected ICCN (or for that matter any unexpected packet). If we wereto throw away these packets, then we're potentially requiring additionalcomplication at the protocol layer.
Agre; State Machine should be able to handle any packet in a life of a session or a tunnel. Regarding additional complication: - I think section 2.3.4 addresses the inconsistency among sesison states. - Renumbering the exisitng messages in transmit queue is not going to eliminate the conditions that could result in the situations described in 2.3.4, so that needs to be there anyways. Then why bother resending these messages? - If we are reovering only the sessions that were in established state then there is no need to retransmit messages for situations that could be handled otherwise by defined mechanisms.
I don't know how your protocol layeris implemented, but mine treats the reliable layer as a pipe. What I putinto it is guaranteed to be delivered or I get a tunnel failed indication. It doesn't seem that I'd really want a tunnel failed indication here, butthat would be my only choice if the reliable layer threw away a packet thathad been submitted for transmit.
>From what I have seen, the reliable layer typically is like a pipe which will either reliably deliver a packet or provide with a tunnel failure indication. Therefore it makes doesn't make a difference if we renumber them or not from delivery perspective. Once queued they'll be delivered. But my point was - why even remark their sequence numbers?
The old tunnel becomes active only upon getting confirmation from its peer thatit has reset control plane sequence numbers. So this would have to be a threeway handshake as described below. For example, for an old tunnel:- Failed node sends: "Reset Nr to 5665 for Old Local Tid = 9, Old Tid = 6".- Non Failed node responds: "Nr for Old Tid = 9 reset to 5665, Reset Nr to 4435for the same tunnel (i.e. Old Local Tid = 6, Old Tid = 9 from this node'sperspective)". Failed endpoint upon getting this message must first enqueue theresponse of this message and then start sending control messages on the Oldtunnel.- Failed node sends: "Nr for Old Tid = 9 reset to 4435". Non
failed node upongetting this can start sending control messages.
[pwh4] I presume in the case where control messages with the new sequencenumber space start arriving at the non-failed node before the final resyncmessage, the messages get discarded as OOW. The failed node will then
re-transmitand they'll be accepted once the non-failed node gets the final resync message.
What we discuss above could work. However, to make it simpler I think simply resetting the control plane of old tunnel would be good enough. thanks, -- vipin e __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree


Gmane