Minutes IETF116: rtgwg: Fri 00:30
minutes-116-rtgwg-202303310030-00
Meeting Minutes | Routing Area Working Group (rtgwg) WG | |
---|---|---|
Date and time | 2023-03-31 00:30 | |
Title | Minutes IETF116: rtgwg: Fri 00:30 | |
State | Active | |
Other versions | markdown | |
Last updated | 2023-04-12 |
IETF 116 RTGWG Minutes
Chairs: Jeff Tantsura (jefftant.ietf@gmail.com)
Yingzhen Qu (yingzhen.ietf@gmail.com)
WG Page: https://datatracker.ietf.org/group/rtgwg/about/
Materials: https://datatracker.ietf.org/meeting/116/session/rtgwg
17:30-18:30 - Monday Session IV, March 27
0. Meeting Administrivia and WG Update
Chairs (10 mins)
=============================================
WG document Update
=============================================
1. YANG Models for Quality of Service (QoS)
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-qos-model/
Aseem Choudhary (10 mins)
No questions at the end of the presentation.
=============================================
Individual draft
=============================================
2. Considerations for Protection of SR Networks
https://datatracker.ietf.org/doc/draft-liu-rtgwg-sr-protection-considerations/
Yisong Liu / Changwang Lin (10 mins)
-
Jeff Tantsura: What's your plan for this document? You listed all
protection techniques. -
YingZhen: The draft is informational. It doesn't seem like you need
any extension -
Weiqiang: the draft is about deployment of protection. all the
components have been discussed in the relevant groups. This draft is
only for information only. -
Greg Mirsky: Slide #6: it is not clear what is the purpose of using
BFD session. BFD doesn't provide information about the locator SID,
it only verifies the forwarding path. I see it is over complicating
the OAMs -
Jeff T: you can add some deployment considerations, when to use
which protection technology. -
Weiqiang: we give the full picture of the all possible solutions.
not necessarily all of them are needed. some operators might only
need a subset. We will optimize the OAM. We think the document is
mature enough. We would like WG adoption.
3. Scenarios and Challenges of Overlay Routing for SD-WAN
https://datatracker.ietf.org/doc/draft-sheng-rtgwg-overlay-routing-requirement/
Hang Shi / Cheng sheng (10 mins)
-
Jeff T: you can use BGP for multicast, which has been there for many
years. Deployment of BIER is non trivial, and it's not supported by
silicon. What you described is vanilla SDWAN deployment for years,
looking forward to your protocol enhancement. it is not clear on
what you want to do. What is your next step? -
Shi Hang: This is a requirement document. We are looking for
feedback and see if there are interests to collaborate. If so, we
will propose solution based on this requirement. -
Tony P: The BIER BGP extension draft is under WGLC in IDR. If you
need any changes, please go to the relevant WGs to propose the
extension.
4. Signaling In-Network Computing operations (SINC)
https://datatracker.ietf.org/doc/draft-zhou-rtgwg-sinc/
Signaling In-Network Computing operations (SINC) deployment
considerations
https://datatracker.ietf.org/doc/draft-zhou-rtgwg-sinc-deployment-considerations/
Zhe Lou (20 mins)
-
Adrian Farrel: wearing my CATS chair hat. I try to see the
difference between this and CATS. I see this as overlay as well, but
it's on path computation between two end points. It is similar to
SFC. CATS is about A-> B, and B does the calculation. CATS is about
which B to use and how to get to it. -
Zhe: Correct. We try to define what is within the network.
- Adrian: How does transit node know there is SINC header under the
encapsulation? - Zhe: transit nodes don't know. They just pass the traffic. The SINC
capable node will find the header. - Adrian: so the SINC capable node are deep parsing.
- Jeff Tantsura: You are define the collective capability. The
operation stateful, you're not really building routes, you're
building trees. How do you signal when operation starts and end?
resiliency? Collective operation can take long time, like large
language models. I'm expecting more from the document. We're looking
at collective tree operation than just encapsulation. - Zhe: We start from fixed domain, like DC, so in a controlled
environment. - David Lamparter: You should focus on the characteristics of
computation (primarily: processing single packets vs. aggregation),
ignore what computation itself actually is. Separate problems. Just
describe what different cases you are talking about. - Zhe: Routers should announce its capability. It should be put
somewhere else.
Chat History
- Jim Uttaro
00:26:44
Along with scalability it would be helpful to understand the
operational complexity
=======================================================
09:30--11:30 - Friday Session I, March 31
0. Welcome and Introduction
Chairs (5 mins)
Agenda bashing
- David Lamparter: I do not believe the BGP Blockchain draft has
sufficient merit to be worth our time here, would just like this to
be recorded for future sessions.
=============================================
Individual drafts
=============================================
1. Routing on Service Addresses
https://datatracker.ietf.org/doc/draft-trossen-rtgwg-rosa/
Dirk Trossen (15 mins)
-
David Lamparter: Extension Header needs clarification, maybe for
next presentation. -
Dirk: you can find some technical details in section 7.
-
Aijun Wang: SAR needs to have a full table, no? How does it get the
information? -
Dirk Trossen: Problem is understood but the routing table is limited
to the services a ROSA domain serves, thus not ALL services of the
Internet. -
Aijun: ICNRG has some work going on but not depending on IP network.
-
Dirk: This one is to run over IP network.
2. BGP Blockchain
https://datatracker.ietf.org/doc/draft-mcbride-rtgwg-bgp-blockchain/
Dirk Trossen (10 mins)
-
David Lamparter: All of the use cases have existing authorities that
control the topic at hand. Applying distributed consensus into that
is entirely useless. -
Dirk Trossen: Trying to do permissioned DCS, not permissionless.
-
Q Misell: The cryptographic parts would require buy-in from e.g.
RIRs, have you been in contact with them? -
Dirk Trossen: Still trying to figure out, need to facilitate
discussion somehow. -
Andrew Alston: (personal) Answer to facilitating discussion: take
this to the IRTF. Too early, not even vaguely ready for
standardization, it's a research topic, better suited to IRTF. -
Rüdiger Volk: Not seeing a direction what this is trying to tackle,
kitchen sink of problems that are raised once in a while by people
using BGP. Better cut down to specific problems. Possible cyclic
dependency with the network operating itself. -
Dirk Trossen: ACK on Andrew's comment, will be taking this there.
-
Jeff T: speaking for myself. I support to get this to IRTF than
IETF.
3. Protocol Assisted Protocol (PASP)
https://datatracker.ietf.org/doc/draft-li-rtgwg-protocol-assisted-protocol/
Zhen Tan (10 mins)
-
Greg Mirsky: Characterization of problems only, no
notification/propagation of the failure information? -
Zhen Tan (ZT): Protocol for gathering information on a device. It
helps to locate problems on the internet. -
Greg Mirsky: Will devices keep some history log of events? (ZT: Yes)
How long? -
Zhen Tan: Vendor dependent
-
Greg Mirsky: So, possible data will already be gone when operator
looks at it? (ZT: Yes) Notification would allow data to be stored
elsewhere. -
Zhen Tan: We have notifications, like in use case #2. There will be
pre-configurations about sending notifications. -
Aijun Wang: Protocol already knows reason for failure, what is the
benefit of having a separate protocol? -
Zhen Tan: This does not need to keep another connection. PASP uses
UDP, the connection is on demand. -
Aijun Wang: OK, need to understand reasons protocol itself can't do
this. is PSAP a on-demand protocol? -
Zhen Tan: yes.
-
Adrian Farrel: You said RSVP, did you mean RSVP-TE? (ZT: Yes) This
might not be as applicable, RSVP-TE already has mechanisms for
collecting and propagating fault information, e.g. RFC 4873. -
Zhen Tan: Goal is to have one way to get the information for all
protocols, otherwise hard to gather information.
6. Routing in Dragonfly topologies - problem space and solutions
Dmitry Afanasiev (20 min)
-
Tony Przygienda: Add path will cause massive path hunting, you need
this (tunneling?) You need to shout everything off, it's a broadcast
domain. Dynamic routing will be too slow, had that discussion
before. Shifting traffic based on congestion is a viable thing,
outside the realm of building a routing protocol that can keep up.
Doable? Yes, look at DAR (dynamic adaptible routing), was looking at
previous traffic, statistic on success. Works stunningly well. For
dragonfly, when the network grows big, you will have to tunnel to
keep 3-hops, then you will have to broadcast it and recompute.
Flooding & broadcasting approach versus reactive shifting flows
around. Reactive might be better to keep up. -
Dimitry Afanasiev: Using VRFs rather than Tunnels, but yes. Adaptive
routing works in milliseconds. -
Tony Przygienda: Amorphous Broadcast domain?
-
Tianji Jiang: Reminiscent of previous work 15 ears ago by Brocade,
L2 was done using trill, has that been looked at? -
Jeff Tantsura: A lot of development happening on this topic
recently, some of it needs to happen at IETF.
4. Tactical Traffic Engineering (TTE)
https://datatracker.ietf.org/doc/html/draft-li-rtgwg-tte-00
Colby Barth (15 mins)
-
Greg Mirsky: This is to be monitored on a link level, not path
level? (CB: yes) Detection of congestion happens on egress to link,
action is supposed to be taken by ingress (upstream node)? -
Colby Barth: Action is to be taken at the point of (local) repair,
that would otherwise act. Congestion is detected on a node's
outgoing interface, which also serves as the repairing node. -
Greg Mirsky: So, monitoring outgoing interface, and taking action on
that. Not monitoring incoming queue, rather outgoing. (CB: yes) Not
taking notification from sources of traffic? (CB: Yes.) Action is
local, so effect overall on other flows cannot be considered, right? -
Colby Barth: Yes, only flows transiting the affected node can be
considered. Not attempting to come up with a global fix. -
Himanshu Shah: This is doing TI-LFA on a congested outgoing link,
that has been done before. Isn't this just a local implementation
that doesn't need an IETF specification? -
Colby Barth: Absolutely correct, this is an informational draft,
it's a local node decision. -
Himanshu Shah: What happens to other ongoing traffic, won't this
make things worse elsewhere? -
Colby Barth: The example uses TI-LFA & tunnels, but other mechanisms
can be used. We call it TTE tunnels in the draft. -
Himanshu: e2e tunnels are precalculated, better switched to those
then using ti-lfa. -
Zhenbin Li: for TE tunnels, if you change in the middle, will this
cause packets out of order? -
Colby Barth: Are you asking about this causing out of order packets?
(Yes) -
Colby Barth: Typically the hashing algorithms should be flow based
which should alleviate problems. -
Tony Li: Delay change is more significantly and will cause
congestion control impact. There is performance impact. -
Jeff Tantsura: Other similar approaches suffer a lot from their
locality of action and may cause downstream congestion. Adding some
non-local decision might help.
5. Requirement of Fast Fault Detection for IP-based Network
https://datatracker.ietf.org/doc/draft-guo-ffd-requirement
Framework of Fast Fault Detection for IP-based Networks
https://datatracker.ietf.org/doc/draft-wang-ffd-framework
Haibo Wang (20 Mins)
-
Greg Mirsky: Terminology - mechanism used isn't failure detection,
more about failure notification? Using other mechanisms, e.g. BFD,
to detect defect? what you described is about propagating the
information in management plane. -
Haibo Wang: Yes, other mechanisms in use in parallel, but also don't
want to run heavyweight things on endpoints. -
Greg Mirsky: Motivation is the large delay in discovery (>10s)?
-
Haibo Wang: It's based on keep-alives. In some scenarios it's much
longer, 5-15 seconds, or 15 mins. -
Greg Mirsky: Hints to me that there's some OAM mechanism missing. It
seems to me not a good design. -
Zhenbin Li: Overlap with CATS working group, coordination?
-
David Black: For an unconverged failure, how do you detect that the
failure is unconverged or converged? Network examples on slides are
simple and obvious - how would determination that a failure is
unconverged be made for a more complex network such as the dragonfly
networks described in item 6 earlier in the meeting?
(Communication issues at this point. To continue on list.)
===========================================================
Side Meeting Update if time allows
============================================================
7. APN Update
https://datatracker.ietf.org/doc/draft-li-apn-problem-statement-usecases/
https://datatracker.ietf.org/doc/draft-li-apn-framework/
Zhenbin Li/Shuping Peng (10 mins)
- Joel Halpern: Presentations of unchartered side meetings do not seem
appropriate for this working group; next one on agenda seems to have
the same problem. Please try to get to a problem statement we can
progress on. Frustrated with the structure.
8. Summary of GIP6 Side Meeting
Hongyi Huang/Qiangzhou Gao (5 mins)
presentation skipped, not enough time
Chat History
Louis Chan
00:18:57
For ROSA, is there development requirement for client application?
David Lamparter
00:24:45
There seems to be no note-taker, I've hopped in but I'm a bit
multitasking-limited as I have comments to make too :)
David Lamparter
00:24:59
(or is someone taking notes outside the pad?)
Jeff Tantsura
00:25:32
David - hope you could do it
David Lamparter
00:26:31
I'll try my best. Would still appreciate if you could ask the room if
someone else wants to share the load.
David Lamparter
00:27:24
https://notes.ietf.org/notes-ietf-116-rtgwg?edit
Andrew Alston
00:36:27
I cannot see how this is vaguely ready to look at in terms of
standardization - I can see how someone may wanna try and do some
research on this in the irtf - maybe
Yingzhen Qu
00:39:57
https://notes.ietf.org/notes-ietf-116-rtgwg?both
Yingzhen Qu
00:40:19
Please contribute to notes
David Lamparter
00:40:37
(Uh. That comment was very disingenious. "Just asking questions. Can't
take questions to the IETF?" … you asked your question, you just didn't
like the answer. I'll point this out to Dirk after the session.)
Anthony Somerset
00:47:42
MD5 is not considered secure anymore surely?
Jeff Tantsura
00:48:37
for quite some time
Joel Halpern
00:49:33
This PASP thing seems to be addressing an already multiply-solved
problem.
David Lamparter
00:54:59
The agenda copied into notetaking pad doesn't match the room… I assume
the agenda in the notetaking pad wasn't updated for some rescheduling
Yingzhen Qu
00:59:18
@David. you're right, I got the presentation sequence wrong, this is
supposed to be #6. Sorry about that
David Lamparter
01:00:18
OK, no problem, I was just confused in the notes for a moment :)
Hesham ElBakoury
01:07:24
When sinc will be presented?
Yingzhen Qu
01:08:25
@Hesham, SINC was presented on Monday
Greg Mirsky
01:20:50
voice is breaking. Perhaps not using video feed might help
John Scudder
01:21:20
Audio seems better now. I assume it was affecting everyone and not just
those of us onsite?
David Black
01:21:35
Yes, affected me - remote.
Jeff Tantsura
01:23:44
me too
Tony Li
01:30:48
There's no signaling at all. Nothing to interoperate.
David Black
01:33:33
Still have an opportunity to misorder when an in-progress flow is
switched to another path.
Tony Li
01:34:58
Misordering is more likely when deactivating a prefix. You're moving a
flow from a presumably suboptimal path back to an optimal one.
Tony Li
01:35:22
In any case, ordering and latency are possible issues ANY time we change
the routing table.
Shaofu Peng
01:36:53
Hi Tony, In the absence of a central orchestration of controller, when a
node in the network implement local path switch, they cannot perceive
the impact on how much traffic will be affected, which may lead to
congestion on a link in the new path. Of course, it is exactly difficult
to learn how much traffic will be affected, but if we have that
knowledge, that will be more perfect.
Jeff Tantsura
01:38:31
@Tony - you might consider using a similar strategy as with adaptive
routing/DLB and move the flow only if the interpacket gap is large
enough not to cause reordering, it is somewhat less of an issue in the
WAN (perceptionally) than in DC, but still something to think about
Tony Li
01:38:54
No argument. One of the more intensive ways of using this technique is
also to monitor per-prefix traffic levels and decide to select prefixes
to balance bandwidth utilization.
Tony Li
01:39:35
@Jeff we're not too worried about this, given that the alternative is
packet loss.
Tony Li
01:39:51
But we don't want to thrash, either.
Jeff Tantsura
01:40:09
absolutely, rebalancing usually yields better results than binary on/off
Shaofu Peng
01:40:42
IMO misorder is out the scope of this proposal...
Tony Li
01:44:44
It's not really out of scope. It's more that it's the lesser of two
evils. :-)
Shaofu Peng
01:48:32
Agree, I just think that it is another local behavior, similar to LFA,
TI-LFA, and previously, we have not raised any concerns about the
disorder of these local behaviors. This issue is addressed by other
technology.
David Lamparter
01:54:58
I'm incredibly confused [by the discussion, not the problem David Black
describes], not sure what to put in the notes here.
John Scudder
01:56:13
I think David's point is very well-taken. If this problem (insofar as I
understand what the speaker is trying to do!) were easy, it would
already have been fixed. If there are low-hanging fruit special cases,
then identify them and make the case that they're worth addressing, but
I don't think that's been done.
Jeff Tantsura
01:56:52
+1 John