Skip to main content

Minutes IETF116: rtgwg: Fri 00:30
minutes-116-rtgwg-202303310030-00

Meeting Minutes Routing Area Working Group (rtgwg) WG
Date and time 2023-03-31 00:30
Title Minutes IETF116: rtgwg: Fri 00:30
State Active
Other versions markdown
Last updated 2023-04-12

minutes-116-rtgwg-202303310030-00

IETF 116 RTGWG Minutes

Chairs: Jeff Tantsura (jefftant.ietf@gmail.com)
Yingzhen Qu (yingzhen.ietf@gmail.com)

WG Page: https://datatracker.ietf.org/group/rtgwg/about/
Materials: https://datatracker.ietf.org/meeting/116/session/rtgwg


17:30-18:30 - Monday Session IV, March 27


0. Meeting Administrivia and WG Update

Chairs (10 mins)

=============================================

WG document Update

=============================================

1. YANG Models for Quality of Service (QoS)

https://datatracker.ietf.org/doc/draft-ietf-rtgwg-qos-model/
Aseem Choudhary (10 mins)

No questions at the end of the presentation.

=============================================

Individual draft

=============================================

2. Considerations for Protection of SR Networks

https://datatracker.ietf.org/doc/draft-liu-rtgwg-sr-protection-considerations/  

Yisong Liu / Changwang Lin (10 mins)

  • Jeff Tantsura: What's your plan for this document? You listed all
    protection techniques.

  • YingZhen: The draft is informational. It doesn't seem like you need
    any extension

  • Weiqiang: the draft is about deployment of protection. all the
    components have been discussed in the relevant groups. This draft is
    only for information only.

  • Greg Mirsky: Slide #6: it is not clear what is the purpose of using
    BFD session. BFD doesn't provide information about the locator SID,
    it only verifies the forwarding path. I see it is over complicating
    the OAMs

  • Jeff T: you can add some deployment considerations, when to use
    which protection technology.

  • Weiqiang: we give the full picture of the all possible solutions.
    not necessarily all of them are needed. some operators might only
    need a subset. We will optimize the OAM. We think the document is
    mature enough. We would like WG adoption.

3. Scenarios and Challenges of Overlay Routing for SD-WAN

https://datatracker.ietf.org/doc/draft-sheng-rtgwg-overlay-routing-requirement/

Hang Shi / Cheng sheng (10 mins)

  • Jeff T: you can use BGP for multicast, which has been there for many
    years. Deployment of BIER is non trivial, and it's not supported by
    silicon. What you described is vanilla SDWAN deployment for years,
    looking forward to your protocol enhancement. it is not clear on
    what you want to do. What is your next step?

  • Shi Hang: This is a requirement document. We are looking for
    feedback and see if there are interests to collaborate. If so, we
    will propose solution based on this requirement.

  • Tony P: The BIER BGP extension draft is under WGLC in IDR. If you
    need any changes, please go to the relevant WGs to propose the
    extension.

4. Signaling In-Network Computing operations (SINC)

https://datatracker.ietf.org/doc/draft-zhou-rtgwg-sinc/
Signaling In-Network Computing operations (SINC) deployment
considerations
   https://datatracker.ietf.org/doc/draft-zhou-rtgwg-sinc-deployment-considerations/

Zhe Lou (20 mins)

  • Adrian Farrel: wearing my CATS chair hat. I try to see the
    difference between this and CATS. I see this as overlay as well, but
    it's on path computation between two end points. It is similar to
    SFC. CATS is about A-> B, and B does the calculation. CATS is about
    which B to use and how to get to it.

  • Zhe: Correct. We try to define what is within the network.

  • Adrian: How does transit node know there is SINC header under the
    encapsulation?
  • Zhe: transit nodes don't know. They just pass the traffic. The SINC
    capable node will find the header.
  • Adrian: so the SINC capable node are deep parsing.
  • Jeff Tantsura: You are define the collective capability. The
    operation stateful, you're not really building routes, you're
    building trees. How do you signal when operation starts and end?
    resiliency? Collective operation can take long time, like large
    language models. I'm expecting more from the document. We're looking
    at collective tree operation than just encapsulation.
  • Zhe: We start from fixed domain, like DC, so in a controlled
    environment.
  • David Lamparter: You should focus on the characteristics of
    computation (primarily: processing single packets vs. aggregation),
    ignore what computation itself actually is. Separate problems. Just
    describe what different cases you are talking about.
  • Zhe: Routers should announce its capability. It should be put
    somewhere else.

Chat History

  • Jim Uttaro
    00:26:44
    Along with scalability it would be helpful to understand the
    operational complexity

=======================================================

09:30--11:30 - Friday Session I, March 31


0. Welcome and Introduction

Chairs (5 mins)

Agenda bashing

  • David Lamparter: I do not believe the BGP Blockchain draft has
    sufficient merit to be worth our time here, would just like this to
    be recorded for future sessions.

=============================================

Individual drafts

=============================================

1. Routing on Service Addresses

https://datatracker.ietf.org/doc/draft-trossen-rtgwg-rosa/
Dirk Trossen (15 mins)

  • David Lamparter: Extension Header needs clarification, maybe for
    next presentation.

  • Dirk: you can find some technical details in section 7.

  • Aijun Wang: SAR needs to have a full table, no? How does it get the
    information?

  • Dirk Trossen: Problem is understood but the routing table is limited
    to the services a ROSA domain serves, thus not ALL services of the
    Internet.

  • Aijun: ICNRG has some work going on but not depending on IP network.

  • Dirk: This one is to run over IP network.

2. BGP Blockchain

https://datatracker.ietf.org/doc/draft-mcbride-rtgwg-bgp-blockchain/
Dirk Trossen (10 mins)

  • David Lamparter: All of the use cases have existing authorities that
    control the topic at hand. Applying distributed consensus into that
    is entirely useless.

  • Dirk Trossen: Trying to do permissioned DCS, not permissionless.

  • Q Misell: The cryptographic parts would require buy-in from e.g.
    RIRs, have you been in contact with them?

  • Dirk Trossen: Still trying to figure out, need to facilitate
    discussion somehow.

  • Andrew Alston: (personal) Answer to facilitating discussion: take
    this to the IRTF. Too early, not even vaguely ready for
    standardization, it's a research topic, better suited to IRTF.

  • Rüdiger Volk: Not seeing a direction what this is trying to tackle,
    kitchen sink of problems that are raised once in a while by people
    using BGP. Better cut down to specific problems. Possible cyclic
    dependency with the network operating itself.

  • Dirk Trossen: ACK on Andrew's comment, will be taking this there.

  • Jeff T: speaking for myself. I support to get this to IRTF than
    IETF.

3. Protocol Assisted Protocol (PASP)

https://datatracker.ietf.org/doc/draft-li-rtgwg-protocol-assisted-protocol/

Zhen Tan (10 mins)

  • Greg Mirsky: Characterization of problems only, no
    notification/propagation of the failure information?

  • Zhen Tan (ZT): Protocol for gathering information on a device. It
    helps to locate problems on the internet.

  • Greg Mirsky: Will devices keep some history log of events? (ZT: Yes)
    How long?

  • Zhen Tan: Vendor dependent

  • Greg Mirsky: So, possible data will already be gone when operator
    looks at it? (ZT: Yes) Notification would allow data to be stored
    elsewhere.

  • Zhen Tan: We have notifications, like in use case #2. There will be
    pre-configurations about sending notifications.

  • Aijun Wang: Protocol already knows reason for failure, what is the
    benefit of having a separate protocol?

  • Zhen Tan: This does not need to keep another connection. PASP uses
    UDP, the connection is on demand.

  • Aijun Wang: OK, need to understand reasons protocol itself can't do
    this. is PSAP a on-demand protocol?

  • Zhen Tan: yes.

  • Adrian Farrel: You said RSVP, did you mean RSVP-TE? (ZT: Yes) This
    might not be as applicable, RSVP-TE already has mechanisms for
    collecting and propagating fault information, e.g. RFC 4873.

  • Zhen Tan: Goal is to have one way to get the information for all
    protocols, otherwise hard to gather information.

6. Routing in Dragonfly topologies - problem space and solutions

Dmitry Afanasiev (20 min)

  • Tony Przygienda: Add path will cause massive path hunting, you need
    this (tunneling?) You need to shout everything off, it's a broadcast
    domain. Dynamic routing will be too slow, had that discussion
    before. Shifting traffic based on congestion is a viable thing,
    outside the realm of building a routing protocol that can keep up.
    Doable? Yes, look at DAR (dynamic adaptible routing), was looking at
    previous traffic, statistic on success. Works stunningly well. For
    dragonfly, when the network grows big, you will have to tunnel to
    keep 3-hops, then you will have to broadcast it and recompute.
    Flooding & broadcasting approach versus reactive shifting flows
    around. Reactive might be better to keep up.

  • Dimitry Afanasiev: Using VRFs rather than Tunnels, but yes. Adaptive
    routing works in milliseconds.

  • Tony Przygienda: Amorphous Broadcast domain?

  • Tianji Jiang: Reminiscent of previous work 15 ears ago by Brocade,
    L2 was done using trill, has that been looked at?

  • Jeff Tantsura: A lot of development happening on this topic
    recently, some of it needs to happen at IETF.

4. Tactical Traffic Engineering (TTE)

https://datatracker.ietf.org/doc/html/draft-li-rtgwg-tte-00
Colby Barth (15 mins)

  • Greg Mirsky: This is to be monitored on a link level, not path
    level? (CB: yes) Detection of congestion happens on egress to link,
    action is supposed to be taken by ingress (upstream node)?

  • Colby Barth: Action is to be taken at the point of (local) repair,
    that would otherwise act. Congestion is detected on a node's
    outgoing interface, which also serves as the repairing node.

  • Greg Mirsky: So, monitoring outgoing interface, and taking action on
    that. Not monitoring incoming queue, rather outgoing. (CB: yes) Not
    taking notification from sources of traffic? (CB: Yes.) Action is
    local, so effect overall on other flows cannot be considered, right?

  • Colby Barth: Yes, only flows transiting the affected node can be
    considered. Not attempting to come up with a global fix.

  • Himanshu Shah: This is doing TI-LFA on a congested outgoing link,
    that has been done before. Isn't this just a local implementation
    that doesn't need an IETF specification?

  • Colby Barth: Absolutely correct, this is an informational draft,
    it's a local node decision.

  • Himanshu Shah: What happens to other ongoing traffic, won't this
    make things worse elsewhere?

  • Colby Barth: The example uses TI-LFA & tunnels, but other mechanisms
    can be used. We call it TTE tunnels in the draft.

  • Himanshu: e2e tunnels are precalculated, better switched to those
    then using ti-lfa.

  • Zhenbin Li: for TE tunnels, if you change in the middle, will this
    cause packets out of order?

  • Colby Barth: Are you asking about this causing out of order packets?
    (Yes)

  • Colby Barth: Typically the hashing algorithms should be flow based
    which should alleviate problems.

  • Tony Li: Delay change is more significantly and will cause
    congestion control impact. There is performance impact.

  • Jeff Tantsura: Other similar approaches suffer a lot from their
    locality of action and may cause downstream congestion. Adding some
    non-local decision might help.

5. Requirement of Fast Fault Detection for IP-based Network

https://datatracker.ietf.org/doc/draft-guo-ffd-requirement
Framework of Fast Fault Detection for IP-based Networks
https://datatracker.ietf.org/doc/draft-wang-ffd-framework
Haibo Wang (20 Mins)

  • Greg Mirsky: Terminology - mechanism used isn't failure detection,
    more about failure notification? Using other mechanisms, e.g. BFD,
    to detect defect? what you described is about propagating the
    information in management plane.

  • Haibo Wang: Yes, other mechanisms in use in parallel, but also don't
    want to run heavyweight things on endpoints.

  • Greg Mirsky: Motivation is the large delay in discovery (>10s)?

  • Haibo Wang: It's based on keep-alives. In some scenarios it's much
    longer, 5-15 seconds, or 15 mins.

  • Greg Mirsky: Hints to me that there's some OAM mechanism missing. It
    seems to me not a good design.

  • Zhenbin Li: Overlap with CATS working group, coordination?

  • David Black: For an unconverged failure, how do you detect that the
    failure is unconverged or converged? Network examples on slides are
    simple and obvious - how would determination that a failure is
    unconverged be made for a more complex network such as the dragonfly
    networks described in item 6 earlier in the meeting?

(Communication issues at this point. To continue on list.)

===========================================================

Side Meeting Update if time allows

============================================================

7. APN Update

https://datatracker.ietf.org/doc/draft-li-apn-problem-statement-usecases/

https://datatracker.ietf.org/doc/draft-li-apn-framework/
Zhenbin Li/Shuping Peng (10 mins)

  • Joel Halpern: Presentations of unchartered side meetings do not seem
    appropriate for this working group; next one on agenda seems to have
    the same problem. Please try to get to a problem statement we can
    progress on. Frustrated with the structure.

8. Summary of GIP6 Side Meeting

Hongyi Huang/Qiangzhou Gao (5 mins)

presentation skipped, not enough time

Chat History

Louis Chan
00:18:57

For ROSA, is there development requirement for client application?

David Lamparter
00:24:45

There seems to be no note-taker, I've hopped in but I'm a bit
multitasking-limited as I have comments to make too :)

David Lamparter
00:24:59

(or is someone taking notes outside the pad?)

Jeff Tantsura
00:25:32

David - hope you could do it

David Lamparter
00:26:31

I'll try my best. Would still appreciate if you could ask the room if
someone else wants to share the load.

David Lamparter
00:27:24

https://notes.ietf.org/notes-ietf-116-rtgwg?edit

Andrew Alston
00:36:27

I cannot see how this is vaguely ready to look at in terms of
standardization - I can see how someone may wanna try and do some
research on this in the irtf - maybe

Yingzhen Qu
00:39:57

https://notes.ietf.org/notes-ietf-116-rtgwg?both

Yingzhen Qu
00:40:19

Please contribute to notes

David Lamparter
00:40:37

(Uh. That comment was very disingenious. "Just asking questions. Can't
take questions to the IETF?" … you asked your question, you just didn't
like the answer. I'll point this out to Dirk after the session.)

Anthony Somerset
00:47:42

MD5 is not considered secure anymore surely?

Jeff Tantsura
00:48:37

for quite some time

Joel Halpern
00:49:33

This PASP thing seems to be addressing an already multiply-solved
problem.

David Lamparter
00:54:59

The agenda copied into notetaking pad doesn't match the room… I assume
the agenda in the notetaking pad wasn't updated for some rescheduling

Yingzhen Qu
00:59:18

@David. you're right, I got the presentation sequence wrong, this is
supposed to be #6. Sorry about that

David Lamparter
01:00:18

OK, no problem, I was just confused in the notes for a moment :)

Hesham ElBakoury
01:07:24

When sinc will be presented?

Yingzhen Qu
01:08:25

@Hesham, SINC was presented on Monday

Greg Mirsky
01:20:50

voice is breaking. Perhaps not using video feed might help

John Scudder
01:21:20

Audio seems better now. I assume it was affecting everyone and not just
those of us onsite?

David Black
01:21:35

Yes, affected me - remote.

Jeff Tantsura
01:23:44

me too

Tony Li
01:30:48

There's no signaling at all. Nothing to interoperate.

David Black
01:33:33

Still have an opportunity to misorder when an in-progress flow is
switched to another path.

Tony Li
01:34:58

Misordering is more likely when deactivating a prefix. You're moving a
flow from a presumably suboptimal path back to an optimal one.

Tony Li
01:35:22

In any case, ordering and latency are possible issues ANY time we change
the routing table.

Shaofu Peng
01:36:53

Hi Tony, In the absence of a central orchestration of controller, when a
node in the network implement local path switch, they cannot perceive
the impact on how much traffic will be affected, which may lead to
congestion on a link in the new path. Of course, it is exactly difficult
to learn how much traffic will be affected, but if we have that
knowledge, that will be more perfect.

Jeff Tantsura
01:38:31

@Tony - you might consider using a similar strategy as with adaptive
routing/DLB and move the flow only if the interpacket gap is large
enough not to cause reordering, it is somewhat less of an issue in the
WAN (perceptionally) than in DC, but still something to think about

Tony Li
01:38:54

No argument. One of the more intensive ways of using this technique is
also to monitor per-prefix traffic levels and decide to select prefixes
to balance bandwidth utilization.

Tony Li
01:39:35

@Jeff we're not too worried about this, given that the alternative is
packet loss.

Tony Li
01:39:51

But we don't want to thrash, either.

Jeff Tantsura
01:40:09

absolutely, rebalancing usually yields better results than binary on/off

Shaofu Peng
01:40:42

IMO misorder is out the scope of this proposal...

Tony Li
01:44:44

It's not really out of scope. It's more that it's the lesser of two
evils. :-)

Shaofu Peng
01:48:32

Agree, I just think that it is another local behavior, similar to LFA,
TI-LFA, and previously, we have not raised any concerns about the
disorder of these local behaviors. This issue is addressed by other
technology.

David Lamparter
01:54:58

I'm incredibly confused [by the discussion, not the problem David Black
describes], not sure what to put in the notes here.

John Scudder
01:56:13

I think David's point is very well-taken. If this problem (insofar as I
understand what the speaker is trying to do!) were easy, it would
already have been fixed. If there are low-hanging fruit special cases,
then identify them and make the case that they're worth addressing, but
I don't think that's been done.

Jeff Tantsura
01:56:52

+1 John