Negotiating Human Language in Real-Time Communications
RFC 8373

Note: This ballot was opened for revision 19 and is now closed.

(Ben Campbell) Yes

Comment (2018-01-09 for -22)
No email
send info
I'm balloting "yes" because I think this is important work, but I have some comments:

Substantive Comments:

- General: It seems to be that this is as much about human behavior as it is capabilities negotiating. Example case: I make a video call and express that I would like to receive Klingon. (Is there a tag for that ? :-) The callee can speak Klingon and Esperanto, so we agree on Klingon. What keeps the callee from speaking Esparanto instead?

I realize we can't force people to stick to the negotiated languages--but should we expect that users should at least be given some sort of UI indication about the negotiated language(s)? It seems like a paragraph or two on that subject is warranted, even if it just to say it's out of scope.

-1, paragraph 6:  (related to Ekr's comments) Does the selection of a single tag in an answer imply  an assumption only one language will be used? There are communities where people tend to mix 2 or more languages freely and fluidly. Is that sort of thing out of scope?

- 5.1, paragraph 2:  Can you elaborate on the motivation to have a separate hlang-send and hlang-recv parameter vs having a single language parameter and instead setting the stream to send or receive only, especially in light of the recommendation to set both directions the same for bi-directional language selection? I don't mean to dispute that approach; I just think a bit more explanation of the design choice would be helpful to the reader.  I can imagine some use cases, for example a speech-impaired person who does not plan to speak on a video call may still wish to send video to show facial expressions, etc.  (I just re-read the discussion resulting from Ekr's comments, and recognize that this overlaps heavily with that.)

-5.1, paragraph 3: "... which in most cases is one of the
   languages in the offer's..."
Are there cases where it might not?

-5.1, last paragraph: "This is not a problem."
Can you elaborate? That sort of statement usually takes the form "This is not a problem, because..."

-5.2, last paragraph: Is there a reason to give such weak guidance on how to indicate the call is rejected?  (Along those lines, are non-SIP uses of SDP in scope?)

Editorial Comments and Nits:

-5.1, paragraph 4: The first MUST seems like a statement of fact.

(Alexey Melnikov) Yes

(Adam Roach) Yes

Comment (2018-01-09 for -22)
No email
send info
I'm glad to see this document being published; thanks to everyone to worked on it.

One tiny nit; section 5.1 contains the following text:

>   This document defines two media-level attributes starting with
>   'hlang' (short for "human interactive language")...

I think this is a hold-over from when the string was "humintlang" rather than "hlang" -- it probably makes more sense to say:

>   This document defines two media-level attributes starting with
>   'hlang' (short for "human language")...

(Alia Atlas) No Objection

(Deborah Brungard) No Objection

(Alissa Cooper) No Objection

Comment (2018-01-09 for -22)
No email
send info
== Section 7 ==

   addition, if the 'hlang-send' or 'hlang-recv' values are altered or
   deleted en route, the session could fail or languages
   incomprehensible to the caller could be selected; however, this is
   also a risk if any SDP parameters are modified en route."

Given that one of the primary use cases for the attributes defined here is for emergency calling, it seems worthwhile to call out the new specific threat that these attributes enable in that case, namely the targeted manipulation/forgery of the language attributes for the purposes of denying emergency services to a caller. This general class of attacks is contemplated in Section 5.2.2 of RFC 5069, although there may be a better reference to cite here for what to do if you don't want your emergency calls subject to that kind of attack (I can't recall another document off the top of my head).

== Section 8 ==

This seems weak for not including some words to indicate what to do to mitigate the risks of exposing this information.

(Spencer Dawkins) No Objection

(Suresh Krishnan) No Objection

Warren Kumari No Objection

(Mirja Kühlewind) No Objection

Comment (2018-01-08 for -19)
No email
send info
One question: I can't really imagine cases where the send and recv would be used to indicate different things. Can you provide an example (and better explain in the document why this 'complexity' was added)?

One purely editorial note: I think section 5.1 could simply be removed before final publication as part of the reasoning is given in the intro already.

(Kathleen Moriarty) No Objection

(Eric Rescorla) No Objection

Comment (2018-01-06 for -19)
No email
send info
Document: draft-ietf-slim-negotiating-human-language-17.txt

1. I'm not marking this first point DISCUSS, but I do think it's
important it be addressed and I trust the AD will ensure that it is.
This document is ambiguous about the contents of the answer
attribute. Specifically, it says:

   In an answer, 'hlang-send' is the language the answerer will send if
   using the media for language (which in most cases is one of the
   languages in the offer's 'hlang-recv'), and 'hlang-recv' is the
   language the answerer expects to receive if using the media for
   language (which in most cases is one of the languages in the offer's

However, the next paragraph permits >1 tag, as does the ABNF in S 6.1.

   Each value MUST be a list of one or more language tags per BCP 47
   [RFC5646], separated by white space.  BCP 47 describes mechanisms for
   matching language tags.  Note that [RFC5646] Section 4.1 advises to
   "tag content wisely" and not include unnecessary subtags.

So, how am I supposed to interpret an answer with >1 tag? Is this
forbidden? I can imagine a number of semantics, but it's important
it be clear in the document.

2. The negotiation structure here does not match that which is
conventionally used with SDP, where each side indicates the formats it
is prepared to receive and the other side can send any of them. Why
did you use this structure? One reason you might is that you expect
the answer to resolve which language is in use, however because SIP
supports early media (i.e., media which is delivered prior to the

Alvaro Retana No Objection

Comment (2018-01-06 for -19)
No email
send info
Thanks for writing an interesting document!

Given that this document doesn’t mandate the behavior in the case of not having languages in common, why does it matter if the combination is “difficult to match together” or not?  I’m wondering about this piece of text (from 5.2):

   two SHOULD NOT be set to languages which are difficult to match
   together (e.g., specifying a desire to send audio in Hungarian and
   receive audio in Portuguese will make it difficult to successfully
   complete the call).

I don’t understand how “difficult to match” can be enforced from a normative point of view.  Difficulty seems to be a subjective criteria -- the example shows a pair that I would consider difficult too (I don't speak Hungarian!), but other pairings could still be difficult for me but easy for others.  Using “SHOULD NOT” (instead of “MUST NOT”) implies that there are cases in which it is ok to do it (again, probably subjectively).  It seems to me that the “SHOULD NOT” could be a simple “should not”.

BTW, that reminds me: please use the template text from rfc8174 (instead of rfc2119).

Nit:  It would be nice to expand SPD in the abstract and put a reference to rfc4566 in the Introduction.