Handling Long Lines in Inclusions in Internet-Drafts and RFCs

Summary: Has a DISCUSS. Has enough positions to pass once DISCUSS positions are resolved.

Benjamin Kaduk Discuss

Discuss (2019-09-04 for -09)
I think the procedures described herein are incomplete without a footer
to terminate the un-folding process.  Otherwise, it seems that the
described algorithms would leave the two-line header for the second and
subsequent instances of folded text in a single document.  (If we tried
to just blindly remove all instances of the header without seeking
boundaries, then we would misreconstruct content when different folding
algorithms are used in the same document with the single-backslash
algorithm occurring first.)

I don't think it's proper to refer to a script that requires bash
specifically as a "POSIX shell script".  I did not attmept to check
whether any bash-specific features are used or this requirements stems
solely from the shebang line, though.

I think the shell script does need to use double-quotes around some
variable expansions, especially "$infile" and "$outfile", to work
properly for filenames containing spaces.  We do quote "$infile" when
we're checking that it exists, just not (most of the time) when we
actually use it!

In addition to the above, I also share Alissa's (and Mirja's) concerns,
but feel that Discuss is more appropriate than Abstain, so we can discuss
what the best way to get this content published is.  For it's fine
content, and we should see it published; it's just not immediately clear
to me what the right way to do so is.
Comment (2019-09-04 for -09)
Section 4.1

   Automated folding of long lines is needed in order to support draft
   compilations that entail a) validation of source input files (e.g.,
   XML, JSON, ABNF, ASN.1) and/or b) dynamic generation of output, using
   a tool that doesn't observe line lengths, that is stitched into the
   final document to be submitted.

I don't think the intended meaning of "source input files" will be clear
to all readers just from this text.  Some discussion of how RFCs can
consider source code, data structures, generated output, etc., that have
standalone representations and natural formats, and the need to display
their contents in the RFC format that has different requirements might
be helpful context for this paragraph and the next.

Section 7.1.2

For some reason my mental model of "RFC style" does not use the word
"really" in this way, and prefers alternatives like "very" or
"exceptionally".  (Also in Section 8.1.2.)

Section 7.2.1

   1.  Determine where the fold will occur.  This location MUST be
       before or at the desired maximum column, and MUST NOT be chosen
       such that the character immediately after the fold is a space ('
       ') character.  For forced foldings, the location is between the

This is a rather awkward natural line break.  I suggest an RFC Editor
note to make sure that the punctuation around the space character all
appears on the same line.

   3.  On the following line, insert any number of space (' ')

I'm not sure I'd characterize the procedure as "complete" when it leaves
the value of the output subject to implementation choice such as this.
(Note that the next paragraph talks about the resulting "arbitrary
number of space" characters, and would presumably also need to be
adjusted if this text was adjusted.)
We also don't seem to bound this number of spaces to be fewer than the
target line length, which only matters in some weirdly pedantic sense.

Section 7.2.2

   Scan the beginning of the text content for the header described in
   Section 7.1.1.  If the header is not present, starting on the first
   line of the text content, exit (this text contents does not need to
   be unfolded).

I'm not sure I understand what "starting on the first line of the text
content" is intended to mean.  (Also in 8.2.2.)

Section 8.2.1

   If this text content needs to and can be folded, insert the header
   described in Section 8.1.1, ensuring that any additional printable
   characters surrounding the header do not result in a line exceeding
   the desired maximum.

We discussed above some cases when text could not be folded using the
algorithm from Section 7.2.1; in what case could text not be folded with
this algorithm?  Just the case when the implementation doesn't support
forced folding?

Section 10

We should warn against implementations scanning past the end of a buffer
(containing the entire contents of a file) when checking what's in the
beginning of the next line -- if a file ends with a backslash and "end
of line" but no further content, we could perform an out of bounds
access if the code assumes it is safe to check for the next line's
initial content.

Section 12.2

I think that RFC 7991 could be normative, since we say "per RFC 7991" to
describe some requirements on behavior.  Likewise for RFC 7994, whose
character encoding requirements we incorporate by reference.

Appendix A

I could perhaps argue that we should include a reference to POSIX for
"POSIX shell script" but find it somewhat hard to believe that this
would be a problem in practice.  It's also moot since we require bash
specifically, so we'd need to reference bash instead of POSIX.

   copy/paste the script for local use.  As should be evident by the
   lack of the mandatory header described in Section 7.1.1, these
   backslashes do not designate a folded line, such as described in
   Section 7.

It perhaps should be, but I think currently is not -- we only talk about
using the two-line header to detect instances of folding, without
mention of a requirement to be contained within <CODE BEGINS>/<CODE
ENDS> or similar.

It seems that my perception of "common shell style" diverges from that
presented in this document, which is not necessarily problematic.
(Things like what diagnostics go to stdout vs. stderr, use or ">
/dev/null" vs ">> /dev/null", etc.)

     printf "Usage: rfcfold [-s <strategy>] [-c <col>] [-r] -i <infile>"
     printf " -o <outfile>\n"

This summary usage line doesn't mention -d, -q, or -h.  (Maybe it
doesn't have to, of course.)

     # ensure input file doesn't contain a TAB
     grep $'\t' $infile >> /dev/null 2>&1

(`grep -q` is a thing, here and elsewhere.)

     # unfold wip file
     "$SED" '{H;$!d};x;s/^\n//;s/\\\n *//g' $temp_dir/wip > $outfile

[I don't remember why the s/^\n// is needed; similarly for the
unfold_it_2() case.]

     if [[ $strategy -eq 2 ]]; then
       min_supported=`expr ${#hdr_txt_2} + 8`
       min_supported=`expr ${#hdr_txt_1} + 8`

On the face of it this seems like it will produce "folded" output that
exceeds the line length, when we give min_supported of 54, use
autodetection of strategy, and have input that is incompatible with

     process_input $@

Need double-quotes around "$@" to properly handle arguments with
embedded spaces.

Ignas Bagdonas Yes

Deborah Brungard No Objection

Roman Danyliw No Objection

Comment (2019-09-04 for -09)
(1) Section 1.  To make this document more enduring, I’d recommend qualifying the capabilities of xml2rfc  (i.e., no line wrapping is done) to a version number.

(2) Section 2. Is it worth saying that in addition to the primary target being <sourcecode> and <artwork> (xml2rfc tags), it is also anything authors currently put between “<CODE BEGINS> … <CODE ENDS>” (sometimes even when not using xml tooling for rendering the draft)?  

(3) Editorial nits:
-- Section 2.  Editorial. s/This work may be also be used/This work may also be used/.

-- Section 4.2. Editorial.  s/already YANG [RFC7950] modules are extracted/YANG [RFC7950] modules are already extracted/.

Suresh Krishnan (was Discuss, No Objection) No Objection

Comment (2019-11-02 for -10)
I am happy with this progressing as an Informational document.

Barry Leiba No Objection

Comment (2019-08-21 for -08)
— Section 4.1 —

I find the BCP 14 “SHOULD” in this section to be odd, and would lower-case them.

   When needed, this effort again
   SHOULD be automated to reduce effort and errors resulting from manual

This sentence is really awkward: “when needed”, the use of “effort” twice, and the uncertainty of whether the clause “resulting from manual processing” applies to both effort and errors, or only to the latter.  I would say it this way:

This work should also be automated to reduce the effort and to reduce errors resulting from manual processing.

— Section 6 —

         assumes that the continuation begins at the character that is
         not a space character (' ') on the following line.

Should be “at the first character”.

— Section 7.1.1 —

   The second line is a blank line.

The code in the appendix generates an *empty* line (no text).  Is that what you mean by “blank line”?  Will a line that contains only space characters (*looks* the same) work also?  The code in the appendix appears to discard the second line without checking its content at all.  I think you should be clearer about what qualifies as a “blank line”.  (This also applies to Section 8.1.1.)

— Section 7.2.1 —

   If this text content needs to and can be folded, insert the header
   described in Section 7.1.1, ensuring that any additional printable
   characters surrounding the header does not result in a line exceeding
   the desired maximum.

Should be “do not result” (to match the plural “printable characters”).

Alexey Melnikov (was Discuss) No Objection

Comment (2019-08-29 for -08)
Thank you for explaining how escaping of trailing \ is possible.

Martin Vigoureux No Objection

Éric Vyncke No Objection

Comment (2019-08-26 for -08)
Sometimes a small problem (like line folding) can be annoying... so thank you for authoring this document.

Just a minor comment:
- should 'pyang' and 'yanglint' be added to the references ?


Magnus Westerlund No Objection

Alissa Cooper Abstain

Comment (2019-09-04 for -09)
RFC 7994 is not a product of IETF consensus, so it seems inappropriate to publish a consensus BCP predicated on requirements defined in RFC 7994 which themselves do not have IETF consensus. This would be the only document related to the RFC format in the last 10 years that I'm aware of that would be published on the IETF stream.

There has been discussion about how embedding YANG models in RFCs seems like a poor fit for a number of reasons. By standardizing line-folding mechanisms and claiming them as a best practice, this document reinforces the root of that problem rather than trying to fix it.

Mirja Kühlewind Abstain

Comment (2019-08-29 for -08)
I don't think this draft is in scope of the charter of the netmod working group. I've seen in the shepherd write-up that input from the RSE was received, therefore I don't necessarily assume that a different publication path would have produced a different outcome and I decided not to block publication, however, given the publication path taken it is  not visible to me if sufficient feedback from the right people that are impacted or targeted by this document has been received. 

In general I don't think it is okay to publish a document that is out-of scope for a working group, especially when the scope of the document impacts other work in the IETF so broadly (while I do understand that this was written with main focus on YANG). Given the current situation I would eventually rather go for informational than BCP.

Alvaro Retana Abstain

Comment (2019-09-04 for -09)
I agree with Alissa on her concern of work related to the RFC format being published in the IETF Stream.  I am then also ABSTAINing.

The text mentions that the RFC Editor has confirmed that "there is currently no convention in place for how to handle long lines", but there is no mention, and I couldn't find a related conversation in the archive, about the RFC Editor's opinion of the proposed solution.  Because "this work primarily targets" elements in xml2rfc, I encourage the Shepherd/AD to explicitly discuss the solution with the RFC Editor.  I also strongly believe that a conversation should take place with the RFC Editor/IAB about the appropriate publication stream.  Both conversations should happen before this document is approved for publication.

Adam Roach (was No Objection) Abstain

Comment (2019-09-05 for -09)
I've updated my position to an Abstain based on the telechat
discussion. I find the arguments regarding BCP versus Informational
to be compelling, and am sympathetic to the concerns about
both stream and charter. All of that said, I do want to see this
document published, and I hope we can rearrange things in a
way that allows that to happen.


Thanks for taking on this work to fill a hole in the tools that
we have for production of RFCs. I have one fairly major comment
and several editorial suggestions.



>  This document defines two strategies for handling long lines in
>  width-bounded text content.  One strategy is based on the historic
>  use of a single backslash ('\') character to indicate where line-

Nit: "historical"



>  According to the RFC Editor,
>  there is currently no convention in place for how to handle long
>  lines in such inclusions, other than advising authors to clearly
>  indicate what manipulation has occurred.

This won't age well. Perhaps "Historically, there has been no
RFC-Editor-recommended convention in place for how to handle..."

>  This document defines two strategies for handling long lines in
>  width-bounded text content.  One strategy is based on the historic
>  use of a single backslash ('\') character to indicate where line-

Nit: "historical"



>   NOTE: '\' line wrapping per BCP XXX (RFC XXXX)

Using this string as the start of the specially-wrapped section
seems somewhat problematic, as it forecloses on the possibility
of also *citing* this BCP at that point in the document. For example,
if I were to use this format, I would definitely want to use a string
more of the format:

    NOTE: '\' line wrapping per BCP XXX ([RFC XXXX])

(taking note of the added brackets).

If this has already been debated in the working group and the current text
is the result of carefully considering this issue and deciding that the
use of the specified string has benefits that outweigh the drawback of
not being able to cite the document per ordinary convention, then don't afford
my suggestion any undue weight. I'm not trying to change a consensus decision.

But if this is a simple oversight, I think it does need to be given
significant thought. For example, I personally am rather likely to elect to do
things "the old way" in my own documents rather than using this format because
of the awkwardness of properly citing a normative reference.

This same comment applies to §8.1.1, of course.


> Appendix A.  POSIX Shell Script: rfcfold

Please add [POSIX.1-2017] as a reference.

Warren Kumari No Record