Early Review of draft-ietf-cdni-logging-11
review-ietf-cdni-logging-11-opsdir-early-harrington-2014-04-08-00

Request Review of draft-ietf-cdni-logging
Requested rev. no specific revision (document currently at 27)
Type Early Review
Team Ops Directorate (opsdir)
Deadline 2015-03-03
Requested 2014-03-31
Draft last updated 2014-04-08
Completed reviews Genart Last Call review of -15 by Martin Thomson (diff)
Opsdir Early review of -11 by David Harrington (diff)
Opsdir Telechat review of -18 by BenoƮt Claise (diff)
Secdir Last Call review of -15 by Klaas Wierenga (diff)
Assignment Reviewer David Harrington
State Completed
Review review-ietf-cdni-logging-11-opsdir-early-harrington-2014-04-08
Reviewed rev. 11 (document currently at 27)
Review result Has Issues
Review completed: 2014-04-08

Review
review-ietf-cdni-logging-11-opsdir-early-harrington-2014-04-08

Hi,

It appears I didn't copy ops-dir with this review. Doh!

David Harrington
ietfdbh at comcast.net
+1-603-828-1401
> -----Original Message-----
> From: ietfdbh [

mailto:ietfdbh

 at comcast.net]
> Sent: Saturday, April 05, 2014 5:15 PM
> To: draft-ietf-cdni-logging.all at tools.ietf.org
> Cc: Benoit Claise; spencer at wonderhamster.org
> Subject: OPSDIR review of draft-ietf-cdni-logging-10
> 
> Hi,
> 
> I have been asked to provide an early OPSDIR review of
> draft-ietf-cdni-logging-10.
> 
>    This memo specifies the Logging interface between a downstream CDN
>    (dCDN) and an upstream CDN (uCDN) that are interconnected as per the
>    CDN Interconnection (CDNI) framework.  First, it describes a
>    reference model for CDNI logging.  Then, it specifies the CDNI
>    Logging File format and the actual protocol for exchange of CDNI
>    Logging Files.
> 
> An OPS-DIR review usually has a principal goal of helping the OPS ADs in
> their evaluation and balloting of documents at IESG reviews.
> An early review is focused more on providing feedback to authors, the
> working group, and the relevant area directors about issues that they
might
> want to consider and address as they move forward with the draft.
> 
> Overall, the document is well-written, as are the related documents, and
the
> set of documents do a good job of explaining the important points of CDN
> interconnection.
> 
> --- RFC5706 review ---
> 
> RFC5706 provides guidelines that protocol designers should consider about
> the operations and management of their protocols. RFC5706 has an
> Appendix
> that OPSDIR reviewers use to help guide their reviews. The following
points
> from the RFC5706 Review Checklist apply.
> 
> 1.  Has deployment been discussed?
> A number of related documents discuss CDN Interconnection, describe the
> problems and use cases, propose a framework for considering various
> interfaces relevant to CDN Interconnections, and detail some requirements
> for those interfaces. A significant amount of discussion is provided about
> expected deployment models, and how the various conceptual interfaces
> apply
> to these deployment models.
> 
>        *  Does the proposed approach have any
>           scaling issues that could affect usability for large-scale
>           operation?
> 
> Yes, there are scalability issues. As mentioned in section 2.1, and
> reflected in the terminology "CDN Reporting" and "CDN Monitoring", there
> are
> requirements for the logging interface to support both near-real-time
> monitoring (for fault and performance mitigation) and deferred analysis
(for
> billing, past-mortem delivery analytics, etc.). The deferred analysis is a
> much easier problem to solve, given the potentially huge amounts of
logging
> records, and the timeliness requirements involved in operating a large
CDN.
> 
> 
> The proposed solution generally looks good for the deferred case, but
seems
> inadequate for the near-real-time monitoring case. The document declares
> the
> near-real-time support to be out of scope.
> 
> Given the costs in time and resources of converting between logging
> formats,
> it may be really desirable to have one format that can serve both sets of
> needs. To a large degree, the proposed format could serve both purposes;
> however, there are various points in the document that specify MUST
> requirements that would prevent using the solution for both use cases.
> Mostly, these requirements imply that a "file" must be fully collected
> before sharing, and given the timeliness required by the monitoring use
> case, waiting until a file is fully collected makes the file approach
> unsuitable for monitoring. It can be feasible to use a logging file
approach
> for monitoring, if the log can be "tail"ed, thereby allowing the contents
to
> be passed as the collection is performed, without waiting for the file to
be
> complete.
> 
> In most of the network management protocols designed by the IETF, we
> recognize three parts - a data modeling language, data models, and a
> protocol that can transport portions of the data model. This document
> doesn't do a great job of keeping those separate. If the document was
> written with such separation in mind, then the record formats would be
> independent of how the file was going to be transported, and a logging
> record might be able to be used with two different approaches to transport
-
> one suitable for deferred usage, where it can be expected that the file is
> completed before transport, and another transport approach suitable for
> near-real-time transport of individual records, possibly using a tail
> approach for a file-based collection of records. The constraint in section
> 4.1.2 is written in a manner that would seem to preclude a tail usage.
> 
> This concern might be able to be mitigated by changing some wordings in
the
> document. I think the document would be better if, rather than declaring
> one
> set out of scope, it recognized the two sets of needs, and discussed how
the
> design of the data modeling language and data models could address both
> sets
> of needs, while recognizing that different transport solutions might be
> needed to meet the two sets of needs, and there might be constraints on
> the
> data model designs to be suitable for both needs.
> 
> 2.  Has installation and initial setup been discussed?
> 
> It has been discussed to a degree, but this document declares a tremendous
> amount of "stuff" to be out-of-scope, and that typically includes how to
> configure the interaction between peers. I think this document really
needs
> to address some of the issues that it declares out of scope, especially
the
> configuration options that must be standardized or negotiated, in order to
> make the solution interoperable across different implementations. Yes,
> some
> of these negotiations may be business-oriented negotiations, but those
> negotiated parameters could then be specified in a technical manner so the
> logging application can be automated in an interoperable manner.
> 
> Some examples where I think this document could have either standardized
> the
> answer or allowed for standardized negotiation:
> a. what exact set of logging information is to be provided by the dCDN to
> the uCDN. The proposed solution includes record-types and field lists
> (templates), so why can't the two use that information to negotiate what
> should be included in the logging?
> b. In section 2.1, it says the uCDN can configure customization, but that
> the dCDN is free to ignore that, and apparently the uCDN doesn't even get
> told that the dCDN is going to ignore it. If I were the uCDN and the dCDN
> was going to ignore my request, I'd like to be told that so I can choose a
> different dCDN that will pay attention to my customization preferences. I
> would find this especially important if my preferences had to do with
> end-user and/or content provider privacy issues, for which there might be
> legal requirements/ramifications.
> c.
> 
> 3. Has the migration path been discussed?
> The migration path is not a direct path. There is no existing standard for
> CDNI, so there really is no migration, per se.
> 
> CDNs are typically proprietary, and how they operate can contribute to
> competitive differentiation. CDNs deliver content for content providers,
and
> it is common that the content providers want reports of completion and
> performance of the deliveries. Connecting CDNs is like connecting black
> boxes together, and the sharing of information is on a need to know basis.
> This document works at describing the subset of monitoring/logging
> information to be shared.
> 
> As observed in the last paragraph of section 2.1, it is desirable to keep
> the intra-CDN and inter-CDN logging compatible. This document explicitly
> tries to reuse (migrate from) the conceptual logging done within CDNs, and
> explain how to use it between CDNs. Often within a CDN, there are multiple
> geo-located facilities (points of presence) each of which more or less
acts
> as a CDN in its own right. So it is not uncommon to conceptually have
> upstream and downstream CDNs within a single CDN. For the most part, this
> attempt at reuse appears to work well.
> 
> I think there are some design problems in the proposed solution because
the
> migration is from a single aggregate CDN to potentially multiple cascading
> levels of CDN. Within a single CDN, I expect that the hierarchy is rather
> flat, and probably most CDNs don't go beyond two levels - one uCDN and
> one
> dCDN for a particular delivery. But CDNI was started to address the
growing
> need for interconnection between multiple CDNs, often in cascading
> uCDN/dCDN
> relationships. The migration from non-cascaded CDNs to cascaded CDNs has
> a
> few design flaws and inconsistencies within the proposed design.
> 
> Most notably, the logging from a dCDN to a uCDN typically contains only
one
> dCDN identifier, such as Verified-origin, and doesn't really permit
> specifying a sub-ordinate dCDN. For example, in a cascade such as CDN-A ->
> CDN-B -> CDN-C, CDN-C will share logging info with CDN-B, but not with
> CDN-A; CDN-B shares logging info with CDN-A. This can be desirable for
> hiding the topology and delegation used by the dCDN. However, this
> document
> does not discuss how the logging provided by CDN-C gets converted into the
> logs for CDN-B that will be sent to CDN-A. Without some logging
information
> from CDN-C, it would seem difficult for CDN-A to utilize the information
to
> meet requirements of billing, analytics, and fault and performance
analysis.
> Maybe the logging gets aggregated and reported in CDN-B's logging, but
that
> "transitive or aggregate logging" doesn't appear to be discussed in this
> document.
> 
> I think Verified-Origin is particularly problematic, because the text
states
> that this can only be added by the uCDN, never the dCDN. So what if CDN-B
> verifies the origin of the logs from CDN-C and then passes the information
> (now as a dCDN) to CDN-A? The text mentions that this might be established
> using authentication mechanisms; so do we lose this logged
> authentication/verification when cascading?
> 
> (Maybe this is resolved by having the uCDN (CDN-B) record a "transaction"
> that starts when it delegates a task to a dCDN (e.g. CDN-C) and ends when
> the dCDN completes the task, and the uCDN just records the whole
> transaction
> without the details that are reported by CDN-C. Then CDN-B logs only the
> whole transaction. I would like to see an explicit example of such
logging,
> using the cdni_http_request_v1 record-type, so this is clear.)
> 
> [I had a bunch of notes, listed below, from my review. It is simpler for
me
> to list those notes, as I have done below. Converting them into comments
> organized according to RFC5706 checklist is time-consuming. To help keep
my
> review shorter than the document I am reviewing, and since this is an
early
> review, I am discontinuing the RFC5706 format review and simply providing
> my
> list of comments. I assume I will be asked to continue reviewing this
> document, so later reviews will go back to the RFC5706 format ..]
> 
> --- Technical advice ---
> I am not an assigned technical advisor for this working group; I have a
> certain degree of expertise in operations and management, especially IETF
> protocols for network management and logging, plus a lesser amount of
> expertise in CDN management. The following comments come from that
> background, and deserve no more attention than comments from other
> contributors.
> 
> 1) Section 2.1 starts saying what is involved in the reference model, but
> then presents a bullet list that seems more interested in detailing what
is
> out of scope for the document, than what is included in the model. I think
> this should be rewritten to be cleaner. As part of that, I recommend
moving
> away from bullets to full sentences.
> 
> 2) An editing/reviewing nit - this document uses lots of bullets rather
than
> sub-section numbering; for review purposes, this is irritating because a
> reviewer has go count through the bullets to reference text contained in
the
> sub-sections.
> 
> 3) in section 2.2.3, some rfc2119 terms are used in lowercase. It isn't
> specified that these are not used to represent  RFC2119 requirements, and
I
> am not sure if they are meant to be used that way. As a result, the intent
> of this text is ambiguous.
> 
> 4) in section 2.2.4, there is an issue about correctly reporting data
about
> sessions that cross logging periods; it only says "it is important to
> correctly report this". I think this document needs to better specify how
to
> correctly report in such an environment. It would help to require that
data
> models, such as the data model defined in this document, identify those
> elements that might suffer such discontinuities, and explain how to
resolve
> the discontinuity. (The MIB Doctor directorate often watches for issues of
> persistence and discontinuities, so you might be able to get some advice
> from them on this issue.)
> 
> 5) I am rather disappointed that this document only defines one data model
-
> the cdni_http_request_v1 model. While I recognize that this probably
> represents the bulk of CDN traffic, it would have been nice to provide a
> proof-of-concept for the registration of record-types and field names to
> have more than one (say, a cdni_ftp_v1 model, to show how to register
> each,
> and especially to be show how to reuse field-names across record types.
> Such
> a proof-of-concept would also be helpful uncovering any problems with the
> approach before it becomes a standard.
> 
> 6) I wonder if support for multi-line entries will become important over
> time; this format doesn't permit expansion to such multi-line records.
> 
> 7) Syslog/TLS ran into a problem when multiple messages were transported
> in
> a stream, as compared to the original UDP-based design where each
> message
> was in its own packet. A great deal of discussion happened about
delimiters
> that could be used to delimit messages in a stream, because no delimiters
> had been reserved for such purpose, and recovering from a delimiter lost
in
> transit could be problematic. (Syslog/TLS finally used a counted-length
> approach, which requires the completed message size to be known before
> sending.) I'm not sure that would be relevant here.
> 
> 8) Under Record-Type, directive value, it says "cdni_http_request_v1" MUST
> be indicated ... ; is this meant to be a REQUIREMENT ala RFC2119, or is
this
> an example? I would hope it is not a hard-coded value, and is just an
> example, so a new record-Type could be defined in the future that is NOT
> http-specific, and a new http-specific record-type might be defined if
> needed.  The wording needs to make this clear.
> 
> 9) There is an interaction between record-type and the Fields directive.
The
> text mentions "the first instance", and I'm not sure this isn't required
for
> every instance. Can the file have the following structure? Record-type;
> fields; <data>; fields; <data>? i.e., can a second fields directive change
> the format of the (following) information within a single record-type
> declaration? Is this needed? There is no explanation as to why this
feature
> is included. An explanation, and examples would be nice.
> 
> 10) The Integrity-hash directive value text seems slightly contradictory -
> a) the behavior of the entity that received a corrupted logging file is
> outside the scope of this specification, and b) depending on the
validation
> of the hash, the receiving entity MUST consider the logging record
corrupted
> or non-corrupted, c) if the entity receives a non-corrupted file, and adds
a
> verified-origin directive, then it must recompute.
> 
> 11) I am concerned that the Integrity hash is optional; would the
> implementation be able to tell whether the integrity was being provided
via
> some external means? It would be nice to standardize whether the integrity
> hash must be present.
> 
> 12) would it make sense to make the integrity hash part of the transport
> protocol (i.e., set in a format that wraps the original logging file,
rather
> than being part of the original file? I am also concerned about having the
> uCDN recompute the hash after verifying the host, There would now be two
> copies archived, wouldn't there - the one being held by the dCDN per
> agreement, and the one with a recomputed hash. Which one takes
> precedence?
> If yu defined a uCDN wrapper for the file presented by the dCDN, and had
> the
> verified-origin and integriy hash as part of the wrapper, then the
> uCDN-wrapped file would match the dCDN-retained file.
> 
> 13) under Integrity-hash, it states "Note that this is not a guarantee
..."
> I think deserves some expansion, either here or in the security
> considerations section.
> 
> 14) If the file were tailed, and transmitted as it was generated, would it
> be compliant to this specification to compute the MD5 as the file was
being
> sent, and then logged the hash directive after the file (without the hash
> directive) finished transmitting?
> 
> 15) What if a mode of corruption is found in the future that this hash
> computation wouldn't detect? MUST the receiver still consider the file to
be
> non-corrupted?
> 
> 
> 16) the last paragraph of 3.4 has a couple spelling/grammar errors.
> 
> 17) for c-ip, is the "client" address unambiguous? I'm not an expert in
http
> redirection, but if IIRC, the client might be specified in more than one
> way. Is it unambiguous which field of the http request this must come
from?
> If there are more than one, such as if the dCDN could differentiate the
> actual client from the DNS resolver address, would it be helpful to
include
> more than one client address?
> 
> 18) I am concerned about not standardizing a negotiation for the u-uri.
How
> is the transformation "agreed upon" in a manner that allows an application
> to know what to do? I think it might be better if expressing the
> transformation expected could be standardized. Otherwise, how do we get
> interoperability across multiple implementations?
> 
> 19) As mentioned above, I'd like to see more than one protocol dealt with,
> as proof of concept.
> 
> 20) sc-total-bytes apparently only applies to HTTP bytes. So maybe this
> should be labeled sc-http-total-bytes, to differentiate it from
> sc-ftp-total-bytes, and from sc-total-bytes (a protocol-independent
value).
> Ditto for many of the other fields defined here.
> 
> 21) sc-status: is there a valid range associated with this field? I know
> squid supports up to status=600, even though it doesn't understand many of
> those values. Is it legal/compliant for some implementations to decide to
> end status=900? Is it legal/compliant for some implementations to consider
> any value >500 to be invalid?
> 
> 22) s-sid mentions http-specific session.
> 
> 23) s-sid: who establishes the session ID? Is it the CDN performing the
> delivery (the dCDN) or the uCDN? Can a uCDN establish a session ID but
then
> have different dCDNs deliver different portions of the content? Having a
> consistent sid would allow the uCDN to correlate the multiple "sessions"
> into one.  In a cascaded environment, are the sids always distinct, and
able
> to be correlated by the uCDN?
> 
> 24) s-cached: I found some of the wording to be ambiguous. I recommend
> some
> rewriting to make sure this is not ambiguous. "exclusively"? "some, but
not
> all, content"?
> 
> 25) why is s-cached important to CDNI? I understand why it is important to
a
> CDN, but why does a uCDN need to know if a dCDN surrogate had it cached?
> Should this be about whether the dCDN had it cached, not whether a
> surrogate
> of the dCDN had it cached?
> 
> 26) After the s-cached definition, there is text about the Fields
directive.
> Doesn't this text belong near the definition of the fields directive?
> 
> 27) What is the benefit of the feature that says the fields can be in any
> order? How does this benefit compare to some level of standardization?
> 
> 28) Why make three fields optional to support? If implementations don't
> support them, then users can't use them. It would seem better to make
> these
> mandatory-to-implement, optional-to-use.
> 
> 29) if a uCDN needs to have any of these optional fields, especially the
> anonymizing field, but there is no negotiation phase specified by this
> document, then how can a uCDN decide not to deal with a dCDN that doesn't
> support the field?
> 
> 30) Updates to log files and the feed  are outside the scope ... Really,
> can't we standardize a way to specify the frequency, the period of time,
and
> timeliness of publishing?
> 
> 31) 4.1.2 says implementation "SHOULD use HTTP cache headers ..."; why
> only
> SHOULD? Is there a specific example of when MUST would be inappropriate?
> 
> 32) Is there a reason we don't record the retention lifetime in the file?
> This might make it easer for the archiver to purge files that have passed
> their retention dates.
> 
> 33) Can redundant feeds have different timing characteristics? If so, is
> "SHOULD ue the UUID ... to avoid ... pulling and storing ..." still valid?
> Maybe there should be some discussion of why this is a SHOULD rather than
> a
> MUST.
> 
> 34) 4.2 says "MUST use HTTP v1.1". I understand trying to have a baseline
> for interoperability. Shouldn't this be MUST implement support for v1.1,
and
> MAY support additional HTTP versions. Clients MUST implement support for
> v1.1, but MAY negotiate which version they use.
> 
> 35) SHOULD support gzip; why not MUST-implement gzip, MAY use.
> 
> 36) 5.2 - is this meant to be an extensible registry? Is it allowed to
have
> enterprises register their own proprietary formats (which would seem
> acceptable if the provide a specification). So are there preferred naming
> conventions, so we can tell the ietf-defined record formats from
proprietary
> record formats? It would also be nice if there was a short description
> field, it is easy to see what a record contains without having to go find
> and read the document to determine that. (I don't know whether <cdni_> is
> meant to be a preserved prefix for use by the cdni WG or not. I cannot
tell
> from the name whether this is v1 of the record format for http_requests,
or
> whether this is the record for exclusively for http_v1 requests, or ...; a
> comment column would be helpful, as would naming conventions.
> 
> 37) The intention to share the registry across record-types is discussed
> after the registry (and on the next page in my printout). It might be nice
> to specify the intention before the registry.
> 
> 38) I LOVE reuse. I am thrilled that you intend to make these field names
> reusable.  I am often amused when developers reinvent the wheel, and then
> advertise how standards-supportive they are by announcing their intention
> that others should reuse their work.
> 
> I don't know the [ELF] format; does that format define any fieldnames that
> could have been reused here, rather than defining them again? I am aware
> that the SIPCLF WG worked on developing a log format for SIP, and they
> explicitly liked the well-known formats used by servers like Apache and
> Squid. Did they define any fieldnames that could have been reused here?
> Syslog and ipfix define fieldnames in their own formats (SDE and IE
> respectively). Could any of these fieldname semantics be defined so as to
> reuse the semantics already defined elsewhere? While they aren't defined
> as
> fieldnames, if the semantics are the same, then applications would be able
> to correlate the information more easily if it is standardized that
> cdni:date follows the same semantics as ipfix:date, and cdni:s_hostname
> matches syslog:hostname semantics, and so on.
> 
> Many of these fieldname values seem to come from URLs; it might be helpful
> to have a reference clause for each item that shows where the extracted
> value is defined (e.g., in the HTTP protocol spec, or subsequent specs).
> 
> 39) in 6.1, there are some MUST-implement statements; section 4.2 seems
> more
> lenient. These should be updated for consistency.
> 
> 40) "Alternae methods my be used ..." I think it is important to be
> unambiguous here about implementation versus usage. We want to be sure
> that
> a small set of required methods be MUST-implement, and they may be
> supplemented by additional implementation features. Users MAY use
> whatever
> method they choose of those that are implemented.
> 
> 41) "Both parties SHOULD support mutual authentication." Hmmm. How do
> you do
> MUTUAL authentication if only one side supports it? Should this be MUST
> implement? Are there circumstances where MUST is not applicable.
> Justifying
> the SHOULD?
> 
> 42) Should the uCDN be able to specify that the dCDN MUST NOT retain
> sensitive information about its clients? (I think this might be geoPriv
> territory).
> 
> 
> Hope this helps,
> David Harrington
> ietfdbh at comcast.net
> +1-603-828-1401