Internet Engineering Task Force (IETF)              Phillip Hallam-Baker
Internet-Draft                                         Comodo Group Inc.
Intended Status: Standards Track                           Rob Stradling
Expires: April 10, 2015                                   Comodo CA Ltd.
                                                         October 7, 2014


                          Compressed CRL Sets
                 draft-hallambaker-compressedcrlset-00

Abstract

   An efficient encoding for Certificate Revocation Lists is described
   based on a Disjoint Set encoding technique. When applied to the
   population of currently issued certificates enroled in the
   Certificate Transparency Notary network, the data set was reduced by
   97% without introduction of false positives or false negatives.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.









Hallam-Baker                 April 10, 2015                     [Page 1]


Internet-Draft             Compressed CRLSets               October 2014

Table of Contents

   1.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  3
      1.1.  Requirements Language . . . . . . . . . . . . . . . . . .  3
   2.  The Revocation Problem . . . . . . . . . . . . . . . . . . . .  3
      2.1.  PKIX Certificate Status Mechanisms  . . . . . . . . . . .  3
      2.2.  Short Lived Certificates  . . . . . . . . . . . . . . . .  4
      2.3.  CRL Sets  . . . . . . . . . . . . . . . . . . . . . . . .  4
      2.4.  Compressed CRL Sets . . . . . . . . . . . . . . . . . . .  5
   3.  Efficient Encoding of Disjoint Sets  . . . . . . . . . . . . .  5
      3.1.  Problem Statement . . . . . . . . . . . . . . . . . . . .  5
      3.2.  Standard Hash Function. . . . . . . . . . . . . . . . . .  6
      3.3.  Parameterized Hash Function.  . . . . . . . . . . . . . .  6
      3.4.  Shortest Discrimination Prefix  . . . . . . . . . . . . .  6
         3.4.1.  Sorted List of Hashes  . . . . . . . . . . . . . . .  7
         3.4.2.  Difference Encoding  . . . . . . . . . . . . . . . .  7
         3.4.3.  Further optimizations. . . . . . . . . . . . . . . .  7
   4.  Compressed CRL Set Encoding  . . . . . . . . . . . . . . . . .  8
   5.  Experimental Results . . . . . . . . . . . . . . . . . . . . .  8
   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  8
      6.1.  Normative References  . . . . . . . . . . . . . . . . . .  8
      6.2.  Informative References  . . . . . . . . . . . . . . . . .  8
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .  8































Hallam-Baker                 April 10, 2015                     [Page 2]


Internet-Draft             Compressed CRLSets               October 2014

1. Definitions

1.1. Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

2. The Revocation Problem

   Certificate revocation is a critical element of the Web PKI design. A
   certificate issuer may revoke a certificate for many reasons
   including:

      *  The certificate was issued in error or due to a CA breach.

      *  The certificate holder has breached some condition set by the
         issuer.

      *  The certificate is being used for or in connection with illegal
         purposes.

      *  The certificate holder has requested the certificate be
         revoked.

      *  The private key corresponding to the public key in the
         certificate has been compromised.

   While certificates are occasionally issued in error or due to an
   issuer breach, these cases are very rare. Impersonating another
   company is much more difficult than stealing a private key from a
   legitimate company. The ability to revoke certificates is therefore a
   critical element in the WebPKI risk mitigation strategy.

   At present, there are approximately TBS certificates unexpired
   certificates issued in the WebPKI of which TBS (approximately 10%)
   have been revoked.

2.1. PKIX Certificate Status Mechanisms

   PKIX [RFC5280] describes two mechanisms for communicating certificate
   status.

      *  Certificate Revocation Lists (CRLs) list the serial numbers of
         certificates that a CA has issued that are not currently valid.

      *  Online Certificate Status Protocol (OCSP) describes an online
         service that a relying party can query to request the current
         status of a certificate.





Hallam-Baker                 April 10, 2015                     [Page 3]


Internet-Draft             Compressed CRLSets               October 2014

   The scaling problems inherent in the use of CRLs quickly became
   apparent as the size of the WebPKI grew. The length of the CRLs
   issued by the large CAs quickly grew to several Mb. While techniques
   such as delta encoding and partitioning had the protential to provide
   temporary relief, the scaling issue was only mitigated, not solved.

   OCSP [RFC6960] provides a scalable solution but in its original
   conception for TLS introduces a new party to every communication.
   Each time a client attempts to establish a TLS connection it looks
   for an unexpired OCSP token to establish the validity of the
   certificate presented. If no unexpired cached token is available, a
   fresh one is obtained from the OCSP distribution. This introduces a
   risk of traffic analysis by either the OCSP service provider itself
   or any party that can view the traffic to the OCSP service.

   OCSP Stapling [RFC6066] was introduced to the TLS protocol as a
   mechanism to avoid the need to obtain OCSP tokens from a third party
   by allowing them to be passed in band in the TLS conversation.

2.2. Short Lived Certificates

   While OCSP stapling removes the privacy, reliability and performance
   concerns of the original OCSP design, the certificate issuer is still
   required to issue the OCSP token and the server using the certificate
   is still required to fetch it. Since issuing an OCSP token on a daily
   basis is tantamount to issuing a new certificate, why not reissue the
   certificate on a daily basis and do away with the OCSP token
   entirely?

   In principle, use of short lived certificates is simpler for all the
   parties involved: CAs, subjects and relying parties. But before short
   lived certificates can be used in practice, new infrastructure is
   required to support their use. In particular client software has to
   be rewritten so that it recognizes a certificate with a short
   validity interval (e.g. 36 hours) as carrying an implicit assertion
   of validity and a mechanism by which the server obtains fresh
   certificates must be specified and implemented. For work on the
   latter problem, see [I-D.hallambaker-omnipublish].

2.3. CRL Sets

   CRL Sets are curated CRLs, typically issued by an application
   provider that list the revoked certificates regarded to pose the
   greatest concern of abuse.

   Use of CRL Sets has the advantage that they may allow an application
   provider to react to a security incident more quickly than the
   issuing CA. But they only provide revocation information for a
   limited number of certificates.





Hallam-Baker                 April 10, 2015                     [Page 4]


Internet-Draft             Compressed CRLSets               October 2014

   As with traditional CRLs, various techniques such as delta encoding
   may be used to limit the size of individual CRLs but these do not
   reduce the total volume of data that must be exchanged.

2.4. Compressed CRL Sets

   Compressed CRL Sets are an alternative encoding for CRLs that permit
   very much higher encoding density than traditional CRLs. Rather than
   encoding the set of revoked certificates, Compressed CRLs encode the
   difference between the set of valid unexpired certificates and the
   set of invalid unexpired certificates. This allows the average number
   of bits required to encode the set to be reduced from 96 bits per
   certificate (using the certificate serial number) to approximately 6.

   While Compressed CRL Sets currently offer a practical answer to the
   problem of revocation in the WebPKI, short lived certificates
   actually scale better.

   Although Compressed CRL Sets provide much better scaling properties
   than traditional CRLs, the size of the CRL Set still increases
   linearly with the product of the number of revoked certificates and
   the log of the revocation ratio. Thus while use of Compressed CRL
   Sets will remain practical as long as the growth in the number of
   WebPKI certificates issued grows at a slower rate than additional
   communications bandwidth becomes available, this cannot be
   guaranteed.

   Short lived certificates in contrast do not require any additional
   revocation data and may be regarded as offering perfect scaling. The
   chief drawback to relying on short lived certificates is that the
   minimum practical validity interval is 48 hours which is also the
   time period in which the vast majority of the harm caused in a
   criminal attack occurs. Also deployment of short lived certificates
   relies on new certificate lifecycle management infrastructure that is
   not yet developed.

   Use of Compressed CRLs in combination with Short Lived certificates
   offers advantages over both. In the short term Compressed CRLs
   provide a mechanism for reporting status on the population of long
   expiry certificates until the infrastructure is deployed to enable
   use of the short lived approach. In the long term, Compressed CRLs
   provide an efficient mechanism to narrow the residual vulnerability
   window from days to minutes.

3. Efficient Encoding of Disjoint Sets

   CRL Sets are compressed using a selected variant of the approach
   described in [[HBS2014 - To be specified]].






Hallam-Baker                 April 10, 2015                     [Page 5]


Internet-Draft             Compressed CRLSets               October 2014

3.1. Problem Statement

   Given two disjoint sets of data entries A and B in which |A| >= |B|,
   provide a boolean function f (x) that is efficient in both memory and
   time such that:

      *  f(x) is true if x is a member of A

      *  f(x) is false if x is a member of B

      *  f(x) can return true or false otherwise

   For ease of comparison we will consider an example in which set A has
   10 million members and set B has 1 million. This corresponds to a
   revocation use case in which there are 11 million unexpired
   certificates and 10% have been revoked.

   Note that while it is in theory possible for the revocation ratio to
   exceed 50% and thus for |A| < |B|, this case is easily handled with a
   binary flag to indicate that Set A represents the revoked
   certificates and B those that are valid.

3.2. Standard Hash Function.

   One approach to this problem is to first reduce the data entries
   using a hash function and compile a list of the hash values
   corresponding to the entries of set B. This has the advantage of
   simplicity but the corresponding data set is very large. For the
   example case, the implementation of f(x) requires 32MB of data if a
   256 bit hash is used.

3.3. Parameterized Hash Function.

   While using a 256 bit hash leaves an infintesimal probability of
   collisions, the same is true of 128 bit or even shorter hashes. If a
   parameterized hash function (e.g. MAC-SHA1) is used we can repeat the
   hash construction process multiple times with different parameters
   and check to see the shortest truncation that allows the sets to be
   distinguished. This allows the example case to be addressed using
   only 6MB of data.

3.4. Shortest Discrimination Prefix

   The parameterized hash function encodes enough information to
   distinguish any member of Set A from any member of set B, but this is
   more information than is necessary. It is sufficient to distinguish a
   member of set B from the two members of set A that are adjacent to
   it.






Hallam-Baker                 April 10, 2015                     [Page 6]


Internet-Draft             Compressed CRLSets               October 2014

   We begin by calculating a hash value H(x) of each member of A and B
   using a hash function that returns a unique value for each entry to
   create the corresponding hash sets AH, BH.

   The hash function used for this purpose does not need to meet any
   special cryptographic properties such as first or second pre-image
   resistance.

   Let P(x,y) be a function of two bit strings, x, y such that P(x,y) is
   true if and only if y is a prefix of x. That if y is n bits long, the
   first n bits of x and y are identical.

   Let M(b, AH) be a function that returns the shortest prefix of b that
   is not a prefix of any member of AH.

   We apply the function M to each member of BH and remove duplicates to
   create the set BP.

   The function f(A,B) can now be implemented as follows: f(x) = false
   if there is a member p of BP such that p is a prefix of the hash
   value of x (i/e/ P(H(x),p) is true)f(x) = true otherwise

   Note that the value of f(x) returned if x is not a member of A or B
   is undefined.

   A space and time efficient implementation of the function f(x) is
   arrived at by an efficient encoding of the set BP.

3.4.1. Sorted List of Hashes

   The set BP may be encoded as a simple list.

   Sorting the list permits efficient access using standard techniques
   applicable to searching a hash table.

   For the example case, the average shortest prefix is 24?? bits giving
   a total data set size of 3Mb.

3.4.2. Difference Encoding

   In a sorted list of hashes, adjacent hashes will typically only vary
   in the last few bits. Since it is only the difference between the
   hashes that contains information, we can encode this information
   without loss of information overall.

   For the example case, the average number of bits required for a
   difference encoding is 6 bits and the total data volume required is
   750KB.






Hallam-Baker                 April 10, 2015                     [Page 7]


Internet-Draft             Compressed CRLSets               October 2014

3.4.3. Further optimizations.

   Further optimizations have been developed permitting an additional
   reduction in the data volume by half over the difference technique.

4. Compressed CRL Set Encoding

   TBS specify an encoding of CRL data based on the above.

   Following the general trend for JSON and the need for compactness,
   the use of JSON-B is ideal.

5. Experimental Results

   TBS here extract the lab data from the prototype implementations
   using the encoding described above.

6. References

6.1. Normative References

   [I-D.hallambaker-omnipublish]  Hallam-Baker, P, "OmniBroker
              Publication Protocol", Internet-Draft draft-hallambaker-
              omnipublish-00, 22 May 2014.

   [RFC6960]  Santesson, S.,Myers, M.,Ankney, R.,Malpani, A.,Galperin,
              S.,Adams, C., "X.509 Internet Public Key Infrastructure
              Online Certificate Status Protocol - OCSP", RFC 6960, June
              2013.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

6.2. Informative References

   [RFC6066]  Eastlake, D., "Transport Layer Security (TLS) Extensions:
              Extension Definitions", RFC 6066, January 2011.

   [RFC5280]  Cooper, D.,Santesson, S.,Farrell, S.,Boeyen, S.,Housley,
              R.,Polk, W., "Internet X.509 Public Key Infrastructure
              Certificate and Certificate Revocation List (CRL)
              Profile", RFC 5280, May 2008.

Authors' Addresses

   Phillip Hallam-Baker
   Comodo Group Inc.

   philliph@comodo.com





Hallam-Baker                 April 10, 2015                     [Page 8]


Internet-Draft             Compressed CRLSets               October 2014

   Rob Stradling
   Comodo CA Ltd.

   rob.stradling@comodo.com


















































Hallam-Baker                 April 10, 2015                     [Page 9]