Last Call Review of draft-ietf-precis-framework-15
review-ietf-precis-framework-15-secdir-lc-kaufman-2014-04-24-00

Request Review of draft-ietf-precis-framework
Requested rev. no specific revision (document currently at 23)
Type Last Call Review
Team Security Area Directorate (secdir)
Deadline 2014-04-22
Requested 2014-04-10
Draft last updated 2014-04-24
Completed reviews Genart Last Call review of -15 by Tom Taylor (diff)
Genart Last Call review of -22 by Tom Taylor (diff)
Secdir Last Call review of -15 by Charlie Kaufman (diff)
Opsdir Last Call review of -15 by Tim Wicinski (diff)
Assignment Reviewer Charlie Kaufman
State Completed
Review review-ietf-precis-framework-15-secdir-lc-kaufman-2014-04-24
Reviewed rev. 15 (document currently at 23)
Review result Has Nits
Review completed: 2014-04-24

Review
review-ietf-precis-framework-15-secdir-lc-kaufman-2014-04-24






I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG.  These comments were written primarily for the benefit of the security area directors.  Document
 editors and WG chairs should treat these comments just like any other last call comments.




 




This document concerns international character sets. You might intuitively think that international character sets would have few if any security considerations, but you would be wrong. Many security mechanisms depend on the ability to
 recognize that two identifiers refer to the same entity and inconsistent handling of international character sets can result in two different pieces of code disagreeing as to whether two identifiers match and this has led to a number of serious security problems.




 




This document defines 18 categories of characters within the UNICODE character set, with the intention that systems that want to accept subsets of UNICODE characters in their identifiers specify profiles referencing this document, and it
 defines two initial classes (IdentifierClass and FreeformClass) that could be used directly by lots of protocol specifications.




 




While I see no problems with this document, it does seem like a missed opportunity to specify some things that are very important in the secure use of international character sets. The most important of these is a rule for determining whether
 two strings should be considered to be equivalent. It is very common in both IETF protocols and in operating system object naming to adopt a preserve case / ignore case model. That means that if an identifier is entered in mixed case, the mixed case is preserved
 as the identifier but if someone tries to find an object using an identifier that is identical except for the case of characters, it will find the object. Further, in instances where uniqueness of identifiers is enforced (e.g. user names or file names), a
 request to create a second identifier that differs only in the case of the characters from an existing one will fail.




 




These scenarios require that if be well defined whether two characters differ only in case, and while that is an easy check to make in ASCII with 26 letters that have upper and lower case versions, the story is much more complex for some
 international character sets. Worse, case mapping of even ASCII characters can change based on the “culture”. The most famous example is the Turkish undotted lower case ‘i’ and uppercase dotted ‘I’ which caused security bugs because mapping “FILE” to lowercase
 in the Turkish Locale did not result in the string “file”. There are also cases where two different lowercase characters are both mapped to the same uppercase character. It is a scary world out there.




 




To be used safely from a security standpoint, there must be a standardized way to compare two strings for equivalence that all programs will agree on. Programs will still have bugs, but when two programs interpret equivalence differently
 it is important that it be possible to determine objectively which one is wrong. The ideal way to do this is to have a canonical form of any string such that two strings are equivalent if their canonical forms are identical.




 




Section “10.4 Local Character Set Issues” acknowledges this problem, but offers no solution.




 




In section “10.6 Security of Passwords”, this document recommends that password comparisons not ignore case (and I agree). But for passwords in particular, it is vital that they be translated to a canonical form because they are frequently
 hashed and the hashes must test as identical. One rarely has the luxury of comparing passwords character by character and deciding whether the characters are “close enough”.




 




Section “10.5 Visually Similar Characters” discusses another hard problem: characters that are entirely distinct but are visually similar enough to mislead users. This problem occurs even without leaving ASCII in the form of the digit ‘0’
 vs the uppercase letter ‘O’ and triple of the digit ‘1’, the lowercase letter ‘l’, and the uppercase letter ‘I’. In some fonts, various of these are indistinguishable. International character sets introduce even more such collisions. To the extent that we
 expect users to look at URLs like https://

www.fideIity.com

 and recognize that something is out of place, we have a problem. It is probably best addressed by having tables of “looks similar” characters and disallowing the
 issuance of identifiers that look visually similar to existing ones in places like DNS registries and other places where this problem arises. Having a document that lists the doppelganger character equivalents would be a useful first step towards deploying
 such restrictions.




 




I suppose it is too much to expect this document to address either of these issues, but I couldn’t resist suggesting it.




 




                --Charlie