Skip to main content

Advice for Safe Handling of Malformed Messages
draft-ietf-appsawg-malformed-mail-09

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft that was ultimately published as RFC 7103.
Authors Murray Kucherawy , Gregory N. Shapiro, Ned Freed
Last updated 2013-10-29 (Latest revision 2013-10-05)
Replaces draft-kucherawy-mta-malformed
RFC stream Internet Engineering Task Force (IETF)
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state Submitted to IESG for Publication
Document shepherd S Moonesamy
Shepherd write-up Show Last changed 2013-09-28
IESG IESG state Became RFC 7103 (Informational)
Consensus boilerplate Yes
Telechat date (None)
Responsible AD Barry Leiba
Send notices to appsawg-chairs@tools.ietf.org, draft-ietf-appsawg-malformed-mail@tools.ietf.org, sm+ietf@elandsys.com
IANA IANA review state IANA OK - No Actions Needed
draft-ietf-appsawg-malformed-mail-09
"Joe <joe@example.com>"@example.net

   where "example.net" is the domain name or host name of the handling
   agent making the interpretation.  Another possible interpretation is
   simply:

       To: "Joe" <joe@example.com>

7.1.7.  Naked Local-Parts

   [MAIL] defines a local-part as the user portion of an email address,
   and the display-name as the "user-friendly" label that accompanies
   the address specification.

   Some broken submission agents might introduce messages with only a
   local-part or only a display-name and no properly formed address.
   For example:

       To: Joe

   A submission agent ought to reject this or, at a minimum, append "@"
   followed by its own host name or some other valid name likely to
   enable a reply to be delivered to the correct mailbox.  Where this is
   not done, an agent receiving such a message will probably be
   successful by synthesizing a valid header field for evaluation using
   the techniques described in Section 7.5.2.

7.2.  Non-Header Lines

   Some messages contain a line of text in the header that is not a
   valid message header field of any kind.  For example:

       From: user@example.com {1}
       To: userpal@example.net {2}
       Subject: This is your reminder {3}
       about the football game tonight {4}
       Date: Wed, 20 Oct 2010 20:53:35 -0400 {5}

       Don't forget to meet us for the tailgate party! {7}

   The cause of this is typically a bug in a message generator of some
   kind.  Line {4} was intended to be a continuation of line {3}; it
   should have been indented by whitespace as set out in Section 2.2.3
   of [MAIL].

   This anomaly has varying impacts on processing software, depending on
   the implementation:

Kucherawy, et al.         Expires April 8, 2014                [Page 10]
Internet-Draft             Safe Mail Handling               October 2013

   1.  some agents choose to separate the header of the message from the
       body only at the first empty line (that is, a CRLF immediately
       followed by another CRLF);

   2.  some agents assume this anomaly should be interpreted to mean the
       body starts at line {4}, as the end of the header is assumed by
       encountering something that is not a valid header field or folded
       portion thereof;

   3.  some agents assume this should be interpreted as an intended
       header folding as described above and thus simply append a single
       space character (ASCII 0x20) and the content of line {4} to that
       of line {3};

   4.  some agents reject this outright as line {4} is neither a valid
       header field nor a folded continuation of a header field prior to
       an empty line.

   This can be exploited if it is known that one message handling agent
   will take one action while the next agent in the handling chain will
   take another.  Consider, for example, a message filter that searches
   message headers for properties indicative of abusive of malicious
   content that is attached to a Mail Transfer Agent (MTA) implementing
   option 2 above.  An attacker could craft a message that includes this
   malformation at a position above the property of interest, knowing
   the MTA will not consider that content part of the header, and thus
   the MTA will not feed it to the filter, thus avoiding detection.
   Meanwhile, the Mail User Agent (MUA) which presents the content to an
   end user, implements option 1 or 3, which has some undesirable
   effect.

   It should be noted that a few implementations choose option 4 above
   since any reputable message generation program will get header
   folding right, and thus anything so blatant as this malformation is
   likely an error caused by a malefactor.

   The preferred implementation if option 4 above is not employed is to
   apply the following heuristic when this malformation is detected:

   1.  Search forward for an empty line.  If one is found, then apply
       option 3 above to the anomalous line, and continue.

   2.  Search forward for another line that appears to be a new header
       field (a name followed by a colon).  If one is found, then apply
       option 3 above to the anomalous line, and continue.

Kucherawy, et al.         Expires April 8, 2014                [Page 11]
Internet-Draft             Safe Mail Handling               October 2013

7.3.  Unusual Spacing

   The following message is valid per [MAIL]:

       From: user@example.com {1}
       To: userpal@example.net {2}
       Subject: This is your reminder {3}
        {4}
        about the football game tonight {5}
       Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}

       Don't forget to meet us for the tailgate party! {8}

   Line {4} contains a single whitespace.  The intended result is that
   lines {3}, {4}, and {5} comprise a single continued header field.
   However, some agents are aggressive at stripping trailing whitespace,
   which will cause line {4} to be treated as an empty line, and thus
   the separator line between header and body.  This can affect header-
   specific processing algorithms as described in the previous section.

   This example was legal in earlier versions of the Internet Mail
   format standard.

   The best handling of this example is for a message parsing engine to
   behave as if line {4} was not present in the message and for a
   message creation engine to emit the message with line {4} removed.

7.4.  Header Malformations

   Among the many possible malformations, a common one is insertion of
   whitespace at unusual locations, such as:

       From: user@example.com {1}
       To: userpal@example.net {2}
       Subject: This is your reminder {3}
       MIME-Version : 1.0 {4}
       Content-Type: text/plain {5}
       Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}

       Don't forget to meet us for the tailgate party! {8}

   Note the addition of whitespace in line {4} after the header field
   name but before the colon that separates the name from the value.

   The acceptance grammar of [MAIL] permits that extra whitespace, so it
   cannot be considered invalid.  However, a consensus of
   implementations prefers to remove that whitespace.  There is no
   perceived change to the semantics of the header field being altered

Kucherawy, et al.         Expires April 8, 2014                [Page 12]
Internet-Draft             Safe Mail Handling               October 2013

   as the whitespace is itself semantically meaningless.  Therefore, it
   is best to remove all whitespace after the field name but before the
   colon and to emit the field in this modified form.

7.5.  Header Field Counts

   Section 3.6 of [MAIL] prescribes specific header field counts for a
   valid message.  Few agents actually enforce these in the sense that a
   message whose header contents exceed one or more limits set there are
   generally allowed to pass; they typically add any required fields
   that are missing, however.

   Also, few agents that use messages as input, including Mail User
   Agents (MUAs) that actually display messages to users, verify that
   the input is valid before proceeding.  Some popular open source
   filtering programs and some popular Mailing List Management (MLM)
   packages select either the first or last instance of a particular
   field name, such as From, to decide who sent a message.  Absent
   strict enforcement of [MAIL], an attacker can craft a message with
   multiple fields if that attacker knows the filter will make a
   decision based on one but the user will be shown the other.

   This situation is exacerbated when message validity is assessed, such
   as through enhanced authentication methods.  Such methods might cover
   one instance of a constrained field but not another, taking the wrong
   one as "good" or "safe".  An MUA, for example could show the first of
   two From fields to an end user as "good" or "safe" while an
   authentication method actually only verified the second.

   In attempting to counter this exposure, one of the following can be
   enacted:

   1.  reject outright or refuse to process further any input message
       that does not conform to Section 3.6 of [MAIL];

   2.  remove or, in the case of an MUA, refuse to render any instances
       of a header field whose presence exceeds a limit prescribed in
       Section 3.6 of [MAIL] when generating its output;

   3.  where a field has a limited instance count, combine additional
       instances into a single compound instance;

   4.  where a field can contain multiple distinct values (such as From)
       or is free-form text (such as Subject), combine them into a
       semantically identical single header field of the same name (see
       Section 7.5.1);

Kucherawy, et al.         Expires April 8, 2014                [Page 13]
Internet-Draft             Safe Mail Handling               October 2013

   5.  alter the name of any header field whose presence exceeds a limit
       prescribed in Section 3.6 of [MAIL] when generating its output so
       that later agents can produce a consistent result.  Any
       alteration likely to cause the field to be ignored by downstream
       agents is acceptable.  A common approach is to prefix the field
       names with a string such as "BAD-".

   Selecting a mitigation action from the above list, or some other
   action, must consider the needs of the operator making the decision,
   and the nature of its user base.

7.5.1.  Repeated Header Fields

   There are some occasions where repeated fields are encountered where
   only one is expected.  Two examples are presented.  First:

       From: reminders@example.com {1}
       To: jqpublic@example.com {2}
       Subject: Automatic Meeting Reminder {3}
       Subject: 4pm Today -- Staff Meeting {4}
       Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}

       Reminder of the staff meeting today in the small {6}
       auditorium.  Come early! {7}

   The message above has two Subject fields, which is in violation of
   Section 3.6 of [MAIL].  A safe interpretation of this would be to
   treat it as though the two Subject field values were concatenated, so
   long as they are not identical, such as:

       From: reminders@example.com {1}
       To: jqpublic@example.com {2}
       Subject: Automatic Meeting Reminder {3}
         4pm Today -- Staff Meeting {4}
       Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}

       Reminder of the staff meeting today in the small {6}
       auditorium.  Come early! {7}

   Second:

Kucherawy, et al.         Expires April 8, 2014                [Page 14]
Internet-Draft             Safe Mail Handling               October 2013

       From: president@example.com {1}
       From: vice-president@example.com {2}
       To: jqpublic@example.com {3}
       Subject: A note from the E-Team {4}
       Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}

       This memo is to remind you of the corporate dress {6}
       code.  Attached you will find an updated copy of {7}
       the policy. {8}
       ...

   As with the first example, there is a violation in terms of the
   number of instances of the From field.  A likely safe interpretation
   would be to combine these into a comma-separated address list in a
   single From field:

       From: president@example.com, {1}
             vice-president@example.com {2}
       To: jqpublic@example.com {3}
       Subject: A note from the E-Team {4}
       Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}

       This memo is to remind you of the corporate dress {6}
       code.  Attached you will find an updated copy of {7}
       the policy. {8}
       ...

7.5.2.  Missing Header Fields

   Similar to the previous section, there are messages seen in the wild
   that lack certain required header fields.  In particular, [MAIL]
   requires that a From and Date field be present in all messages.

   When presented with a message lacking these fields, the MTA might
   perform one of the following:

   1.  Make no changes

   2.  Add an instance of the missing field(s) using synthesized content
       based on data provided in other parts of the protocol

   Option 2 is recommended for handling this case.  Handling agents
   should add these for internal handling if they are missing, but
   should not add them to the external representation.  The reason for
   this advice is that there are some filter modules that would consider
   the absence of such fields to be a condition warranting special
   treatment (for example, rejection), and thus the effectiveness of
   such modules would be stymied by an upstream filter adding them in a

Kucherawy, et al.         Expires April 8, 2014                [Page 15]
Internet-Draft             Safe Mail Handling               October 2013

   way visible to other components.

   The synthesized fields should contain a best guess as to what should
   have been there; for From, the SMTP MAIL command's address can be
   used (if not null) or a placeholder address followed by an address
   literal (for example, unknown@[192.0.2.1]); for Date, a date
   extracted from a Received field is a reasonable choice.

   One other important case to consider is a missing Message-Id field.
   An MTA that encounters a message missing this field should synthesize
   a valid one using techniques described above and add it to the
   external representation, since many deployed tools use the content of
   that field as a common unique message reference, so its absence
   inhibits correlation of message processing.  Section 3.6.4 of [MAIL]
   describes advisable practise for synthesizing the content of this
   field when it is absent, and establishes a requirement that it be
   globally unique.

7.5.3.  Return-Path

   A valid message will have exactly one Return-Path header field, as
   per Section 4.4 of [SMTP].  Should a message be encountered bearing
   more than one, all but the topmost one is to be disregarded, as it is
   most likely to have been added nearest to the mailbox that received
   that message.

7.6.  Missing or Incorrect Charset Information

   MIME provides the means to include textual material employing
   character sets ("charsets") other than US-ASCII.  Such material is
   required to have an identified charset.  Charset identification is
   done using a "charset" parameter in the Content-Type header field, a
   charset label within the MIME entity itself, or the charset can be
   implicitly specified by the Content-Type (see [CHARSET]).

   It is unfortunately fairly common for required character set
   information to be missing or incorrect in textual MIME entities.  As
   such, processing agents should perform basic sanity checks, such as:

   o  US-ASCII contains bytes between 1 and 127 inclusive only
      (colloquially, "7-bit" data), so material including bytes outside
      of that range ("8-bit" data) is necessarily not US-ASCII.  (See
      Section 2.3.1 of [MAIL].)

   o  [UTF-8] has a very specific syntactic structure that other 8-bit
      charsets are unlikely to follow.

Kucherawy, et al.         Expires April 8, 2014                [Page 16]
Internet-Draft             Safe Mail Handling               October 2013

   o  Null bytes (ASCII 0x00) are not allowed in either 7-bit or 8-bit
      data.

   o  Not all 7-bit material is US-ASCII.  The presence of the various
      escape sequences used for character switching can be used as an
      indication of the various charsets based on ISO/IEC 2022, such as
      those defined in [ISO-2022-CN], [ISO-2022-JP], and [ISO-2022-KR].

   When a character set error is detected, processing agents should:

   a.  apply heuristics to determine the most likely character set and,
       if successful, proceed using that information; or

   b.  refuse to process the malformed MIME entity.

   A null byte inside a textual MIME entity can cause typical string
   processing functions to mis-identify the end of a string, which can
   be exploited to hide malicious content from analysis processes.
   Accordingly, null bytes require additional special handling.

   A few null bytes in isolation is likely to be the result of poor
   message construction practices.  Such nulls should be silently
   dropped.

   Large numbers of null bytes are usually the result of binary material
   that is improperly encoded, improperly labeled, or both.  Such
   material is likely to be damaged beyond the hope of recovery, so the
   best course of action is to refuse to process it.

   Finally, the presence of null bytes may be used as indication of
   possible malicious intent.

7.7.  Eight-Bit Data

   Standards-compliant email messages do not contain any non-ASCII data
   without indicating that such content is present by means of published
   SMTP extensions.  Absent that, MIME encodings are typically used to
   convert non-ASCII data to ASCII in a way that can be reversed by
   other handling agents or end users.

   The best way to handle non-compliant 8bit material depends on its
   location.

   Non-compliant 8bit in MIME entity content should simply be processed
   as if the necessary SMTP extensions had been used to transfer the
   message.  Note that improperly labeled 8bit material in textual MIME
   entities may require treatment as described in Section 7.6.

Kucherawy, et al.         Expires April 8, 2014                [Page 17]
Internet-Draft             Safe Mail Handling               October 2013

   Non-compliant 8bit in message or MIME entity header fields can be
   handled as follows:

   o  Occurrences in unstructured text fields, comments, and phrases,
      can be converted into encoded-words (see [MIME3] if a likely
      character set can be determined).  Alternatively, 8bit characters
      can be removed or replaced with some other character.

   o  Occurrences in header fields whose syntax is unknown may be
      handled by dropping the field entirely or by removing/replacing
      the 8bit character as described above.

   o  Occurrences in addresses are especially problematic.  Agents
      supporting [EAI] may, if the 8bit conforms to 8bit syntax, elect
      to treat the message as an EAI message and process it accordingly.
      Otherwise, it is in most cases best to exclude the address from
      any sort of processing -- which may mean dropping it entirely --
      since any attempt to fix it definitively is unlikely to be
      successful.

8.  MIME Anomalies

   The five-part set of MIME specifications includes a mechanism of
   message extensions for providing text in character sets other than
   ASCII, non-text attachments to messages, multi-part message bodies,
   and similar facilities.

   Some anomalies with MIME-compliant generation are also common.  This
   section discusses some of those and presents preferred mitigations.

8.1.  Missing MIME-Version Field

   Any message that uses [MIME] constructs is required to have a MIME-
   Version header field.  Without it, the Content-Type and associated
   fields have no semantic meaning.

   It is often observed that a message has complete MIME structure, yet
   lacks this header field.  It is prudent to disregard this absence and
   conduct analysis of the message as if it were present, especially by
   agents attempting to identify malicious material.

   Further, the absence of MIME-Version might be an indication of
   malicious intent, and extra scrutiny of the message may be warranted.
   Such omissions are not expected from compliant message generators.

Kucherawy, et al.         Expires April 8, 2014                [Page 18]
Internet-Draft             Safe Mail Handling               October 2013

8.2.  Faulty Encodings

   There have been a few different specifications of base64 in the past.
   The implementation defined in [MIME] instructs decoders to discard
   characters that are not part of the base64 alphabet.  Other
   implementations consider an encoded body containing such characters
   to be completely invalid.  Very early specifications of base64 (see
   [PEM], for example) allowed email-style comments within base64-
   encoded data.

   The attack vector here involves constructing a base64 body whose
   meaning varies given different possible decodings.  If a security
   analysis module wishes to be thorough, it should consider scanning
   the possible outputs of the known decoding dialects in an attempt to
   anticipate how the MUA will interpret the data.

9.  Body Anomalies

9.1.  Oversized Lines

   A message containing a line of content that exceeds 998 characters
   plus the line terminator (1000 total) violates Section 2.1.1 of
   [MAIL].  Some handling agents may not look at content in a single
   line past the first 998 bytes, providing bad actors an opportunity to
   hide malicious content.

   There is no specified way to handle such messages, other than to
   observe that they are non-compliant and reject them, or rewrite the
   oversized line such that the message is compliant.

   To ensure long lines do not prevent analysis of potentially malicious
   data, handling agents are strongly encouraged to take one of the
   following actions:

   1.  Break such lines into multiple lines at a position that does not
       change the semantics of the text being thus altered.  For
       example, breaking an oversized line such that a [URI] then spans
       two lines could inhibit the proper identification of that URI.

   2.  Rewrite the MIME part (or the entire message if not MIME) that
       contains the excessively long line using a content encoding that
       breaks the line in the transmission but would still result in the
       line being intact on decoding for presentation to the user.  Both
       of the encodings declared in [MIME] can accomplish this.

Kucherawy, et al.         Expires April 8, 2014                [Page 19]
Internet-Draft             Safe Mail Handling               October 2013

10.  Security Considerations

   The discussions of the anomalies above and their prescribed solutions
   are themselves security considerations.  The practises enumerated in
   this document are generally perceived as attempts to resolve security
   considerations that already exist rather than introducing new ones.
   However, some of the attacks described here may not have appeared in
   previous email specifications.

11.  IANA Considerations

   This document contains no actions for IANA.

   [RFC Editor: Please remove this section prior to publication.]

12.  References

12.1.  Normative References

   [EMAIL-ARCH]   Crocker, D., "Internet Mail Architecture", RFC 5598,
                  July 2009.

   [MAIL]         Resnick, P., "Internet Message Format", RFC 5322,
                  October 2008.

   [MIME]         Freed, N. and N. Borenstein, "Multipurpose Internet
                  Mail Extensions (MIME) Part One: Format of Internet
                  Message Bodies", RFC 2045, November 1996.

12.2.  Informative References

   [BINARYSMTP]   Vaudreuil, G., "SMTP Service Extensions for
                  Transmission of Large and Binary MIME Messages",
                  RFC 3030, December 2000.

   [CHARSET]      Melnikov, A. and J. Reschke, "Update to MIME regarding
                  "charset" Parameter Handling in Textual Media Types",
                  RFC 6657, July 2012.

   [DKIM]         Crocker, D., Ed., Hansen, T., Ed., and M. Kucherawy,
                  Ed., "DomainKeys Identified Mail (DKIM) Signatures",
                  RFC 6376, September 2011.

   [DSN]          Moore, K. and G. Vaudreuil, "An Extensible Message
                  Format for Delivery Status Notifications", RFC 3464,
                  January 2003.

   [EAI]          Yang, A., Steele, S., and N. Freed, "Internationalized

Kucherawy, et al.         Expires April 8, 2014                [Page 20]
Internet-Draft             Safe Mail Handling               October 2013

                  Email Headers", RFC 6532, February 2012.

   [ISO-2022-CN]  Zhu, HF., Hu, DY., Wang, ZG., Kao, TC., Chang, WCH.,
                  and M. Crispin, "Chinese Character Encoding for
                  Internet Messages", RFC 1922, March 1996.

   [ISO-2022-JP]  Murai, J., Crispin, M., and E. van der Poel, "Japanese
                  Character Encoding for Internet Messages", RFC 1468,
                  June 1993.

   [ISO-2022-KR]  Choi, U., Chon, K., and H. Park, "Korean Character
                  Encoding for Internet Messages", RFC 1557,
                  December 1993.

   [MIME3]        Moore, K., "MIME (Multipurpose Internet Mail
                  Extensions) Part Three: Message Header Extensions for
                  Non-ASCII Text", RFC 2047, November 1996.

   [PEM]          Linn, J., "Privacy Enhancement for Internet Electronic
                  Mail: Part I -- Message Encipherment and
                  Authentication Procedures", RFC 1113, August 1989.

   [RFC733]       Crocker, D., Vittal, J., Pogran, K., and D. Henderson,
                  Jr., "Standard for the Format of Internet Text
                  Messages", RFC 733, November 1977.

   [SMTP]         Klensin, J., "Simple Mail Transfer Protocol",
                  RFC 5321, October 2008.

   [URI]          Berners-Lee, T., Fielding, R., and L. Masinter,
                  "Uniform Resource Identifier (URI): Generic Syntax",
                  RFC 3986, January 2005.

   [UTF-8]        Yergeau, F., "UTF-8, a transformation format of ISO
                  10646", RFC 3629, 2003.

Appendix A.  RFC Editor Notes

   [RFC Editor Note: This section can be removed before publication.]

   I can't seem to figure out how to do this with xml2rfc, but the ISO-
   2022 reference above should contain the following URI:
   http://www.iso.org/iso/catalogue_detail.htm?csnumber=22747

Appendix B.  Acknowledgements

   The author wishes to acknowledge the following for their review and
   constructive criticism of this proposal: Dave Cridland, Dave Crocker,

Kucherawy, et al.         Expires April 8, 2014                [Page 21]
Internet-Draft             Safe Mail Handling               October 2013

   Jim Galvin, Tony Hansen, John Levine, Franck Martin, Alexey Melnikov,
   and Timo Sirainen

Authors' Addresses

   Murray S. Kucherawy

   EMail: superuser@gmail.com

   Gregory N. Shapiro

   EMail: gshapiro@proofpoint.com

   N. Freed

   EMail: ned.freed@mrochek.com

Kucherawy, et al.         Expires April 8, 2014                [Page 22]