draft-crocker-spam-techconsider-00

Network Working Group                             D. Crocker
Internet Draft                                   Brandenburg
                                                 28 Apr 2003

Expires: <10-04>





                  Technical Considerations
                 for Spam Control Mechanisms
            draft-crocker-spam-techconsider-00.txt


     This document is an Internet-Draft and is in full
     conformance with all provisions of Section 10 of
     RFC2026. Internet-Drafts are working documents of the
     Internet Engineering Task Force (IETF), its areas, and
     its working groups.  Note that other groups may also
     distribute working documents as Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum
     of six months and may be updated, replaced, or
     obsoleted by other documents at any time.  It is
     inappropriate to use Internet-Drafts as reference
     material or to cite them other than as "work in
     progress."

     The list of current Internet-Drafts can be accessed at

          http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be
     accessed at

          http://www.ietf.org/shadow.html.
     Copyright (C) The Internet Society (2003).  All Rights
     Reserved.



SUMMARY

     Internet mail has operated as an open and unfettered
     channel between originator and recipient. This invites
     some abuses, called spam, such as burdening recipients
     with unwanted commercial email. Spam has become an
     extremely serious problem, is getting much worse, and
     is proving difficult (or impossible) to eliminate. The
     most practical goal is to bring spam under reasonable
     control; it will require an on-going, adaptive effort,
     with stochastic rather than complete results. This note
     discusses available points of control in the Internet
     mail architecture, considerations in using any of those
     points, and opportunities for creating Internet
     standards to aid in spam control efforts.  It offers
     guidance about likely trade-offs (benefits and
     limitations.)



CONTENTS

     1.   Spam And Consent
     2.   Email Architecture Control Points
     3.   Administrative And Legal Mechanisms
     4.   Filtering
          4.1. Policies
          4.2. Explicit Lists
          4.3. Content Analysis
          4.4. Negotiation
          4.5. Traffic Analysis
     5.   Infrastructure Enhancement
     6.   Evaluating Technical Approaches
          6.1. Adoption
          6.2. Burden
          6.3. Scaling
          6.4. Scenarios
     7.   Security Considerations
     8.   Acknowledgements
     9.   AuthorsÆ Addresses



1.   SPAM AND CONSENT

     Internet mail has operated as an open and unfettered
     channel between originator and recipient.  It has
     always suffered from some degree of abuse, in which
     originators impose on recipients inappropriately.  In
     recent years, a version of this abuse has grown
     substantially.  Called spam, its definition varies from
     "unsolicited commercial email" to "any email the
     recipient does not want".  Often there are no technical
     differences between spam and "acceptable" email. Their
     format, content and even aggregate traffic patterns may
     be identical. Hence spam is a problem for fundamentally
     non-technical reasons, yet the Internet technical
     community must pursue technical responses to it.  The
     lack of strong community consensus on a single, precise
     definition makes this particularly challenging.

     For most working discussions, the term "Unsolicited
     Bulk Email" is sufficient.  The salient point is that
     it is mass-mailings that are of the broadest concern.
     More detailed discussion must, of course, be precise in
     the definition of "unsolicited" and usually must
     distinguish between different types of mail, such as
     commercial, religious, political or personal.

     The simplistic -- but entirely adequate -- summary of
     the impact of spam on Internet mail is that it is an
     extremely serious problem, it is getting much worse,
     and it is proving difficult (or impossible) to
     eliminate.  Spam is generated by a wide range of clever
     originators and it always will be.

     Instead of thinking of Spam as a disease that might be
     eliminated, it is more useful to think about crime, war
     and cockroaches. It is not realistic to expect to
     eliminate any of these, no matter how much anyone might
     wish otherwise. Therefore the best we can hope to
     accomplish is to bring spam under reasonable control
     and that control will require an on-going, adaptive
     effort, with stochastic rather than complete results.
     We need multiple, adaptive techniques. As spam changes,
     so must our mechanisms. Different sets of mechanisms
     will be appropriate for different circumstances.

     In other words spam has become a permanent part of the
     Internet mail experience and efforts to control it may
     only reduce it to a tolerable level, rather than
     eliminate it. It is somewhat comforting to remember
     that an individual spam is not damaging.  Rather it is
     the quantity of spam that poses a threat.  Therefore it
     is acceptable for spam control mechanisms to be
     imperfect.

     This note discusses available points of control in the
     Internet mail architecture, considerations in their
     use, definitions of terminology and opportunities for
     creating Internet standards.  It also offers guidance
     about likely trade-offs (benefits and limitations.)

     The note does not offer an analysis of the types of
     spam or the types of attacks used in sending spam, nor
     is it intended to specify solutions. Similarly, the
     note does not discuss fine-grained details, such as the
     arguments associated with single opt-in mechanisms,
     versus double opt-in.  These points are essential to
     the engineering of particular solutions, but only as
     refinements after the larger architectural and system
     control choices are made.

     COMMENT:       This document is intended to evolve,
                    based on feedback.  Comments are eagerly
                    sought, preferably in the form of
                    suggested text changes, and preferably
                    on the ASRG mailing list, at
                    <mailto:asrg@permissiontechnology.com>



2.   EMAIL ARCHITECTURE CONTROL POINTS

     Email transmission sequences can touch many systems,
     between the originator and the recipient.  However for
     most discussions about control, only five major
     components are important:

         Originator      Intermediary      Recipient
          Service          Service          Service
     +---------------+                 +---------------+
     | UA.o -> MTA.o |  ->  ISP.i  ->  | MTA.r -> UA.r |
     +---------------+                 +---------------+


     UA.o:     The originator's user agent, typically
               operated by the user and under their direct
               control

     MTA.o:    The mail transfer agent service associated
               with the originator's environment, possibly
               operated by the sender and possibly operated
               under separate control, such as by their
               employer.

     ISP.i:    The IP and/or mail transfer agent service(s)
               operated by independent third-part(ies).

     MTA.r:    The mail transfer agent service associated
               with the recipient's environment

     UA.r:     The recipient's user agent

     In many organizations, the MTA service is multi-stage,
     such as including a department MTA and an Internet
     "firewall" MTA. This distinction is of fundamental
     importance for making software and operations
     decisions, but it does not have a significant impact on
     a discussion about points of control.  By contrast, the
     distinction between originator's service, recipient's
     service and any independent third parties is essential
     to this larger examination.  These are separate,
     independent administrative environments and are subject
     to different policies.  In particular, note that a
     discussion about using control points hinges on the
     scope of the control to be exercised.

     Besides constituting a major burden to recipients, the
     volume of spam traffic has become a serious problem for
     transit services.  Hence a precept in controlling spam
     is to seek control as close to the source as possible.
     The fewer downstream resources consumed by spam, the
     better.  Of course the ideal would be a mechanism in
     UA.o that would prevent spam from being sent in the
     first place.  Indeed, legal remedies seek to affect a
     sender's motivations, so that they will not send the
     spam at all.

     Unfortunately software control of spam in UA.o cannot
     be assumed, because that software is usually under the
     control of the originator.  If they wish to bypass any
     control mechanisms in UA.o, they will find a way. The
     same may be true of MTA.o. Hence Internet-wide designs
     of spam control must assume that UA.o and MTA.o may
     cooperate to generate and transmit spam.  Efforts to
     control either of these components may be sought as an
     adjunct, where they are operated by an independent
     service, but it must not be relied on.

     Wherever the detection mechanism is placed, the
     critical challenge is to identify spam in real time, if
     its relaying and delivery are to be stopped.  The other
     avenue is post-hoc removal of the right to make further
     use of the MTA service.  This may have strong utility
     for controlling spammers needing to operate within
     acceptable social bounds.  It will have no effect upon
     spammers who avoid accountability.



3.   ADMINISTRATIVE AND LEGAL MECHANISMS

     Both government law and service provider contracts can
     be used for defining unacceptable behavior and the
     remedies available when there are violations. There are
     two major problems with this administrative control of
     spam.  One is that a spammer often cannot be
     identified.  There are many opportunities for anonymous
     posting of email, such as through Internet cafes,
     transient access services and free email services. The
     second problem is that the sender of spam may not be in
     the jurisdiction seeking to exercise control, or a
     jurisdiction responsive to the recipient's
     jurisdiction.  The Internet is global.  Unlike postal
     bulk mail, the cost of sending spam over the Internet
     does not change as the mail crosses jurisdictional
     boundaries.

     Hence it seems likely that use of administrative
     procedures can be effective for controlling
     "responsible" spam.  That is, spam sent by
     organizations operating as accountable social
     participants, perhaps indulging in overly aggressive
     policies, but still desiring to remain socially
     tolerable.  The large number of "rogue" spammers is not
     similarly burdened.



4.   FILTERING

     The technical mechanism for real-time detection and
     handling of spam is a filter, placed at ISP.i, MTA.r
     and/or UA.r.  A filter has two functions: qualification
     and action. Action is usually either adding a special
     label to the message or disposing of it.

     Qualification tests whether a message is spam.  Test
     results are:

          Positive:      Message matches the test
                         criteria.

          Negative:      Message fails to match the
                         test criteria.

     When the tests are heuristic or statistical, some
     portion of the results will be incorrect.  These are
     classed as:

           False         Message matches test criteria,
           Positive      but the criteria are too
           (FP):         aggressive.

           False         Message fails to match the test
           Negative      criteria, but the criteria are
           (FP):         not sufficiently strong.

     Filters are used for two, complementary policies:

          Acceptance:    Approves mail for delivery.

          Rejection:     Withholds or refuses permission for
                         relaying or delivery.

     Note that rules for acceptance are equally subject to
     error.  However Acceptance rules usually employ simple,
     explicit criteria rather than heuristics, so that FP
     and FN results are not usually a concern.  Hence FP and
     FN discussion is usually about Rejection rules.


4.1. Policies

     The simplest model for an assessment list is to have
     entries containing a single, simple  attribute, such as
     sender email address or source system IP address or
     domain name.

           Standards      1.   Control protocol between
           Opportunity:   recipient and filtering service
                          server, to permit specifying
                          policies and specific rules.

                          2.   Modify SMTP delivery status
                          notifications to avoid flooding
                          innocent  mailboxes because of
                          forged senders. [Needs
                          clarification. /ed]

                          3.   Codify best current practices
                          of filters to minimize sending
                          DSN. [Cited by VS; needs
                          clarification. /ed]

                          4.   Codify DSN and SMTP status
                          message wording, such as saying
                          that rejections resulting from
                          filtering should include a URL for
                          an extended explanation. [Needs
                          clarification. /ed]

                          5.   Replace SMTP.

     The idea of replacing SMTP is appealing because it
     permits thinking in terms of creating an infrastructure
     that has accountability and restrictions built in.
     Unfortunately an installed base the size of the
     Internet is not likely to make such a change anytime
     soon.  It seems far more likely that successful spam
     control mechanisms will be introduced as increments to
     the existing Internet mail service.


4.2. Explicit Lists

     The simplest method of testing is to have explicit
     lists of simple identifier criteria, such a From
     address or IP address.

     Pre-assessed senders are entered into a:

          Whitelist:     For automatic Acceptance

          Blacklist:     For automatic Rejection.

     One approach to maintaining Whitelists and Blacklists
     is to make explicit entries into them, manually.  This
     is often what a spam control service will propagate to
     its subscribers. Most such services are for
     Blacklisting "known" spammers.

     A difficulty with listing services is the set of
     criteria used for adding and removing senders or sites.
     These policies usually need to be explicit, objective
     and documented, as well as consistently applied. Even
     then they are attractive targets for lawsuits claiming
     inappropriate listing.

     For assessments based on the identity of the sender,
     rather than the content of the message, another concern
     is validation of the key attribute used for
     identification.  What if the value for that attribute
     is set falsely?  For example, what if email was not
     sent by the address listed in the From field?

          Standards      6.   List format and exchange, to
          Opportunity:   permit sharing Whitelist and
                         Blacklist entries

                         7.   Format and access to filter
                         logs, such as among MX secondaries.
                         [Suggested by VS; needs
                         clarification. /ed]


4.3. Content Analysis

     Filters look for message attributes, such as strings of
     text in the headers or content of the message being
     inspected.  Other attributes include the address or
     domain name of the originating system, or the
     occurrence of the same message content in multiple
     messages near the same time. Simple filters look for
     any occurrence of specific strings. A more powerful
     approach to content analysis looks for multiple sets of
     these strings, assigns a score to each occurrence; it
     then labels spam according to the aggregate score.

     Rule creation is done manually, or by a service, or by
     analysis of a known corpus of messages.  A service
     observes email traffic at many Internet locations and
     receives reports as recipients see new occurrences of
     spam. The service then propagates new rules to its
     subscribers. The analytic approach performs empirical
     rule creation, using statistical (Bayesian) techniques
     that discern string occurrences in known spam, versus
     mail that is known not to be spam.

     As rules become common, spammers adapt their messages
     to bypass filters, so that existing rules quickly
     become far less effective.  Hence long-term filter use
     must have a base of rules that is continually modified.
     Empirical rules generation must be repeated, or must
     operate continuously, analyzing all incoming mail.

     Manual rule maintenance is simply not viable for
     typical users; the effort is far too great. A concern
     about services is that they are inherently post-hoc.
     They are always updating the rule-set after an "attack"
     commences, so that some spam is certain to reach some
     recipients; however the view that a small amount of
     spam is not dangerous mitigates this concern.  Lastly,
     methods using automated analysis rely on heuristics, or
     guesses.  They are certain to have some FNs that permit
     real spam to reach the recipients, and some FPs that
     incorrectly label legitimate mail as spam.

     Any effective, long term filtering mechanism must have
     automatic or semi-automatic rule creation and must
     upgrade the set of rules continuously or periodically.

           Standards     8.   Rule format and exchange, to
           Opportunity:  permit sharing effective rules.

                         9.   Sample message labeling and
                         exchange, to permit submission of
                         candidate content to remote service

                         10.  Hash-based identifier of
                         content


4.4. Negotiation

     In addition to real-time analysis, a recipient may
     engage in an explicit negotiation with the sender, to
     validate them. When this is performed at the time of
     message receipt, it is called a Challenge-Response (CR)
     mechanism.

     CR introduces delay in message receipt and creates at
     least one additional email round-trip exchange for
     every new sender/recipient pair.  This is a substantial
     burden both on participants and on the transit service.
     Senders often refuse to respond to the challenge, so
     that the mechanism dissuades senders from all but the
     most urgent communications.  Also the delay imposed by
     CR can render time-sensitive messages useless.

     As with other forms of Internet-based attack, effort is
     often divided into two phases.  The first assesses
     details about the target and the second uses them.  For
     spam, the assessment phase of the process seeks to
     discover valid email addresses.  CR mechanisms suffer
     from providing that validation.

           Standards     11.  CR protocol, to permit
           Opportunity:  automated interaction between the
                         recipient's system and the
                         sender's system.


4.5. Traffic Analysis

     Spam is often referred to as "unsolicited bulk mail" to
     highlight that senders typically post very large
     amounts quickly. Opt-in (subscription) email also
     demonstrates this traffic pattern.  Still there is
     benefit in measuring aggregate email behavior.

           Standards      12.  Traffic reporting protocol,
           Opportunity:   to permit collaboration among
                          independent administrations.



5.   INFRASTRUCTURE ENHANCEMENT

     Enhancement of underlying Internet services might
     reduce the effectiveness of some spam transmission
     mechanisms.  For example many spammers prefer to send
     to domain name service MX secondaries because
     secondaries are often not as well filtered as MX
     primaries.  Because of the lack of MX secondary
     coordination protocols, the best advice for all but
     large sites is to stop using MX secondaries.

           Standards     13.  MX secondary coordination
           Opportunity:  protocol. [Suggested by VS; might
                         need clarification. /ed]

                         14.  Best Current Practises  (BCP)
                         documentation of preferred MTA
                         operation for spam control

                         15.  BCPs for other services
                         operating to control spam

     Postal mail imposes a fee on the sender for each
     message that is sent. Such a fee makes the cost of
     sending significant, and proportional to the amount
     sent.  In contrast, current Internet mail is very
     nearly free to the sender.  Hence there is interest in
     exploring "sender pays" email. One form of sender-pays
     is identical to postal stamping.  Another entails
     "retribution" to the sender, taking the fee for their
     posting only if the recipient indicates they were
     unhappy to receive it.

     For both models, it is not clear that it is possible to
     fit the necessary mechanisms to existing Internet mail.
     Its complete absence from the current service and the
     existence of anonymous and free email services may have
     too much operational inertia.  It is also not clear who
     should accrue the revenues or how they should be
     disbursed.

     Standards           16.  Billing and accounting
     Opportunity:        protocols to obtain sender
                         fees and track them.



6.   EVALUATING TECHNICAL APPROACHES

     The complexity of Internet mail service and the nature
     of spam make it difficult evaluate proposals for
     control mechanisms.  In this section, the key technical
     factors affecting viability are examined.


6.1. Adoption

     A critical barrier to the success of a new mechanism is
     the effort it takes to begin using it. It is essential
     to look carefully at the adoption process.

     What will it take for someone to start using the
     proposed mechanism? What will it take for that person
     to get some benefit from the mechanism? For example,
     how many people and/or systems must adopt it before it
     provides any benefit?

     A key construct to this issue is "core-vs-edge".  For
     Internet-scale operations, adoption at the edge of a
     system is typically easier and quicker than adoption in
     the core. If a mechanism affects the core
     (infrastructure) then it usually must be adopted by
     most or all of the infrastructure before it provides
     meaningful utility. In something the scale of the
     Internet, it can take decades to reach that level of
     adoption, if it ever does. For localized operations,
     adoption in the core might be quicker, involving a
     single administrative entity, rather than an array of
     independent users.

     Remember that the Internet comprises a massive number
     of independent administrations, each with their own
     politics and funding. What is important and feasible to
     one might be neither to another. If the latter
     administration is in the handling path for a spam, then
     it will not have implemented the necessary control
     mechanism. Worse, it well might not be possible to
     change this.  For example a proposal that requires a
     brand new mail service is not likely to gain much
     traction.

     By contrast, some "edge" mechanisms provide utility to
     the first one, two or three adopters who interact with
     each other. No one else is needed for the adopters to
     gain some benefit. Each additional adopter makes the
     total system incrementally more useful. For example a
     filter can be useful to the first recipient to adopt
     it. A consent mechanism can be useful to the first two
     or three adopters, depending upon the design of the
     mechanism.

     Obviously another concern is the effort it takes to
     continue using the mechanism. That is, once a use has
     chosen to make the change to adopt a mechanism, how
     much effort does it take to use it regularly?

     Equally, the impact on others is important. For
     example, a challenge-response system is irritating for
     the person being challenged, and it imposes extra delay
     on the desired communication. If the originator and the
     recipient both access the Internet only occasionally
     (such as through dial-up when mobile) a challenge-
     response model can impose days of delay. For some
     communications, this can be disastrous.


6.2.      Burden

     The purpose of spam control is to cause some email to
     fail to reach its intended destination.  This is, of
     course, directly at odds with the constructive goal of
     email. Hence spam control alters the basic model of
     email service.

     Effective mechanisms will place some kind of burden on
     senders and receivers.  Hence a challenge for spam
     control mechanisms is to require enough of a burden to
     be effective, but not so much that it makes email
     unacceptably painful to use.  When evaluating
     proposals, the nature and distribution of these burdens
     must be considered carefully.


6.3. Scaling

     How does the proposal scale? What happens if everyone
     on the Internet engages in a particular behavior? What
     if the Internet grows by a factor of 1000?

     Remember that "everyone" is approximately 100 million
     users today, and should be expected to grow to 10
     billion, if we expect the Internet to be useful for
     some decades. And it is likely there will be more email
     users/accounts that there are people on the planet,
     given that individuals and organizations occupy
     multiple roles.

     So, what will it be like for 100 million or 10 billion
     users to employ the proposed mechanism?

     The other side of the scaling question is to ask how
     much of the Internet will be affected by a proposal
     and, therefore, how much spam will be controlled by it?
     If a proposal requires substantial effort to adopt and
     use, but will affect only a small percentage of spam,
     the efficacy of that proposed mechanism is very much in
     question. An obvious example of this concern is legal
     scope, given that spam is global and there is no global
     law enforcement.


6.4. Scenarios

     Almost any proposal will make sense for a particular
     scenario that is sufficiently constrained. The real
     test is how the proposal works for other, likely
     scenarios.

     Make sure the proposal considers these likely cases
     carefully. For example, citing the scenario of mailing
     list participation is an excellent test. There are many
     others.



7.   SECURITY CONSIDERATIONS


     This note discusses types of mechanisms for evaluating
     and filtering email.  As such it covers topics with
     extremely sensitive security concerns.  However it does
     not propose any standards and therefore does not have
     any direct security effects.



8.   ACKNOWLEDGEMENTS

     This note is motivate by discussions on the Anti-Spam
     Research Group (ASRG) mailing list and draws a number
     of points from discussion there. A number of Standards
     Opportunity suggestions were taken from an ASRG posting
     by Vernon Schryver. The sub-section "Burden" is taken
     from a posting by Dave Hendricks.



9.   AUTHORSÆ ADDRESSES

     Dave Crocker
     Brandenburg InternetWorking
     675 Spruce Drive
     Sunnyvale, CA  94086  USA

     Tel: +1.408.246.8253
     dcrocker@brandenburg.com




10.  FULL COPYRIGHT STATEMENT

     Copyright (C) The Internet Society (2003).  All Rights
     Reserved.

     This document and translations of it may be copied and
     furnished to others, and derivative works that comment
     on or otherwise explain it or assist in its
     implementation may be prepared, copied, published and
     distributed, in whole or in part, without restriction
     of any kind, provided that the above copyright notice
     and this paragraph are included on all such copies and
     derivative works.  However, this document itself may
     not be modified in any way, such as by removing the
     copyright notice or references to the Internet Society
     or other Internet organizations, except as needed for
     the purpose of developing Internet standards in which
     case the procedures for copyrights defined in the
     Internet Standards process must be followed, or as
     required to translate it into languages other than
     English.

     The limited permissions granted above are perpetual and
     will not be revoked by the Internet Society or its
     successors or assigns.

     This document and the information contained herein is
     provided on an "AS IS" basis and THE INTERNET SOCIETY
     AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL
     WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
     LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
     HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
     WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
     PARTICULAR PURPOSE.
Document	Document type	This is an older version of an Internet-Draft whose latest revision state is "Expired". Expired & archived
	Select version	00 01 02
	Compare versions
	Author
	RFC stream	(None)
	Other formats	txt pdf bibtex bibxml
	Additional resources