MADMAN Working Group Gordon B. Jones [gbjones@mitre.org]
INTERNET-DRAFT MITRE
draft-ietf-madman-alarmmib-01.txt Niraj Jain [njain@us.oracle.com]
Oracle Corporation
Glenn Mansfield [glenn@aic.co.jp]
AIC Systems Laboratory
August 1996
Mail and Directory Alarms
Status of this Memo
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a "working
draft" or "work in progress."
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au.
Abstract
This document defines alarms for Mail and Directory usage. It is to be
used in conjunction with the Mail and Directory Management (MADMAN)
RFCs.
Expires: January 31, 1997 [Page 1]
Internet Draft August 1996
1.The SNMPv2 Network Management Framework.
1. The SNMPv2 Network Management Framework.
The major components of the SNMPv2 Network Management framework are
described in the documents listed below.
o RFC 1902 [1] defines the Structure of Management Information
(SMI), the mechanisms used for describing and naming objects
for the purpose of management.
o STD 17, RFC 1213 [2] defines MIB-II, the core set of managed
objects (MO) for the Internet suite of protocols.
o RFC 1905 [3] defines the protocol used for network access to
managed objects.
The framework is adaptable/extensible by defining new MIBs to suit the
requirements of specific applications/protocols/situations.
Managed objects are accessed via a virtual information store, the MIB.
Objects in the MIB are defined using the subset of Abstract Syntax
Notation One (ASN.1) defined in the SMI. In particular, each object type
is named by an OBJECT IDENTIFIER, which is an administratively assigned
name. The object type together with an object instance serves to
uniquely identify a specific instantiation of the object. For human
convenience, often a textual string, termed the descriptor, is used to
refer to the object type.
2. The Need for Alarms in Messaging
Alarms are notifications of abnormalities associated with an MTA or a
message processed by an MTA. Alarms are generated by a Management
Console. Two facilities aid the Management Console in the generation of
alarms. The first facility is the trap, which is an unsolicited event
initiated by the Management Agent and directed to the Management
Console. Traps generated by an agent may optionally convey the values
of MIB variables inside them. The Management Console interprets the
traps and generates alarms as it determines appropriate.
The second facility consists of variables that can be polled by the
Management Console. These variables include the existing MIB variables
defined in the other MADMAN RFCs (Network Services Monitoring MIB,
Directory Services Monitoring MIB, Mail Monitoring MIB), plus more
Expires: January 31, 1997 [Page 2]
Internet Draft August 1996
variables defined herein specifically to augment support for alarm
generation. If the Management Console detects a variable value which
indicates that a threshold has been reached, or some other worrisome
trend or event has occurred, it generates an alarm as it determines
appropriate. It is expected that when an abnormality occurs, a trap will
be generated indicating the specific cause of the problem. If the trap
is lost or discarded by the network, the console may still detect the
abnormality on its next regular polling cycle through inspection of the
MIB variables. This combination of mechanisms provides a flexible alarm
functionality that is either event-driven, polling-driven, or both.
It is understood that traps are an unreliable mechanism. However, traps
may enhance the effects of polling-based alarms. This is because traps
can provide a more immediate discovery of a problem than polling alone
can, which may be important within some operational environments. For
example, when component availability is required to exceed 99%, a
polling cycle consisting of fifteen minute intervals to detect if a
component is operational may fail this requirement. A polling cycle
more frequent than fifteen minutes might saturate the network with SNMP
traffic. When a fifteen minute polling cycle with 99% reliability is
combined with an event-driven mechanism that is itself 99% reliable, the
probability that a given component failure goes undetected, if both
event-driven and polled, becomes less than one one-hundredth of one
percent. This scenario is also applicable to the case of message
throughput requirements, where the detection of queue saturation may be
both event-driven and polling-driven.
Alarms denote cases where outstanding intervention is required.
Implementations that result in a bombardment of superfluous traps should
be avoided (some fault conditions may lend themselves to this). Traps
should not be issued repetitively to signify one basic fault condition.
The setting of threshold conditions and the evaluation of other
composite information is the responsibility of the console, or is a
local implementation matter within the agent. The destinations of SNMP
traps as selected by the SNMP agents or applications is also a local
matter.
3. MIB Data to Support Alarms
The following material is a definition of the traps and MIB variables
defined specifically to support alarm functionality. The MADMAN
variables used to support alarms are defined in RFCs 19??, 19??, and
19??. The usage of these traps and MIB variables to fulfill specific
requirements is defined in a later section.
3.1 Traps to Support Alarms
Expires: January 31, 1997 [Page 3]
Internet Draft August 1996
Two forms of specific traps are defined to support alarms. The first,
called mADAlarm, denotes an MTA- or DSA-related failure, and the second,
messageAlarm, denotes a message-related failure in an MTA. mADAlarm This
trap is generated by the agent in an unsolicited fashion to signify that
a failure has occurred within the MTA or DSA. Examples of such failures
may include one MTA's inability to contact another MTA, or the detection
of message queue saturation. The mADAlarm trap may convey a number of
values, including the name of the MTA or DSA reporting the problem, the
name of the remote MTA or DSA purportedly causing the problem, and
variables describing the problem itself. messageAlarm This trap is
generated by the agent in an unsolicited fashion to signify that a non-
recoverable failure has occurred in processing a message due to some
sort of structural flaw in the message itself or in its addressing.
Examples may include cases where a message can not be delivered, non-
delivered, or redirected, or the case where a messaging loop was
detected. The messageAlarm trap may convey a number of values, including
the name of the MTA that processed the message, and variables describing
the problem itself.
3.2 MIB Variables to Support Alarms
A new table is defined in the MIB to supply supplementary fault-related
information to support alarm generation. When a failure occurs, the
identities of the applications responsible are retained in the MIB,
along with the ID of the message most recently involved in a failure.
Through polling, any changes in the values of these variables can
signify a recent failure. The following sections describe each variable
in the MIB. lastMessageIdFailure This is the identifier of the most
recent message that was the cause of a message-related failure. A
message-related failure is defined to be a non-recoverable error in the
processing of a message. In the event of multiple message failures, it
is a clue to the administrator or application to inspect the message
queues to determine which messages are defective. numMessagesFailed This
is the total number of messages that have failed processing since the
messaging application was last initialized. This variable may be used
in conjunction with lastMessageIdFailure to detect multiple message
failures within a single unit of time. lastFailureMtaGroupName When an
error involving a neighboring MTA occurs, this variable holds the
mtaGroupName (from the MADMAN mtaGroupTable) of the MTA most recently
involved in a failure. lastFailureApplName This variable holds the
applName (from the MADMAN applTable) of the MTA that most recently
reported a failure.
4. SNMP Format for Alarms
Alarms are supported under SNMP using traps and additional MIB
Expires: January 31, 1997 [Page 4]
Internet Draft August 1996
variables. An additional table called mADAlarmTable is defined here.
Elements of the existing MADMAN tables and proposed extensions are also
utilized for alarm purposes. It is expected that traps will be
implemented under SNMP v1, but that the grammatical constructs used to
define them are taken from SNMP v2. Page 31 of RFC 1157 shows how trap
Protocol Data Units (PDUs) are formed in SNMP v1. We would add two
enterprise-specific traps (generic-trap type 6) whose specific-trap
values are set to either mADAlarm (specific-trap 0) or messageAlarm
(specific-trap 1). The enterprise field of the trap would contain the
OID "experimental ??" designating the MADMAN alarm MIB (MADAlarmMIB).
The values of variables and their corresponding OBJECT IDENTIFIERs are
conveyed within the VarBindList. These variables are obtained from
either the mADAlarmTable or tables found in the other MADMAN RFCs.
Expires: January 31, 1997 [Page 5]
Internet Draft August 1996
MADMAN-ALARM-MIB DEFINITIONS ::= BEGIN
IMPORTS
MODULE-IDENTITY, OBJECT-TYPE,
NOTIFICATION-TYPE, experimental, Counter32, Gauge32
FROM SNMPv2-SMI
DisplayString,
TEXTUAL-CONVENTION
FROM SNMPv2-TC
applOperStatus, applName
FROM APPLICATION-MIB
mtaGroupName, mtaGroupInboundRejectionReason,
mtaGroupStoredVolume, mtaLoopsDetected, mtaGroupLoopsDetected,
mtaGroupOutboundConnectFailureReason
FROM MTA-MIB;
mADAlarmMIB MODULE-IDENTITY
LAST-UPDATED "9608230000Z"
ORGANIZATION "IETF Mail and Directory Management Working
Group"
CONTACT-INFO
" Glenn Mansfield
Postal: AIC Systems Laboratory
6-6-3, Minami Yoshinari
Aoba-ku, Sendai, Japan 989-32.
Tel: +81-22-279-3310
Fax: +81-22-279-3640
E-mail: glenn@aic.co.jp"
DESCRIPTION
"The MIB module describing alarms for MADMAN"
::= { experimental 73 }
mADAlarmTable OBJECT-TYPE
SYNTAX SEQUENCE OF mADAlarmEntry
ACCESS not-accessible
STATUS mandatory
DESCRIPTION
"The table holding alarm information for an individual MTA or DSA."
::= { mADAlarmMIB 1 }
mADAlarmEntry OBJECT-TYPE
SYNTAX mADAlarmEntry
ACCESS not-accessible
STATUS mandatory
DESCRIPTION
"The alarm entry associated with each MTA or DSA."
::= { mADAlarmTable 1 }
Expires: January 31, 1997 [Page 6]
Internet Draft August 1996
mADAlarmEntry ::= SEQUENCE {
lastMessageIdFailure DisplayString,
numMessagesFailed Counter32,
lastFailureMtaGroupName DisplayString,
lastFailureMtaApplName DisplayString
}
lastMessageIdFailure OBJECT-TYPE
SYNTAX DisplayString
ACCESS read-only
STATUS mandatory
DESCRIPTION
"This is the message ID of the last message to either loop or have
an unrecoverable error while proccessing"
::= {mADAlarmEntry 1}
numMessagesFailed OBJECT-TYPE
SYNTAX Counter32
ACCESS read-only
STATUS mandatory
DESCRIPTION
"This is the number of messages that have had an unrecoverable error
while proccessing since MTA initialization"
::= {mADAlarmEntry 2}
lastFailureMtaGroupName OBJECT-TYPE
SYNTAX DisplayString
ACCESS read-only
STATUS mandatory
DESCRIPTION
"This is the group name of the last MTA group to have a connectivity
failure"
::= {mADAlarmEntry 3}
lastFailureMtaApplName OBJECT-TYPE
SYNTAX DisplayString
ACCESS read-only
STATUS mandatory
DESCRIPTION
"This is the application name of the last MTA to have a connectivity
failure"
::= {mADAlarmEntry 4}
mADAlarmNotifications OBJECT IDENTIFIER ::= { mADAlarmMIB 2 }
mADAlarm NOTIFICATION-TYPE
OBJECTS {applOperStatus, applName, mtaGroupName,
Expires: January 31, 1997 [Page 7]
Internet Draft August 1996
mtaGroupConnectFailureReason, mtaGroupStoredVolume}
-- these OBJECTS are the things that an mADAlarm may convey
::= {mADAlarmNotifications 1}
messageAlarm NOTIFICATION-TYPE
OBJECTS {lastMessageIdFailure, numMessagesFailed }
::= {mADAlarmNotifications 2}
mADAlarmConformance OBJECT IDENTIFIER ::= {mADAlarmMIB 3}
mADAlarmGroup OBJECT IDENTIFIER ::= {mADAlarmConformance 1}
mADAlarmCompliances OBJECT IDENTIFIER ::= {mADAlarmConformance 2}
mADAlarmTrapCompliance MODULE-COMPLIANCE
STATUS current
DESCRIPTION
"The most basic level of compliance for MAD SNMPv2 entities that
implement MAD alarms."
MODULE
MANDATORY-GROUPS {mADAlarmTrapGroup}
::= {mADAlarmCompliances 1}
mADAlarmVariableCompliance MODULE-COMPLIANCE
STATUS current
DESCRIPTION
"The compliance statement for MAD SNMPv2 entities that implement MIB
variables to support
alarms for MTAs."
MODULE
MANDATORY-GROUPS {mADAlarmVariableGroup}
::= {mADAlarmCompliances 2}
mADAlarmTrapGroup OBJECT-GROUP
OBJECTS {mADAlarm, messageAlarm}
STATUS current
DESCRIPTION "Two Traps providing the basic level of support for alarms for
MTAs."
::= {mADAlarmGroup 1}
mADAlarmVariableGroup OBJECT-GROUP
OBJECTS {lastMessageIdFailure, numMessagesFailed,
lastFailureMtaGroupName, lastFailureMtaApplName}
STATUS current
DESCRIPTION "A collection of objects providing support for alarms for MTAs
that includes some
other alarm-specific MIB variables"
::= {mADAlarmGroup 2}
Expires: January 31, 1997 [Page 8]
Internet Draft August 1996
END
5. Scenarios
The following scenarios provide examples of how the mADAlarm and messageAlarm
are used in various fault conditions.
5.1 Connectivity Failure
When an MTA or DSA detects that another MTA or DSA cannot be contacted, a
mADAlarm is sent. The mADAlarm contains the applName of the MTA reporting the
problem, the mtaGroupName for the MTA that cannot be contacted, and the
mtaGroupOutboundConnectFailureReason. In the case of a more general
connectivity failure, such as the general unavailability of the network
element, the MTA-trap contains only the variable mtaGroupConnectFailureReason.
Care should be taken to report these conditions only in the case of permanent
failure, since intermittent failures are more frequent and might result in too
many traps being generated. For example, when an MTA cannot connect to another
MTA in order to deliver a message, the MTA delivering the message usually
retries the delivery attempt for a specified duration or for a specified number
of tries. If the retry limit is exceeded, a case that should not occur, the
message is returned. In this case, a trap would be sent when the retry limit
is exceeded, but would not be sent for each individual retry.
5.2 MTA or DSA Down
This condition signifies that the MTA or DSA is not operational (but should be)
or has not recently registered with the management system. This condition is
reported with an mADAlarm containing the values of applOperStatus and applName
from the MADMAN Application Monitoring MIB. Support for this feature is
optional, since an MTA or DSA that has crashed cannot report that fact to an
agent, and since off-the-shelf agents cannot be expected to monitor the
aliveness of applications by themselves.
5.3 Messaging Loop Detection
This condition may signify that a particular message has been detected,
received, and sent multiple times, perhaps exceeding a locally established
threshold value. The condition is reported with a messageAlarm trap, where the
trap contains the applName of the MTA reporting the problem, and optionally the
values of lastMessageIdFailure, mtaLoopsDetected, mtaGroupLoopsDetected.
Expires: January 31, 1997 [Page 9]
Internet Draft August 1996
5.4 Message Processing Failure
When an MTA encounters certain non-recoverable errors processing a message,
(e.g., a "dead" message that cannot be delivered, nondelivered, or redirected),
a messageAlarm is generated. The messageAlarm contains the applName of the MTA
reporting the failure, and optionally the lastMessageIdFailure, which
identifies the most recent message that failed, and numMessagesFailed, which
aids in detecting multiple message failures. If other messages had failed
processing prior to the immediate condition being reported and after the most
recent polling cycle, the identities of these messages may be detected
manually.
5.5 Queue Error
When an MTA or agent detects that a queue is full or is approaching saturation,
a mADAlarm is sent. The applName of the MTA reporting the problem is conveyed
within the variable bindings list of the mADAlarm. The mADAlarm also contains
the values of the MIB variables mtaGroupName and mtaGroupStoredVolume (both
from the mtaGroupTable).
5.6 Security Error
When an MTA or agent detects a security error such as an authentication failure
(e.g. when an MTA or DSA fails to authenticate itself to another), a mADAlarm
is sent. The applName of the MTA reporting the problem is conveyed within the
variable bindings list of the mADAlarm. The mADAlarm also contains the values
of the MIB variables mtaGroupInboundRejectionReason (stating an authentication
failure) and the mtaGroupName.
When an MTA or agent detects a security error such as a data integrity
violation (e.g. while processing a message), a messageAlarm is sent.
The applName of the MTA reporting the problem is conveyed within the variable
bindings list of the messageAlarm. The messageAlarm also contains the values
of the MIB variables mtaGroupInboundRejectionReason (stating an integrity
violation) and the mtaGroupName.
Expires: January 31, 1997 [Page 10]
Internet Draft August 1996
6. Acknowledgements
This draft is the product of discussions and deliberations carried out
in the following groups:
ietf-madman-wg ietf-madman@innosoft.com
This draft also incorporates the intellectual contributions of
Bruce Greenblatt
Sue Lebeck
Roger Mizumori
Edward Owens
7. References
[1] Case, J., McCloghrie, K., Rose, M., and S. Waldbusser, "Structure
of Management Information for version 2 of the Simple Network
Management Protocol (SNMPv2)", RFC 1902, SNMP Research,Inc.,
Hughes LAN Systems, Dover Beach Consulting, Inc., Carnegie Mellon
University, February 1996.
[2] McCloghrie, K., and M. Rose, Editors, "Management Information
Base for Network Management of TCP/IP-based internets: MIB-II",
Expires: January 31, 1997 [Page 11]
Internet Draft August 1996
STD 17, RFC 1213, Hughes LAN Systems, Performance Systems
International, March 1991.
[3] Case, J., McCloghrie, K., Rose, M., and S, Waldbusser, "Protocol
Operations for version 2 of the Simple Network Management
Protocol (SNMPv2)", RFC 1905, SNMP Research,Inc., Hughes LAN
Systems, Dover Beach Consulting, Inc., Carnegie Mellon
University, February 1996.
[4] Freed, N., Kille, S., "Network Services Monitoring MIB"
Monitoring MIB", RFC 1565, Innosoft, ISODE Consortium, January
1994.
[5] Freed, N., Kille, S., "Mail Monitoring MIB", RFC 1566,
Innosoft, ISODE Consortium, January 1994.
[6] Mansfield, G., Kille, S, "X.500 Directory Monitoring MIB",
Monitoring MIB", RFC 1567, AIC Systems Lab, ISODE Consortium,
November 1994
Security Considerations
Security issues are not discussed in this memo.
Authors' Addresses
Glenn Mansfield
AIC Systems Laboratories
6-6-3 Minami Yoshinari
Aoba-ku, Sendai 989-32
Japan
Phone: +81-22-279-3310
E-Mail: glenn@aic.co.jp
Gordon B. Jones
MITRE Corporation
1820 Dolley Madison Blvd.
McLean, VA 22102-3481
Phone: (703) 883-76701
E-Mail: gbjones@mitre.org
Expires: January 31, 1997 [Page 12]
Internet Draft August 1996
Niraj Jain
Oracle Corporation
500 Oracle Parkway
Redwood Shores
California 940065
Phone: (415) 506-2581
E-Mail: njain@us.oracle.com
Expires: January 31, 1997 [Page 13]