TIP/Tenex reliability improvements
RFC 636

Document Type RFC - Unknown (June 1974; No errata)
Last updated 2013-03-02
Stream Legacy
Formats plain text html pdf htmlized bibtex
Stream Legacy state (None)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state RFC 636 (Unknown)
Telechat date
Responsible AD (None)
Send notices to (None)
NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
TIP/TENEX Reliability Improvements

RFC 636                                    J. Burchfiel  - BBN-TENEX
                                           B. Cosell     - BBN-NET
NIC 30490                                  R. Tomlinson  - BBN-TENEX
                                           D. Walden     - BBN-NET
                                                            10 June 1974
                                                                       
                   TIP/TENEX Reliability Improvements

                                                                       

During the past months we have felt strong pressure to improve the
reliability of TIP/TENEX network connection as improvement in the
reliability of users' connections between TENEXs and TIPs would have
major impact on the appearance of overall network reliability due to the
large number and high visibility of TENEXs and TIPs.  Despite the
emphasis on TIP/TENEX interaction, all work done applies equally well to
interactions between Hosts of any type.                                

The remainder of this RFC gives a sketch of our plan for improving the
reliability of connections bettween TIPs and TENEXs.  Major portions of
this plan have already been implemented (TIP version 322; TENEX version
1.32) and are now undergoing final test prior to release throughout the
network.  Completion of the implementation of the plan is expected in
the next quarter.                                                      

Our plan for improving the reliability of TIP/TENEX connections is
concerned with obtaining and maintaining TIP/TENEX connections,
gracefully recovering from lost connections, and providing clear
messages to the user whenever the state of his connection changes.     

When a TIP user attempts to open a connection to any Host, the Host may
be down.  In this case it would be helpful to provide the user with
information about the extent of the Host's unavailability. To facilitate
this, we modified the IMP program to accept and utilize information from
a Host about when the Host will be back up and for what reason it is
down.  TENEX is to be modified to supply such information before it goes
down, or through manual means, after it has gone down.  When the TIP
user then attempts to connect to the down TENEX, the IMP local to the
TENEX returns the information about why and for how long TENEX will be
down.  The TIP is to be modified to report this sort of information to
the user; e.g., "Host unavailable because of hardware maintenance --
expected available Tuesday at 16:30 GMT".                              

The TIP's logger is presently not reentrant.  Thus, no single TIP user
can be allowed to tie up the logger for too long at a time; and the TIP


NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
TIP/TENEX Reliability Improvements

therefore enforces a timeout of arbitrary length (about 60 seconds) on
logger use.  However, a heavily loaded Host cannot be guaranteed always
to respond within 60 seconds to a TIP login request, and at present TIP
users sometimes cannot get connected to a heavily loaded TENEX.  To
correct this problem, the TIP logger will be made reentrant and the
timeout on logger use will be eliminated.                              

One notorious soft spot in the Host/Host protocol which degrades the
reliability of connections is the Host/Host protocol incremental
allocate mechanism.  Low frequency software bugs, intermittant hardware
bugs, etc., can lead to the incremental allocates associated with a
connection getting out of synchronization.  When this happens it usually
appears to the user as if the connection just "hung up".  A slight
addiition to the Host/Host protocol to allow connection allocates to be
resynchronized has been designed and implemented for both the TIP and
TENEX.                                                                 

TENEX has a number of internal consistency checks (called "bughalts")
which occasionally cause TENEX to halt.  Frequently, after diagnosis by
system personnel, TENEX can be made to proceed without loss from the
viewpoint of local users.  A mechanism is being provided which allows
TENEX to proceed in this case from the point of view of TIP users of
TENEX.                                                                 

The appropriate mechanism entails the following:  TENEX will not drop
its ready line during a bughalt (from which TENEX can usually proceed
successfully), nor will it clear its NCP tables and abort all
connections.  Instead, after a bughalt TENEX will:  discard the message
it is currently receiving, as the IMP has returned an Incomplete
Transmission to the source for this message; reinitialize the interface
to the IMP; and resynchronize, on all connections possible, Host/Host
protocol allocate inconsistencies due to lost messages, RFNMs etc.  The
latter is done with the same mechanism described above.  This procedure
Show full document text