[ Home ] [ Table of Contents ] [ About Lee Goeller ] [ Search ]

Background for Telephone Switching
2nd Edition (Revised and Expanded)

Chapter 8
Operation, Administration
and Maintenance

OUTLINE

OBJECTIVE: Once a system is in the field, operation, administration and maintenance are the principal factors the customer must consider over system life. This chapter discusses some of the design requirements a system must meet to make the OA&M job easier for the customer.

PREVIEW QUESTIONS

  • How can telephone systems report troubles?

  • How can access for test personnel be provided?

  • How do computers aid in the above?


OPERATION, ADMINISTRATION,
AND MAINTENANCE

Traditionally, telephone switches last a long time, growing and changing like living creatures. This means that their design must give fundamental consideration to the needs of the customer, whether an operating telephone company or a business with a PBX, over the working life of the system. To obtain the best possible service from such a major investment, the needs of day-to-day operations, including craft access, must be built in; both hardware and software design must be planned to handle all administrative requirements including record keeping; and finally, the ability to locate problems when they arise, continue operating in the presence of such problems, and facilitate repair is absolutely basic.

The designer must always assume the implacability of Murphy's Law: if something can go wrong, it will, and if it can find several ways to go wrong, it will choose the least convenient. It is easy enough to design a system to "work," at least under some circumstances. What is much more difficult, and for which the newly graduated designer is often totally unprepared, is to anticipate less obvious failure modes and then to design the system to deal with them in a suitable way.  The importance of field trials cannot be overemphasized in this context. There is nothing like actual operating experience in a live environment to test the difference between theory and practice. Indeed, the more brilliant the designers (or, for that matter, the more driving the management), the more necessary the field trial.

Nobody can outsmart the universe. Even the best designers cannot possibly guess all the ways their complex systems can respond to the usage they will receive from both casual callers and highly trained technicians. Sophisticated customers, having learned the hard way that reality and time in the field can force design improvements over the objections of ivory tower designers, often prefer to stick with a proven system rather than rush to something new.  Those who wish to sell the latest state of the art must be able to demonstrate that they have not ignored the lessons of the past.

INTERNAL AND EXTERNAL TESTING

SXS systems, with their distributed control located on each switch, tended to be quite reliable for a number of reasons. First, the failure of any given switch did not knock out the entire machine, or even very much of it. Second, the path through the matrix, handed off from switch to switch, was checked for continuity at every stage; there was no simple way an open path could be turned over to a customer for a charged call. And third, once the connection was up, every device used to establish it was still in the circuit. If a bad connection could be trapped and held, going "hand over hand" down the line ultimately located the trouble.

The designers of crossbar systems recognized all these factors. They knew that, once the originating register, marker, sender and incoming register were used and dismissed, there would be no way to identify them to find what went wrong.  So they designed elaborate routines to test all common equipment between calls, and extensive marker-controlled tests on each connection during call set-up. The false cross and ground (FCG) test made sure a matrix path was not shorted, grounded or connected to another path, and a continuity test not only checked the matrix but also the cross-connect jumpers at the main distributing frame.

From the beginning of electronic switching, when much of the system's structure was hidden in abstract software rather than "real" hardware, it was obvious that even more extensive self-testing would be required. In all working systems, many forms of internal testing are carried out, both on a per call basis and through special test routines carried out at regular intervals. Once troubles are found, further testing is used in to isolate the problem to the minimum number of PCBs.

As stored program control developed ever more sophisticated software, more and more features and functions were taken over by software alone, or software with vestigial hardware; thus the testing of software, both during development and in operation, took on increased importance. When digital techniques dictated the merging of transmission and switching, even more hardware could be replaced with zeros and ones, and the importance of software administration and maintenance increased even more.

What was not as generally appreciated over the years was the role of the switch in testing connecting circuits and systems. A local switching system with a metallic matrix was a large access switch permitting measurements to be made on outside plant conductors using dc meters, Wheatstone bridges, and various other instruments, often from one local test-desk serving a number of central offices. Need for this access was one of the most important reasons why electronic switching matrices were slow to be accepted for local CO switching. In PBXs, where station wiring was mostly protected inside buildings, such testing could be ignored with reasonable safety, and electronic switching flourished.

With a digital switching matrix, there is no way to connect directly to customer lines to measure leakage resistance to ground or to apply voltages to break down insulation on lines with intermittent faults. Thus separate test access must be provided if the advantages of digital switching are to be enjoyed.

Note that conventional 2500 type telephones have little effect on line testing because everything except the ringer is disconnected by the switch-hook when the line is idle, and the ringer itself, in series with a capacitor, does not draw dc or audio frequency currents and is essentially invisible to most testing procedures.

One access technique, following the approach of Northern Telecom, is shown in Fig. 1. Here a test relay is provided on each line card. As long as the relay is unoperated, tip and ring pass straight through to the line circuitry (battery feed, supervision, ringing, codec, etc.). However, when the relay is operated, the through path is broken, T and R are connected to the "outside" test bus, and T1 and R1 are connected to the "inside" test bus. By associating appropriate equipment with these buses, tests can be conducted on the path to the customer and/or the customer's line circuitry. Further, the outside T and R of one line card can be switched, via the test buses, to the inside T1 and R1 on another line card to provide emergency service (for a very limited number of lines at any one time) until repairs can be effected.

With the coming of ISDN telephones, where an electronic line circuit at the CO connects via tip and ring to a customer owned and maintained NT1 or NT2 device, testing will be further complicated by need to disconnect electronic circuitry at both ends of the pair. In addition, the telephone company will no longer be able to upgrade the customer's NT1 or NT2 device to reflect new testing procedures or changes in the transmission medium. Digital signals on customer lines will require even better maintenance capabilities than those used for analog lines, but test access will be more difficult.

A further complication can be expected when optical fiber is inserted into the local loop in various ways. The main purpose of "fiber in the loop" appears to be CATV delivery, which will require CATV switching and voice/TV multiplexing to take place between the CO switch and the outside world.  Optical fiber, however, will require tests quite different from those needed by copper pairs.

On the trunk side, switching systems have, for many years, interfaced carrier systems, first analog and now almost all digital, rather than copper pairs. As a result, transmission hardware stands between switching systems and the outside world, and often provides its own test access, procedures and personnel. On the other hand, today there is almost no trunk hardware that is unique to a particular circuit, and as a result, a switch can monitor one circuit or even a test channel in a multiplexed group and be fairly sure that the information obtained is pertinent to all channels in the group. Thus the switch still has a major responsibility for monitoring its connecting circuits as well as its internal components.

The net result of these various requirements is that the switch, with its built-in intelligence, must monitor test points in the outside world as well as within its own equipment to locate troubles, and must have built into its own programs enormous capabilities for locating and displaying problems in both its own and connecting equipment when they arise. Indeed, it is not unusual for the software that deals with maintenance and administration to exceed that which is required to deal with telephone calls.

ALARMS

Because system failures of various sorts do occur, a variety of means have been developed over the years to identify them and expedite their repair. Sometimes the old reliable approaches are still valid even in the digital age.

Fuse alarms

Fuses are installed in power feeders to various sub-assemblies to protect the central power supply and with it the other sub-assemblies depending on it for power; a fuse is not necessarily expected to protect the equipment it to which it delivers power. Fuses in the telephone industry are generally rated at the highest current they will carry indefinitely without blowing; some other industries rate their fuses in terms of the lowest value required to cause them to blow. In either instance, a fuse melts under excess current to disconnect a short circuit or other overload. This protection is one of the two things a fuse must do. The other fuse function consists of letting somebody know that a problem exists. Alarm fuses are quite generally used in telephony.

Alarm fuses are constructed in such a way that the fuse element itself holds a spring compressed. When the fuse element melts, the spring is released so that it can push a contact connecting an alarm lead to the incoming power bus. In actual fact, an alarm fuse is a transfer contact that disconnects the overload and, at the same time, connects the alarm. Further, the spring also provides some sort of visible signal at the fuse so that, when the maintenance force arrives, the particular circuit in trouble can be identified immediately.

Ideally, fuses should be located at eye level to make them easy to inspect and replace. This is almost never the case; fuse panels are usually at the top or bottom of the frame or cabinet, depending on where the power enters. With seven foot frames, location at the top isn't bad; with eleven foot frames, spotting and replacing a blown fuse is more difficult.  However, fuses at the top of a frame are not subject to damage from inadvertent kicks, cleaning machinery, etc.

Fuse alarms are arranged in a hierarchy, depending on how many users will be inconvenienced by the loss of a given circuit. In electromechanical systems, the loss of a single trunk had far less impact than loss of an originating register, and register failure was less critical than loss of a marker.  Different colored lights as well as gongs and buzzers were often used to locate the floor and aisle on which the alarm occurred, and to differentiate between major and minor troubles.

Electronic switching systems are much smaller than comparable electromechanical systems, and are organized somewhat differently. Typically, power is distributed to individual cabinets at -50 volts dc, and power supplies in each shelf convert it to the several smaller dc voltages needed for electronic components. Often, each PCB (which may contain a number of lines, trunks or other circuits) is fused at each voltage, each shelf power supply is fused, and the -50 volts entering the cabinet has its own fuse.

The system control, via the scanner, can monitor alarm fuses for inputs to its automatic trouble reporting and analysis system. These inputs are particularly important when the office is unmanned and a telemetering system, operated by the common control, reports regularly to a remote maintenance center. Key fuses may have dedicated scan points for their individual inputs to the system; less important fuses may be monitored in groups by a common scan point. The use of fuses as a source of alarm signals is as important as their protection function.

It is not, of course, necessary to scan all such scan points during peak busy hours when real time may be scarce.  However, such scans impose no time burden when carried out during off-peak hours, or by a separate processor designed to handle testing and traffic measurements. Output displays and reports should be carefully formatted to give maximum information with minimum possibility of error in interpretation.

Because common control failures are possible, fuse alarms in electronic switches should also operate in the traditional way so that a system failure will not cut off necessary information that can easily be obtained directly. Lights, gongs and other signals, independent of the common control, still have their place.

Permanent signal alarms

Partial dials and permanent signals are fairly common.  Crosses to 60 Hz power lines can cause partial dials in common control systems (few SXS selectors will follow 60 Hz, but mercury relays or electronic circuitry in dial pulse detectors can), and leakage currents, produced by low insulation resistance, lead to permanent signals. But the most usual source of such troubles is the user. A phone knocked off-hook, someone wishing privacy, or someone going back to look up the rest of the telephone number are the common causes.

A permanent signal ties up a certain amount of common equipment, and degrades the traffic handling capacity of the system for the rest of the users. SXS had timers for line-finders and first selectors; if the selector did not go off normal after a certain amount of time, an audible signal was sounded to alert maintenance. Crossbar systems often had timers built into registers; the timers were reset after each digit. If the next digit did not arrive before time-out, AND if someone else needed the register, the non-dialer or slow-dialer lost the register to the new origination. The timing interval could, as a refinement, be altered as a function of traffic with 20 seconds being typical for a side hour and 3.5 seconds for the busy hour.

Because electromechanical common control systems were particularly subject to traffic overloads, the clearing of permanent signals was of great importance. Some systems were arranged to connect a "howler" tone for a given period of time, and then route the line to a test position where it was stacked in a queue. If the user found the phone off hook and hung up, the whole connection was cleared automatically. But, until cleared, it tied up a path through the switching matrix.

Line lock-out. Rather than tie up a path through the matrix to a howler tone, some early systems used line lock-out. Here, a line generating a permanent signal was released from a matrix-connected resource and returned to the line circuit for monitoring. As long as the off-hook persisted, nothing further would happen because the signal lead to cause the system to take action was inhibited. However, if the user hung up and came off hook again, the inhibition was removed and dial tone was returned.

In electromechanical systems, the line circuit for a system with line lock-out was more complex than a traditional line and cut-off relay. Usually an additional relay was required to prevent the system control from recognizing the continuing off-hook. Sometimes this relay was also used to return howler tone without using a path through the matrix.

In most electronic switching systems, the line circuit provides battery feed and supervision to analog telephones during both idle and busy intervals; it also applies ringing and coin control voltages. Thus, the line circuit is already quite complex, and the incremental cost to add line lock-out and/or howler, perhaps connected via the ringing access circuitry, is minimal. With stored program control, it is a simple matter to have the program recognize a permanent signal without any additional hardware; the system can then ignore it until an on-hook is detected. Howler tone can be returned via the matrix for a timed interval and then dismissed to free the matrix path and tone port, but in non-blocking systems, typically PBXs, there is little to be saved by releasing the matrix path. In digital systems, where tones are distributed on time slots to which ports needing tone can be connected, there is no separate service circuit group supplying tone which can be conserved. As a result, the line lock-out and howler functions in digital systems need not follow blindly the approaches used with metallic matrices. 

Trunks can cause permanent signals as well as lines; however, there is no advantage in returning howler to a sender.  It is urgent to clear the trouble on a trunk and return it to service; as a result, trunks require their own alarms and displays. Where switching and transmission are separate systems, this is often the function of carrier system channel banks. In digital systems, where the channel bank function becomes part of the CO switch or PBX, the switch must alert others to the problem and make use of the information itself.

Make busy. In general, any circuit that is in trouble should be "made busy" to prevent its seizure: an open line must not be seized to complete a call, and, even more important, a defective trunk or service circuit, intended for shared use by many lines, must be removed from its hunt group. A system must be able to busy out circuits its automatic test procedures find faulty; further, maintenance people must have similar procedures for taking circuits out of service manually.

Ideally, a "made-busy" signal, clearly different from a regular busy signal, should be available for return to calling users or repair staff, and made-busy circuits should be printed out or otherwise displayed to provide an independent record of circuits out of service. "Trap" programs should be constructed to check the status of made-busy circuits to be sure information provided to the maintenance force is up to date, and that records of intermittent failures which may have cleared themselves are also available for analysis.

As has been mentioned, it is not unusual to find two, four or more circuits on one plug-in module. When one circuit fails, the module must be replaced to restore service.  However, the module cannot be removed until all its circuits have become idle, and all are made busy. Thus a make-busy procedure must include a request to the system to make the particular circuit busy with intent to remove; the system must then busy out that circuit and all other idle circuits on the same module. Circuits in use must be monitored for hangup, and then they, too, must be made busy. Only when all circuits are made busy will the signal for removal be given. This signal can easily be printed out on a maintenance display, but a lamp or other visible signal on the module itself is insurance against inadvertent termination of active calls.

Because a trunk, unlike a line or a service circuit, has a switch at both ends, it is harder to busy out. Coordination between both ends is required, but common channel signaling makes the job much easier today than in the recent past. PBX trunks, whether digital or analog, pose special problems. When the PBX customer makes a trunk busy, the matching make-busy function at the CO is a service which requires compensation.  If only one trunk is found to be bad on subsequent testing, the local exchange carrier may not charge to take it out of service during repair. However, taking good trunks on the same PBX trunk card out of service may add to the cost of maintenance.

Carrier group alarms

Although a back-hoe cutting a cable serving customer telephones may not cause an immediate and massive problem, other kinds of cable failure can produce large numbers of permanent signals. It takes a tip-ring short (or, at least, a ring-ground short) to cause an origination, and although it is hard to short the several hundred pairs in a cable, it is possible. On trunks using SF signaling, it was a different story. SF required the presence of tone to show the trunk idle, while the absence of this signal made it busy. If a cable containing such trunks was cut, or if a microwave tower fell over, the idle signal, along with everything else, went away, producing massive seizures at the terminating end of all the trunks involved.

To deal with such problems, the carrier group alarm (CGA), which originally alerted transmission maintenance people to major system failures, was fed into switching systems using stored program control. CGA could then be interpreted so that massive seizures could be ignored, existing calls on the same trunk group terminated and charging stopped, and all trunks on the group made busy to further seizure.

CCIS solves this SF problem along with others, but because CCIS usually uses a facility different from the one carrying a trunk group, and thus may function properly even when the trunks are knocked out, it is even more important to coordinate CGA with specific circuits. Today, where digital carrier systems are interfaced directly, switching systems must be able to extract information from test signals on the transmission framing bits to obtain the equivalent of a CGA signal. Even with this capability, visual and audible alarms, independent of the switching system, should be maintained for added reliability.

Many CGA systems must have their carrier equipment in the same building with the switching system if alarm information is to be passed along; this is not always the case. When short-haul carrier systems are used to allow long-haul carrier to terminate on a number of switches near each end, perhaps with DACS "grooming" of which channels go where, the trunk that terminates at a switch may very well not be in the carrier system which failed. Thus failure of a large carrier system may affect dozens of switching systems, all located many miles from the nearest end of the particular link in trouble.  Passing the CGA signal forward to all of these remote offices is something of a challenge.

Because a CGA signal is not always available, as with PBXs in particular, some kind of trouble detection scheme should be built into switching system programs. When large numbers of nearly simultaneous seizures are detected, these seizures should be checked to see if they are associated with a common trunk group. If they are, only one or two are given incoming registers and the system monitors for time-out, temporarily ignoring requests for service on the rest. This leaves the remaining registers free to handle service requests from other trunk groups, and provides immediate attention to a potential problem. Similarly, on outgoing trunks, many sender time-outs in a short interval should cause the system to check for facility failure common to the trunks involved. By programming the processor to hunt for permanent signals and time-outs on trunks in this way, making such trunks busy and flagging maintenance, system reliability can be increased and traffic overloads generated by carrier failures can be cleared, with or without carrier group alarms.

Finding bad trunks with traffic measurements. When trunks had individual plug-in units in channel banks, external SF signaling units, and individual trunk circuits on switching systems, all tied together with cable pairs terminated on cross-connect frames, it was not unusual for an individual trunk to fail. CGA does not work for individual trunks within a group, and thus will do little to identify "killer trunks" and stuck trunks.

A killer trunk is one which looks all right to the switch, and can be seized; however, the caller finds it unsatisfactory due to noise, or some other problem. The caller hangs up and tries again, usually getting a different trunk and completing the call satisfactorily. A stuck trunk is one which appears busy for hours on end, and is not released to be seized for another call. Both of these faults can easily be located by means of traffic measurements. A killer trunk will have much shorter holding times than others in the same group, while a stuck trunk will have much longer holding times. Stored program controlled switches can easily detect and display such information. Although this sort of approach would have been almost impossible with earlier generations of switches, it is not uncommon to find modern switching systems continuing to ignore an opportunity to take advantage of one of their most obvious capabilities.

Of course, digital trunks between digital switches have almost no hardware to fail on a per-circuit basis, so killer and stuck trunks should slowly fade away. However, hunt algorithms and other software features are sometimes arranged inadvertently to avoid selection of certain trunks in a group; this kind of problem can easily be detected with other variations in recording and displaying traffic information.

Trunk make busy. Failed trunks, either individuals or groups, must be made busy; that is, they must be removed from the hunt procedure so that they will not be selected for calls. At the same time, the distant end must not see a seizure that will result in a permanent signal. In electromechanical systems, an outgoing trunk was made busy by putting a ground (or battery) on the sleeve lead so that hunting would pass over it to the next trunk in the group; with no tip-ring closure or M-lead operation, no seizure was sent forward to the terminating end.  With stored program systems, where hunting is done on the map of available facilities in system memory, it is even easier to mark the trunk busy. With several levels of busy possible, and trap programs to inventory and display them for maintenance personnel, stored program systems can greatly improve trunk reliability.

Making a bad trunk busy at one end implies a conjugate action at the distant end; for a two-way trunk, seizure from the far end must also be inhibited, and either a one-way or two-way trunk, at the incoming end, must be warned not to accept a seizure which may be caused by the fault.

Today, common channel signaling, often making use of centralized data bases for more advanced features, changes the nature of trunk make-busy. With signaling and supervision removed from the trunk itself, and the packet network used for signaling able to update such data bases in real time, taking a trunk out of service or restoring it should be much more effective. It is likely, however, that many PBXs will continue to use conventional circuits with in-band signaling for both PBX-CO and tie-trunks, and older make-busy procedures will not fade away for some time.

LINE-LOAD CONTROL

There are times when massive seizures are not the result of a cable failures or other abnormal equipment conditions. When it snows, when some kind of disaster take place, when bets have to be placed with bookies, etc., there can be peak loads on switching systems far in excess of those predicted by averages where calls are originated "individually and collectively at random." During such intervals, it is very important for doctors, police, national guard and other emergency personnel to get through. To this end, line-load control is provided.

Line-load control simply allows the system to ignore originations from non-priority users when it is in effect, while allowing emergency personnel to make calls as usual.  Actually, line-load control will give everybody a chance to originate calls if they wait long enough for dial tone; the procedure is to allow non-priority users on one line group at a time to have an opportunity to place calls. Once the system has accepted a call, line-load control does not interfere with the call's completion.

Line-load control obviously requires the use of class marks. When manual line-load control is activated, there might be several different classes of priority users, depending on the type of emergency condition in effect. If heavy system traffic causes line-load control to be implemented automatically, perhaps in response to a heavy snow-fall, priorities might well be different from those imposed by a riot or invasion. Line-load control is another function whose implementation is greatly facilitated by stored program control.

It should be noted that having a non-blocking matrix will not eliminate the need for line-load control. With massive seizures, there may not be enough DTMF or dial pulse receivers, and the call processing capability of the switch's control itself may be overloaded. In addition, many of the emergency numbers which a caller might wish to reach will probably be busy already with higher priority calls.

GENERALIZED CALL PROCESSING TIMEOUTS

One of the basic rules in the design of large complex systems in general, and telephone switches in particular, is always to close the feedback loop. That is, whenever the control issues an order, make sure that carrying it out gives an active indication which is returned to the control for positive checking. The chain of events is usually fairly long:

  • The processor writes an order into a buffer.

  • The buffer applies the order to drivers.

  • The drivers activate the media in the bus system which carries information to and from the peripheral cabinets.

  • Buffer equipment in a peripheral cabinet plucks the order off the bus and applies it to port or matrix circuitry.

  • The addressed circuitry then changes state.

It is not impossible for an order to occasionally go astray under such circumstances.

An additional problem develops when electronic controls are used to activate or monitor electromechanical equipment.  Because of the great difference in speeds, there is no way for the response of an electromechanical device to return to the control during the same, or even the next several hundred, clock intervals. Obviously, the control cannot stand around waiting, so one approach is to assume everything goes all right and to watch for responses as they come in. Each response is matched with an indication stored when the order was transmitted; only when a match fails to come in after a reasonable time-out interval does the system go looking for trouble. Even with all-electronic systems, the transit times to and from distant cabinets, to say nothing of remote switching modules, particularly when measured in terms of the high speed clocks presently in use, suggest the use of such a procedure.

Time-outs associated with permanent signals, stuck senders and the like have already been discussed. Another class of feedback for time-out checking includes actions by the system control which cause a natural response to occur. For instance, when a path is set up from a line through a metallic matrix to a dial-pulse receiver or trunk, proper operation of the system as a whole will cause the line sensor to be disconnected, changing its output from active to passive, and the loop monitor in the dial pulse detector or trunk to be operated, going from passive to active.

This "transfer of supervision" acts as natural feedback to check continuity of the matrix path and the operation of the line circuit and the trunk or signaling detector.  Unfortunately, even in metallic matrices, there are instances where transfer of supervision does not take place, and it does not occur at all with paths through most electronic matrices. 

Where natural responses are not available, suitable responses can sometimes be constructed. For instance, any kind of sender can be arranged to have a scan-point to monitor up-checks and down-checks to show that each pulse or digit has been sent. In E&M trunk circuits used for dial pulsing, one could set the E-lead scan point to monitor the M-lead during pulsing, at least when stop-go signals are not expected. The outgoing digits can then be monitored with the incoming digit program and the transmitted digit compared with the "received" digit from the sender's output. In almost every type of operation, either the sent signal can be monitored for checking, or the response to the transmitted signal, when it returns, can used for checking as well as for its intended purpose.

Common channel signaling, like most data systems, has its own built-in checks. These may not, however, be sufficient. Because call set-up information travels via a path different from the one to be used by the customer, there is no assurance that the talk path is actually available. As a result, special tone senders and detectors are sometimes provided to momentarily check the path through individual trunks. It would be far more useful to set up the entire connection and have such a test provided end to end prior to turning the multi-trunk path over to a caller.

System time-outs should be included in most programs to make sure that the goals required for successful call processing are achieved on schedule--each digit received, all digits received, wink start detected, answer obtained, hang-up detected, etc. If any expected signal is not obtained before time-out, checking should be instituted. Such an approach may also catch program loops and other software difficulties.

DUPLICATED AND MULTIPLE CIRCUITS

In any kind of complex system, component failures will take place sooner or later. Because this cannot be prevented, overall design must take it into account when system reliability is being planned. In telephone systems, extra circuits and sub-systems are provided so that automatic switchover can replace faulty circuits serving a number of customers. In general, such duplication does not extend to circuits that serve one or a small number of lines.

Several methods of providing extra equipment are common. As has been mentioned, shelf power supplies may be provided in pairs, sharing the load. If one fails, an alarm is sounded and the other carries the whole load until repairs are made.  Another approach, often used with echo suppressors or signaling sets in transmission systems, has one spare circuit for a group of five to ten working circuits. The spare is kept in "hot standby" and can be substituted for any of the other units in the group that should happen to fail. Patch cords at large jack panels have been used to make the substitution manually, but automatic throwover has its advantages.

In switching rather than transmission systems, it is often convenient to simply provide a few extra circuits in the group, using the matrix for access. Senders, digit receivers, tone circuits and other single-ended service circuits work well this way; if any one fails, it is simply made busy and the system continues unaffected except for a slight reduction in traffic handling capability. This is called "graceful degradation."

In crossbar systems, where several markers operating in parallel established connections through the matrix, graceful degradation took place when a marker failed; the remaining markers simply picked up the extra load. This approach has, occasionally, been used with electronic common control systems, but the speed and power of modern electronic processors makes the complexities of parallel operation unnecessary. With only one processor, the problem of keeping markers from fighting with each other for control of specific resources is eliminated. The popularity of "parallel processing" in the computer world, however, suggests that the last word has not yet been said.

As was mentioned in Chapter 1, a single common control makes a system vulnerable because its failure will immediately take the whole switch out of operation. Thus common controls are usually duplicated for reliability. Hot stand-by is common; the two common controls run in parallel, each checking the other but with only one providing outputs to the system. When the active processor develops a problem, the other takes over and runs, unduplicated, until repairs can be effected. The probability of the second common control failing before the first is restored is, hopefully, minute. The processors usually take turns being "main" and "standby" so that there is no question that both are in working condition. PBXs have sometimes used "cold" standby, with the second processor doing other tasks (such as processing CDR information) until it is needed. When computers were very expensive, this made more economic sense than it does today. 

In some systems, there are two complete common controls consisting of processors with their program, data base and current-information memories. When trouble occurs, the entire active complex is replaced by the standby. This makes it necessary for the standby common control to have its current-information memory track the one in the working system so that, at change-over, it will be ready to go. In other systems, processors and memories are duplicated separately so that only the faulty unit is replaced; the switch-over circuitry is more complex in such an arrangement, but more flexibility is available to configure a working control system.

Yet another approach to reliability calls for each port group on a large switch to be autonomous with its own redundant processors, and capable of completing internal calls without outside help. If the event that a module fails, the other port modules can continue operation unaffected. This approach is particularly useful when remote switching units are supported; neither a central processor failure nor the failure of the umbilical from the RSU to the central location will cause customers to lose service.

However, when there are N port groups, the probability that a call originating in one will be completed within that unit is roughly 1/N. Thus a means is required for the originating module to find and obtain a connection, either direct or via a group selector, to a terminating module. This can easily be done with a central data base, perhaps associated with the group selector, although other approaches have been used.

The switching matrix has always posed special reliability problems. Because it was the biggest single item in electromechanical space division systems, it could not be duplicated; the throw-over circuitry alone would have been too large. However, there were many paths through the matrix between any two terminals, and because most trunk and service circuit groups were well scattered over different switch frames, many kinds of matrix failure affected only a small proportion of possible calls.

This has changed in digital switches. The use of inexpensive RAM memory to build large TSIs, and the use of inexpensive logic gates to make different space-division connections in each time slot, encourage redundancy even for switching matrices. Because most of today's digital switches trade off time slots for crosspoints, the switching matrix has shrunk almost out of sight. As a result, even very large CO switches, or toll switches for 100,000 trunks, today duplicate their switching matrices just as they duplicate control systems. Usually the group selector is duplicated as a whole, while the line groups contain duplicated concentrators.

Although line and individual trunk circuits are usually not duplicated for reliability, multiplexed trunks raise the ante. When one circuit board terminates a T-span, all 24 trunks in that digroup depend on the single board; it is no longer possible to "scatter" the individual trunks over a number line groups for higher reliability as was common when trunks entered the switch as individuals. However, both transmission and switching are more reliable today than in the past, and trunk groups are often large enough to require several T-spans in parallel. Thus T-spans can scattered, at least until switches make a practice of interfacing digital transmission systems at higher levels of multiplexing.

TEST FACILITIES

Internal testers and testing

A large portion of system software is devoted to checking the operation of the control and the distribution of its signals to line, trunk, and matrix circuits. As mentioned above, one approach uses both main and stand-by processors, running in parallel; by matching the operation of one control with the other, continuous checking is always in progress.

It is also desirable to run routine tests on off-line sub-systems, including the processors, memories, bus systems, etc., to locate faults that may not come up in the general course of operation. Such tests are usually internal and are clearly indigenous to the particular system. For externally driven tests, specialized load boxes, test call generators, etc., are often used during installation, and can be brought back at periodic intervals for additional independent testing.  Modern common controls are particularly clever at discovering their own weaknesses, but sometimes an external measure of their health is desirable.

Trouble and maintenance routines can be quite lengthy and complex. In larger systems, they are usually part of the overall system program; in smaller systems, they are sometimes kept off-line on tape or disk where memory is cheap, and are only loaded into RAM when required. Such a procedure simplifies the working memory and reduces costs; it also permits many more test routines to be made available.

As the cost of microprocessors has dropped, some systems have built in autonomous test processors to continually generate a variety of test calls, monitoring the response of the overall system to be sure it is working properly. These processors remove the load of routine testing from the system control, and can easily continue testing when heavy traffic would cause cancellation of such functions when run by the main processor, precisely the time when trouble analysis is most important.

For digit detectors serving customer lines, special test transmitters can be connected periodically to exercise them under extreme conditions such as maximum and minimum percent break and pulsing rate, tone frequencies both just inside and just outside the edge of each signaling frequency band, (accepting the former and rejecting the latter), etc. When a form of signaling such as MF has both digit transmitters and receivers available, one can be tested against the other, preferably through a worst-case artificial line, as long as different transmitter-receiver pairs are used on each run.  Upon detection of a failure, further tests, using the suspicious transmitter and receiver with other units, will be required to find which was in trouble.

Where analog tones are generated separately and distributed to tone ports on the switching matrix, the tone distribution system can be protected with alarm fuses; loss of dial tone, for instance, can make users think the whole system is down. However, tone detectors, connected periodically via the matrix, can also detect open circuits that will not blow fuses. Special receivers can check both amplitude and frequency components and, with a little care, their band-width can be adjusted to pick up drifting senders, including DTMF telephone sets, before the drift carries the signals out of the range of regular receivers. Similarly, special senders can check receivers for bandwidths that are too wide or too narrow, and amplitude thresholds that are too sensitive or insensitive.  It is just as important to check signaling equipment for its ability to reject invalid signals as to receive valid ones.

These test circuits, dealing with audio signals, can be given ports on the matrix, connected as needed, and cycled through their test procedures by the system program for both automatic tests and tests selected by maintenance personnel.  The generation of digital signaling and call progress tones, where amplitude is determined by PCM samples stored in ROM and frequency is controlled by the system clock which is synchronized via T Carrier throughout the country, should greatly simplify testing in modern systems, as can digital signal detectors which respond to signals in their digital rather than analog form.

Common channel signaling has already gone a long way toward eliminating MF and dial pulse signaling on trunks, along with traditional methods of supervision. Similarly, digital signaling to electronic telephones, both ISDN and PBX proprietary, is eroding the need for testing dial pulses, DTMF, and power ringing, although new technologies have testing requirements of their own.

Testing connections through metallic switching matrices, for both continuity and crosses to existing connections, was fairly simple. The false cross and ground ("FCG") test simply connected battery and ground to tip and ring respectively, through a detector, before cut-through to the trunk or service circuit took place. Because tip normally went to ground and ring to battery, a cross to another path (tip to tip or ring to ring) or a short or ground operated the detector. 

Metallic continuity was checked by transfer of supervision, or by attaching loop closures and supervisory detectors momentarily. Flow of ringing current was monitored on terminating calls to check continuity through the matrix, main distributing frame, and pair to the set, even when the phone at the far end was on-hook. The no-test connection was also used to gain test access from the line side of the matrix.

With most electronic and all digital matrices, the problem becomes much more difficult. The FCG test illustrates the point. One of the main faults picked up by FCG was a stuck crosspoint, contacts welded together or otherwise shorted, making a permanent path between a horizontal and a vertical in a crosspoint array. A new call, arriving on the same vertical but departing on a different horizontal, would actually be connected to two horizontals at the same time; if the undesired horizontal was in use on another call, high level cross-talk would result. With space division electronic matrices, a shorted crosspoint is perfectly possible, but connecting a detector with reversed battery and ground would alter the bias voltages needed to keep the desired crosspoints conducting. With a digital matrix using a Time-Space-Time architecture, corresponding problems can exist and must be dealt with if large numbers of calls are not to be affected. 

Because all digital switches are 4-wire, and any number of "listen" paths can be connected to the same time slot, the difference between a false cross to another conversation and a desired connection to a common source of call progress tones or recorded announcements must be distinguished.

Even continuity testing is more difficult; with the exception of the analog line groups in AT&T's 5ESS, line circuits are AC coupled to the matrix path, and DC supervision is not passed from the line circuit to another circuit on the other side of the matrix. However, when the line circuit remains in the talk path at all times, handling many functions on a per-line basis, adding some means of applying a voice frequency tone or a digital test pattern after the codec and detecting it at the far side is only a small incremental addition to cost and complexity.

In addition to per-call tests, a many additional tests are required toward the customer loop and toward the switch itself. A small relay per line, as was shown in Fig. 8-1, is a convenient way to give a wide variety of test circuitry access at any port, looking outward or inward as desired.

Test lines

Not so long ago, when all toll calls were established by operators, the operators could check each connection as they set it up. If the connection was bad, they could immediately hold it for testing and connect a distinct tone to help maintenance forces locate the bad circuit. "Tone and hold" was also common in the tie-trunk networks of large companies when they used their switchboard attendants to set up inter-location calls.

With the coming of DDD, the operator vanished from the connection and with her the per-call monitoring for suitable transmission. This accelerated the need for automatic routineing of lines and trunks, and encouraged the provision of "test lines," special service circuits to which lines and trunks could be connected and which could apply tests and return test results to the system.

Test lines are also needed by telephone company installers to test the signaling and ringing capability of newly installed or reconnected lines. Such circuitry is quite similar to that needed for reverting calls, although audible signals are helpful to indicate the nature of failure. Test lines can also be used by the installer (or user) at the customer location in cooperation with personnel at a test center; the latter can walk the on-site person through various tests, and report back results as displayed.

Several test lines have been established over the years.  Some of the simpler ones simply return a tone--a 1000 Hz tone at 1 milliwatt, TLP, for instance. Dialing up such a test tone allows the caller to measure the returned level and, as a result, find the one-way loss in the facility. It is also possible to dial up special quiet connections for making noise measurements, and on trunks, to loop around to test both directions of transmission or echo suppressor operation. Some test lines test signaling and supervision while others are capable of testing data transmission.

Test lines usually return answer supervision so that 2600 Hz SF signaling, where still used, is removed during the test.  This poses some problems because SF signaling, in the presence of another signal, tends to stay in the state in which it is found. For this reason, test lines that return tones usually have periodic intervals of silence long enough to allow SF signaling to change from on-hook to off-hook or vice versa.  Without this refinement, release of the test line might be very difficult. Such problems will vanish with the passing of SF.

Because customers in the United States must own their own equipment, including PBXs and key systems in addition to telephone sets, there appears to be a need for test lines which can be used by customers to perform their own tests. Indeed, this might produce a modest source of revenue for telephone companies, and PBXs, in particular, could be programmed to make test connections on each CO trunk at night to test transmission and signaling.

ALIT.

Automatic line insulation testing goes a step beyond line test circuits such as those used by installers to check dial speed or the telephone's ringer. ALIT is a totally automatic system that runs through all the lines in the office checking for line leakage, either tip to ring or tip or ring to ground.  It is a very powerful tool for maintaining outside plant.

ALIT is often put into operation during or right after rain storms so that cables affected by moisture can be detected. It only takes the system a few minutes to run through all the idle lines in an office; normally it skips over busy lines, and gives an indication of the lines so missed.

When ALIT is first installed, its threshold is set quite low; it just looks for lines with leakage resistances less than 10,000 ohms, for instance. After these worst offenders are found and improvements made, the threshold is set at 15,000 ohms. When these lines are cleared up, the threshold can again be increased.

Obviously, ALIT is the kind of function that can easily be built into a local switching system. All that is needed is a service circuit containing a suitable tester, and a program to cause it to be connected to each idle line in turn. The busy/idle status of each line is already known by the system control, and the maintenance display and print-out system is readily available.

Although straightforward through metallic matrices, ALIT access for digital CO switches requires the kind of access made possible by the kind of circuitry in Fig. 8-1. ALIT equipment can use the "outside test bus" to connect to each line in turn under system control. Note that simply bridging the customer line is not sufficient; the line circuit's shunt impedances and battery and ground must be disconnected to protect them from ALIT test voltages, and to prevent their interference with measurements on the outside plant. Circuitry on the customer's premises, no longer owned by the telephone company, must be designed to permit ALIT measurements to be made, or an interface per line provided to disconnect CPE during the test.

Other test boxes

As digital switching has been extended to local central offices, ALIT is only one of the sets of tests required at each line circuit; various tests must also look toward the switch.  Thus equipment associated with the "inside test bus" of Fig. 1 can check line supervision, application and trip of ringing, etc. on lines to analog phones, and the bit-stream and D-channel signals used with digital phones. In addition to automatic tests, such equipment must also be accessible to test personnel for manual testing from a centralized test desk.

The development of such test equipment, although still in its infancy, is one of the more interesting applications of what is generally referred to as "artificial intelligence."  Careful studies have been made of exactly what skilled craftspeople do to clear various kinds of faults, including such subtle observations as how fast a meter moves when connected to a circuit. Such information has been converted into sophisticated computer programs designed to run a matching test box which can be connected to the line to be tested and then directed to do what the craftsperson of yesterday would have done manually. Results are displayed for immediate use and storage in archives. Test boxes, being relatively inexpensive, can be located in each line group and with remote switching units.

As optical fiber in the local plant increases, such boxes with a whole new array of tests suitable for that medium will be needed. Test access will be more complex, because CATV may well share the transmission medium, with its multiplexer between the telephone switch and the outside world. Such requirements are more of a challenge than a problem, and will provide work for both programmers and hardware designers that may turn out to be more useful than video games, interactive X-rated entertainment, and other marvels of modern technology.

POSITIONS FOR OA&M ACCESS

In the days of SXS and Crossbar, Operations, Administration and Maintenance were carried out by a number of different groups. Some would answer trouble reports from customers while others might take customer orders for service changes. A different group would write up work orders specifying jumper changes at the MDF or wiring changes for class of service, while still other groups might make tests on the switches, the outside lines, and trunks to other central offices. In addition, there were other groups that kept records related to all this activity.

Stored Program Control, combined with inexpensive memory in the form of magnetic tape and hard disks, has made drastic changes. Although there are still craftspeople pulling jumpers on the MDF, splicing cables, and changing circuit boards, most of the OA&M effort can now be carried out by individuals manipulating information that interacts directly with system software. Most of the old interfaces such as jack fields, panels of blinking lights, and rows of keys and switches are now gone, replaced by glowing terminals or PCs sitting on desks in conventional offices, operated by computer-trained people who cause the system to carry out orders and keep appropriate records automatically.

Traditionally, it has been the practice in central offices to provide separate access positions for handling switching systems and transmission facilities. There should be enough access positions to accommodate several workers at the same time to facilitate system debugging and occasional bursts of trouble; because lines and trunks, although simpler, exist in vast quantities, line and trunk test positions will usually outnumber those intended for switching system access.

Even small switches have standard data ports for plugging in a terminal to act as an interface. In PBXs, the attendant console can sometimes take over the access role for maintenance and administration, as can electronic telephone sets with enough buttons and displays. However, the use of a VDT or PC is preferred for interactive operations or to facilitate displays and print-outs of related information. As the power of PCs has increased, a whole industry has sprung up to provide software to convert the austere information flows to and from the switch into a more useful and friendly user interface for traffic, CDR and management.

Master control

The master control for CO switches (see Fig. 8-2) interfaces the switching system for internal maintenance and testing. Elaborate status displays are usually provided so that active and standby controls, memories, etc., can be identified, along with made-busy circuits, areas with excessive trouble counts, and the like. Maintenance personnel must have available to them the results of routine tests run by the system, both on test calls and regular calls. Naturally, it must be possible for additional tests to be run under human direction.

Test result displays should be designed for easy interpretation; systems of 1ESS vintage compared cryptic test outputs with a "dictionary" listing the relationship between test output and one or more system faults for any given test.  This sort of comparison can be done far better by a computer than a human, with the computer providing a meaningful display.

For simpler interfaces, teletypewriters are still sometimes used, largely because they are well understood by the telephone industry. They have a keyboard for input and a printer for output and a visual record of test sessions. They can also run from pre-punched paper tape and can make machine readable paper tape for later input to a computer or another TTY. However, they tend to be slow, are relatively noisy, have high maintenance costs, and can hardly compare to the convenience of a PC supported by an inexpensive printer. Any terminal with a keyboard has some disadvantages, particularly for those who do not touch-type, but touch-screens and pointers such as the mouse or trackball, coupled with menus for easy selection of desired functions, leave older interfaces at the starting gate. Add the intricate displays made possible by computer graphics, and the direction for future user interfaces is obvious.

Line and trunk test panel

While the master control test center may be optimized to handle the relatively few subsystems common to large numbers of calls (digit receivers and senders, common controls, memories, buses, switchover facilities, etc.), a line and trunk test panel may be desirable to access the more numerous but less complex lines and trunks. Again, access to and display of system tests should be available, and any built-in routiners for lines, trunks, and perhaps certain related service circuits should also be available for human control. In particular, instruments for checking telephone sets, coin phones, PBX trunks and the like should have both automatic and manual control procedures.

In small telephone companies, a separate local test desk may not be provided; thus the line and trunk test panel may have to do the entire local test job. In larger companies, a centralized test desk installation for many switches may be used, as has been mentioned. Thus the line and trunk test panel must be able to work under a variety of circumstances, and share responsibilities with other test centers in a number of ways.

Remote test centers

One of the more important advantages of electronic switching is the appreciably lower failure rates encountered.  There is a standard story that an electronic office has a two-entity maintenance force, a human and a dog. The purpose of the dog is to keep the human from fooling with the equipment. There is a problem associated with too much reliability, however: maintenance people do not get enough opportunity to practice their skills and, as a result, tend to lose them.

Because it would be silly to increase the number of troubles just to exercise the maintenance force, a better alternative is being generally adopted for both CO and PBX systems. The maintenance force is centralized so that it can serve a number of switches, and these switches as a group generate enough troubles to keep work force skills at peak efficiency. With suitable terminals, this centralized work force can access the test and administration functions associated with any switch as though it were in the same building.

Major disasters such as the central office fire in Hinsdale, IL, in May, 1988, which destroyed the CO, access to the remote test center, and, most important, communications back into the Hinsdale area including those from the test center itself to fire, police and telephone maintenance forces at the scene show the risks which remote test centers may incur. However, it is far more important to deal with such risks effectively than to permit hysterical politicians take over system design.

Most PBXs are provided with an RS-232C port for test access; used with a modem and a private line to the CO, they can call and be called by a remote test center. A number of manufacturers maintain such centers (which AT&T calls RMATS, for Remote Maintenance, Administration and Test System), on-line 24 hours a day, to support the PBXs installed by their local distributors. Occasionally, the modem is given an extension number on the PBX; this saves the cost of a separate private line and, by forcing calls to that extension to go through the attendant, offers a degree of security. However, if the system is completely down, the extension cannot be reached unless power failure transfer is used from a particular trunk to the test port. When a maintenance group has to go to the site to fix a problem, it is desirable for them to have a separate access channel in addition to one used by the central location.

Perhaps the greatest disadvantage of remote test access, particularly in PBXs and voice mail systems, is vulnerability to unauthorized use. Computer hackers, phone phreaques, disgruntled employees and ex-employees and others can use the maintenance port to play pranks with the system, steal phone calls, and do other damage. Complex and frequently changed passwords are often recommended as a solution to this problem, but limiting dial-up access to specific times, personally verifying the caller, and calling back to provide access only to authorized phone numbers may be more effective.

RECORD KEEPING

Chapter 2 discussed some of the advantages that stored program control offers telephone systems in terms of record keeping for both customer billing and system administration. Clearly, the switch itself must accumulate billing information in order to set up a call, and can easily acquire and store traffic information as it goes about its job. But this is just the beginning. The telephone directory, whether on-line or off-line, not only implies the nature of many system translations, but it also has direct implications for billing in that it contains the customer's name and address.

The deployment of 911 systems requires customer records of this sort to be made available at emergency switchboards as well as at system control centers; addresses are particularly important here so that a specific apartment in a large apartment complex can be identified immediately when necessary.  This problem is particularly difficult when customer service is provided by the apartment management using a PBX with DID; in such instances, it is not satisfactory to list the building manager's name and location as the customer.

Internal to the switching system, inventory records for circuit boards must be kept. Many systems give each circuit board a serial number written in ROM and accessible to the system control. Thus the control knows which PCBs are in which slots, and can warn craftspeople when an attempt is made to associate an analog telephone with an ISDN line card, etc.  Such serial numbers can also indicate to the system which release of PCB is in place, important when software must conform to different hardware releases.

Automatic and manual test results must be stored for future reference, retrievable in ways that facilitate the location of intermittent faults. The importance of historical records for detecting seasonal variations in system traffic has already been mentioned in Chapter 2.

It will take some time for the full advantages of computer control, multiple processors and inexpensive memory to even approach what it is capable of achieving in terms of system operation and management.

HUMAN INTERFACE

Human factors requirements have been mentioned from time to time throughout this book. Operator and attendant positions, maintenance positions and other monitors and display presentations all must be designed, checked by human factors experts, and field tested if a system is to perform satisfactorily. Short-cuts here, particularly when dictated by management or engineering considerations, may save money, but often at the expense of future reputation.

Other variables that need the attention of the human factors expert are frame heights, fuse panel locations, insertion and removal procedures for plug-in modules, color coding for various reasons, labeling apparatus for type of unit and specific identity, provision of proper illumination, etc. In older systems, standard procedures developed over the years have tended to make many requirements implicit. In new designs, if they are not considered explicitly, decisions made without regard for human factors may produce unsuspected problems later on.

Perhaps one of the most important opportunities for system design to benefit from human factors considerations lies in the training programs and instruction manuals developed by manufacturers for users, installation and maintenance people, sales personnel, and others who work with the system. Here is where the payoff shows up directly.

If the system is designed to serve the needs of the customers, and if reliability, maintenance and repair are properly planned, the proper design of training aids will put the frosting on the cake. The full advantage of good design will be brought home to everyone involved, and labor costs and down-time will be reduced and user satisfaction increased. Even the most automatic equipment is designed to serve people, and must, sometime or other, be served by people. Failure to consider this at all stages in the design procedure can lead to disaster, even though the system is a marvel of technology.

TERMS TO REMEMBER

  • Permanent signal

  • Line lock-out

  • CGA

  • Test line

  • ALIT

REVIEW QUESTIONS

Click Here for Answers

1. Is built-in test equipment and software worthwhile?

2. How are transmission and switching related in terms of operations and maintenance?

3. How is line testing different from trunk testing?

4. What does a fuse protect?

5. How are alarm fuses useful?

6. How are permanent signals and partial dials detected?

7. What is "line lockout?"

8. If a trunk or service circuit is found to be defective, what should be done?

9. Is CGA as important as it used to be?

10. What would you think if one trunk in a group has a very short holding time compared to the rest? A very long holding time?

11. What is Line Load Control?

12. What is needed to make sure a system control's order is actually carried out?

13. Because components will sooner or later fail, how can a system be made reliable?

14. Give three ways circuitry can be made redundant.

15. How can the path through a digital switching matrix be tested for continuity and crosses to other connections?

16. If you have the system connect a trunk to a test line in a distant CO and you get back a tone at the right level, is the trunk ok?

17. Will ALIT be important on copper lines used for ISDN BRI and PRI?

18. How does a switch's master control differ from the line and trunk test panel?

19. Which is more desirable: to design a system with such complex displays and printouts that craftspeople can have the satisfaction of passing a long and difficult training program before they go to work, or to design a system with such clear and logical outputs that almost no training is needed to take responsible action?

[ Top ] [ Next Chapter ] [ Table of Contents ]


Copyright 2006 Lee Goeller. All Rights Reserved.