Sonoran Systems

Article KB0005 - Dated: 28-May-2004

Problem:

Occasionally the ArTrac Fault Management system misses an alarm or has a hanging alarm (an alarm that did not clear when the network element is no longer in alarm).

Solution:

There are three common causes of this problem.  They are:

Network Failures - ArTrac relies on the network to which it is connected to faithfully deliver event messages from the monitored network elements.  Frame slips, data collisions, data switch and router failures, data corruption, and a host of other transient network problems can compromise the deliver of messages to ArTrac.  Depending on the design and maintenance of the network, these types of problems can be very infrequent or occur quite often.

To identify possible network problems, you should perform the following:

  1. From the Chronological, Dynamic, or Map display screen, search historical database of the ArTrac system for any failure messages which have occurred in the past month.  Search for any messages with an origin of "ERR" and "EXC".  Identify any messages reported by the Connection Manager which could be indicating an network connectivity problem.  Look particularly for any messages which indicate that a connector has disconnected.  Address any problems that are noted.
  2. Open the Connection Manager administration screen and set the "Routing Type" to Analysis and File for any connectors which establish connectivity for the network element that you are having a problem with.  Allow the ArTrac system to record all network communications for the network element to file for one or more days.  Following some time, open the recording file(s) in an editor and search the file for any unusual data which might suggest that there is corruption in data communications.  In particular, look for unusual characters in the network element's messages, missing data or partial messages, out of sequence messages, repeating messages, or other structural problems in the data.  Any such corruption in the data can indicate a network problem.
  3. Place a bridged network analyzer on the ArTrac network connection and monitor the communications between Artrac and the network element.  It would be best to allow the analyzer to watch the connection for several days.  Correct any network problems indicated by the analyzer.

Transmit Buffer Congestion in the Network Element - Depending on the design of the network element, the transmit buffer may be inadequate for the volume of messages that the network element must pass to the network.  In such cases, the network element may discard messages in overflow conditions.  Of course, if an important message is discarded, it will never reach the ArTrac system and may result in an alarm being missed or in an alarm hanging (the clearing message wasn't received).  This is not the fault of the ArTrac system or the network but rather the network element itself.

To identify possible buffer congestion problems in the network element, check the following:

Many switches, including Lucent, Nortel, Motorola, and Ericsson can develop buffer congestion problems.  Open the Connection Manager administration screen and set the "Routing Type" to Analysis and File for any connectors which establish connectivity for the network element that you are having a problem with.  Allow the ArTrac system to record all network communications for the network element to file for one or more days.  Following some time, open the recording file(s) in an editor and search the file for any of the following:

  1. Most network elements place a sequence number somewhere in the transmitted message.  Thus, each successive message received from the network element should have a sequence number that is one number higher than the previous message.  Check the contents of the recording file for any missing messages - locations in the file where the sequence number skips one or more numbers.  If any such occurances are found, the network element is likely experiencing transmit buffer congestion.
  2. Perform a search in the recording file for any messages like "X messages discarded." or any other messages that the network element may transmit which indicate transmit buffer congestion.  If any such messages are found, the network element is experiencing transmit buffer congestion

If any of the above conditions are found in the recording file, contact the manufacturer of the network element for information on how to resolve the problem.

Conflict in Rule Sets - ArTrac has a very solid alarm analysis engine that uses a system of checks to guarantee 100% reliability in the parsing of alarm messages.  The ArTrac database is, in its most simple description, a relational database; meaning that complex relationships can be established between rule sets and look-up tables.  There are occasions where the system administrator might create a rule set that is either too ambiguous such that it interferes with other rule sets or directly conflicts with another rule set.  This can result in analysis failure of certain event messages received by the system.  By nature, this type of failure is difficult to locate.  Fortunately, ArTrac has a built-in Database Analysis feature which checks the integrity of its databases and performs a comprehensive evaluation of how rule sets interact structurally and relationally.  The Database Analysis feature will send a detailed report the the Chronological Display screen (and to the historical database) which will assist you in identifying any problems in your database.

To solve Rule Set conflicts perform the following:

  1. Open the Event Analyzer Administration screen from the ArTrac desktop.  Select the "Event Rules" tab from any rule set.  Click on the "Run DBase Analysis" button located under the list of look-up tables.
  2. From the Chronological, Dynamic, or Map display screen, search historical database of the ArTrac system for messages with an origin of "DBA."  Correct any problems that are noted.

 

If you continue to experience this problem, contact your product support representative for further assistance.