Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

HOW TO: Interpret statistics in the UMP Web Monitor
Solution
 

Here is an example of the UMP Web Monitor page for a disk store, which is very similar to the output of a memory store:

 

2000: Source [10.29.3.42.16391 368207535.561976604]

Topic: "mytopic"

Last Activity: 14:37:31.092124

Repository: disk

  •    Receiver Paced Persistence: 0
  •     Message Map: 72846
  •     Window: [0, 0, 11c8d]
  •     Memory: 28491005 / 50331648 / 1073811024
  •     Age Threshold: 0
  •     Sync: [11c8d, 11c8d, 11c8d]
  •     In Progress: 0 / 0
  •     Offsets: 0 / 26743277 / 10737418240
  •     Active ULBs: 0 high 0
  •     Loss: 24 ULBs 0
  •     Drops: 0 / 0

LBM Stats: [LBTRM:10.29.3.42:14390:72f2421e:226.16.6.0:14400], received 72907/26455049, dups 876, loss 870, naks 7511/3521, ncfs 876-1-2-3, unrec 0/24 nak stm 3/867/9803, nak tx 0/1/43

 

Receivers:

 

 

The LBM Stats output reflects transport statistics of the underlying receiver created by the store to receive data from the source. In this case, the transport is LBT-RM and further details are below, but are described in terms of an LBT-RM transport.

 

Please note in particular, the Loss statistics which can be confusing. The "Loss" value in the repository statistics (24 in this case) reflects unrecoverable loss for the "mytopic" topic. This unrecoverable loss is also reported in the store logs (in addition to any unrecoverable loss bursts (ULB) statistics). The "loss" value of the LBM Stats menu incorporates both recoverable and unrecoverable loss for the transport.  In addition, the "unrec" value of the LBM Stats reflects unrecoverable loss for the transport.

  • Source string [LBTRM:10.29.3.42:14390:72f2421e:226.16.6.0:14400]
    • [TRANSPORT: Source IP and Port: Session ID: Destination IP address and port]
  • received 72907/26455049
    • Number of LBT-RM datagrams received: Depending on batching settings, a single LBT-RM datagram may contain one or more messages, or it could be a single fragment of a larger message. With LBT-RM, larger messages are split into datagrams based on the configuration option transport_lbtrm_datagram_max_size (default 8KB). Note that the 72907 value corresponds to the 0x11c8d in the Sync and Window fields of the webmon.
    • Bytes received: Number of LBT-RM datagram bytes received, i.e., the total of lengths of all LBT-RM packets including UM header information. Note this corresponds to the "Memory" field index in the webmon Repository heading.
  • dups 876
    • Number of duplicate LBT-RM datagrams received. Such duplicates require extra effort for filtering, and this should be investigated.
  • loss 870
    • This is the amount of loss that was detected. There is no distinction in this statistics between unrecovered and recovered loss.
  • naks 7511/3521
    • The total number of NAKs that were sent and the number of NAK packets into which they were batched.
      Please notice that this is much larger than the number of packets that are lost. It is possible during periods of heavy loss that multiple NAKs are issued per lost datagram (controlled by configuration options transport_lbtrm_nak_generation_interval and transport_lbtrm_nak_backoff_interval) until either the retransmission is received or the datagram is declared as unrecovered loss. 
  • ncfs 867-0-0-0
              NCFs Ignored (ncfs_ignored) - NCFs Shed (ncfs_shed) - NCFs Retransmit Delay (ncfs_rx_delay) - NCFs Unknown (ncfs_known)
    • NCFs Ignored: Number of NCFs received from a source transport with reason code "ignored". If a source transport receives a NAK for a datagram that it has recently retransmitted, it sends an "NCF ignored" and does not retransmit. How "recently" is determined by the configuration option source transport_lbtrm_ignore_interval (default 500ms). If this count is high, a receiver transport may be having trouble receiving retransmissions, or the ignore interval may be set too long.
    • NCFs Shed (ncfs_shed): Number of NCFs received with reason code "shed". When a source transport's retransmit queue and rate limiter are both at maximum, it responds to a NAK by sending an "NCF shed", and does not retransmit. The receiver transport should wait, then send another NAK. If this count is high, one or more crybaby receiver transports may be clogging the source transport's retransmit queue.
    •  NCFs Retransmit Delay (ncfs_rx_delay): Number of NCFs received with reason code "rx_delay". When a source transport's retransmit rate limiter prevents it from immediately retransmitting any more lost datagrams, it responds to a NAK by sending an "NCF rx_delay", then queues the retransmission for a later send. The receiver transport should wait for the retransmission and not immediately send another NAK. If this count is high, one or more crybaby receiver transports may be clogging the source transport's retransmit queue.
    • NCFs Unknown (ncfs_known): Number of NCFs received with reason code "unknown". These are NCFs with a reason code this receiver transport does not recognize. After a delay (set by configuration option transport_lbtrm_nak_suppress_interval (default 1000ms), it resends the NAK. This counter should never be greater than 0 unless applications linked with different versions of Ultra Messaging software coexist on the same network.
  • unrec 0/24
    • Unrecoverable loss transmit window (txw):  Number of LBT-RM datagrams unrecovered (LBM_MSG_UNRECOVERABLE_LOSS delivered to receiver application) due to transmission window advance. This means that the message was no longer in the source-side transmission window and therefore not retransmitted. The window size is set by transport configuration option lbtrm_transmission_window_size (default 24MB)
    • Unrecoverable loss due to timeout (tmo): Number of LBT-RM datagrams unrecovered due to a retransmission not received within the NAK generation interval (set by configuration option transport_lbtrm_nak_generation_interval; default 10,000ms). Note: Receivers for these messages' topics will also report related messages as unrecoverable, with LBM_MSG_UNRECOVERABLE_LOSS for an individual message and LBM_MSG_UNRECOVERABLE_LOSS_BURST for a burst loss event. However, it is possible for these application-level message declarations to occur even without increments to this counter, as the transport is unaware of the topic content of messages and may still be trying to deliver related lost packets. Note that this value correlates with the accumulated "Loss" statistics on the webmon.
  • nak stm 3/867/9803 
          NAK service time (i.e. time in milliseconds for a lost message to be recovered) Min/Mean/Max
    • Min: Minimum time (in milliseconds), i.e., the shortest time recorded so far for a lost message to be recovered. If this time is greater than configuration option transport_lbtrm_nak_backoff_interval, it may be taking multiple NAKs to initiate retransmissions, indicating a lossy network.
    • Mean: Mean time (in milliseconds) in which loss recovery was accomplished. This is an exponentially weighted moving average (weighted to more recent) for accumulated measured recovery times. Ideally this field should be as close to your minimum recovery time (nak_stm_min, above) as possible. High mean recovery times indicate a lossy network.
    • Max: Maximum time (in milliseconds), i.e., the longest time recorded so far for a lost message to be recovered. If this time is near or equal to the configuration option transport_lbtrm_nak_generation_interval setting, you have likely experienced some level of unrecoverable loss.
      This case for recovered packets (Max 9803) is close to 10 seconds, the default for giving up and declaring unrecoverable loss.
  • nak tx 0/1/43
          Number of times per lost message that the receiver transport transmitted a NAK. (Min/Mean/Max)
    • Min: Minimum number of times per lost message that a receiver transport transmitted a NAK, i.e., the lowest value collected so far. A value greater than 1 indicates a chronically lossy network.
    • Mean: Mean number of times per lost message that a receiver transport transmitted a NAK. Ideally this should be at or near 1. A higher value indicates a lossy network. This is an exponentially weighted moving average (weighted to more recent) for accumulated NAKs per lost message.
    • Max: Maximum number of times per lost message that a receiver transport transmitted a NAK, i.e., the highest value collected so far. A value higher than 1 suggests that there may have been some unrecoverable loss on the network during the sample period. A significantly high value (compared to the mean number) implies an isolated incident. 
      For this case, data shows that most of the time, only one NAK was needed per lost message. However, in the worst case, 43 NAKs were issued for a lost message.
More Information

Details of these statistics and those of other transports can be obtained from the receiver statistics structure reference in the API documentation; search for lbm_rcv_transport_stats_t_stct.

Reference

​For more information on the different store page parameters, see Ultra Messaging® Guide for Persistence and Queuing > "Ultra Messaging® Web Monitor".

Applies To
Product: Ultra Messaging
Problem Type:
User Type:
Project Phase:
Product Version:
Database:
Operating System:
Other Software:
Attachments
Last Modified Date:8/20/2014 10:11 PMID:142353
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)