CABLE360     CABLEFAX MAGAZINE    CABLEFAX DAILY
Testing & Maintenance  |  Cable IT  |  Best Practices  |  Tech Talk
SMART VIEW: VIDEO | VOICE | DATA | WIRELESS | TOP TEN
SEARCH:

September 1, 2008

Service Assurance Metrics

Meeting Higher Standards of Service

In the DOCSIS project, we started something critical that the cable TV industry still needs to finish. We designed methods to be able to tell remotely - and in an automated fashion - whether any one subscriber or group of subscribers was currently (or soon to be) experiencing degraded service. With a great degree of success, over the years much of the original DOCSIS team reassembled and engaged in the furtherance of methods to allow cable operators to proactively address issues that - if left unaddressed - have the catastrophic effect of extended periods of degraded service to subscribers. Most importantly, we built ways to proactively evaluate every management team's ability to grow the network and provide the best possible service. The key to success is disciplined use of service assurance software.

Start at the top

In order to achieve optimal results in raising service levels and driving costs out of business, cable TV executives embrace and encourage consistent use of assurance software throughout their respective organizations. The vice presidents and subordinate staff responsible for inside plant operations, outside plant operations, customer installations, service calls, upgrades, disconnects, and telephone customer service each use the same assurance software and are held accountable for delivered service levels.

A lossless aggregation of service levels enables management to identify those staff members or teams that need additional training or some other type of assistance in order to bring service levels for their areas of responsibility on par with their peers. Depending on corporate culture, management may choose to recognize: (A) those staff members who consistently provide customers with the highest service levels, or alternatively, (B) those with the lowest service levels - those who are "buying the beer on Friday."

In support of "no troubled home or business left behind," three principles are espoused by executive management to operational staff throughout the organization.

1. Every possible operational parameter is trended over time for every management area, topology segment, customer account, network and customer premises equipment (CPE) device. Operational parameters include: real-time service levels for voice, video and data; the number and type of inbound telephone calls; the number of truck rolls (including repeat truck rolls); correctable and uncorrectable packet error rates; and RF signal, noise and power levels in the HFC network.

2. Only data from the network itself is used to "certify" proper service levels before, during and after every installation or repair of CPE, inbound subscriber call to the call center, and HFC plant maintenance activity.

3. Every service degradation and outage is automatically pinpointed and prioritized by number of customers affected and length of time customers have been affected. Examples of service degradations include abnormal reset behavior of CPE, pixilated (macro-blocking) or frozen video, garbled or roboticized voice, slow surfing, etc.

In order to ensure that no "troubled home or business is left behind," these three principles are embraced and become "mantras" throughout the organization. The result is an enlightened understanding and awareness of service levels that stems from a widely shared real-time view of all issues affecting individual subscribers and topology segments. Applications of the previously mentioned principles have resulted in creation and improvement of business processes at network and field operations centers, creation of installation records and birth certificates, and implementation of real-time whole-house and network checks for customer and technical service representatives.

Assisting operations teams

Tools available today deliver the right information to the right people to improve service levels and network availability. By aggregating a vast array of operational support metrics from network elements such as cable modem termination systems (CMTSs), call management servers (CMSs), cable modems, multimedia terminal adapters (MTAs), and video set-top boxes, these tools vigilantly check that the operational thresholds of these devices are not exceeded. And when thresholds are exceeded, the tools calculate and distribute in real time a hierarchy of availability metrics that quantify the level of service delivered by every plant segment to every piece of CPE. This sharply reduces the time to identify, prioritize and repair degraded service.

To assist technicians and management in day-to-day network operations, the availability metrics are aggregated over three dimensions: topology, time, and type of problem (e.g., packet loss, congestion, etc.) In the topology dimension, staff observe relative readiness and service levels for various segments of the HFC plant. Equipped with customized screens and profiles, staff who are members of distinct groups are quickly able to see just the segments of the HFC for which they are responsible. This makes a difference in service levels! For example, all corporate management staff have ready access to service level availability by topology (i.e., management team or geographic region) for every division as shown in the top row of Figure 1.

FIGURE 1: Topologies of interest to different levels of management

FIGURE 1: Topologies of interest to different levels of management

Likewise, all management in 'Division 6' has ready access to topologies for the entire division, or Eastern and Western Regions, or individual regions as shown in the second row of Figure 1. At the regional level, for example in 'Region 3,' the management and supervisors have ready access to topologies of interest as shown in the third row. By far, the largest groups of plant maintenance users are the actual plant technicians, who work in the field (often) on wireless laptop computers, which provide ready access to topologies of interest as shown in the bottom row of Figure 1.

Following the yellow regions and scanning back through Figure 1, plant maintenance technicians get a sense of how availability of even a few cable modems or MTAs on Hub 41, CMTS 1, Downstream 1, Upstream 1 will be aggregated (rolled up) into availability at higher levels - all the way up to the national level. It is important to note that availability has a unique quality of being a lossless form of compression - meaning that every fraction of an hour of degraded service is reported for any and every piece of DOCSIS CPE. In this way, no troubled home or business is ever left behind.

Recall that the second dimension over which service level availability is aggregated is time. Customized screens allow management, supervisors and plant maintenance technicians to easily see availability scores over time. This helps staff determine whether there are any recent or current service-affecting issues in their area. The focus is on providing information to those individuals who can improve the customer experience. For a cable system in a metropolitan area, where there may be more than a hundred plant maintenance technicians, customized screens show each technician only the issues in the areas for which he or she has responsibility.

An enormous efficiency is achieved by rendering in real-time graphs of the "Top 5" worst issues in any area so that technicians do not have to look through all of the plant to identify the worst issues. Considering that Hub 41 has 127 upstreams across three CMTSs, this saves a technician a lot of time. Because it is easy to see (and hence work on) just the worst upstream(s), service quality in this serving area is more easily improved. Another item to consider is that one or more of the top 5 worst may currently be within the thresholds defined for satisfactory service levels, meaning that all remaining 122 upstreams are also within the thresholds defined for satisfactory service levels. There is a great comfort and efficiency in knowing this - without having to go look at the remaining 122 upstreams, one by one.

Considering the larger geographic area of responsibility of the supervisor of all County 1 maintenance technicians, a customized screen reports the top 5 worst across a larger topology of 746 upstreams in Hubs 41-48. Now, considering the even larger geographic area of responsibility of the manager of field operations for all of Region 3, a customized screen reports the top 5 worst across an even larger topology of 2,158 upstreams in Hubs 41-48, 61-63, 71-73, and 81-87 as shown in Figure 2.

FIGURE 2: Example of customized screen for the manager of field operations for all of Region 3

FIGURE 2: Example of customized screen for the manager of field operations for all of Region 3

Similarly, one can see the top 5 worst upstreams for all of Division 6 or a whole region of the country. And the views aren't limited to codeword error rate (CER) and carrier-to-noise ratio (CNR); it is just as easy to see the top 5 worst congested upstreams or downstreams or the top 5 data throughput-using customers in the country - or any neighborhood, city, county or region therein.

Don't search

There is powerful event-driven functionality to further assist technicians, supervisors and managers. To eliminate the steps required to login and look for troubled areas, staff subscribe to events that will notify a user when the tool has detected a particular connectivity or traffic condition that potentially affects customers. Notification may be in the form of an email or emailed text message. In this way, technicians and supervisors remain aware of any trouble in their geography (i.e., topology) around the clock if appropriate.

Different event notifications are useful to technicians, supervisors, regional managers, capacity planners, field operations centers and the network operations center (NOC). Events are made powerful by multiple simultaneous triggers that leverage frequency of occurrence, magnitude of occurrence and duration of observation to provide a sense of good-to-bad network transitions as well as bad-to-good transitions. These triggers are chosen so that staff are not swamped by too many alarms. For example, to make the good-to-bad transition, CNR must fall twice (frequency of occurrence) below 17 dB (magnitude of occurrence) in a two-hour period (duration of observation). And the bad-back-to-good transition will happen only after CNR has remained above the 17dB threshold for two hours. For this example, the 17 dB threshold is illustrative; the actual dB threshold is calculated on-the-fly by the tool and is based on upstream settings such as 16-QAM (quadrature amplitude modulation) vs. quadrature phase shift keying (QPSK), amount of forward error correction (FEC), etc.

Aggregation

Recall that the third dimension over which availability is aggregated is type of problem. All potential problems are arranged in a hierarchy called the "combiner" as shown in Figure 3.

FIGURE 3: The combiner shows an aggregate hierarchy of potential service-affecting issues.

FIGURE 3: The combiner shows an aggregate hierarchy of potential service-affecting issues.

The combiner shows potential service degradation from each type of problem and allows users to focus on the network issues that are most impacting availability.

For each of the dark and light blue boxes, there are upper and/or lower thresholds that define the limits of acceptable behavior of the CMTS, HFC network and CPE. When a threshold is exceeded, an accumulation function increments and accounts for how long any degradation occurs. For each box, there are three accumulators: severely degraded, degraded and non-degraded. Severely degraded is shown at left with a red underline, and degraded is shown at right with a yellow underline. The non-degraded accumulator is recorded but not shown.

The dark blue boxes aggregate (add up) to form the traffic and connectivity measures of service degradation, which in turn aggregate to a single summary measure of degraded and severely degraded service. Note in Figure 3 that the summary metric indicates that there are 360 severely degraded pieces of CPE during this hour, of which 250 are because of uncorrectable upstream packet errors, and 110 are because of uncorrectable downstream packet errors. Depending on the time period of interest, the combiner represents any time period from the last hour, day, week, month, quarter or year. The combiner may represent one or more pieces of CPE, segments of the HFC network, CMTS, hubs, regions, management teams, etc. The combiner employs lossless compression of all service degradations so that no troubled home or business is unaccounted for or otherwise left behind.

Recent successes

How does one know if these tools are helping operations teams improve the customer experience? Certainly there are traditional metrics that are tracked closely on a monthly basis such as the overall number and percentage of subscribers that are calling with service complaints, receiving service calls (truck rolls) and receiving repeat service calls. There is a high correlation between the traditional metrics and the availability metrics in the combiner. And all indications suggest that traditional metrics are improved in areas where cable TV operations teams have been using the tools described herein.

Robert F. Cruickshank III is VP Operations and Business Support Strategies for ARRIS. Drawn from a paper presented at SCTE Cable-Tec Expo 2008. Reach him at robert.cruickshank@arrisi.com.

Sidebar 1: Events of Interest to Operations Teams

These events monitor HFC connectivity issues such as RF levels, abnormal device resetting behavior and packet loss. CER events indicate packet loss and as such will alert of customer impacting issues such as slow surfing, garbled voice, garbled streaming of Web radio and pixilated video. These events are of use to HFC plant operations personnel:

Upstream connectivity events

• US CER Extremely High: Any Interface - Fires when any upstream interface in your scope has extremely high CER for at least a minimum number of samples
• US CER High: Any Interface - Fires when any upstream interface in your scope has high CER for at least a minimum number of samples
• US CNR Extremely Low: Any Interface - Fires when any upstream interface in your scope has extremely low CNR for at least a minimum number of samples
• US CNR Low: Any Interface - Fires when any upstream interface in your scope has low CNR for at least a minimum number of samples
• CM Resets Extremely High: Any US Interface - Fires when an extremely high number of modem deregistrations is detected within a maximum number of minutes on an upstream interface in your scope
• CM Resets High: Any US Interface - Fires when a high number of modem deregistrations is detected within a maximum number of minutes on an upstream interface in your scope
• CMs Offline High: Any US Interface - Fires when any upstream interface in your scope has a high percentage of modems in an offline state

CMTS connectivity events

• CMTS Not Responding: Any - Fires when a CMTS in your scope has failed to respond to SNMP a high number of times within a maximum number of minutes
• CMTS Resets: Any - Fires when a CMTS reset is detected in your scope

Cable modem connectivity events

• DS CER Extremely High: Any CM - Fires when any modem in your scope has extremely high downstream CER for at least a minimum number of samples
• DS CER High: Any CM - Fires when any modem in your scope has high downstream CER for at least a minimum number of samples
• DS SNR Extremely Low: Any CM - Fires when any modem in your scope has extremely low downstream SNR for at least a minimum number of samples
• DS SNR Low: Any CM - Fires when any modem in your scope has low downstream SNR for at least a minimum number of samples
• US TX Power High: Any CM - Fires when any modem in your scope has high transmit power for at least a minimum number of Y samples
• US Path Loss High: Any CM - Fires when any modem in your scope has high upstream path loss for at least a minimum number of samples
• CM Flapping: Any CM - Fires when a single modem in your scope has reset a high number of times within a maximum period of time

Sidebar 2: Events of Interest to Capacity Planning Teams

This set of events monitors HFC congestion. As such, these events will also alert of slow surfing and degraded best-effort services such as garbled voice, dropouts in Web radio and pixilated IP video. These events are of use to traffic engineering and capacity planning personnel.

Upstream traffic events

• US Utilization Extremely High: Any Interface - Fires when utilization of any upstream interface in your scope is extremely high for at least a minimum number of samples
• US Utilization High: Any Interface - Fires when utilization of any upstream interface in your scope is high for at least a minimum number of samples
• US Utilization Extremely High: % Interfaces - Fires when a configurable percentage of upstream interfaces in your scope has extremely high utilization for at least a minimum number of samples
• US Utilization High: % Interfaces - Fires when a configurable percentage of upstream interfaces in your scope has high utilization for at least a minimum number of samples
• US Utilization Extremely High: % CMs - Fires when a configurable percentage of modems in your scope resides on upstream interfaces that have extremely high utilization for at least a minimum number of samples
• US Utilization High: % CMs - Fires when a configurable percentage of modems in your scope resides on upstream interfaces that have high utilization for at least a minimum number of samples

Downstream traffic events

• DS Utilization Extremely High: Any Interface - Fires when utilization of any downstream interface in your scope is extremely high for at least a minimum number of samples
• DS Utilization High: Any Interface - Fires when utilization of any downstream interface in your scope is high for at least a minimum number of samples
• DS Utilization Extremely High: % Interfaces - Fires when a configurable percentage of downstream interfaces in your scope has extremely high utilization for at least a minimum number of samples
• DS Utilization High: % Interfaces - Fires when a configurable percentage of downstream interfaces in your scope has high utilization for at least a minimum number of samples
• DS Utilization Extremely High: % CMs - Fires when a configurable percentage of modems in your scope resides on downstream interfaces that have extremely high utilization for at least a minimum number of samples
• DS Utilization High: % CMs - Fires when a configurable percentage of modems in your scope resides on downstream interfaces that have high utilization for at least a minimum number of samples

CMTS traffic events

• CMTS Processor Utilization High: Any - Fires when CMTS CPU utilization for any CMTS in your scope is high for at least a minimum number of samples





MORE BEST PRACTICES




SERVICES







CT-HOSTED WEBCASTS AVAILABLE ON DEMAND (to register for free playback, click on title):

Qualifying your Optical Fiber Network & Plant through Fiber Characterization Testing
Sponsored by JDSU
Tuesday, October 28
11:00am-12:00pm (et)

RFoG and PON: Parallel, Merging or Intersecting Roads?
Friday, October 31
1:00pm (et)

MER & BER Fundamentals - 2-part series sponsored by Trilithic. Available On Demand.

Cable and the Hospitality Market
Available On Demand.





Add a Comment

Name:
Email:
Comments:

Please enter the letters or numbers you see in the image.
 
   Your message will be reviewed before it is posted
 


 
  Home | News | Strategy | Deployment | Operations | Tools | | Jobs | Resources
Subscribe | Contact | About Us | Privacy & Terms | Advertising | Site Map
CABLE360 © 2008 Access Intelligence LLC. All Rights Reserved.