1.1 Introduction
When sending a parcel, the sender can generally select from a range of contractual commitments from the postal courier service provider; that the parcel will arrive within two working days of being sent, for example. The commitments may include other parameters or metrics such as the number of attempts at redelivery if the first attempt is unsuccessful, and any compensation that will be owed by the courier if the parcel is late or even lost. The more competitive the market for the particular service, the more comprehensive and the tighter the commitments or service level agreements (SLAs) that are offered.
In the same way, within the networking industry the increased competition between Internet Protocol (IP) [RFC791] service providers (SPs) together with the heightened importance of IP applications to business operations has led to an increased demand and consequent supply of IP services with better defined and tighter SLAs for IP performance. These SLAs represent a contract for the delivery of the service; in this case, it is an IP transport service. The SLA requirements of a service need to be derived from the SLA requirements of the applications they are intended to support; customers utilizing the service rely on this contract to ensure that they can deliver the applications critical to their business. Hence, SLA definitions are key and it is essential they are representative of the characteristics of the IP transport service they define.
For an IP service, the service that IP traffic receives is measured using quality metrics; the most important metrics for defining IP service performance are:
- delay
- delay variation or delay-jitter
- packet loss
- throughput
- service availability
- per flow sequence preservation.
âQuality of serviceâ or QOS (either pronounced âQ-O-Sâ or âkwosâ) implies providing a contractual commitment (SLA) for these quality metrics. This contract may be explicitly defined; it is common for an IP transport service to have such an explicit SLA, for example. Alternatively, SLAs may be implied; for example, if you upgrade from a 2 Mbps DSL Internet connection to an 8 Mbps connection then you might expect that the service that you receive improves; however, this need not necessarily be the case. In this example â2 Mbpsâ and â8 Mbpsâ define the maximum rates for the service and as DSL services are commonly delivered using contended access networks, the actual usable throughput that users experience may be less than this maximum rate. Hence, the way in which the network has been engineered to deliver the service will determine the throughput that users receive, and it is possible that even though the userâs maximum rates are different, their attained usable throughput may be the same. Clearly, there is no incentive for end-users to upgrade from a 2 Mbps service to an 8 Mbps service if they do not perceive a difference between them. Hence, in reality there is an implied SLA difference between the two services even if it is not explicitly specified â that the 8 Mbps service will offer a higher attainable throughput than the 2 Mbps service.
Application and service SLA requirements are the inputs and also the qualification criteria for measuring success in a network QOS design; a network which provides a 500 ms one-way delay would clearly not be able to support a voice over IP (VoIP) service requiring a worst-case oneway delay of 200 ms. Similarly a network that provides a one-way delay of 50 ms may be over-engineered to support this service, and over-engineering may incur unnecessary cost. In price-sensitive markets, whether customers will be prepared to pay for the facility that QOS provides may depend in part on whether they can detect the effects of QOS; SLAs can provide a means to qualify the difference between services.
Although it is common for SPs, who provide virtual private network (VPN) services to enterprise organizations, to offer an explicit SLA to their enterprise customers, it is less common within enterprise organizations to define explicitly the SLAs that they engineer their networks to support. Nonetheless, enterprise networks support business-critical applications that have bounded SLA requirements; without an understanding of these requirements, it is not possible to engineer a network to ensure that they can be adequately supported without the risk of over- or under-engineering. An understanding of application SLA requirements is therefore as important in enterprise networks as in network SP environments.
In considering SLAs and SLA metrics, because they define a service contract, as with any contract the detail of the contract definition matters. In terms of SLAs for IP service performance, it is important to understand how the SLAs are numerically defined; SLAs may be defined in absolute terms, e.g. a worst-case one-way delay of 100 ms, or may be defined statistically, e.g. a loss rate of 0.01%. In the case of the statistical definition, defining a network loss rate of 0.01% is not sufficient information on its own to be able to determine if an application or service could be supported on that network. How the loss rate is measured and calculated needs to be defined in order to understand what impact the 0.01% loss rate will have on the end applications; 1 lost packet in every one hundred packets may not have a significant impact on a VoIP call, but 10 consecutive packets dropped out of 1000 will cause a glitch in the call that is audible to the end-user.
In order to remove some of the potential ambiguity around SLA definitions, the IP Performance Metrics [IPPM] Working Group (WG) within the Internet Engineering Task Force (IETF) was tasked with defining a set of standard metrics and procedures for accurately measuring and documenting them, which can be applied to the quality, performance, and reliability of IP services. The intent of the IPPM WG was to design metrics such that they can be measured by network operators, end-users, or independent testing groups. Their aim was to define metrics that do not represent a subjective value judgment (i.e. do not define âgoodâ or âbadâ), but rather provide unbiased quantitative measures of performance. [RFC2330] defines the âFramework for IP Performance Metricsâ within the IETF. It is noted, however, that the SLAs provided by network service providers to customers do not generally use the IPPM definitions; see Section 1.4 for a discussion on âmarketingâ versus âengineeringâ SLAs.
In the proceeding sections in this chapter, we consider the SLAâs metrics that are important for IP service performance in more detail, review the current industry status with respect to the standardization, and support of these metrics and then describe application SLA requirements and the impacts that these metrics can have on application performance.
1.2 SLA Metrics
1.2.1 Network Delay
SLAs for network delay are generally defined in terms of one-way delay for non-adaptive (inelastic) time-critical applications such as VoIP and video, and in terms of round-trip delay or round-trip time (RTT) for adaptive (elastic) applications, such as those which use the Transmission Control Protocol (TCP) [RFC793].
One-way delay characterizes the time difference between the reception of an IP packet at a defined network ingress point and its transmission at a defined network egress point. A metric for measuring one-way delay has been defined by [RFC2679] in the IETF.
RTT characterizes the time difference between the transmission of an IP packet at a point, toward a destination, and the subsequent receipt of the corresponding reply packet from that destination, excluding end-system processing delays. A metric for measuring RTT has been defined by [RFC2681] in the IETF.
Whether considering one-way delay or round-trip delay, the delays induced in a network are made up of the four following components.
1.2.1.1 Propagation Delay
Propagation delay is the time taken for a single bit to travel from the output port on a router across a link to another router. This is constrained by the speed of light in the transmission medium and hence depends both upon the distance of the link and upon the physical media used. The total propagation delay on a path consisting of a number of links is the sum of the propagation delays of the constituent links. Propagation delay is around 4 ms per 1000 km through coaxial cable and around 5 ms per 1000 km for optical fiber (allowing for repeaters).
In practice, network links never follow the geographical shortest path between the points they connect, hence the link distance, and associated propagation delay, can be estimated as follows:
- Determine the âas the crow fliesâ geographical distance D between the two end points.
- Obviously, the link distance must be longer than the distance as the crow flies. The route length R can be estimated from D, for example, using the calculation from International Telecommunications Union (ITU) recommendation [G.826], which is summarized in the following table.
The only way of controlling the propagation delay of a link is to control the physical link routing, which could be controlled at layer 2 or layer 3 of the Open Systems Interconnection (OSI) 7 layer Reference Model. If propagation delays for a link are too large, it may be that the link routing in an underlying layer 2 network is longer than it needs to be, and may be reduced by rerouting the link. Alternatively, a change to the network topology, by the addition of a more direct link for example, may reduce the propagation delay on a path.
1.2.1.2 Switching Delay
The switching or processing delay incurred at a router is the time difference between receiving a packet on an incoming router interface and the enqueuing of the packet in the scheduler of its outbound interface. Switching delays on high-performance routers can generally be considered negligible: for backbone routers, where switching is typically implemented in hardware, switching delays are typically in the order of 10â20 Îźs per packet; even for software-based router implementations, typical switching delays should only be 2â3 ms.
Little can be done to control switching delays without changing router software or hardware; however, as switching delays are generally a minor proportion of the end-to-end delay, this will not normally be justified.
1.2.1.3 Scheduling Delay
Scheduling (or queuing) delay is defined as the time difference between the enqueuing of a packet on the outbound interface scheduler, and the start of clocking the packet onto the outbound link. This is a function of the scheduling algorithm used and of the scheduler queue utilization, which is in turn a function of the queue capacity and the offered traffic load and profile.
Scheduling delays are controlled by managing the traffic load and by applying appropriate queuing and scheduling mechanisms (see Chapter 2, Secti...