Session Initiation Protocol Event Package for Voice Quality ReportingTelchemy Incorporated
aspen@telchemy.comTelchemy Incorporated
alan.d.clark@telchemy.comAvayaSt. LouisMO63124alan.b.johnston@gmail.comAdobe Systems, Inc.
henrys@adobe.comSIPPING WG
This document defines a Session Initiation Protocol (SIP) event package that enables the collection and reporting of metrics that measure the quality for Voice over Internet Protocol (VoIP) sessions. Voice call quality information derived from RTP Control Protocol Extended Reports (RTCP-XR) and call information from SIP is conveyed from a User Agent (UA) in a session, known as a reporter, to a third party, known as a collector. A registration for the application/vq-rtcp-xr MIME type is also included.
Real time communications over IP networks use SIP for signaling with RTP/RTCP for media transport and reporting respectively. These protocols are very flexible and can support an extremely wide spectrum of usage scenarios. For this reason, extensions to these protocols must be specified in the context of a specific usage scenario. In this memo, extensions to SIP are proposed to support the reporting of Real-Time Control Protocol Extended Reports [4] metrics.
RTP is utilized in many different architectures and topologies. RFC 5117 [13] lists and describes the following topologies: point to point, point to multipoint using multicast, point to multipoint using the RFC 3550 translator, point to multipoint using the RFC 3550 mixer model, point to multipoint using video switching MCUs, point to multipoint using RTCP-terminating MCU, and non-symmetric mixer/translators. As the abstract to this document points out, this specification is for reporting quality of Voice over Internet Protocol(VoIP) sessions. As such, only the first topology, point to point, is currently supported by this specification. This reflects both current VoIP deployments which are predominantly point to point using unicast, and also the state of research in the area of quality.
How to accurately report the quality of a multipart conference or a session involving multiple hops through translators and mixers is currently an area of research in the industry. This mechanism could be extended to cover additional RTP topologies in the future once these topics progress out of the realm of research and into actual Internet deployments.
The usage scenarios addressed in this memo are situations where a SIP user agent can easily report the voice quality since it communicates with a small number of other endpoints:
1. Point-to-point VoIP conversations. These can include small telephony type multiparty scenarios, such as when using call transfer.
2. Conference calls using a central conferencing server when each SIP endpoint can report on the quality of their leg to the central conferencing server.
3. Multicast VoIP calls using source specific multicast (SSM). This is somewhat similar to the central conferencing scenario.
Distributed conferences with audio mixing in the endpoints may require reporting on too many call legs and may be therefore less practical if there are more than 3-4 participants.
The usage scenarios 1, 2, and 3 provide voice quality reports that are most closely related to the user experience, since the reporting application resides in the endpoints, such as in SIP UAs (UA). Many SIP UAs however may have limitations as to the footprint of the software and as a result frugal reporting capabilities are preferable.
RTCP reports are usually sent to other participating endpoints in a session which can make collection of performance information by administration or management systems too complex. In the usage scenarios addressed in this memo, the data contained in RTCP XR VoIP metrics reports (RFC3611 [4]) are forwarded to a central collection server systems using SIP.
Applications residing in the server or elsewhere can aid in network management to alleviate bandwidth constraints and also to support customer service by identifying and acknowledging calls of poor quality. Specifying such applications are however beyond the scope of this paper.
Keeping it Simple
There is a large portfolio of quality parameters that can be associated with VoIP, but only a minimal necessary number of parameters are included on the RTCP-XR reports:
1. The codec type, as resulting from the SDP offer-answer negotiation in SIP,
2. The burst gap loss density and max gap duration, since voice cut-outs are the most annoying quality impairment in VoIP,
3. Round trip delay because it is critical to conversational quality,
4. Conversational quality as a catch-all for other voice quality impairments, such as random distributed packet loss, jitter, annoying silent suppression effects, etc.
In specific usage scenarios where other parameters are required, designers can include other parameters beyond the scope of this paper.
RTCP reports are best effort only, and though very useful have a number of limitations as discussed in [3]. This must be considered when using RTCP reports in managed networks.
This document defines a new SIP event package, vq-rtcpxr, and a new MIME type, application/vq-rtcpxr, that enable the collection and reporting of metrics that measure quality for RTP [3] sessions. The definitions of the metrics used in the event package are based on RTCP Extended Reports [4] and RTCP [3]; a mapping between the SIP event parameters and the parameters within the aforementioned RFC's is defined within this document in section 4.6.2.
Monitoring of voice quality is believed to be the highest priority for usage of this mechanism and as such, the metrics in the event package are largely tailored for voice quality measurements. The event package is designed to be extensible. However the negotiation of such extensions is not defined in this document.
The event package supports reporting both the voice quality metrics for both inbound and outbound directions. Voice quality metrics for the inbound direction can generally be computed locally by the reporting endpoint however voice quality metrics for the outbound direction are computed by the remote endpoint and sent to the reporting endpoint using the RTCP Extended Reports [4].
Configuration of usage of the event package is not covered in this document. It is the recommendation of this document that the SIP configuration framework [8] be used. The authors have defined a configuration dataset that would facilitate this support in section 5.8.
The event package SHOULD be used with the SUBSCRIBE/NOTIFY method however it MAY be also used with the PUBLISH method for backward compatibility with some existing implementations. Message flow examples for both methods are provided in this document.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 .
This document defines a SIP events package [5] for Voice over IP performance reporting. A SIP UA can send these events to an entity which can make the information available to other applications. For purposes of illustration, the entities involved in SIP vq-rtcpxr event reporting will be referred to as follows:
o REPORTER is an entity involved in the measurement and reporting of media quality i.e. the SIP UA involved in a media session.
o COLLECTOR is an entity that receives SIP vq-rtcpxr events. A COLLECTOR may be a proxy server or another entity that is capable of supporting SIP vq-rtcpxr events.
The REPORTER SHOULD send the voice quality metric reports using the NOTIFY method. The COLLECTOR SHALL send a SUBSCRIBE to the REPORTER to explicitly establish the relationship. Configuration of an address of the COLLECTOR is not needed as explained below.
The REPORTER MUST NOT send any vq-rtcpxr events if a COLLECTOR address has not been configured.
The REPORTER populates the Request-URI according to the rules for an in-dialog request.
The COLLECTOR MAY send a SUBSCRIBE to a SIP Proxy acting on behalf of the reporting SIP UA's.
A SIP UA that supports this specification MAY also send the service quality metric reports using the PUBLISH method, however this approach SHOULD NOT be used in unmanaged Internet services. The Publish method MAY be supported for backward compatibility with existing implementations.
The REPORTER MAY therefore populate the Request-URI of the PUBLISH method with the address of the COLLECTOR. To ensure security of SIP proxies and the COLLECTOR, the REPORTER MUST be configured with the address of the COLLECTOR, preferably using the SIP UA configuration framework [8], as described in section 5.8.
It is recommended that the REPORTER send an OPTIONS message to the COLLECTOR to ensure support of the PUBLISH message.
A voice quality metric report may be sent for each session terminating at the REPORTER and may contain multiple report bodies. For a multi-party call the report MAY contain report bodies for the session between the reporting endpoint and each remote endpoint for which there was an RTP session during the call.
Multi-party services such as call hold and call transfer can result in the user participating in a series of concatenated sessions, potentially with different choices of codec or sample rate, although these may be perceived by the user as a single call. A REPORTER MAY send a voice quality metric report at the end of each session or MAY send a single voice quality metric report containing a report body for each segment of the call.
Users of this extension should ensure they implement general SIP mechanisms for avoiding overload. For instance, an overloaded proxy or COLLECTOR MUST send a 503 Service Unavailable or other 5xx esponse with an appropriate Retry-After time specified. REPORTERs MUST act on these responses and respect the retry after time interval. In addition, future SIP extensions to better handle overload as covered in [14] should be followed as they are standardized.
To avoid overload of SIP Proxies or COLLECTORS it is important to do capacity planning and to minimize the number of reports that are sent.
Approaches to avoiding overload include:
a. Send only one report at the end of each call
b. Use interval reports only on "problem" calls that are being closely monitored
c. Limit the number of alerts that can be sent to a maximum of one per call.
This document defines a SIP Event Package as defined in RFC 3265 [5].
No event package parameters are defined.
SUBSCRIBE bodies are described by this specification.
Subscriptions to this event package MAY range from minutes to weeks. Subscriptions in hours or days are more typical and are RECOMMENDED. The default subscription duration for this event package is one hour.
There are three notify bodies: a Session report, an Interval session report, and an Alert report.
The Session report SHOULD be used for reporting when a voice media session terminates or when a media change occurs, such as a codec change or a session forks and MUST NOT be used for reporting at arbitrary points in time. This report MUST be used for cumulative metric reporting and the report timestamps MUST be from the start of a media session to the time at which the report is generated.
The Interval report SHOULD be used for periodic or interval reporting and MUST NOT be used for reporting for the complete media session. This report is intended to capture short duration metric reporting and the report intervals SHOULD be non-overlapping time windows.
The Alert report MAY be used when voice quality degrades during a session. The time window to which an Alert report relates MAY be a short time interval or from the start of the call to the point the alert is generated; this time window SHOULD be selected to provide the most useful information to support problem diagnosis.
Session, Interval and Alert reports MUST populate the metrics with values that are measured over the interval explicitly defined by the "start" and "stop" timestamps.
Voice quality summary reports reference only one codec (payload type). This payload type SHOULD be the main voice payload, not comfort noise or
telephone events payloads. For applications that consistently and
rapidly switch codecs, the most used codec should be reported. All
values in the report, such as IP addresses, SSRC, etc represent those
values as received by the REPORTER. In some scenarios, these may not be
the same on either end of the session - the COLLECTOR will need logic to
be able to put these sessions together. The values of parameters such
as sample rate, frame duration, frame octets, packets per second, round
trip delay, etc depend on the type of report they are present in. If
present in a Session or an Interval report, they represent average
values over the session or interval. If present in an Alert report,
they represent instantaneous values.
The REPORTER always shares local quality reporting information and
should, if possible, share remote quality reporting information. This
remote quality could be available from received RTCP-XR reports or other
sources. Reporting this is useful in cases where the other end might
support RTCP-XR but not this voice quality reporting.
This specification defines a new MIME type application/vq-rtcpxr which is a text encoding of the RTCP and RTCP-XR statistics with some additional metrics and correlation information.
This section describes the syntax extensions required for event publication in SIP. The formal syntax definitions described in this section are expressed in the Augmented BNF [6] format used in SIP [2], and contains references to elements defined therein.
Additionally, the definition of the timestamp format is provided in [7]. Note that most of the parameters are optional. In practice, most implementations will send a subset of the parameters. It is not the intention of this document to define what parameters may or may not be useful for monitoring the quality of a voice session, but to enable reporting of voice quality. As such, the syntax allows the implementer to choose which metrics are most appropriate for their solution. As there are no "invalid", "unknown", or "not applicable" values in the syntax, the intention is to exclude any parameters for which values are not available, not applicable, or unknown.
The authors recognize that implementers may need to add new parameter lines to the reports and new metrics to the existing parameter lines. The extension tokens are intended to fulfill this need.
Parameter values, codec types and other aspects of the endpoints may change dynamically during a session. The reported values of metrics and configuration parameters SHALL be the current value at the time the report is generated.
The Packet Loss Rate and Packet Discard Rate parameters are calculated over the period between the starting and ending timestamps for the report. These are normally calculated from a count of the number of lost or discarded packets divided by the count of the number of packets, and hence are based on the current values of these counters at the time the report was generated.
Packet delay variation, signal level, noise level, echo level are computed as running or interval averages, based on the appropriate standard (e.g. RFC3550 for PDV) and the sampled value of these running averages is reported.
Delay, packet size, jitter buffer size and codec related data may change during a session and the current value of these parameters is reported as sampled at the time the report is generated.
RFC3611 uses an 8 bit, fixed point number with the binary point at the left edge of the field. This value is calculated by dividing the total number of packets lost by the total number of packets expected and multiplying the result by 256, and taking the integer part.
For any RTCP XR parameter in this format, to map into the equivalent SIP vq-rtcpxr parameter, simply reverse the equation i.e. divide by 256 and taking the integer part.
Following SIP and other IETF convention, timestamps are provided in Coordinated Universal Time (UTC) using the ABNF format provided in RFC 3339 [7]. These timestamps SHOULD reflect, as closely as possible, the actual time during which the media session was running to enable correlation to related events occurring in the network and to accounting or billing records.
The parameters in this field provide a shortened version of the session SDP(s), containing only the relevant parameters for session quality reporting purposes. Where values may change durina a session, for example a codec may change rate, then the most recent value of the parameter is reported.
This is the "payload type" parameter used in the RTP packets i.e. the codec. This field can also be mapped from the SDP "rtpmap" attribute field "payload type". IANA registered types SHOULD be used.
This parameter provides a text name for the codec used in this session.
This parameter is mapped from the SDP "rtpmap" attribute field "clock rate". The field provides the rate at which voice was sampled, measured in Hertz (Hz).
This parameter is not contained in RTP or SDP but can usually be obtained from the device codec. The field reflects the amount of voice content in each frame within the RTP payload, measured in milliseconds. Note this value can be combined with the FramesPerPacket to determine the packetization rate.
This parameter is not contained in RTP or SDP but is usually provided by the device codec. The field provides the number of octets in each frame within the RTP payload. This field is usually not provided when the FrameDuration is provided.
This parameter is not contained in RTP or SDP but can usually be obtained from the device codec. This field provides the number of frames in each RTP packet. Note this value can be combined with the FrameDuration to determine the packetization rate.
This parameter is not contained in RTP or SDP but can usually be obtained from the device codec. Packets per second provides the (rounded) number of RTP packets that are transmitted per second.
This parameter is taken directly from the SDP attribute "fmtp".
This parameter does not correspond to SDP, RTP, or RTCP XR. It indicates whether silence suppression, also known as Voice Activity Detection (VAD) is enabled for the identified session.
This value corresponds to "PLC" in RFC3611 in the VoIP Metrics Report Block. The values defined by RFC3611 are reused by this recommendation and therefore no mapping is required.
This field provides the IP address, port and synchronization source (SSRC) for the session from the perspective of the endpoint that is measuring performance. The IPAddress can be IPv4 or IPv6 format. The SSRC is taken from SDP, RTCP, or RTCP XR input parameters.
In the presence of NAT, the MAPPED-ADDRESS as reported by the STUN [9] server (RFC 5389) MUST be reported, since the internal IP address is not visible to the network operator.
This field provides the IP address, port and ssrc of the session peer from the perspective of the remote endpoint measuring performance. In the presence of NAT, the MAPPED-ADDRESS as reported by the STUN [9] server MUST be reported, since the internal IP address is not visible to the network operator.
This value corresponds to "JBA" in RFC3611 in the VoIP Metrics Report Block. The values defined by RFC3611 are unchanged and therefore no mapping is required.
This value corresponds to "JB rate" in RFC3611 in the VoIP Metrics Report Block. The parameter does not require any conversion.
This value corresponds to "JB nominal" in RFC3611 in the VoIP Metrics Report Block. The parameter does not require any conversion.
This value corresponds to "JB maximum" in RFC3611 in the VoIP Metrics Report Block. The parameter does not require any conversion.
This value corresponds to "JB abs max" in RFC3611 in the VoIP Metrics Report Block. The parameter does not require any conversion.
This value corresponds to "loss rate" in RFC3611 in the VoIP Metrics Report Block. For conversion, see "General mapping percentages from 8 bit, fixed point numbers".
This value corresponds to "discard rate" in RFC3611 in the VoIP Metrics Report Block. For conversion, see "General mapping percentages from 8 bit, fixed point numbers".
This value corresponds to "burst density" in RFC3611 in the VoIP Metrics Report Block. For conversion, see "General mapping percentages from 8 bit, fixed point numbers".
This value corresponds to "burst duration" in RFC3611 in the VoIP Metrics Report Block. This value requires no conversion; the exact value sent in an RTCP XR VoIP Metrics Report Block can be included in the SIP vq-rtcpxr parameter.
This value corresponds to "gap density" in RFC3611 in the VoIP metrics Report Block.
This value corresponds to "gap duration" in RFC3611 in the VoIP Metrics Report Block. This value requires no conversion; the exact value sent in an RTCP XR VoIP Metrics Report Block can be reported.
This value corresponds to "Gmin" in RFC3611 in the VoIP Metrics Report Block. This value requires no conversion; the exact value sent in an RTCP XR VoIP Metrics Report Block can be reported.
This value corresponds to "round trip delay" in RFC3611 in the VoIP Metrics Report Block and may be measured using the method defined in RFC3550. The parameter is expressed in milliseconds.
This value corresponds to "end system delay" in RFC3611 in the VoIP Metrics Report Block. This parameter does not require any conversion. The parameter is expressed in milliseconds.
This value is computed by adding Round Trip Delay to the local and remote End System Delay and dividing by two.
This value SHOULD be measured using the methods defined in IETF RFC2679. The parameter is expressed in milliseconds.
Inter-arrival jitter is defined in RFC 3550. The parameter is expressed in milliseconds.
It is recommended that MAJ be measured as defined in ITU-T G.1020 [10]. This parameter is often referred to as MAPDV. The parameter is expressed in milliseconds.
This field corresponds to "signal level" in RFC3611 in the VoIP Metrics Report Block. This field provides the voice signal relative level is defined as the ratio of the signal level to a 0 dBm0 reference, expressed in decibels. This value can be used directly without extra conversion.
This field corresponds to "noise level" in RFC3611 in the VoIP Metrics Report Block. This field provides the ratio of the silent period background noise level to a 0 dBm0 reference, expressed in decibels. This value can be used directly without extra conversion.
This field corresponds to "RERL" in RFC3611 in the VoIP Metrics Report Block. This field provides the ratio between the original signal and the echo level in decibels, as measured after echo cancellation or suppression has been applied. This value can be used directly without extra conversion.
This field reports the listening quality expressed as an R factor (per G.107). This does not include the effects of echo or delay. The range of R is 0-95 for narrowband calls and 0-120 for wideband calls. Algorithms for computing this value SHOULD be compliant with ITU-T Recommendations P.564 [11] and G.107 [12].
This field provides a text name for the algorithm used to estimate ListeningQualityR.
This field corresponds to "R factor" in RFC3611 in the VoIP Metrics Report Block. This parameter provides a cumulative measurement of voice quality from the start of the session to the reporting time. The range of R is 0-95 for narrowband calls and 0-120 for wideband calls. Algorithms for computing this value SHOULD be compliant with ITU-T Recommendation P.564 and G.107. Within RFC3611 a reported R factor of 127 indicates that this parameter is unavailable; in this case the ConversationalQualityR parameter MUST be omitted from the vq-rtcpxr event.
This field provides a text name for the algorithm used to estimate ConversationalQualityR.
This field corresponds to "ext. R factor" in RFC3611 in the VoIP Metrics Report Block. This parameter reflects voice quality as measured by the local endpoint for incoming connection on "other" side (refer to RFC3611 for a more detailed explanation). The range of R is 0-95 for narrowband calls and 0-120 for wideband calls. Algorithms for computing this value SHOULD be compliant with ITU-T Recommendation P.564 and G.107. Within RFC3611 a reported R factor of 127 indicates that this parameter is unavailable; in this case the ConversationalQualityR parameter MUST be omitted from the vq-rtcpxr event.
This field provides a text name for the algorithm used to estimate ExternalR-In.
This field corresponds to "ext. R factor" in RFC3611 in the VoIP Metrics Report Block. Here, the value is copied from RTCP XR message received from the remote endpoint on "other" side of this endpoint refer to RFC3611 for a more detailed explanation). The range of R is 0-95 for narrowband calls and 0-120 for wideband calls. Algorithms for computing this value SHOULD be compliant with ITU-T Recommendation P.564 and G.107. Within RFC3611 a reported R factor of 127 indicates that this parameter is unavailable; in this case the ConversationalQualityR parameter SHALL be omitted from the vq-rtcpxr event.
This field provides a text name for the algorithm used to estimate ExternalR-Out. Conversion of RFC3611 reported MOS scores for use in reporting MOS-LQ and MOS-CQ MUST be performed by dividing the RFC3611 reported value by 10 if this value is less than or equal to 50 or omitting the MOS-xQ parameter if the RFC3611 reported value is 127 (which indicates unavailable).
This field corresponds to "MOSLQ" in RFC3611 in the VoIP Metrics Report Block. This parameter is the estimated mean opinion score for listening voice quality on a scale from 1 to 5, in which 5 represents "Excellent" and 1 represents "Unacceptable". Algorithms for computing this value SHOULD be compliant with ITU-T Recommendation P.564 [11]. This field provides a text name for the algorithm used to estimate MOS-LQ.
This field corresponds to "MOSCQ" in RFC3611 in the VoIP Metrics Report Block. This parameter is the estimated mean opinion score for conversation voice quality on a scale from 1 to 5, in which 5 represents excellent and 1 represents unacceptable. Algorithms for computing this value SHOULD be compliant with ITU-T Recommendation P.564 with regard to the listening quality element of the computed MOS score.
This field provides a text name for the algorithm used to estimate MOS-CQ.
This field provides a text description of the algorithm used to estimate all voice quality metrics. This parameter is provided as an alternative to the separate estimation algorithms for use when the same algorithm is used for all measurements.
This section shows a number of message flow examples showing how the event package works.
It is the suggestion of the authors that the SIP configuration framework [8] be used to establish the necessary parameters for usage of vq-rtcpxr events. A dataset for this purpose should be designed and documented in a separate draft upon completion of the framework.
This document registers a new SIP Event Package and a new MIME type.
RTCP reports can contain sensitive information since they can provide information about the nature and duration of a session established between two or more endpoints. As a result, any third party wishing to obtain this information SHOULD be properly authenticated by the SIP UA using standard SIP mechanisms and according to the recommendations in [5]. Additionally the event content MAY be encrypted to ensure confidentiality; the mechanisms for providing confidentiality are detailed in [2].
The authors would like to thank Rajesh Kumar, Dave Oran, Tom Redman, Shane Holthaus and Jack Ford for their comments and input.
Performance parameter definitions for quality of speech and other voiceband applications utilizing IP networks.Conformance testing for voice over IP transmission quality assessment models.The E-model, a computational model for use in transmission planning.