What Is RTP (Real-Time Transport Protocol)? Meaning, Working, and Applications

essidsolutions

Real-Time Transport Protocol (RTP) is a network communication protocol developed on the User Datagram Protocol (UDP), which helps transport audio, video, and media traffic in real-time while minimizing jitter and packet loss. This article explains how RTP works, its critical applications, and associated protocols like RTCP and RTSP. 

What Is RTP (Real-Time Transport Protocol)?

Real-Time Transport Protocol (RTP) is defined as a network communication protocol developed on the User Datagram Protocol (UDP), which helps transport audio, video, and media traffic in real-time while minimizing jitter and packet loss.

RTP lets applications send real-time data over unicast or multicast network services, including video, audio, and simulation data, from one end of the network to the other. Many entertainment and communication systems that use streaming services use the RTP protocol. To connect networks, RTP is usually used with a signaling protocol, like Session Initiation Protocol (SIP). 

Even though RTP apps can use Transmission Control Protocol (TCP), they typically utilize User Datagram Protocol (UDP) instead. This is because it sends data more quickly. Audio or video data blocks from the transmission side of a multimedia application are wrapped in RTP packets, and each RTP packet is wrapped in a UDP segment. This means that RTP is a protocol that works on top of UDP. 

The Internet Engineering Task Force (IETF) defined RTP in 1996. One can think of it as a sublayer of the computer network transport layer since it provides multimedia applications services like sequence numbers and timestamps. It also enables Voice over Internet Protocol (VoIP) apps to broadcast audio and video streaming over the internet in real-time. RFC 3550 was issued in 2003 to modify it. It lacks delivery mechanisms such as multicasting or port numbers. RTP supports a variety of file types, including MPEG and MJPEG. It is susceptible to packet delays but less so to packet loss.

Further, RTP Control Protocol or RTCP is often combined with RTP. While RTP is responsible for delivering media streams (such as audio and video), RTCP monitors transmission metrics and Quality of Service (QoS) and synchronizes numerous streams. RTCP offers out-of-band statistics and monitors data for every RTP session, whereas RTP allows for real-time data transfer. It does not carry any media files but instead aids in quality assurance.

RTP is a technical foundation of Voice over Internet Protocol (VoIP). It is usually used in tandem with signaling services like the Session Initiation Protocol (SIP), setting up network connections. For the IP telephony protocols SIP and H.323, RTP has been defined as the communication link standard for sending audio or video streams.

RTP is a fresh approach to the network protocol. RTP is meant to be adaptable to offer the data that a specific application demands. It is typically built into the application processing instead of being constructed as a separate layer. RTP is a deliberately used as a incomplete protocol base. In contrast to previous protocols, which may offer more functionality by expanding protocol or adding alternative systems that need parsing, RTP is aimed to be changed as necessary by modifying or adding headers. 

RTP services with high bandwidth, including video, can profoundly impair the service quality of other network services. Therefore, implementers must take the appropriate precautions to avoid unintended bandwidth use. In the application description, the constraints and possible operational implications of high-bandwidth services impacting the net, and other connectivity options, should be indicated clearly. 

See More: TCP vs. UDP: Understanding 10 Key Differences

How Does RTP Work?

RTP protocol prioritizes the concatenation and combining of audio and video over the integrity of the data being sent, and it is based on a variety of protocols in practice. Among these is the UDP protocol, which is used in the TCP/IP architecture. Using the UDP protocol to encapsulate RTP packets has several limitations, notably in error correction. As a result, every lost or damaged shipment is simply disregarded and thrown. 

In contrast to other transport protocols, RTP is a key part of the program. The architecture of the Real-Time Transport Protocol includes a sequence number, payload identification, frame indication and several components, which are explained below:

1. Sequence number

If packets come out of order, they must be re-ordered in real-time at the receiver. If a packet is lost, it must be identified and compensated for without retransmissions. This field has a length of 16 bits. It’s used to assign RTP packets serial numbers. It aids in the sequence of events. The initial packet’s sequence number is assigned at random. This field is mainly used to check for missing packets and order discrepancies. For every RTP packet delivered, the sequence number increases by one, and the receiver can use it to identify packet loss and consequently restore the packet sequence.

If, for example, the application receiver gets a stream of RTP packets with a break between the sequence numbers 76 and 79, the receiver is aware that packets 77 and 78 are absent from the stream. The receiver might then try to hide the data that has been lost. It’s worth noting that a voice call is made up of two unidirectional streams, each with a distinct base value for the sequence numbers. The last packet carries a sequence number from a stream traveling in another direction.

2. Synchronization Source Identifier (SSRC)

A 32-bit field that identifies and defines the synchronization source. This source identifier’s value is a random integer generated by the source itself. This primarily helps resolve disputes that arise when two originating areas commence with identical sequencing digits. Each stream in an RTP session usually has its own SSRC. The SSRC is not the sender’s IP address but rather a number that the source provides randomly when a new stream begins. It’s improbable that two streams will be allocated the same SSRC. It has intermediate and intra-media synchronization.

  • Inter-media synchronization: If many types of media are being utilized in a session, one must find a way to synchronize them so that the audio and video are in sync. 
  • Intra-media synchronization: The length of time that must be transferred between successive packets being “played-out.” For example, no data is normally conveyed during silent intervals in speaking. This must accurately reproduce the silence’s duration.

3. Payload identification

On the Internet, altering the encoding for the media (“pay-load”) on the fly to conform to changing bandwidth availability or the capabilities of new users joining a group is frequently essential. To identify the encoding for each packet, some method is required. This is a 7-bit field that indicates the format of the data in the packet to the recipient. This value represents the numerical value of the sample source codec.

Low numbers (0-23) indicate audio codecs, whereas higher values indicate video. However, one may also include other payload types. Each payload type denotes a distinct audio/video media encoding. At any one moment, an RTP source is only authorized to deliver a single payload type. The kind of codec used in the media stream is mainly specified in this field. For instance, using “1” as a payload category with the encoding name 1016, for example, indicates that a data stream would be encoded utilizing FS-1016 speech encoding.

4. Frame indication

The RTP system includes a frame indicator. The frame indicator indicates each frame’s start and end. The requirement for some type of frame indicator is provided via the marker bit with a profile-specific function. For instance, one may set it at the beginning of speaking spurts in a voice application. Video and audio are delivered in frames, which are logical units. To help coordinate delivery to higher layers, it is important to specify to a receiver the beginning or end of a frame.

5. The timestamp

This is a 32-bit length field of RTP data packets, which is used to identify correlations between the timings of several RTP packets. The timestamp for the first packet is chosen at random, and the total of the former timestamp and the time taken to create the first byte of the current packet is utilized to determine the time stamp for subsequent packets. The value of one clock tick varies depending on the application. 

If the source is being actively sampled, the timestamp clock will advance by one at the end of each sampling period. Even when no activity comes from the source, the timestamp clock will continue to advance at the same pace. The size of the data pieces must fit inside the payload and break over whole-number octets, regardless of the technique, packets, or periods used. Arrival timings can also be calculated using the timestamp. Jitter is a term that describes the difference in arrival time.

6. Contributor identifier

RTP has the potential to transport multiple samples, which can come from several sources — distinguished using the contributor identifier. 

In this instance, RTP must offer a means to differentiate samples and the source for each stream to be rebuilt at the receiving end. Therefore, there is a 32-bit parameter that is used for source identification when several sources are participating in a session. The mixer source uses the synchronization source identity, whereas the other sources use the contributor identifier. 

A single source is connected with this packet if the value of this 4-bit field is set to zero (0000 in binary). It contains the number of different sources present if the value is nonzero. Different synchronization source identities are used when both audio and video come from the same node so that the network does not get confused between the two types of data. 

Even when two nodes communicate through audio and video streams through the same program, as when Skyping with a webcam and microphone, various synchronization sources are likely to be utilized. Conversion from one codec to another is also possible. 

See More: What Is FTP (File Transfer Protocol)? Definition, Uses, and Best Practices for 2022

Top 4 Applications of RTP

Next, let us look at some key ways the RTP protocol is used. Keep in mind that these examples were chosen to show how RTP applications work and not to imply a limitation on RTP’s capabilities.

1. Conferencing using audio and video

Audio and video are utilized in the same conference and sent as separate RTP sessions. For each media, RTCP packets are sent over two distinct UDP port pairings and multicast addresses. The video and audio sessions are not directly coupled at the RTP level; nevertheless, a user taking part in both sessions must utilize the same distinguished (canonical) identifier in the RTCP packets for connecting the sessions.

One reason for this distinction is to allow some conference attendees to get only one media if they want. Despite the separation, it is possible to accomplish synchronized playback of a source’s video and audio using timestamp data included in the RTCP packages for both sessions. RTP allows data to be sent in parallel to numerous destination endpoints through IP multicast. 

SIP or Session Initiation Protocol, for example, is a signaling protocol. The data streaming transmission begins when the sender and recipient have established a session. The data is transferred in bits, and the transmission computer divides the frame into bits. Reconstructed frames from bits are received at the receiver end. The RTP header must be at least 12 bytes long. The UDP or User Datagram Protocol is used to transmit data.

2. Translation and media mixing 

Consider a scenario in which a low-speed connection links conference participants in one location to most people with high-bandwidth network connectivity in another. Rather than forcing users towards reduced bandwidth, encoding lower-quality audio — an RTP-level relay called a mixer — might be set near the low-bandwidth region. 

The mixer resynchronizes incoming audio packets to recreate the user’s constant 20 ms spacing, merges the reconstructed audio files into a continuous sequence, changes the audio encoder to a relatively low format, and transmits the lower-bandwidth container stream in the low-speed network. One may broadcast these packets unicast to a specific recipient or multicast to multiple addresses.

The RTP header provides a means for mixers to distinguish the original resources that produced the mixed packet so that receivers may give the correct talker signal. Some of the audio conference’s intended participants may have high-bandwidth connections but may not be available through IP multicast. 

They could, for example, be protected by an application-level firewall that prevents IP packets from passing through. If mixing isn’t required at specific locations, a translator, a sort of RTP-level relay, can be used instead. One is positioned on either side of the firewall, with one on the outside directing multicast messages collected over a secure site to the internal translator. The translator behind the firewall then sends them as multicast messages to a multicast cluster restricted to the site’s intranet. 

See More: What Is MFT (Managed File Transfer)? Definition, Working, and Best Practices for 2022

3. Voice over Internet Protocol (VoIP)

Voice over Internet Protocol (VoIP) sends voice over data networks. This is a big improvement over the analog “plain old telephone service” or POTS, still in use after over a century. VoIP delivers the benefits of packet-switched networks to the telephone, such as decreased costs and reliability. 

Applications that need real-time streaming of multimedia data, such as VoIP, often demand rapid data transmission with different tolerances for packet loss. In a VoIP application, for example, audio packet loss might result in the loss of milliseconds of audio data. Error compensation algorithms can effectively handle this loss, making it minimal and undetectable to the caller(s).

VoIP applications will commonly leverage SIP to start and run the call, and they will utilize Secure Real-Time Transport Protocol (SRTP) to encrypt it. PBX software such as Asterisk and 3CX and other options are examples of VoIP servers that utilize RTP. Security VoIP may be provided using SRTP, which includes confidentiality, integrity, and secure authentication. AES is used for privacy, while SHA-1 is used for integrity in SRTP. While VoIP can offer significant cost savings, particularly for new locations without a major legacy voice investment, security issues remain. By default, several VoIP protocols, such as RTP, provide little or no security.

4. Real-time streaming 

Real-Time Streaming Protocol (RTSP), which can be used to send video between a server and a subscriber, was built with the help of RTP. RTSP is a protocol at the application layer that lets users tell streaming media servers to pause and play. When you connect to the server in real-time using RTSP, it is possible to control the streaming media in real time without actually sending the data. 

A prominent RTSP server is VideoLAN, the non-profit behind the VLC media player. Many security cameras may also broadcast footage via RTSP to a video security server for storage. For the simplicity of broadcasting to many people, specific live TV or streaming services use RTSP. 

RTSP servers often use RTP and RTCP together to send the actual streaming data. When RTSP is used to start a video broadcast from an IP camera, the device sends an RTSP request to the streaming server. This initiates the setup process; after that, RTP can be used to send video and audio data. So, RTSP can be considered a TV remote control for streaming media, while RTP is the actual broadcast.

See More: SFTP vs. FTPS: Understanding the 8 Key Differences

Understanding RTCP: The “Sister Protocol” to RTP

Real-Time Transport Control Protocol (RTCP) is a protocol that operates in combination with Real-Time Protocol (RTP) to track data delivery in large multicast networks. Because the RTP synchronization source IDs may vary, RTCP packets and their identifiers are kept separate from RTP values. Instead, RTCP employs a canonical name, often known as a CNAME. 

Since manufacturers implement RTP and RTCP in diverse ways, network behavior is not as predictable as we would want. The RTP control protocol (often referred to as the real-time control protocol) serves primarily to offer feedback on the RTP stream’s quality. RTCP specifies a variety of packet types, including;

  • Sender reports, which allow active senders in a session to submit transmission and reception information, are one of the packet types defined by RTCP.
  • Receiver reports are used by non-sending receivers to provide receipt statistics.
  • In source descriptions, CNAMEs as well as other sender descriptor data are conveyed.
  • Control packets specific to certain applications

As previously indicated, these numerous RTCP packet variants are transmitted via the lower-layer interface, which is often UDP. Multiple RTCP packets may fit inside one PDU of the lower-level protocol. A minimum of two RTCP messages must be sent in every lower-level PDU: a report package and a source description package. More messages might be added based on the size constraints of the lower-layer protocols.

Takeaway

In an era of nearly ubiquitous connectivity, real-time transport protocol, or RTP is a foundational concept. Every VoIP provider leverages RTP to equip enterprises with cloud-based communication services, and the protocol is widely used in consumer-facing media as well. With recent advancements like RTCP and RTSP, the protocol will remain central to network communication for years to come.  

Did this article help you understand how the real-time transport protocol works? Tell us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to hear from you! 

MORE ON NETWORKING