IP Video Basics

System Overview - IPTV

The term "IPTV" has a range of meanings.  Major telecommunications service providers regard IPTV as only meaning the delivery of IP based video over their IP/DSL networks.  Other services may stream video content over the public Internet, and in fact many cable networks deliver digital video services over IP.  We choose to define IPTV in the broader sense as "the delivery of selected video content over IP networks".

An IPTV system comprises a source of video streams - a video server or head-end, a means for delivering video over an IP network and a playback system or IP Set Top Box.  Video is encapsulated in IP packets and delivered to the endpoint using Multicast, Unicast or potentially Broadcast.

Multicast or Unicast are more likely to be used in DSL, Internet based or mobile services in which a small number of channels can be delivered to the endpoint.  Multicast is the most efficient method to deliver content to a number of subscribers that are watching the same channel.  Unicast is often used for Video On Demand ("VoD") service.

Broadcast is typically used in cable networks which have the capability to simultaneously deliver very large numbers of streams over a single cable.

System Overview - IP Videoconferencing and HD Telepresence

IP Videoconferencing systems allow people to interact over distance, which saves on travel cost and allows more rapid business interactions.  HD (High Definition) Telepresence is an emerging term for high definition IP videoconferencing using life-sized images, which enhances the reality of the interaction.

An IP videoconferencing system comprises two or more conferencing units, where each conferencing unit has at least one monitor and camera.  Video conference sessions involving more than two locations may require some form of intermediate bridge to manage the flows of IP video from each location.

Desktop IP Videoconferencing is expected to become widespread, potentially due to the spread of this technology through low cost/free desktop VoIP services and mobile services.

Frames and Pictures

Video is transmitted as a series of frames or pictures - typically at a rate of 25-60 frames per second.  Some video systems (e.g. traditional TV) use interlacing, in which alternate frames contain odd and even scan lines and hence used reduced bandwidth. 

Resolution Type Horizontal Vertical Frame/sec Interlaced
480i SDTV 720 pixels 480 pixels 25-30 fps Yes
480p EDTV 640-852 pixels 480 pixels 50-60 fps No
720i HDTV 1024 pixels 720 pixels 25-30 fps Yes
720p HDTV 1024 pixels 720 pixels 50-60 fps No
1080i HDTV 1920 pixels 1080 pixels 25-30 fps Yes
1080p HDTV 1920 pixels 1080 pixels 50-60 fps No

Higher resolution and frame rates lead to higher transmitted bit rates - for example a 1080p picture has over ten times the number of pixels as a 480i picture. 

IP Video Streams

Video frames ("pictures") are normally transmitted at a rate of 15-60 frames per second.  Each frame is compressed, divided into small "chunks" for transmission over IP networks.  Frames may be compressed independently of other frames ("I" frames which are intra-frame encoded) or compressed based on differences from other frames ("P" or "B" frames, which are inter-frame encoded).

A typical Group of Pictures (GoP) comprises an I frame and some number of P and potentially B frames.  For example a GoP may comprise I, B, B, P, B, B, P, B, B, P, B, B, P, B, B sent at 30 frames per second. The I frame is independently compressed and hence results in a much larger number of IP packets than the P or B frames (which only encode changes from the previous frame).

Each compressed frame is divided into some number of transport units and each transport unit is first encapsulated in a transport packet (RTP or MPEG-2 Transport) and then in UDP or TCP and then in IP.  Hence an IP Video packet sent over an IP network will generally look like:-

    [IP header] [ UDP or TCP header] [ RTP header or MPEG-2 Transport ] [ Video payload ] 

In some cases MPEG-2 Transport is carried over RTP.

IP packets may be errored or lost, which can lead to video quality degradation.  Some video systems use Forward Error Correction, which adds some redundancy to the packet stream which allows some proportion of lost packets to be replaced at the receiving end.   Another common approach is to use a retransmission based protocol to replace lost packets, for example Reliable UDP, TCP or multicast with unicast retransmission.


I frames are encoded independently from any previous frame. For I frames - each frame is divided into blocks, typically 16 x 16 pixels in size, and each block transformed using a Discrete Cosine Transform (DCT).  The coefficients of the DCT are quantized and then compressed further.

P frames are based on the difference from the previous I or P frame.

B frames are based on the previous I frame or the previous or next P frame (i.e. can be bidirectionally encoded).

Common video codecs used in video broadcast are MPEG2, MPEG4/ H.264 and Microsoft's VC1.  In videoconferencing applications H.261, H.263 and increasingly H.264 are used.

Conditions of use: The material on this site is copyright Telchemy and may be freely used but not copied or downloaded.  In making use of this site the user acknowledges that Telchemy or Contributor has no liability for any issues or problems that may arise directly or indirectly as a result of such use.  Telchemy and Contributor are providing this material as-is with no warranty as to correctness or completeness and do not accept any responsibility for any issues or problems of any nature whatsoever that may arise from the use of the material on this site.