Doug’s Q-Tips – How Does MPEG Encoding Work?

We refer to our QMOD solutions as HD Modulators – IPTV Encoders. That covers a lot of ground and requires a bit of explaining. A QMOD is an Encoder, as it accepts a video source and encodes it into an MPEG-2 stream. From there, it can package the stream to be carried over RF coax by an HD RF channel, a process called modulation. In addition, it can package the same stream for transport over an Ethernet system, called IPTV or Video Over Ethernet. These are essentially the same technologies at work when you tune in a cable channel or watch Netflix at home.

Pictures in an MPEG Stream


Back in the day when movies were made on celluloid, moving pictures were actually pictures running in sequence on film. Twenty-four still pictures went by your eyes in every second, and your brain perceived that as motion. The digital video you see on Netflix or today’s digital theater is put together in a similar fashion, with a few more tricks to scrunch the stream to the right bandwidth for the application. MPEG streams are made of GOPs – Groups of Pictures – as shown above. The groups could be small or large; typical broadcast streams use a GOP of 15 pictures. MPEG does more than just compress the picture in each frame – it looks at other frames and only encodes instructions on what changes or moves. Data takes way less space than pixels. The process begins with a GOP, simply a Group of Pictures.

  • Each GOP begins with an “I” frame called the Intra frame that is just compressed video, and it holds the most data in the group.  This one is stored in memory, and all other frames refer to it to note changes.
  • The next two frames are called the “BBidirectional frames – these are stored as well, since more information is needed to encode them.
  • Next, a “PPredictive frame is processed. It doesn’t store video; it records the difference between it and the previous I or P frame. In this example, it would throw out all of the background that stayed the same, and note that the “ball has moved”.
  • The encoder returns to the two B frames, encodes each by comparing the information in the previous and following I and P frames.

As a result of the process, the encoder outputs IBBP pattern as IPBB. The decoder at the other end of the stream assembles the video to the display to the original IBBP pattern by storing the four frames and restoring the Bs to the original sequence. Using the data, it rebuilds the stream into video sent to the display.

The MPEG encoder spends its bitrate budget with human perception in mind. As the eye focuses more on black and white (luminance) more than color (chrominance), it saves chrominance information at half the resolution as luminance. When an image is fairly static, the encoder will use the extra bitrate to fine tune detail. As we do, when things are moving fast, it sacrifices detail for better motion rendering.

MPEG and Latency

It’s good to understand how an MPEG stream is built and decoded, as it answers the basic question you may experience in your applications – latency. There is so much cross-referencing and storing in the process – making and playing the stream takes time.

Delay and latency is baked into the process. It takes about a minimum of 250 milliseconds, typically 400 – 800 ms, to create a 60Hz stream with a GOP of 15 frames. You can see this when you change channels, where you see about a delay before you see the next program. The decoder has to wait until the first I frame, store that and the next PBB frames, and put them back in sequence to play the video. As the decoder has its own delays in rebuilding and playing the stream, the TV itself adds about 100-200 ms to the total latency. For our new-gen QMODs, the total latency is averages 550 ms, about 400 ms for encoding, 150 ms for decoding.

There are techniques for shortening latency, such as dropping all the B frames or shorter GOPs – but there are quality tradeoffs, and many TV tuners aren’t setup for unorthodox encoding.

Evauating Encoder Performance

Not all encoders are created equal. In today’s market, there are products that employ consumer or broadcast quality encoding. In broadcast quality encoding, the tuned channel is very similar to the source video. Static images have clear lines and text – even small text is clear, with very little distortion in font edges. When playing video with fast motion, you can’t see a discernable loss in detail and moving objects aren’t pixellated. Consumer encoding delivers an image that is noticably softer than broadcast quality; a stressed encoder often compensates by softening edges. When there’s motion, you can see a visible loss in detail – an obvious fuzzyness that appears as the encoder sacrfices a lot of detail to process the motion.


MPEG-2 has remained the standard for off-air and cable broadcasting, while DirecTV and IPTV has adopted MPEG-4 (H.264). In addition, IP networks have standardized on MPEG-4 for video over Ethernet, though many players can decode MPEG-2 as well. We will be adding MPEG-4 encoding to our new-generation QMODs in the near future.