ArvatoSystems_MA_Media_John_Proctor
Peter BC
MA_Dirk_Steinmeyer_Broadcast
YOUR CONTACTS
Unsplash_what is video transcoding_vidispine

What is Video Transcoding?

File formats in video transcoding
Video transcoding process
Video transcoding: the next big thing?
Book a Demo

To answer the question – what is video transcoding – we first need to understand a little bit about video lifecycles and compression. 
 

When we generate or capture an image, we generate a huge amount of data. If we consider a High Definition (HD) picture, there are over 2 million pixels, and for each pixel we need to capture data on color and brightness. In the case of video, depending on the frame rate we are capturing up to 60 or more of these images per second. 

The amount of data soon becomes unmanageable so we throw some of it, in fact eventually quite a lot of it, away by compressing it. At a high level, there are three levels of compression:

  • Lossless – where we throw away data that we can completely recreate later
  • Visually lossless – where we throw away data, but we can’t detect what we’ve thrown away
  • Lossy – where we throw away data and can see the difference, although in many cases, viewers aren’t aware of this.

We combine these different types of compression and use them to “encode” the video using algorithms known as “codecs” - short for (en)code, decode. Depending on where the video is in its lifecycle, we use different codecs and different levels of compression. Early in the lifecycle, we want to store as much image information as possible to enable processes such as:

  • color correction
  • green and blue screen compositing
  • similar editing and grading. 

For example, you might want to adjust the picture to find details in the shadows or highlights. If you have shot your scenes using a codec that stores a lot of data, e.g. specifies a high bitrate, you will have the information available in your low and highlights for your creative decision how to adjust and present the video file. 
 

A production facility will compensate for these higher bitrate requirements by using high performance storage and fast data networks between editing stations in order to provide the video data in real-time to the editors. 
 

To make the material usable downstream, for example viewable on a web client, these master files need to be converted to use a different codec and/or more compression. 
 

This conversion is called video transcoding.
 

Video transcoding happens many times in the video lifecycle and almost every device you consume video on requires a different codec or compression, resulting in many video transcoding processes. If done well, video transcoding has little impact on the perceived viewing quality. That amazing 4K HDR picture on your new TV from your favorite streaming service probably has a bit rate of around 12Mbps while an uncompressed 4K image can exceed 12Gbs – more than 1000 times more!

What are the types of file format used in Video Transcoding?

In this article, we focus on “file-based transcoding” – the process of taking one video file and converting it to create a second – but there are other types of transcoding, for example converting video streams.
 

Video formats can be broken down into 6 parts.

  • 1. Video Codec

    The video codec is the algorithm used to compress the video. Most codecs offer many levels of compression and different parameters that enable you to optimize that codec for different scenarios. For example, many codecs offer the possibility to encode each image, or frame, individually, or to encode a Group of Pictures or GOP. In many videos, there’s little movement between one image/frame and the next, so we often can achieve a much higher level of compression through this method. However, decoding the video requires a whole GOP to be decoded before the first image can be displayed, and for non-linear applications (where we jump around in the video to different frames), such as editing, this requires far more processing, so like overall compression level, these parameters will differ during the lifecycle.


    Common video codecs include:

    • MPEG 2
    • H.264 AVC (Advanced Video Codec)
    • H.265 HEVC (High Efficiency Video Codec)
    • VP8
    • VP9
    • JPEG2000
  • 2. Audio Codec

    Recorded audio requires far less data than video, and in most cases it is far less compressed. However, audio is often processed as part of video transcoding, for example to meet loudness regulations, and will normally be compressed/encoded in the later stages of the lifecycle.


    Common audio codecs include:

    • WAV – wave file format
    • AIFF – audio interchange file format
    • MP3 – MPEG 1 layer 3 encoding
    • DOLBY® E – surround sound encoding
    • DTS – surround sound encoding
  • 3. ANC

    Ancillary data, of ANC, plays a huge role in modern video workflows. ANC covers all the data, and metadata, that needs to be carried with the video and audio. This includes things like subtitles/closed captions – required by law in many regions when it comes to distribution – but earlier in the lifecycle can include things like GPS data from the camera and other production metadata. ANC data can change and be added to often in the video lifecycle, so ANC processing forms an important part of video transcoding.
     

    Ancillary data is often stored in defacto standard or proprietary sidecar formats, such as .STL, or in standardized containers within the wrapper.
     

  • 4. Wrapper

    A wrapper is often used, as the name suggests, to wrap together the video, audio and ANC components. The wrapper can be a separate file for those components, or be used to wrap them into a single file. Wrappers can be hugely important for interoperability between systems.

     

    The two most common wrappers are QuickTime, or .mov, and MXF, the Material eXchange Format.

  • 5. Metadata

    In addition to the ANC data, many video files will be accompanied by descriptive metadata. This is particularly common for video formats used to move files between systems or facilities. Metadata is commonly stored in a sidecar .xml file with a proprietary data schema.

  • For certain video formats, the video and audio components themselves are broken chunks of a certain duration. This is the case, for example, with some cinema formats, where each image/frame is stored individually, and most OTT (Over-the-top) streaming formats where the video/audio is stored in longer chunks. In other formats, such as IMF (Interoperable Master Format) those components may be of different duration and or include different versions). So that transcoders/streaming servers etc. know what belongs where, in these cases there will be additional files or components alongside the video, audio ANC and metadata that describes what’s what. These formats often use XML files for the purpose of packaging.


    Some video formats are standardized and come about through collaboration between technology vendors and users. This is typically to ensure interoperability between systems and facilities. Some examples include:
     

    • IMF – the Interoperable Master Format
    • AS10 – Application Specification 10, file delivery format published by AMWA/DPP
    • AS11– Application Specification 11, file delivery format published by AMWA/DPP
    • MPEGDASH – OTT streaming format now supported by most devices
       

    As they kick off the video lifecycle, a lot of formats are created by manufacturers. Some examples include:
     

    • DVCPro / DVCPro50 – Panasonic Standard Definition (SD) DV (Digital Video) format
    • XDCAM DV – Sony equivalent to above
    • XDCAM / IMX / eVTR – Sony alternative to DV, based on MPEG 2 and MXF
    • DVCPro 100 – Panasonic High Definition (HD) DV format
    • HDV / XDCAM HD – Sony High Definition format based on MPEG 2 and MXF
    • AVC Intra – Panasonic HD format based on H.264 and MXF
    • AVC Ultra – Panasonic Ultra HD (UHD) format based on H.264 and MXF
    • REDCODE – proprietary format of camera manufacturer Red
    • ARRI RAW – proprietary format of camera manufacturer Arri


    In production, with slightly different needs, formats can be created by the vendors of video editors, looking to optimize the performance of their software. Two such examples are:
     

    • Avid DNxHD – a proprietary codec typically wrapped in MXF
    • Apple ProRes – a proprietary codec typically wrapped in QuickTime, but also MXF


    For distribution, device or operating manufactures typically drive the format choice. MPEG DASH has consolidated the market here, but HLS (HTTP Live Streaming) was the Apple contender and Smooth Streaming, the Microsoft alternative.

How video transcoding process is carried out?

The video transcoding process requires two main inputs. First, details of what the input file or format is and second, the full set of parameters for the output. Typically, the details of what the input file/format is are determined through performing an analysis of the media. The parameters for the output are typically determined by rules in the media supply chain system, or derived from metadata based variables in the workflow engine. From here, we create what is known as a pipe.

The pipe exists for all the persisting components in the media file – video, audio, ANC and metadata, splitting where needed between components. Through analysis of the file and instruction, the transcoder can configure the pipe at the start of the transcode, meaning that the processes involved can happen in a concatenated way, rather than sequentially, reducing the duration of the transcode process.

For simplicity, we’ll start with just one component, the video.

Demux

The first step is to demux the input. This means to extract the compressed video data from the packaging and/or wrapper.

Decode

The second step is to decode the video – decompressing it back to, or as close as possible, the uncompressed original frames.

Video Processing

During a video transcode process, there are a number of processes that can, or need to be applied to the video. These include things like scaling, if the size of the output is different from input, de-interlacing or interlacing, if we’re going from a progressive “P” picture to interlaced “I” or vice-versa, and in advanced transcoders may include more advanced image processing that changes elements of the picture to improve the perceived results of the video encode process.

Encode

Taking the video and encoding it with the required destination codec.

Mux

Packing the video into the wrapper and or packaging and, where needed, combining it with the other components.

Video transcoding: the next big thing?

There are two key drivers that drive changes in video transcoding. The first is how the transcoders are implemented and operated, the second is the codecs used in the transcode process.

Looking at the first of these, one thing that is certainly on the rise is Cloud Transcoding. There’s now a wide range of transcoders available on-demand in the cloud. However, many of these services are aimed at different parts of the video lifecycle and can be difficult to use together or in an automated way. That’s where a cloud-based media services platform comes in, making different services available on demand, but enabling you to drive this with metadata based rules and integrated automation.


With regards to codecs, the latest buzz in the industry is the successor to HEVC called H266 or VVC for Versatile Video Codec ISO/IEC 23090-3. VVC was developed by the Joint Video Experts Team (JVET) with significant contributions from Fraunhofer HHI’s Video Coding and Analytics department.

Lab measurements already promise 30-50% lower bitrates at the same quality as HEVC and AV1.


The new technology and complexity will require significantly more processing power in the transcoder workflow and on the player side. Here is where availability of hardware decoders becomes very important.


Video transcoding is a critical stage and many points in the media supply chain. Video transcoders can also provide an overwhelming array of formats, options and parameters that can easily introduce errors into the workflow. Fortunately, it is possible to easily add the right transcoder for the task in question to a workflow when using a media services platform, ensuring the best quality throughout the video lifecycle and eliminating accidental errors.
 

VidiNet enables on demand access to a growing number of transcode services, including our own VidiCoder and services using AWS Elemental MediaConvert and Bitmovin technology. VidiNet users can always choose the right video transcoder for the task and switch between services since they are all made available through a single interface and the same API.

ArvatoSystems_Kontakt_AdobeStock_107480818

Get in touch with our media experts

We would love to hear from you.

If you have any questions or want to contact us, do not hesitate to reach out! 

You May Also Be Interested In

VidiNet

VidiNet is a cloud-based platform at the heart of the content ecosystem. The foundation for a broad range of applications and services, VidiNet provides a robust footing for the complete content chain.

Article hub for all you need to know about video, media and content

Learn more about video technology development

Your Contacts for Video Transcoding

ArvatoSystems_MA_Media_John_Proctor
John Proctor
Expert for Broadcast Solutions - North America
Peter BC
Peter Booth-Clibborn
Expert for Broadcast Solutions
MA_Dirk_Steinmeyer_Broadcast
Dirk Steinmeyer
Expert for Broadcast Solutions - Europe & MEA