Learn how to Build Your Rock-Solid Enterprise Video Solution – Pt 1.

By Vidispine - October 20, 2015 (Last updated: April 5, 2017)

Do you want to build a video application, and want to make sure it is enterprise grade? From storage to high availability, our video fanatics will help you find your way around common pitfalls and mistakes in a multi part blog post. This is the first post where we go through storage, metadata and video formats.

Illustration Enterprise Video Solution
Video – all of a sudden everybody is using it, for everything…

  • marketing
  • internal instruction videos
  • staff memos
  • local news
  • user generated content
  • video CV:s
  • people talking about Poodles shaved to look like Camels! Don’t believe us?
  • …and the list just gets longer…

Video is everywhere, and there are numbers to prove it. Cisco predicts video will be 80% of all IP traffic in 2019. People now spend more time with video on the internet than social networks, with over 40% of the viewing happening on mobile. We all know it’s big, and that it will get bigger. Sooner or later you will end up building a video application for your enterprise.

So where to start? And what are the pitfalls? Say you’re a seasoned developer but really have no clue where to start when it comes to building rock-solid video applications. To make sure you get safely on your way I have talked to our video fanatics Erik and Nils here at Vidispine, to get some pro tips and tricks for you. In this first part we’ll talk about the basic stuff you will run into when you get started building video applications, covering storage, video formats and metadata. In the second part we will talk about content structure, workflows and scalability.

Storage (do you have any idea how much space video takes)

Video take a lot of space, and throughout the handling of video from raw footage to a finished video ready for viewing you will have created and stored several versions, whole or in pieces. You will have files located all over your file systems, and you need to decide how to handle that. You can go the route of centralizing all files, or you could go the other route, leaving content where they are but have a strategy to handle the de-centralization.


Deciding on storage is first a matter of how you will use your video files that are stored. Are you doing lots of editing? Are you streaming content? Are you building a huge library of content you will use rarely? The answer is often multiple of these choices and that’s why many organisations end up with tiered storage.
Editing, especially on original footage of high quality video, require fast I/O to disk and very often even special attention to your bandwidth to the disk. If you are only doing small scale editing and no team sharing, don’t worry about this, your laptop will do fine. Instead pay attention to the second tier, fast access repository.

In the fast access repository you will find content you work with on a weekly/monthly basis. This is likely to be on your SAN for example. Here you run the risk of overspending for no particular reason. Sales guys pushing expensive editing storage when all you really need is to quickly find your clips so you can give it to someone who will make the edit. It may make sense to pay attention so that you don’t end up with a proprietary format or file system. Go standard as much as you can.

For organisations with a _lot_ of content, and to be clear, we are talking about hundreds of terabytes, there is a third tier call near-line archive. Often it is object based storage, often smart solutions to quickly fetch files from cheaper medium. What it does is giving some middle ground between accessibility and cost efficient “deep archive”. Again, pay attention to standards! Go simple.


Last tier is deep archive. This is where you place content you may not use in many years. It is sometimes called “the why-not-archive”. Some solutions out there are so cheap it is more costly to go through the purging process rather than just dump it in the archive. We would recommend looking at cloud options, e.g., Amazon Glacier or Amazon S3. Regardless of solution it is important you get some basic structure in place to later find your content. We suggest you keep it simple and manage a lowres version, a thumb nail and some basic metadata. This you can store and manage on whatever storage since it doesn’t take up much space.

Going wrong with storage

So, what happens if you go wrong with storage? The biggest risk is spending too much money. Once you spent your money it’s not coming back! The second biggest risk is latency. For this there is a fix, spend some more and upgrade your tiers until you are happy. Make sure you have a system in place where it is easy to move clips between the tiers though and maybe even store them in multiple tiers seamlessly. The third biggest risk is that you don’t find your stuff. The fix is a basic content management system. Make sure you depend as little as possible on human processes. “Auto-anything” is the future as the volumes tend to grow much faster than you can keep up with.

Multi-device input and output (yes, they will use different devices)

Video come in all kinds of formats and containers, from your mobile phone to professional equipment. Depending on your application you might have to handle multiple formats from many different devices, and from audience platforms like web and social media. The same problem arises on the output side, where you want to make sure that different users get what they need, ranging from editable formats to the format you use to publish on your web-site. In short, it’s a mess and will not get easier any time soon!

Codecs and containers

When we talk about video format we are actually talking about two different things, the container and the codec. The container and codecs are often mixed up, for good reasons, as they sometimes have the same name, and in some case also overlaps in functionality.


The container is what we typically would call the format of the video. The container are there to bundle the different parts of a video, i.e., the actual video stream, the audio stream, but also metadata/data tracks with information. Examples of containers are MOV (Quicktime), Ogg or AVI. The codec on the other hand is the actual compression standard that are used to turn the raw bytes of the video stream into a compressed format, or the other way around, to decode the compressed stream for playback. Examples of codecs are H.264 and Theora.


You can (and will) change format of your video. This is called transcoding (when changing the codec), or transwrapping (when changing the container). This is something you need to be able to handle for several different formats, for now existing and future containers/codecs. As with many other things there are multiple ways to do transcoding. There are high-end transcoders, very often on proprietary hardware with a cost ranging from “very affordable” to “I’m running a TV-station budget”. If you are super ambitious and plan for your own corporate TV-channel or VOD-service, this could be your choice.

Then there is cloud based, per-minute transcoding. Popular services are for example Zencoder and Amazon Elastic Transcoder. This is good for cheap, standard bulk transcoding. The downside is that you really don’t know how much you will spend and you risk running into specialized formats they don’t know. Also, if the rest of your workflow is not cloud-based, you will pay for the download of the transcoded content in addition to the transcoding cost.

Finally there are software-based, integrated and configurable transcoders. This is kind of a middle ground with decent cost and usually good flexibility. The downside can be performance as they are usually running on commodity servers with CPU compared to GPU.

Metadata (making sure you can find your assets later on)

What you are building is essentially an asset management system, where the assets are your videos. Assets need to be taken care of to be valuable, and you need to find the right asset when you need it. This is when you need to start thinking about metadata, data about the data, data about your assets. Metadata can be anything from technical metadata describing aspects of your assets as capture time, author, resolution, etc, to descriptive metadata that describes the content of your assets, or how your assets are connected, but also administrative and semantic metadata. Even if you don’t have a metadata strategy when you start out, you need to come up with one soon.


When you have come to the point where you create or use many hours of video you would want to automatically harvest metadata to make it feasible. There are for example tools to create transcript from a video, or automatically detect logos or other images in your material, but also more advanced tools like mood recognition, sentiment analysis, scene recognition etc.

Now when you have the metadata it will be meaningless unless you can search in it to find what you are after. You want the search engine to handle any kind of metadata model, or maybe even multiple metadata models at the same time. You need to be able to find clips, sub-clips or even specific frames in all your content, no matter where it is located.

If you go wrong with metadata

There are three main things that can go wrong with metadata:

  • No metadata.
  • Too much metadata.
  • Bad metadata.

If you don’t attach any metadata to your assets, it’s easy to explain the consequences – you won’t find your assets, and not finding them means no way to monetize or repurpose in any meaningful way. Too much metadata is usually a symptom of not finding the metadata in the first place so you start building an advanced all encompassing metadata model, with a complete taxonomy of the world, and then try to force the users to add all that metadata. It won’t happen. That’s when you run into the third issue – bad metadata. The world as taxonomy, and manual metadata management will lead to bad metadata, which means you will not find your assets.

That adds a fourth issue that will arise as your content grows:

  • Not using automatic metadata management and harvesting tools wherever possible.

Automatic is the way to go when working with metadata. As your content grows you want to set up the metadata harvesting to be as automatic as possible, and instead direct the human effort to fix what the automation cannot. The tools for extracting all kinds of metadata, including scene detection, closed captioning is getting better every day, and depending on you needs you can incorporate the tools as they reach your needed level of maturity.

Shortcut your R&D with Vidispine – the worlds most comprehensive API-only media content management backbone for developers and media professionals. Discover, manage and transcode your local or cloud based media files on S3, EBS, CIFS, FTP, SFTP, https, https, Azure. Integrate with Aspera, Signiant, FileCatalyst, NetApp, Cerify, Avid ++. Authenticate with OAuth2, LDAP, AD, BasicAuth ++. Manage metadata all the way down to frame level. Transcode any audio or video format. RESTful API with great .NET SDK. Supports all REST clients incl cURL and Postman.