AWS Image & Video Analysis in Vidinet Cognitive Services
Media content has always had quite a limited amount of information associated with it. Quite often this is the file name, short description and some technical metadata that the ingest process may have extracted from the source file. If we are lucky then there may be some global information of the media file itself, such as title, actors, type of content, age, licensing, and similar.
To go beyond this level of information and find information in time spans of the media, we have traditionally used manual audio-visual control to understand and describe what the media actually contains. This manual process takes time and resources.
There are many points in a media supply chain where information describing objects and environments can be very useful. Some examples might be:
- Ingest of media in a post-production workflow that requires the logging of content.
- The acquisition of media into a distribution platform needs information for compliance checking.
- Adding to the user experience by detecting highlights in sports.
- Automating trailer/promo production by cognitive metadata information
- Find a combination of objects including faces as well as environments in a vast media archive
Wouldn’t it be great if we use technology here to harvest this metadata data info automatically instead? Well, since you are already reading this article, you have asked yourself the same question, of course.
The answer is machine learning video analysis – a technology that trains computer software to detect objects, faces, and environments in the media content. Note the wording here – train. Because what happens when you train? You get better, right?
Machine learning algorithms get better and better over time, something as you have probably already discovered just by comparing your google photo search for “ice cream“ in your photo library today with the same search two years ago.
VidiNet Cognitive Services and AWS Rekognition
VidiNet is our media supply chain platform where Vidispine customers add and configure different services for their on-premise, cloud, or hybrid environment. In here, you can now access VCS video and image analysis and add this service to your infrastructure – or just your trial account.
The VidiNet Cognitive Services (VCS) is a core architecture designed to manage cognitive services from a growing number of providers on the market. In this first release of VCS, you will find cognitive services based on the AWS Rekognition libraries. With the introduction of VCS, we now take VidiCore API and Vidinet to the next level.
The AWS image Rekognition libraries in VidiNet Cognitive Services (VCS) are able to automatically detect enough information about what is inside your media content to free up human resources to intervene only when necessary. This way, you can not only save on human resources by offloading image detection to computer software but also use your human resources for those unique image recognition tasks only more suitable for manual control.
The VCS video and image analysis will decode the video content and present a list of each found item. All machine learning software depends on confidence values, both for training and executing.
The VCS video and image analysis will provide a confidence value for each item found. This value is essential when designing an architecture that are dependent on the objects, faces, and environments found in the content.
Do you require a confidence value of 90 % or higher for an item to be sure you have the correct data? Or are you willing to lower this confidence value to 60 % to ensure you are not missing an object?
Note that the AWS image Rekognition in VCS will also report back an interpretation of the current event such as “leisure activities “ and is also aware that this is a musical instrument and not only a guitar. The activity in the picture itself is described as “playing guitar“.
And most importantly, VidiCore API, of course, offers the APIs for a UI that will allow for a manual adjustment.
Our UI is your UI
Use this toolkit to design precisely the UI that works best in your media supply chain. The VidiCore Development Toolkit (VDTK) is free and includes multiple packages:
- React wrappers
- Prebuilt components using https://material-ui.com/ (react components using Google's material design CSS) (edited)
The UI examples in this article are built on React. When you start a trial on VidiNet, we will provide you with an easy to use UI for testing.
VCS video and image analysis and AWS Rekognition pricing?
When you try out the VCS video and image analyze, you will get an automatic cost estimate based on the AWS Rekognition pricing and the source duration for the job you are starting. Use this estimate as a basis for calculating the price for the automation of VCS video and image analysis in your media supply chain.
Currently, we charge 0,1 USD per content minute and 0,001 USD per image but remember that you only pay when you use the service. You will scale up or pause your media supply chain whenever your business model requires it.
This flexibility is just one of many advantages when building your media supply chain with Vidispine.