Cognitive services are often referred to as AI, but a better reference is probably Deep Learning which is a subset of Machine Learning. The history of Machine Learning starts as early as 1943 when computer scientists Walter Pitts and Warren McCulloch used the neural networks of the human brain to create a computer model for mimicking the “thought process” behind learning by trial and error. Walter Pitts and Warren McCulloch called this combination of algorithms and mathematics “threshold logic”. Today we refer to this as the confidence value.
This method proved very successful and forms the fundaments of today’s machine learning algorithms.
It is important to understand the technical background of this technology since the capabilities evolves over time as the algorithm is trained on more and more material. As an example, AWS speech recognition today is better than a year ago and will be even better a year from now.
IN THE API 5.0 THE AWS AI SERVICES ARE ENABLED IN VIDICORE API, WHAT FEATURES DO WE HAVE SO FAR IN THE PRODUCT?
The VidiCore API platform in itself has the advantage of being prepared to integrate with any number of cognitive services and suppliers. First out, we have integrated VidiCore API with Amazon Rekognition and Amazon Transcribe. A user in VidiNet will select the one of these AWS AI services and add the service to the current media supply chain. To be able to integrate many small cognitive services and orchestrate them together is likely the best strategy here, and this is why Vidispine will be the perfect platform for any media supply chain relying on cognitive services today and in the future.
Being able to add more cognitive services as they appear in VidiNet is of course very important since this will require no additional custom integration with the current media supply chain in use. VidiNet takes care of the integration layers in the Vidispine database. Also, any requirement on scaling performance for analyzing large batches of media is managed by the built-in native cloud architecture of VidiNet.
If you suddenly need to analyze 1000 files instead of 100 in a limited time span VidiNet will scale your performance to meet this new requirement.
WHAT KIND OF WORKFLOW DO YOU PREDICT TO BE THE MOST COMMON IN THE NEAR FUTURE?
It isn’t very easy to point to a specific workflow, really. Besides the actual suspects like speech to text and face recognition I see the advantages more from a media supply chain perspective where business and content intelligence come together. Customers in the future will recognize the importance of service agnostic basic metadata extraction that offers various types of cognitive recognition models while being able to unify all this metadata into a single MAM system and media supply chain to drive business intelligence. In addition, I believe that, especially in the field of computer vision, there should be a simple way of training your own current and regional concepts with just few examples (training data) integrated in the MAM directly. How this could look like and what we are experimenting with in this respect is beyond the scope of this issue and may be the subject of another one.
And once all necessary time accurate information are available we are able to build value added services on top that span from content intelligence like search and monetize content, content recommendation due to genealogy pattern, (real time) assistance systems due to rights ownership or recommendations while cutting due to target program slots based on rating prediction, content compliance to automatic highlight cuts of content, domain specific archive tagging packages, similarity search in respect to owned licenses, and much more.
Customers will start to find their own appliances and use cases for cognitive services. And there is growing numbers of cognitive serves providers out there at the same time with unique or overlapping functionality.
This is why the customized metadata architecture combined with centralized domain transformation from different metadata providers in VidiCore API and the cloud native VidiNet service portal becomes the obvious media supply chain of the future.
HOW DOES IT WORK?
Let´s take a look at the user interface of VidiNet and how the AWS AI services can be used.
After adding the AWS AI services to your Media Supply Chain in VidiNet you are ready to explore the different features.
Vidispine comes with a simple interface, VCV (Vidispine Content Viewer) where developers can test the result of our different services before adding them to production. In below example we are testing the Amazon Transcribe results on a test clip of former President Obama. It is clear that we will not be able to fully trust any speech transcribe service yet, but you can immediately see that the accuracy is pretty high. A human intervention will there for only need to adjust a smaller part of the actual transcription.
Amazon Rekognition is applied the same way. By analyzing your content you will get time-based metadata back with information on objects and faces recognized in your media.