Results From The Easiest 2016 Video Predictions Ever – pt2

By Patrik - January 19, 2017 (Last updated: April 24, 2017)

In part two of my prediction recap I take a look at emotion detection, automatic metadata harvesting and automatic video editing. There was a lot of interesting stuff happening in the area, and I am looking forward to coming years as this will just explode.

This is part 2 (part 1) of going through last year’s predictions from the blog post The Six Easiest Predictions About Video In 2016. In this part I go through emotion recognition that turned out to be spot on, automatic metadata harvesting which still are struggling mainly because of a perception that it is not useful yet, and last but not least the automatic video editing that still have ways to go but got further than I thought. Let us know your thoughts on the subject. 

Emotion recognition is the new face recognition

Emotions are as important today, as they were last year, and the activity in this area proves it. Of the three companies I used as examples last year, Apple acquired Emotient, and the other two still exists. Google released the Vision API with emotion detection capabilities, Amazon released Rekognition API also with emotion detection, and Facebook acquired emotion detection startup FacioMetrics. Trust me, we will see the capabilities rolled into their home automation systems, with a Siri knowing if you’re happy or need coffee.

In 2016 we also saw some more usage of emotion detection, e.g., more market and audience analysis, with Kairos releasing Project Look, travel tips based on your mood, and also support when making financial decisions. Some more gimmicky than others, but usage picking up.

Researchers from MIT released a paper on using wireless signals to detect emotion (hey, shopping mall). Other researchers created a machine-vision algorithm that can make snap judgments by people’s faces (no, this robot will not trust you).

All in all, emotion detection is a vital part of a future where we interact more with machines, not only through a keyboard. It will also be one key to be able to automate video storytelling. I say that 2016 was the year of emotion detection in computer vision.Mood image emotion recognition

Automatic metadata harvesting for the win

This is an area that have picked up speed. I can add Google, Microsoft, Amazon AWS, and Clarifai to last year’s list, all with API’s for anyone to use. One interesting research project is the NMR-led ReCAP project that aims to introduce a platform for automatic content analysis services. Twitter is doing research into automatic tagging of live video with good results. I also found a project where an AI lip reads better than humans.

Accuracy may not be on par with humans yet, but speed is greater and if the choice is between getting something tagged and nothing tagged I prefer the first. Considering the scale of content the big players have to train their systems on accuracy will improve fast. Together with humans, automated metadata harvesting will work great.

Automatic metadata harvesting is not used at scale yet, but it’s an essential part of handling huge amounts of video. The more types of metadata you can collect automatically, and also correlate for accuracy the better. I’d say I am half correct and wait for 2018 when we’ll see commercial products in professional use, and those not using them will be left behind.

Automatic video editing become crowded

I listed three favorite automatic video editing applications last year. KnowMe seems to have died, Replay got acquired by GoPro, while ShredVideo still lives on. Magisto existed last year as well and have extended their product to small businesses, also adding an AI layer to it. An interesting new app is FLO, promising to create stories based on different narrative styles.

One interesting domain for automatic storytelling is action cameras, for three reasons. First, action cameras are becoming commodities with small differences except for accessories. Second, at the same time the community is built for speed, and you are never better than your last uploaded video. Third is that the action cameras contains different sensors that can be used. The main differentiator forward could very well be software. GoPro acquired Replay and now have automatic editing which TomTom offer in their Bandit camera, like Garmin does in their Virb.

As I wrote, most action cameras have GPS, gyros, accelerometers, and even a heart rate monitor. This makes it easier to to find highlights in the material, and therefore easier to automate editing. Last year, startup Graava promised an action camera with similar features, instead they skipped the camera and  focused on a mobile phone app using all available sensors on the phone to help the editing, probably because of fierce competition in the hardware area. With automatic editing, your latest chest deep powder run is online before you can hike back up for another one.

What about real movies? Not even close, but IBM had Watson helping out creating a trailer for the horror movie Morgan, cutting trailer creation time from weeks to 24 hours. Director Oscar Sharp and his technology partner had an AI create a script for a sci-fi short film, after it read the scripts of 100s of sci-fi movies. Then they spent a few days filming it. The result is entertaining, but not the next blockbuster.

There are of course research going into this as well, and maybe the guys from Vicarious can give computers some imagination. Another interesting research project touching this area is BBC’s Object Based Composition, breaking down video in its pieces. With those objects and corresponding metadata, it would be possible to automatically create new clips on a subject.

There is a lot of promising products, experiments and research in the area, and with the help of all the money pouring into AI, I am sure we will see more of this in the years to come. I’ll say it’s not crowded yet, but it’s heating up. It’s definitely useful for private use, and probably for smaller business with smaller budgets.