Counting Files Over Multiple Cloud Storages Made Easy

By Vidispine - May 11, 2016 (Last updated: April 9, 2017)

Today in our guest post Binagora show how to count media files and the total size of those files, in a media library spread out over several cloud storages, using Vidispine and a node.js application.

Some time ago we had to deal with the following scenario…

One of our customers had a huge media library distributed in many media files stored on the cloud. Most of them in S3 buckets and a few other in Azure containers. They needed to answer two simple questions for the different environments they manage:

  1. How many media files do we have in total in our library?
  2. What’s the total size of all those media files?

The idea was not to answer these questions only once. What we want is being able to check this information as often as we need.

Luckily, Vidispine was already deployed in all environments so it was just a matter of using it properly, and this is how we did it:

We already had the different S3 buckets and Azure containers mapped on Vidispine as storages. node.js. This app gets all storages from Vidispine (executing on API request)

Then iterates through all the storages:

  • For each storage, gets all storage methods (another API request)
  • For those storages methods that are mapped to S3 or Azure, gets the bucket name or the container name (just parsing the method’s url)
  • Gets the total amount of files from the storage (another API request)
  • Gets the total amount of importable files from the storage (again, another API request)
  • Displays the storage id, total capacity, total size, storage type, storage name and file counts on the console output
  • Displays total size for files and importable files considering all storages.

binagora console output illustration

You can find the complete Vidispine storage checker application on Git Hub.

Some interesting things to have in mind:

  • It only parses the storage type and name for S3 buckets and Azure containers, but of course it could be easily extended to other kinds of storage methods (as File System, FTP, https, etc)
  • Vidispine address, port, username and password are read from a configuration file. The set is identified by argument that you pass to the tool on the command line . So you can define your own sets of value, using convenient names if you want (eg: DEV, QA, etc). Remember to update this configuration file when you run the console on your environment.
  • The only external dependency for the tool is the Vidispine REST API. This is an important concept. Notice that it’s reading files from AWS and Azure services without the need of using those specific APIs or SDKs. That’s part of the magic of having a DMAM system in the middle! You forget about where the files are or how to read them, you just ask Vidispine about them when you need.
  • Consuming a REST API is always pretty easy and straight forward. So you can do it from most of the popular programming languages. We used node.js because we like it and it’s trendy, but we could do it from C#, PHP or Java, always consuming the same API, without needing to know how was VS developed. That’s part of the magic of consuming REST APIs!
  • Performing this calculations by hand is very time consuming and error prone. And you’ll also need some external tools as Cloud Berry Explorer or Azure Storage Explorer just to mention some options.

Binagora Logo
This blog post was written by our friends at Binagora. Check them out and see how they can help you with your next Media&Entertainment project.