Video Preview Generation – Wild walk with Innoval and AWS

In Blog, Cloud Technology by Jiju Thomas Mathew

1

Recently a client contact, challenged our AWS team with a task, to generate previews of video files which will be synced to Amazon S3 frequently after sourcing it from an ftp site.

Challenges:

  • Files will come at any time of the day 24 x 7
  • Will be organized into date based folders on bucket level
  • Will need to check the stability of file on arrival at the specified Amazon S3 bucket
  • Files will vary from between 10 to 900 MB
  • Will need to use the same bucket to store the preview files
  • Lifecycle rules to be implemented for the preview files to discard after a week
  • Keep cost as low as possible without loosing the time value

Initial Solution:

  • AWS Lambda with Node.Js 6.10 timed with Amazon Cloud Watch events at specified interval
  • Get S3 listing and store it to S3 after checking for repeat status
  • Twice same filesize and timestamp means file is stable
  • Pass files to On-Demand EC2 from custom Amazon Machine Image
  • Using ffmpeg and some shell scripts pick file from Amazon S3, do the process and store it back

After about a week, we started to identify some issues, which were due to the fact that we had overlooked the fine print of “Amazon Instance Hour Billing”. We were launching more or less 30 to 35 instances per day even though they were running just for a minute or two and terminating, we could be billed for as much number of instance hours.

A round of brainstorming was put forth and considering the following points we arrived at a further solid solution.

  • AWS Lambda (FaaS) runs on Amazon Linux
  • Lambda can run compiled binary if it is statically linked
  • Thanks to John Van Sickle (https://johnvansickle.com/ffmpeg/)
  • AWS Lambda has ephemeral disk of 512MB
  • Lambda can be triggered from S3 events

Final Solution:

  • Function from initial solution will pass on any sub 350MB files to a new Function
  • New function will do the same process as that of the EC2 using statically linked binary

Outcome:

With over 70 files being processed per day last week through EC2 which was being invoked for almost 35 hours a day, has now come down to about 5 hour perday. Well it means in figures, a reduction of $100 to $15 per month.