Batch processing is a common, powerful pattern for high-CPU background workloads. Companies often use it for advanced simulations, rendering, media transcoding and processing, deep learning, and more. At Hudl, we're using AWS Batch to manage a video processing pipeline that includes a GPU-based deep learning algorithm.
AWS Batch is a recent addition to Amazon's cloud platform that makes it very simple to define and execute tasks without worrying about the infrastructure needed to make it happen. Once you define a task by providing a Docker image and necessary parameters, you can create hundreds of thousands of jobs, and let Batch deal with scaling, parallelization, and managing dependencies.
In this talk I'll walk through setting up Batch jobs (including some basic Docker images and everything on the Batch side), how Batch handles scheduling and dependencies, describe scenarios where Batch excels, and touch on some pain points we've experienced so far.
Hudl is still early in the stages of using it, but so far it’s proven easy to use and very adaptable to what we need. We’re planning to move more of our workloads into batch, including thumbnail generation, video transcoding and processing, PDF generation, and more.