A client wanted to run mostly embarrassingly parallel headless Matlab computations at scale.

MathWorks’ Matlab is a multi-paradigm numerical computing environment and proprietary programming language. It is not surprising MathWorks provides various cloud options – the computational power of the cloud enables some interesting opportunities.

GNU’s Octave provides similar capabilities and many Matlab scripts will run, often with no modifications, under Octave. This is especially true for heavy computational tasks (think ML, signal processing, linear programming/optimizations, large linear algebra problems, etc.). Of course, there are no limits on running Octave in the cloud.

We had the following options:

- Matlab Parallel Server
- Matlab Compiler + Matlab Runtime
- Octave from repository (or flatpak, snapcraft, …)
- custom optimized Octave build (size and computational efficiency)

The first option is the easiest to implement but infeasible at scale. Since the workflow can change often, the second option was possible but more difficult (also the Matlab Runtime was ~1.9GB zip or 2.2GB fully extracted for Linux). In comparison, Octave found in repositories are light weight (~310MB including dependencies) but are typically old and the computational throughput was lacking compared to Matlab.

Profiling the workflow, we discovered the following bottlenecks: large linear programming problem, many Fourier transforms and critical linear algebra routines. Further, benchmarking suggested significant performance dependence on the (virtualized) processor types found in the cloud (e.g. SSE, AVX, AVX2, AVX512, etc. support).

Our client *boldly* decided to pursue the fourth option- a custom build of Octave.

We compiled the three key libraries (glpk, FFTW and openBLAS) and Octave under Amazon Linux 2. We did this on targeted EC2 spot instance types (could also be done locally in a VM using Amazon Linux 2 image). The resulting Octave (v5.1.0) build was almost 3x faster than the repository’s Octave and at least a wash with Matlab.

There was one challenge. The workflow involved a few legacy Matlab *.mex files from a third party. Ideally these would be recompiled as native *.oct files. Unfortunately this was not an option. We found a way to ‘squish‘ these *.mex file into Octave. However, the latest version of Octave that allowed this was 4.2.2. Native *.oct file could have used 5.1.0.

In addition to an optimized Octave, we bundled in the ‘squished’ *.mex files ** and** required Octave packages (control, nan, io, statistics, optim and signal) for the workflow. We baked the compressed Octave optimized builds (each ~15MB) into a custom Amazon Linux 2 AMI. At AMI bootup,

*–user-data*scripts determined the instance type and extracted (~85MB) the corresponding Octave install. To avoid EBS charges (and help Octave load times), the install was extracted to a small tmpfs.

We could have simply placed Octave installs inside a (Docker) container, but this would have boosted network traffic at scale (i.e. one instance could run multiple containers). Irrespective of which path, benchmarking identified the optimal throughput to price ratio with respect to instance type, size and spot price.

Once we had the automated procedure for optimized Octave builds, it was trivial to extend to any cloud provider and their individual solutions for high throughput computing (HTC). Our client now has access to previously unachievable scale across the major clouds ** and** at a cost of just over bare compute. Here, the client’s bold leadership paid off handsomely.

Finally, we thought of this work as ‘infrastructure’- code for the common good, if you like. Our client agreed the basic build system could be shared.

With thanks and pleasure: Enjoy!

**Update 10 July 2019** Example of using the above by automating the build across different instance types and putting the resulting archive(s) into a custom AMI for AWS Batch or AWS-parallelcluster.

[…] move the build scripts (for details) to a tmpfs on the spot instance, execute the build and retrieves the archive. Avoids EBS entirely. […]