Myth #2 of Apache Spark Optimization: Cluster Autoscaling

In this blog series we’ll be examining the Five Myths of Apache Spark Optimization. (Stay tuned for the entire series!) If you’ve missed Myth #1, check it out here.

The second myth examines another common assumption of many Spark practitioners: Cluster Autoscaling stops applications from wasting resources.

How Cluster Autoscaling Improves Resource Utilization

Cluster Autoscaling is a crucial feature in cloud computing, particularly for Kubernetes clusters. When a developer launches an application, Cluster Autoscaling responds to the developer’s request for resources and spins up the necessary instances or nodes.

The autoscaling component dynamically adjusts the available compute resources based on current (and ever-changing) workload demands. It ensures that resources are provisioned when they’re needed and prevents instances from running either before or after resources are requested. Then, once the application is done running, Cluster Autoscaling automatically terminates the instances. The automated and dynamic resource provisioning enabled by Cluster Autoscaling is essential to minimizing waste in a cloud cluster.

Cluster Autoscaling is possible because it makes use of the on-demand, elastic nature of compute resources in the cloud. As a result, many data architects and engineers, especially those migrating Spark workloads to the cloud, believe that Cluster Autoscaling automatically solves the problem of application waste in a way that is not possible in on-premises environments.

The Limitations of Cluster Autoscaling

Despite all of the above benefits, Cluster Autoscaling does not address a fundamental problem: Apache Spark applications tend be wasteful, as we saw in Myth #1.

This means that Spark applications can waste the resources requested by the Cluster Autoscaler. Even with Cluster Autoscaling enabled, it’s possible (and indeed common) for Spark applications to request resources but not use them. In fact, as much as 30 percent or more of the resources provisioned for a Spark app can go directly to waste. Cluster Autoscaling is simply not designed to remediate waste inside Spark applications. Furthermore, it does nothing to help with an application that’s been written inefficiently or that doesn’t use all the resources provisioned for it. In short, Cluster Autoscaling is not a silver bullet solution to waste in the cloud.

To illustrate the problem, let’s consider a worst-case scenario in which a developer runs a simple application that only contains a Thread.sleep statement. Now imagine the developer requests a terabyte of memory for this application—an application that does nothing. The Cluster Autoscaler would do what was asked of it and deliver the requested terabyte of memory, even if that amount of memory represents an absurd amount of overprovisioning. The Cluster Autoscaler is not designed to validate or rationalize this developer’s request.

Figure 1: This graphic represents that even if a developer requests resources accurately for peak, the autoscaler will grant the resources based on the request and not the actual utilization of resources required to run the application.

Although this is an extreme example, it underscores the point that the Cluster Autoscaler is not immune to pathological applications or excessive provisioning requests. And, as we have seen, even the most well-written Spark applications can contain significant and surprising amounts of waste.

In our next blog entry in this series, we’ll examine the third myth, which is related to instance rightsizing. Stay tuned!

Myth #2 of Apache Spark Optimization: Cluster Autoscaling

How Cluster Autoscaling Improves Resource Utilization

The Limitations of Cluster Autoscaling

Explore More