Bonus Myth of Apache Spark Optimization

In this blog series we’ve examined Five Myths of Apache Spark Optimization. But one final, bonus myth remains unaddressed:

Bonus Myth: I’ve done everything I can. The rest of the application waste is just the cost of running Apache Spark.

Unfortunately, many companies running cloud environments have come to think of application waste as a cost of doing business, as inevitable as rent and taxes. This acquiescence to cloud waste has become pervasive, affecting even the most sophisticated IT teams. In fact, respondents in a recent survey indicated a 39 percent year-over-year increase in the amount of cloud spend over budget, and a third of companies say they will exceed their cloud budget by up to 40 percent.

In this final blog installment in this series we’ll demonstrate that things don’t have to be this way—and we’ll propose a solution.

Cost Optimization at the Infrastructure Level

In this blog series we’ve examined five different options that can help remediate cost overruns in the cloud, along with the benefits and limitations of each:

Observability and monitoring
Cluster Autoscaling
Instance Rightsizing
Manual application tuning
Spark Dynamic Allocation

The fundamental gap with all of these options is that none of these solutions addresses the significant waste inherent within Apache Spark applications. Instead, all of these solutions are infrastructure-level optimizations. Optimization at the infrastructure level saves money at the hardware layer and ensures the best financial return on an infrastructure investment. But only about 60 percent of the waste in a cloud environment exists at the infrastructure level.

Figure 1: Only about 60 percent of the waste in a cloud environment exists at the infrastructure level.

What remains untouched with all these options is waste at the application/platform level where applications run. This waste typically comprises around 40 percent of the optimization potential in a cloud environment.

This application/platform-level waste is not your fault, nor is it the fault of your developers. It stems from an underlying issue in application resource provisioning, particularly with Apache Spark.

As we’ve seen in this series, developers must request a certain allocation level of memory and CPU for their applications, and they typically request resources to accommodate peak usage; otherwise their applications get killed. However, most applications run at peak provisioning levels for only a small fraction of time, which results in most applications having extra resource provisioning—or waste—built in from the start.

The Challenge of Overprovisioning

On average, typical applications can be overprovisioned by 30 to 50 percent or sometimes more. The FinOps Foundation recently reported that reducing waste has become a top priority among cloud practitioners, while the Flexera State of the Cloud 2023 report found that cloud spend overran budget by an average of 18 percent in their surveyed enterprises, resulting in nearly a third of cloud spend going to waste every day.

That’s a lot of waste!

Capacity Optimizer Remediates Waste Inside Applications

Pepperdata built Capacity Optimizer to solve the problem of application waste, once and for all.

In contrast to all the solutions presented in this blog series, Capacity Optimizer detects unused resources inside Spark applications and provides that data to the native Kubernetes or YARN scheduler so it can run otherwise pending tasks. It continuously and autonomously reduces the waste and cost by increasing node level utilization in real time without the need for application changes.

This type of optimization is known as Continuous Intelligent Tuning. Pepperdata’s Continuous Intelligent Tuning enables the scheduler to make use of the node resources that are allocated but not used so that it can launch more workloads on existing nodes before adding new nodes.

Capacity Optimizer also optimizes autoscaling by ensuring that new instances are launched only when the existing instances are fully utilized. It targets about 85 percent utilization and then ensures that the autoscaler does not add more nodes while the currently running nodes are idle. The result: CPU and memory are autonomously optimized to run more workloads to increase savings.

Figure 2: Pepperdata enables nodes to run at the greatest capacity and efficiency.

Capacity Optimizer reduces instance hours and costs between 30-47 percent on average by eliminating waste and maximizing resource utilization, eliminating the need for tweaking and tuning clusters and applications.

Capacity Optimizer Gives Time Back to Your Developers

As with any solution, there are some things that Capacity Optimizer does not do (and in this case, what we don’t do may be very helpful!):

Capacity Optimizer does not change application code or configurations, nor does it require developers to do so. Developers are often sensitive about other people or third-party services touching their applications, especially without their knowledge. Capacity Optimizer never modifies applications or configuration parameters. It also never requires developers to tweak their applications. Instead, Capacity Optimizer augments the scheduler and the Cluster Autoscaler with real-time data about waste inside Spark applications.
Capacity Optimizer does not require any upfront data modeling. Unlike other solutions, Capacity Optimizer does not analyze a set of applications over a period of time and then tune them the next time they run. That type of effort is usually futile, because as we have seen in this blog series, modern data environments are highly dynamic; whatever happened last week may be completely different from whatever is happening this week. Instead, Capacity Optimizer is a real-time solution that empowers the native scheduler and Cluster Autoscaler with improved, point-of-action information that remediates waste as it occurs.
Capacity Optimizer does not rely on tuning recommendations; it automates tuning. As we saw in Myth #4, manual application tuning in response to automated recommendations can be both onerous and of limited value. Capacity Optimizer automates tuning, and does this at scale so your developers can spend their time on other critical projects instead of application tuning. However, if a customer really wants a list of tuning recommendations, detailed metrics and recommendations for individual applications are always available in the Pepperdata dashboard.

The net result of all these benefits: developers are freed from the tedium of tuning applications and configurations so they can focus on higher-value projects that grow your business.

The Proof is in the ROI

As with all solutions, the real metric is: are customers happy with their results? The answer is a resounding YES.

Capacity Optimizer has been deployed by some of the largest and most demanding enterprises in the world, including members of the Fortune 5, security-conscious global banks, and other top-tier companies that have come to trust and rely on Pepperdata. Hardened and battle tested in those environments for the last decade, Capacity Optimizer also provides deep application level observability as it optimizes cloud clusters continuously, autonomously, and in real time.

The three anonymized but actual examples below demonstrate incredibly compelling and ongoing customer successes achieved with Capacity Optimizer. Remember—these are customers who already did all of the engineering necessary to optimize their systems and still achieved incredible savings once they implemented Capacity Optimizer.

Pepperdata Customers Enjoy Significant Daily and Yearly Cost Savings

For even more detail on how our customers have saved, please see our Case Studies with the results of Pepperdata savings for Autodesk, Extole, and many others.

Are You Ready to Eliminate Application Waste, Once and For All?

To summarize the entire series of myths, we reviewed the common misconceptions around five key optimization strategies at the infrastructure and application level: deploying observability and monitoring tools and solutions, implementing Cluster Autoscaling, rightsizing instances, tuning applications manually, and enabling Spark Dynamic Allocation. These optimization strategies minimize waste in your cloud cluster generally, but do not address the waste inherent within the application itself.

To address this waste at the application level, we invite you to implement Pepperdata Capacity Optimizer to reduce these costs and extract the greatest value from your cloud environment. By maintaining applications and clusters in their sweet spot of utilization, Capacity Optimizer reduces hardware usage by up to 47 percent, on top of everything you’re already doing. The decreased instance hours translate directly to reduced cost and a lower monthly bill.

You don’t need an engineering sprint or a quarter to plan for Pepperdata. It’s super simple to try out. In a 60-minute call, we’ll create a Pepperdata dashboard account with you. Pepperdata is installed via a simple bootstrap script into your Amazon EMR environment and via Helm chart into Amazon EKS. All the savings are automatic and immediate, with an average savings of thirty percent. It’s totally free to test in your environment, and the savings you gain during your Proof of Value is also free.

To learn more, visit pepperdata.com or contact us at sales@pepperdata.com.