Adaptive Performance Core

Pepperdata Adaptive Performance Core™ software observes and reshapes applications’ usage of CPU, RAM, network, and disk, without user intervention, to ensure jobs complete on time. Pepperdata dynamically prevents bottlenecks in multi-tenant, multi-workload clusters so that many users and jobs can run reliably on a single cluster at maximum utilization. Unlike cluster management tools and tuning, which use insufficient data and cannot respond to changing conditions, Pepperdata captures complete metrics for every process to solve performance problems automatically at scale.

Guarantee Quality of Service. Pepperdata senses contention for CPU, memory, disk I/O, and network at run time and will automatically slow down low-priority tasks when needed to ensure that your high-priority jobs complete on time — without the need to isolate workloads on separate clusters.

Increase cluster throughput by 30% to 50%. Pepperdata knows the true hardware resource capacity of your cluster and dynamically allows more work to be done by servers that have free resources at any given moment. Because Pepperdata software automatically and safely increases hardware utilization, you can run more jobs, run jobs faster, or reduce your hardware footprint.

Diagnose problems faster. Pepperdata gives you a both a macro and granular view of everything that’s happening across the cluster by monitoring the use of CPU, memory, disk I/O, and network for every job and container, by user or group, in real time. The software precisely pinpoints where problems are occurring so that IT teams can quickly identify and fix troublesome jobs. Since Pepperdata measures actual hardware usage in a centralized Hadoop deployment, the software also enables IT to accurately track and allocate costs associated with shared cluster usage per department, user, and job.

By guaranteeing stable and reliable cluster performance, Pepperdata allows enterprises to realize untapped value from existing distributed infrastructures and finally apply big data to more use cases to meet business objectives.

 

Quality of Service

Make Hadoop run the way your business needs it to. Pepperdata software’s Adaptive Performance Core™ senses contention for CPU, memory, disk I/O, and network at runtime and automatically slows down low-priority tasks when needed to ensure that high-priority applications complete on time.

Enterprises deploying Hadoop benefit from its ability to scale across thousands of servers and offer unprecedented insight into business operations. Hadoop’s downside today is that it lacks predictability. Hadoop does not allow enterprises to ensure that the most important jobs complete on time, and it does not effectively use a cluster’s full capacity. As a result, companies deploying Hadoop are often forced to create separate clusters for different workloads and to overprovision those clusters, which results in lower ROI on their Hadoop investments and significantly higher recurring operational costs.

Guarantee on-time execution of critical jobs
Enterprises depend on production jobs executing reliably, but Hadoop applications are notoriously unpredictable, especially in the world of multi-tenancy and mixed workloads. A single poorly-behaving job can unexpectedly consume much more than its fair share of network or disk, causing critical applications to miss their SLAs and leaving the operations team scrambling. Pepperdata constantly monitors the use of all computing resources by every interesting process on the cluster and takes automated action to ensure that each job, queue, user, or group is given the resources specified in the cluster policies.

Pepperdata enables Hadoop operators to prioritize the completion of business-critical production applications over ad hoc jobs in multi-tenant clusters with diverse workloads, for example by deploying a policy that limits the amount of bandwidth that low-priority jobs can use when that bandwidth is needed by other applications.

 

Increased Throughput

Hadoop is inefficient in its use of cluster hardware, because YARN and the scheduler only do up-front, conservative allocation when launching containers. This planning for the worst case means that most of the time, most servers are dramatically underutilized. Because Pepperdata software’s Adaptive Performance Core™ is aware of actual hardware usage on every server at every moment by every container, it can allocate more work to servers that currently have excess capacity, while ensuring that critical applications still complete on time and the cluster remains stable.

Pepperdata can improve your cluster throughput by 30-50%, letting you run more jobs in less time on your existing hardware.

 

Troubleshooting

Diagnose problems faster. Pepperdata gives you both a macro and granular view of everything that’s happening across the cluster by monitoring the use of CPU, memory, disk I/O, and network for every job and container, by user or group, in real time. These detailed performance metrics are captured second by second, and are saved so that you can analyze performance variations and anomalies over time.

You can also quickly and easily set up alerts on any metric of interest so that you’re notified when anything unexpected happens on your cluster. Alerts let you know that disks or network cards might be failing, that a job could benefit from tuning, or even that Pepperdata automatically took action to avert a cluster performance issue.

READ MORE
 

Pepperdata Technology

Pepperdata_Diagram_v4_web

Pepperdata software is easy to install and works with all Hadoop distributions without modifying the existing scheduler, workflow, and job submission process.

Architecture
Pepperdata software is composed of two main components.

Pepperdata Supervisor runs on the Resource Manager (or JobTracker) node and communicates with agents that run on every data node in the cluster. It collects hundreds of metrics associated with the consumption of CPU, memory, disk I/O, and network resources by container/task, job, user, and group on a second-by-second basis and dynamically optimizes the usage of those resources. It enables administrators to implement policies that guarantee the completion of high-priority jobs while maintaining the cluster at peak performance.

Pepperdata Dashboard renders real-time and historical visualizations and reports of hardware usage with user-level, job-level, and developer views.

Key Features and Capabilities

  • Cluster configuration policies file — Hadoop administrators can specify how much cluster hardware to guarantee to specific users, groups, or jobs. Pepperdata software’s Adaptive Performance Core™ ensures that high-priority workloads get the hardware resources they need, while dynamically making any remaining capacity available to other jobs.
  • Spark support — Pepperdata monitors and controls hardware usage of Spark, in addition to MapReduce.
  • HBase protection — HBase jobs can safely run side-by-side with Spark, MapReduce, and other types of jobs on the same cluster. Pepperdata ensures that the other jobs do not interfere with HBase’s access to hardware, ensuring low latency and predictability for HBase queries.
  • Alerts — Operators can quickly and easily set up proactive alerts on any metric and level of granularity (job, user, queue, or the entire cluster) to be notified when anything unusual happens on the cluster that might require attention.
  • Chargeback reports — Operators can accurately allocate hardware expenses by measuring cluster resource consumption at the user, group, and job level over any time period.
  • Near-zero overhead — Pepperdata agents consume just 1-2% of a single core, out of the 8 to 24 cores on a typical Hadoop server.
  • Installs on any Hadoop cluster – Pepperdata runs on clusters using any standard distribution, including Apache, Cloudera, Hortonworks, IBM, and MapR. Pepperdata works with both classic Hadoop (Hadoop 1) and YARN (Hadoop 2). Pepperdata supports clusters running on either physical nodes or virtual machines.
  • Complements schedulers — Pepperdata works with all popular schedulers (capacity scheduler, fair scheduler, etc.) without modification to workflows, job code, or existing cluster tuning parameters.
  • Complements YARN — The YARN ResourceManager allows a more diverse range of job types to be scheduled and launched on the cluster. Once those jobs start, Pepperdata’s optimization ensures they complete safely and on time.
  • Multi-cluster support — Multiple clusters can be monitored in a single dashboard.