Apache Spark is the backbone of modern data processing, powering ETL pipelines, analytics, machine learning, and now GenAI workloads. But as datasets grow and infrastructure costs escalate, Spark’s performance and cost-efficiency gaps become more pronounced.
At DataPelago, we set out to solve this challenge, not by replacing Spark, but by accelerating it. We’re excited to introduce DataPelago Accelerator for Spark, the industry’s first plug-and-play accelerator for Apache Spark, built for heterogeneous compute and designed for immediate cost and performance gains.
DataPelago Accelerator for Spark, or Accelerator, is a drop-in acceleration layer for Apache Spark, available for both self-managed and managed deployments. Without changing a line of code or moving a byte of data, DataPelago Accelerator delivers:
Powered by DataPelago Nucleus, DataPelago’s Universal Data Processing Engine, the Accelerator enables Spark workloads to run seamlessly across CPUs, GPUs, and other accelerators.
With DataPelago Accelerator for Spark, we are able to complete a few of our heaviest OLAP cube jobs on OSS Spark—something that had been challenging due to data skew and performance bottlenecks. This opens the door for a migration from managed platforms without compromising speed or reliability while reducing our costs by 50%.
Apache Spark is flexible and widely adopted, but it's not inherently optimized for modern hardware. Open-source Spark offers cost control but lags in performance. Managed Spark services improve performance but at a significantly higher cost. Users are often forced to choose between speed and affordability.
DataPelago Accelerator removes this trade-off, delivering high performance and low cost — without compromising openness, flexibility, or compatibility.
The Accelerator optimizes Spark across the full execution pipeline — from query planning to code generation to runtime. Core enhancements include:
DataPelago Accelerator is already accelerating diverse production workloads:
* Acceleration and cost savings achieved with the same servers as before.
DataPelago Accelerator is designed for drop-in simplicity:
The Accelerator meets the demands of modern data teams:
DataPelago Accelerator is now generally available for both open source and managed Apache Spark clusters. It supports hybrid and multi-cloud environments, including GPU cloud providers and AI factories.
You can activate the Accelerator today with a single line during cluster startup — and immediately accelerate your Spark workloads while cutting costs.
We believe Spark should be fast, efficient, and open — without compromise. DataPelago Accelerator delivers on that vision, giving data teams the performance they need with the control they want.
Available now at datapelago.ai or contact us at info@datapelago.com for a demo.