DataPelago transforms the economics of data processing, so your GenAI and Analytics aren't just faster—they're in a league of their own

Any data. Lightning fast
at any scale with unbeatable economics
Structured, semi-structured, or unstructured — we supercharge it all. Training models, fine-tuning AI, powering RAG, or extracting insights—we accelerate data processing in every workload
Discover new value
with zero disruption or lock in
90% of your data sits untapped because processing is too slow and expensive. Accelerate processing of massive datasets in record time — preserving your existing applications and infrastructure, with zero vendor lock-in.
Go beyond Moore’s wall with unprecedented price / performance advantage and unlock new workloads. The platform refactors data processing to exploit accelerated computing - leveraging the higher degree of parallelism and tightly-coupled memory model to deliver orders of magnitude of higher performance.
Novel computing abstraction to enable heterogeneous accelerated computing including GPUs, FPGAs, and CPUs. The platform intelligently maps operations to execution units that are also dynamically reconfigured to match the query operators.
DataPelago’s engine accelerates data processing for GenAI and Lakehouse Analytics. The engine leverages Substrait-based open-source frameworks such as Gluten and related technologies. Now Spark, Trino, and other query engines can fully exploit the benefits of GPU, CPU and FPGA acceleration.
Seamless integration with SQL, Python, and other programming languages, workflow automation tools such as Airflow, query clients such as Notebook, Tableau, Power BI, etc. Deploy without any changes to data, tools, and processes. No vendor lock-in.
Accelerate GenAI from Data to Deployment
DataPelago accelerates multi-modal GenAI data processing end-to-end. Extract, filter, chunk, tokenize, and embed—then deploy foundation models, fine-tune systems, or build RAG applications faster with always-fresh data





At Akad Seguros, innovation is woven into our DNA, fueling our unwavering commitment to exceptional customer service. Our partnership with DataPelago exemplifies this dedication, as we modernize our data architecture and unify processing pipelines for GenAI and data analysis. Leveraging DataPelago's advanced platform, we can seamlessly process structured, semi-structured, and unstructured data, reducing our costs by more than 50% and enhancing operational performance. By fully utilizing AWS’s Accelerated Computing (GPU) infrastructure, this collaboration is transforming our capacity to deliver superior results and elevating the quality of service for our customers.
DataPelago enabled us to scale our analytics and AI workloads without any re-engineering. We saw proof of value within days-meaningful performance gains and costs savings with zero application changes required. It’s truly production-ready from day one, and their customer support made the entire experience seamless. DataPelago serves as a powerful force multiplier that delivers real business impact.
With DataPelago, we were finally able to complete our heaviest OLAP cube jobs on OSS Spark—something that had been impossible due to data skew and performance bottlenecks. This opens the door for a full migration from managed platforms without compromising speed or reliability while reducing our costs by 50%.
The exponential growth of semi-structured and unstructured data along with rapid Gen AI/AI adoption is driving innovation, not only in AI, but in data management and data processing. McAfee has been proud to partner with DataPelago on the design of their technology that shows promising results, including significant performance and cost improvements on certain workloads.
Samsung SDS America has been working with DataPelago to evaluate their data processing platform in our AWS VPC, leveraging Accelerated Computing Infrastructure (GPUs). In testing with sample data, we’ve seen promising results in terms of performance and cost efficiency compared to traditional compute engines. DataPelago's platform shows potential in modernizing architecture and unifying data processing pipelines for GenAI and analytics, handling structured, semi-structured, and unstructured data types. This collaboration aligns with our interest in exploring innovative solutions that separate compute and storage, enhancing flexibility and reducing vendor lock-in.
Twingo is proud to partner with and serve as an official reseller for DataPelago, delivering cutting-edge Big Data solutions to the Israeli market. As an early design partner, we are excited to offer DataPelago’s unified data processing platform, accelerating engines like Spark and Trino using advanced CPU and GPU infrastructure across any data lakehouse format, including Iceberg, Hudi, and Delta Lake. The benchmarks from our collaboration are groundbreaking, reducing Total Cost of Ownership and delivering exceptional value. This partnership reinforces our commitment to innovation and next-gen solutions for data-driven organizations.
The growth in the volume of data processed by security systems is exponential as the adoption of AI and GenAI in cybersecurity continues to grow. Datapelago enables cost-effective expansion of AI/GenAI and cybersecurity systems by transforming the economics of data processing with its heterogeneous accelerated computing engine. As a security practitioner, I am excited with its modular architecture which allows for seamless plug-and-play integration with open-source components like Spark and Apache Gluten, ensuring frictionless deployment without any vendor lock-in.
As Director of Engineering at Uber and Presto Foundation GB Chair, I have extensive experience developing and running open-source analytics software at an enterprise scale. Our workloads typically included heavy scan/filter/join operations, which are ideal for hardware acceleration. It's exciting to see how DataPelago disrupts the industry by accelerating open-source frameworks like Presto and Spark with custom hardware infrastructure. I'm particularly impressed with their dynamic mapping to heterogeneous computing elements and reconfigurable run-time techniques. By accelerating open-source frameworks, I think DataPelago will significantly transform today's performance/$ paradigm and reshape the economics of data processing.
Congratulations to DataPelago on their launch and announcement that their engine will extend Gluten, Substrait and Velox to deliver the benefits of accelerated computing for Spark to address the performance and cost challenges in the Apache Spark community. Apache Gluten is designed to reuse Apache Spark's whole control flow, while offloading the compute-intensive data processing part to high performance native libraries in the backend. DataPelago is taking this quantum leap forward by extending Gluten with native accelerated computing enhancements, yielding orders of magnitude performance and cost improvements for Spark workloads!




