Slingshot: A modular framework for designing data processing systems

Bogdan Simion, Daniel N. Ilha, Suprio Ray, Leslie Barron, Angela Demke Brown, Ryan Johnson

IEEE Conference on Big Data, Santa Clara, California, October 2015

 

Abstract

Traditional relational database engines have been losing ground to specialized data processing engines in virtually every market segment, from data warehousing, OLTP, and stream processing, to scientific applications [1], [2], [3]. Although relational database engines are evolving to leverage new technologies and more efficient processing paradigms, the generality of a large monolithic engine often makes this a significant effort. Our aim is to delimit database engine components and decouple their functionality to design a more lightweight and flexible data processing engine that can support any application domain efficiently and without the effort of a complete redesign. We introduce a new data processing engine called Slingshot, where modularity and implementation flexibility are the top priority. In Slingshot, the core database engine is minimal and mainly handles inter-operation of the database components. Each component, abstracted by an interface, can be externally implemented and plugged into the framework as a module that handles the component’s functionality. As a result, component design decisions are offloaded to the modules, which allows designers the liberty to choose what features are suitable for their target applications, to drop excess functionality, and to optimize code independent of the rest of the engine. We compare Slingshot to a traditional RDBMS and to custom solutions on queries that are representative of three application types (spatial, OLAP, and OLTP). We show that Slingshot outperforms the RDBMS in most cases, while performing comparably in others. Furthermore, Slingshot performs better or comparable to custom solutions on most tests. Finally, we show that Slingshot’s flexibility allows for easy and efficient integration of GPU support for spatial refinement, resulting in speedups of up to 70x over traditional database engines on long-running spatial analytics queries.

 

Manuscript

Pdf

 

Bibtex

Bib