We’d like to recognize Steffen Kläbe, a Research Engineer at Actian in llmenau (Thuringia, Germany). He attended the 2023 joint conference by EDBT/ICDT in Greece, one of the top database conferences worldwide, where he presented two research papers. For his research on Patched Multi-Key Partitioning for Robust Query Performance he received an award for Best Paper. In the research community, this award is quite a success.
View the abstract:
“Data partitioning is the key for parallel query processing in modern analytical database systems. Choosing the right partitioning key for a given dataset is a difficult task and crucial for query performance. Real world data warehouses contain a large amount of tables connected in complex schemes resulting in an overwhelming amount of partition key candidates. In this paper, we present the approach of patched multi-key partitioning, allowing to define multiple partition keys simultaneously without data replication. The key idea is to map the relational table partitioning problem to a graph partition problem in order to use existing graph partitioning algorithms to find connectivity components in the data and maintain exceptions (patches) to the partitioning separately. We show that patched multi-key partitioning offer opportunities for achieving robust query performance, i.e. reaching reasonably good performance for many queries instead of optimal performance for only a few queries.”
Kläbe’s additional paper Exploration of Approaches for In-Database ML covers the increasing role of integrating ML models with specialized frameworks for classification or prediction.
View the abstract:
“Database systems are no longer used only for the storage of plain structured data and basic analyses. An increasing role is also played by the integration of ML models, e.g., neural networks with specialized frameworks, and their use for classification or prediction. However, using such models on data stored in a database system might require downloading the data and performing the computations outside. In this paper, we evaluate approaches for integrating the ML inference step as a special query operator – the ModelJoin. We explore several options for this integration on different abstraction levels: relational representation of the models as well as SQL queries for inference, the use of UDFs, the use of APIs to existing ML runtimes and a native implementation of the ModelJoin as a query operator supporting both CPU and GPU execution. Our evaluation results show that integrating ML runtimes over APIs perform similarly to a native operator while being generic to support arbitrary model types. The solution of relational representation and SQL queries is most portable and works well for smaller inputs without any changes needed in the database engine.”
Congratulations, Steffan! We look forward to seeing more of your wins and research in the future.