Episode image

Michael Abebe | Proteus: Autonomous Adaptive Storage for Mixed Workloads | #7

Disseminate: The Computer Science Research Podcast

Episode   ·  0 Play

Episode  ·  27:57  ·  Jul 18, 2022

About

Summary:Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimised for OLAP workloads. Maintaining these data copies consumes significant storage space and system resources. Conversely, if a system stores data in a single format, OLTP or OLAP workload performance suffers.In this interview, Michael talks about Proteus, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads. Proteus generates physical execution plans that utilize storage-aware operators for efficient transaction execution. For HTAP workloads, Proteus delivers superior performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.Questions:0:56: Can you start off by explaining what a mixed workload is? 1:58: What is the challenge database systems face in trying to support these mixed workloads? 3:23: How have previous database systems tried to support mixed workloads? 5:19: What are the design goals of Proteus? 7:23: Can you elaborate more on the architecture of Proteus and how it makes decisions? 8:46: Can you dig into how you predict the transaction latency, what is the mechanism behind this? 10:35: It feels to me that you are accumulating a lot of metadata, this must have some overhead, how does this impact performance? 12:08: It sounds like the Adaptive Storage Advisor is a centralized coordinator, what are the limitations of this decision choice?  13:35: Are we in the context of a data-center here or can Proteus handle a geo-distributed deployment? 14:34: Changing the storage layout has some implicit cost, how does Proteus decide whether a storage layout change is good or bad? 16:57: How does Proteus predict what the transaction is going to be?18:46: How did you evaluate Proteus?20:20: If you had to summarize your work, what is the one key insight the listener can take away?21:07: Is Proteus publicly available? 21:39: What are the next steps? 22:57: What is the most unexpected lesson you have learned whilst working on distributed database systems? 24:21: Do you think a single system catering for both workload types is better than two specialized engines? 26:10: What attracted you to work on this topic? Links:Paper: https://cs.uwaterloo.ca/~mtabebe/publications/abebeProteus2022SIGMOD.pdf Presentation: https://www.youtube.com/watch?v=qbe29viYTasUni of Waterloo Data Systems Group: https://uwaterloo.ca/data-systems-group/ Contact:Website: https://cs.uwaterloo.ca/~mtabebe/ Email: mtabebe@uwaterloo.ca GitHub: @mtabebe Hosted on Acast. See acast.com/privacy for more information.

27m 57s  ·  Jul 18, 2022

© 2022 Acast AB (OG)