SSD Lab

DBTest

Anatomy of the LSM Memory Buffer: Insights & Implications

Shubham Kaushik and Subhadeep Sarkar

In Proceedings of the International Workshop on Testing Database Systems (DBTest) , 2024

Abs PDF

Log-structured merge (LSM) tree is an ingestion-optimized data structure that is widely used in modern NoSQL key-value stores. To support high throughput for writes, LSM-trees maintain an in-memory buffer that absorbs the incoming entries before writing them to slower secondary storage. We point out that the choice of the data structure and implementation of the memory buffer has a significant impact on the overall performance of LSM-based storage engines. In fact, even with the same implementation of the buffer, the performance of a storage engine can vary by up to several orders of magnitude if there is a shift in the input workload.
In this paper, we benchmark the performance of LSM-based storage engines with different memory buffer implementations and under different workload characteristics. We experiment with four buffer implementations, namely, (i) vector, (ii) skip-list, (iii) hash skip-list, and (iv) hash linked-list, and for each implementation, we vary any design choices (such as bucket count in a hash skip-list and prefix length in a hash linked-list). We present a comprehensive performance benchmark for each buffer configuration, and highlight how the relative performance of the different buffer implementations varies with a shift in input workload. Lastly, we present a guideline for selecting the appropriate buffer implementation for a given workload and performance goal.
DBTest

KVBench: A Key-Value Benchmarking Suite

Zichen Zhu, Arpita Saha, Manos Athanassoulis, and Subhadeep Sarkar

In Proceedings of the International Workshop on Testing Database Systems (DBTest) , 2024

Abs PDF

Key-value stores are at the core of several modern NoSQL-based data systems, and thus, a comprehensive benchmark tool is of paramount importance in evaluating their performance under different workloads. Prior research reveals that real-world workloads have a diverse range of characteristics, such as the fraction of point queries that target non-existing keys, point and range deletes, as well as, different distributions for queries and updates, all of which have very different performance implications. State-of-the-art key-value workload generators, such as YCSB and db_bench, fail to generate workloads that emulate these practical workloads, limiting the dimensions on which we can benchmark the systems’ performance.
In this paper, we present KVBench, a novel synthetic workload generator that fills the gap between classical key-value workload generators and more complex real-life workloads. KVBench supports a wide range of operations, including point queries, range queries, inserts, updates, deletes, range deletes, and among these options, inserts, queries, and updates can be customized by different distributions. Compared to state-of-the-art key-value workload generators, KVBench offers a richer array of knobs, including the proportion of empty point queries, customized distributions for updates and queries, and range deletes with specific selectivity, constituting a significantly flexible framework that can better emulate real-world workloads.
EDBT

QuIT your B+-tree for the Quick Insertion Tree

Aneesh Raman, Konstantinos Karatsenidis, Shaolin Xie, Matthaios Olma, Subhadeep Sarkar, and Manos Athanassoulis

In Proceedings of the International Conference on Extending Database Technology (EDBT) , 2025

Abs PDF

Search trees, like B+-trees, are often used as index structures in data systems to improve query performance at the cost of index construction and maintenance. Production data systems drastically reduce the index construction cost when the data arrives fully sorted by employing a fast-path ingestion technique to their B+-tree that directly appends the incoming entries to the tail leaf. However, this optimization is only effective if the incoming data is fully sorted or has very few out-of-order entries. The state-of-the-art sortedness-aware design (SWARE) employs an in-memory buffer to capture near-sortedness to reduce the index construction cost when the data is nearly sorted. This, however, sacrifices performance during lookups and introduces additional design complexity.
To address these challenges, we present Quick Insertion Tree (QuIT), a new sortedness-aware index that improves ingestion performance with minimal design complexity and no read over- head. QuIT maintains in memory a pointer to the predicted- ordered-leaf (𝑝𝑜ℓ𝑒) that provides a sortedness-aware fast-path optimization, and facilitates faster ingestion. The key benefit comes from accurately predicting 𝑝𝑜ℓ𝑒 throughout data ingestion. Further, QuIT achieves high memory utilization by maintaining tightly packed leaf nodes when the ingested data arrives with high sortedness. This, in turn, helps improve performance during range lookups. Overall, QuIT outperforms B+-tree (SWARE) by up to 3× (2×) for ingestion, while also offering up to 1.32× faster (than SWARE) point lookup performance and accessing up to 2× fewer leaf nodes than the B+-tree during range lookups.
TODS

Enabling Timely and Persistent Deletion in LSM-Engines

Subhadeep Sarkar, Tarikul Islam Papon, Zichen Zhu, Dimitris Staratzis, and Manos Athanassoulis

ACM Transactions on Database Systems, 2023

Abs PDF

Data-intensive applications have fueled the evolution of log-structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as second-class citizens. A delete operation inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art LSM-engines do not provide guarantees as to how fast a tombstone will propagate to persist the deletion.
Further, LSM-engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written, leading to undesired latency spikes and increasing the overall operational cost of a database. Efficient and persistent deletion is key to support: (i) streaming systems operating on a window of data, (ii) privacy with latency guarantees on data deletion, and (iii) en masse cloud deployment of data systems. Further, we document that LSM-based key-value engines perform suboptimally in presence of deletes in a workload. Tombstone-driven logical deletes, by design, are unable to purge the deleted entries in a timely manner, and retaining the invalidated entries perpetually affects the overall performance of LSM-engines in terms of space amplification, write amplification, and read performance. Moreover, the potentially unbounded latency for persistent deletes brings in critical privacy concerns in light of the data privacy protection regulations, such as the right to be forgotten in EU’s GDPR, the right to delete in California’s CCPA and CPRA, and deletion right in Virginia’s VCDPA. Toward this, we introduce the delete design space for LSM-trees and highlight the performance implications of the different classes of delete operations.
To address these challenges, in this article, we build a new key-value storage engine, Lethe+, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order. We show that Lethe+ supports any user-defined threshold for the delete persistence latency offering higher read throughput (1.17x-1.4x) and lower space amplification (2.1x-9.8x), with a modest increase in write amplification (between 4% and 25%) that can be further amortized to less than 1%. In addition, Lethe+ supports efficient range deletes on a secondary delete key by dropping entire data pages without sacrificing read performance or employing a costly full tree merge.

Smart and Scalable Data Systems Lab

@ Department of Computer Science | Brandeis University

Latest News

We are hosting North-East Database Day 2025 at Brandeis! SSD Lab presents three posters!

Our paper ''QuIT your B+Tree for Quick Insertion Tree'' is accepted for publication in EDBT 2025.

Congratulations, Alex Ott! Alex gets full scholarship to attend the Brandeis x Possible program.

Our paper ''Anatomy of the LSM Memory Buffer: Insights & Implications'' is accepted for publication in DBTest 2024.

Our paper ''KVBench: A Key-Value Benchmarking Suite'' is accepted for publication in DBTest 2024.

SSD Lab goes to North-East Database Day 2024 with two posters!

Congratulations to Alex Ott for being selected for the Gordon Science Research Fellowship!

Our paper ''Enabling Timely and Persistent Deletion in LSM-Engines'' is now published in ACM ToDS.

Explore Our Research

Query Path Optimization

Deletion-Compliant Databases

Self-Designing Data Systems

Publication

Classes

Team

Find Us