The Apache Cassandra storage engine

Sylvain Lebresne

Apache Cassandra is a distributed database built to handle massive amounts of data on large clusters of community servers. This talk will present the storage engine at the core of Cassandra, motivating the use of a structure akin to a Log-Structured Merge Tree rather than of a usual B-Tree and it's implications for the data model. We will also introduce most of the current features of that engine (secondary indexes, integrated caching, TTL...) including recent developments introduced in Cassandra 1.0 like compression/checksumming and the new leveled compaction.

