Cassandra stores memtables either on the Java heap off heap native memory or

Cassandra stores memtables either on the java heap

This preview shows page 585 - 587 out of 674 pages.

Cassandra stores memtables either on the Java heap, off-heap (native) memory, or both. The limits on heap and off-heap memory can be set via the properties memtable_heap_space_in_mb and memtable_offheap_space_in_mb , respectively. By default, Cassandra sets each of these values to 1/4 of the total heap size set in the cassandra-env.sh file. Allocating memory for memtables reduces the memory available for caching and other internal Cassandra structures, so tune carefully and in small increments. You can influence how Cassandra allocates and manages memory via the memtable_allocation_type property. This property configures another of Cassandra’s pluggable interfaces, selecting which implementation of the abstract class org.apache.cassandra.utils.memory.MemtablePool is used to control the memory used by each memtable. The default value heap_buffers causes Cassandra to allocate memtables on the heap using the Java New I/O (NIO) API, while offheap_buffers uses Java NIO to allocate a portion of each memtable both on and off the heap. The offheap_objects uses native memory directly, making Cassandra entirely responsible for memory management and garbage collection of memtable memory. This is a less-well documented feature, so it’s best to stick with the default here until you can gain more experience. Another element related to tuning the memtables is memtable_flush_writers . This setting, which is 2 by default, indicates the number of threads used to write out the memtables when it becomes necessary. If your data directories are backed by SSD, you should increase this to the number of cores, without exceeding the maximum value of 8. If you have a very large heap, it can improve performance to set this count higher, as these threads are blocked during disk I/O. You can also enable metered flushing on each table via the CQL CREATE TABLE or ALTER TABLE command. The memtable_flush_period_in_ms option sets the interval at which the memtable will be flushed to disk. Setting this property results in more predictable write I/O, but will also result in more SSTables and more frequent compactions, possibly impacting read performance. The default value of 0 means that periodic flushing is disabled, and flushes will only occur based on the commit log threshold or memtable threshold being reached.
Image of page 585
Commit Logs There are two sets of files that Cassandra writes to as part of handling update operations: the commit log and the SSTable files. Their different purposes need to be considered in order to understand how to treat them during configuration. Remember that the commit log can be thought of as short-term storage that helps ensure that data is not lost if a node crashes or is shut down before memtables can be flushed to disk. That’s because when a node is restarted, the commit log gets replayed. In fact, that’s the only time the commit log is read; clients never read from it. But the normal write operation to the commit log blocks, so it would damage performance to require clients to wait for the write to finish.
Image of page 586
Image of page 587

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture