In this post we will explore B+tree data structure and how it is utilised by databases.
Why do we need indexes?
As we discussed in previous post we need indexes to efficiently read and write data. Lets look at how a B+ tree is structured
B+ tree structure
A B+tree has root nodes, internal nodes and leaf nodes.
Root nodes and internal nodes only contain keys and pointers to the children.
And only leaf nodes will contain the values. The values can be actual value or pointers to disk.
We typically align the node size with the OS Page size (often 4KB). This ensures that when the database requests a node, the disk fetches exactly one page, minimizing wasted I/O operations.
B+tree is a balanced tree and all the leaf nodes are at the same level. And each leaf node has a pointer to next leaf node. This makes it great for Sequential/ range based scan.
[20]
/ -------------
/ \
[10] [30]
/ \ / \
/ \ / \
[5 | 6 | 7] [10 | 12 | 17] [20 | 22 | 25] [30 | 35 | 40]
| | | |
+----=>--------+----=>---------+----=>----------+
Here we have root node, with key 20 and internal nodes 10, 30.
Internal Node 10, connects to two leaf nodes. One leaf node has all values less than 10 and other has all values greater than or eual to 10.
Similarly, for the other internal node too.
All the keys are sorted, so any child to right of node will have key >= to current node.
Note: In implementations each internal node may not have children on both sides.
B+tree over B tree
Compared to B tree, A B+ tree will always have values in the leaf nodes.
B tree will have values in internal nodes also, this leads to
- Increased size of indexes.
- Since all values are not at same level, range queries will become slow.
- Lastly all reads will not take consistent time.
If our index size is big that memory can’t fully hold it, we will have to offload parts of it to disk and then this introduces significant complexity. If we are querying something which is not loaded to memory we have to first move it to memory and then do searching.
In contrast just holding the keys as in B+tree, will considerably reduce the amount of memory needed.
B+ tree also fan out more, these are shorter trees with a lot of branches. The number of branches is determined internally and is called the branching order.
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data Storage | Keys & Values in all nodes | Values only in Leaf nodes |
| Tree Height | Higher (pointers take up space) | Lower (more keys per node) |
| Range Scans | Slow (requires tree traversal) | Fast (linked list at leaves) |
B+ tree operation
Get Query
- Get is simplest operation in B+ tree.
- We start from the root node. And find the element key just smaller or equal to the searched value and find the pointer to child node.
- After finding the node, we traverse to the internal node and do the same thing again till we reach the leaf node.
- In the leaf node, we find the key which is equal to searched key and return value.
- If we don’t find exact match, we return with NOT FOUND.
Range Query
- The initial bit is similar to Get Query.
- Once we find the correct leaf node.
- We traverse via the leaf node pointers and accumulate all the values which satisfy the range conditions.
Insert Query
- To insert a value we find the internal node which is just before or equal to to be inserted key.
- If there is no internal node which satisfies this, we will create new internal node with key same as to be inserted key
- From the internal node, we find or create a leaf node where we will store the key value pair.
- Now any time new internal nodes or values are created, we also have to update the parents with their reference.
- Also these inserts, both leaf and internal nodes one can cause , the node size to increase before a threshold value (page size as discussed before) will cause splitting of nodes.
- Split can cause cascading effect across tree, leading to more splits in upward direction, eventually increasing the height of tree.
Delete Query
- To delete a value we find th einternal node again which is just before or equal to to be deleted key
- From the internal nodes we will traverse to leaf node, and in the leaf node we will delete the key value pair.
- Now as in case for insertion leading to split, deletion of key can cause the node to reach minimum threshold. This will cause a merge of current node to one of the siblings.
- This merge can also cascade through the tree vertically, eventually decreasing the height of tree.
Node management
Here is a simple demonstration how a Split and merge happen in B+tree.
B+ tree Visualisation: https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
Splitting Nodes
As we have seen splitting of nodes is caused due to hitting the size limit of a node.
Once node has been split, lets say we will have two new nodes created left and right.
Left node replaces the original node. The link from parent node to original node now points to left node.
The right node has no parent and the first key in right node is promoted as key to parent node.
In some cases, this promotion to parent node causes split of parent node and so on, this is called cascading split.
Cascading splits may cause even the root node to split and addition of new root node, causing increase in height of tree.
Merging nodes
Sometimes deletion of keys causes, node to have size less than a minimum threshold. It is not good to have nodes which do not have many items.
Here it makes sense to merge nodes which have underflowed with their sibling.
When a two nodes are merged, the key to right node is removed from parent.
Similar to insert, sometimes these merges might cascade vertically and even in some cases might lead to decrease in height of the tree.
Conclusion
As we can see we have to do a lot of things when we insert. That’s why B+ tree based DBs do not perform well for write heavy applications.
B+ tree based DBs shine in read-heavy applications.