Efficient Caching using Bloom Filter
Caching can improve the performance of heavy read applications because it can reduce the reading process from the database. Usually, Redis is used to store the data; but it’s expensive if we overuse it. So, the questions: How to improve the efficiency of the Caching?
The answer to that question is Bloom Filter. Bloom filter is a data structure designed to tell you, quickly and memory-efficiently, whether an element is in a filter. In the Bloom filter, there is the possibility of false positives, but not false negatives. In other words, the returned result is either “probably in the set” or “definitely not in the set”. But, How this is related to the question above, here is a simple illustration:
- For example, in Redis to store 1 million small Keys -> String Value pairs use ~ 85MB of memory.
- Using Bloom filter with 1 million items will require the size is about ~ 1MB (you can simulate in here)
How does Bloom Filter Works?
Bloom filter requires more than a hash function in its algorithm. For example: suppose we want to check the word “demo” into a filter using 2 hash functions and an array of 10 bits long. The steps are as follows:
- Initial the array with the value set to 0.
- Calculate the first hash function as follows:
h1("test") % 10 = 3
h2("test") % 10 = 2
3. Map the hashing value into an array. The hash value becomes an index of the array and sets the value into 1.
4. To check if the word exists in the array, then all the values from the array are 1.
Let us find the word test that does not exist in the filter. First, calculate the hash function as follows:
h1("test") % 10 = 3
h2("test") % 10 = 2
The word “test” has a value of 3 and 2, but in the array; index 3 has a value of 0, so that word does not exist in the filter.
Demo Using Java (SpringBoot)
As a demo, I will create a username checker. If it does not exist in the filter, then I add it into the filter. I will use the Guava library SpringBoot framework.
In the snippet above, I add guava dependency in pom.xml (I use maven).
This is the main controller for the hello rest API. This is straightforward. In the line 17, Bloom filter is initialized. For logic checking starts on line 29–32. Data is stored in memory, for persistence of data guava can be combined using Redis.