BitMatcher: Bit-level Counter Adjustment for Sketches (IEEE ICDE, 2024)
|
Sketch has been widely used in the field of large-scale data stream processing. However, common fixed-counter algorithms such as Count-Min Sketch have to allocate larger counters, which wastes a lot of memory due to the high skewness of real-world data streams. To reduce memory usage, we propose to dynamically adjust the counter size that matches the distribution of the data stream. We introduce BitMatcher, a fast global-adjusting algorithm that automatically adjusts the counter to the optimal size to match the data stream. During stream processing, BitMatcher identifies items hashed into a bucket based on isolated fingerprints. If it overflows, BitMatcher changes the flag bits in the bucket and dynamically increases or shrinks the size of some counters in a fine-grained manner. BitMatcher can also relocate a cold item in the bucket with the idea of cuckoo hashing to preserve the potential hot item while achieving global load balancing. Through the above way of dealing with overflow caused by skewed data, BitMatcher precisely manipulates allocated bits and maximizes the use of memory. The experiments show that BitMatcher has high throughput and can outperform SOTA by up to 4 orders of magnitude in terms of accuracy. We also deployed BitMatcher on several platforms, showing its good software and hardware scalability.
2024.6.1: The conference slides of BitMatcher has been uploaded.
2023.12.1: BitMatcher has been accepted by IEEE ICDE 2024.
2023.6.1: The source code of BitMatcher has also been uploaded at GitHub (https://github.com/wenjunpaper/BitMatcher).
2023.6.1: The website of BitMatcher is under construction and will be finished soon.
E-mail: wenjunli@pku.org.cn, wenjun0816@qq.com.