Python Bloom过滤器实现

发布时间:2026/6/16 4:24:00

Python Bloom过滤器实现 Bloom过滤器概率集合。可能误判但不会错判不存在。位数组k个哈希函数。不可删除(Counting Bloom可删除)。mmap位数组支持持久化。空间效率高。应用:缓存穿透、URL去重、垃圾邮件过滤。import mmap, hashlib, mathclass BloomFilter:def __init__(self, n, fpr0.01):self.n nself.p fprself.m int(-n * math.log(fpr) / (math.log(2) ** 2))self.k int(self.m / n * math.log(2))self.bit_array bytearray(self.m)def _hashes(self, item):result []for i in range(self.k):h hashlib.md5(f{i}:{item}.encode()).hexdigest()result.append(int(h, 16) % self.m)return resultdef add(self, item):for pos in self._hashes(item):byte_idx pos // 8bit_idx pos % 8self.bit_array[byte_idx] | (1 bit_idx)def __contains__(self, item):for pos in self._hashes(item):byte_idx pos // 8bit_idx pos % 8if not (self.bit_array[byte_idx] (1 bit_idx)):return Falsereturn Truebf BloomFilter(10000)bf.add(test)print(test in bf) # Trueprint(other in bf) # 可能True(误判)或False

相关新闻