Carleton University - School of Computer Science Honours Project
Fall 2018
Bloom Filters in Estimating Set Differences
William Dullemond
SCS Honours Project Image
ABSTRACT
With the rise of data collection and decentralized systems the need to understand and synchronize data across multiple systems is more present then ever. The purpose of this paper is to show how Bloom Filters can be utilized for finding the difference between two remote sets with minimum communication. In systems with large sets of data or small constrained systems, set reconciliation can require a lot of costly communication. Therefore an estimation of set differences can allow systems to make informed decisions about reconciliation. By using Bloom Filters communication would be lessened allowing for estimations at a reduced communication cost. This paper examines two approaches to estimating set differences. Method one is based on examining the bitwise subtraction of Bloom Filters to estimating the number of different elements. This Method is fast but is has larger error bars. Method two is based on testing each element of a set against the Bloom Filter of the opposing set. This method is slower and requires larger Bloom Filters but is much more reliable. Both methods provide a means of estimating set differences with different strengths and weaknesses.