Carleton University - School of Computer Science Honours Project
Fall 2020
The Effects of Dimensionality on Nearest Neighbor Search Algorithms
Jacob Boertjes
SCS Honours Project Image
ABSTRACT
Nearest neighbor searches affect our everyday lives in ways that we rarely think about. From social media, to scientific computing, to playlists on your favorite music app, nearest neighbor searches provide the means to quickly return relevant information from large sets of high-dimensionality data. The problem is described simply as searching for the nearest point from a given set, to a chosen query point. A naive algorithm can achieve linear runtime, although some clever tricks allow other algorithms to achieve far superior runtimes, while maintaining the guarantee of an optimal solution. In some cases, a sub-optimal solution is acceptable, and Approximate Nearest Neighbor Search algorithms can be used to obtain approximate solutions more quickly than exact methods. The effects of dimensional size is explored for Linear Search, k-d Trees, Ball Trees, Locality Sensitive Hashing, and Annoy. First, a review of each algorithm is conducted, followed by experimental analysis using datasets with different underlying distributions. The results found show an increase in runtime as the number of dimensions grows, as well as the set's overall size.