Honours Project: 2020 Fall ant | www.scs.carleton.ca

Carleton University - School of Computer Science Honours Project

Fall 2020

The Effects of Dimensionality on Nearest Neighbor Search Algorithms

Jacob Boertjes

ABSTRACT

Nearest neighbor searches affect our everyday lives in ways that we rarely think about. From social media, to scientific computing, to playlists on your favorite music app, nearest neighbor searches provide the means to quickly return relevant information from large sets of high-dimensionality data. The problem is described simply as searching for the nearest point from a given set, to a chosen query point. A naive algorithm can achieve linear runtime, although some clever tricks allow other algorithms to achieve far superior runtimes, while maintaining the guarantee of an optimal solution. In some cases, a sub-optimal solution is acceptable, and Approximate Nearest Neighbor Search algorithms can be used to obtain approximate solutions more quickly than exact methods. The effects of dimensional size is explored for Linear Search, k-d Trees, Ball Trees, Locality Sensitive Hashing, and Annoy. First, a review of each algorithm is conducted, followed by experimental analysis using datasets with different underlying distributions. The results found show an increase in runtime as the number of dimensions grows, as well as the set's overall size.