Carleton University - School of Computer Science Honours Project
Winter 2017
Applying Document Clustering to Wikipedia Articles
Matthew Diener
SCS Honours Project Image
ABSTRACT
Methods for document clustering provide insights into how large documents from a large corpus relate to each other. One common approach for document clustering is applying a k-means clustering algorithm to documents which are represented as vectors of tf-idf values. This project applies that approach to a large portion of documents from Wikipedia, and uses the results to demonstrate a realistic way to apply document clustering to build recommender systems. This project also investigates potential methods for analyzing the effectiveness of this clustering method through comparisons with user contribution history and crawl graphs.