Confidentiality: 
Not Confidential
Firstname: 
Matthew
Lastname: 
Diener
Faculty: 
Tony White
Term: 
Winter
Year: 
2017
Honours Project Title: 
Applying Document Clustering to Wikipedia Articles
Abstract: 
Methods for document clustering provide insights into how large documents from a large corpus relate to each other. One common approach for document clustering is applying a k-means clustering algorithm to documents which are represented as vectors of tf-idf values. This project applies that approach to a large portion of documents from Wikipedia, and uses the results to demonstrate a realistic way to apply document clustering to build recommender systems. This project also investigates potential methods for analyzing the effectiveness of this clustering method through comparisons with user contribution history and crawl graphs.
Cover Image: 
Upload description: 
HonoursReport.pdf is simply the report and honours-final.zip contains all files that were used/created as part of the project.