Honours Project Title:
Applying Document Clustering to Wikipedia Articles
Methods for document clustering provide insights into how large documents from a large corpus relate to each other. One common approach for document clustering is applying a k-means clustering algorithm to documents which are represented as vectors of tf-idf values. This project applies that approach to a large portion of documents from Wikipedia, and uses the results to demonstrate a realistic way to apply document clustering to build recommender systems. This project also investigates potential methods for analyzing the effectiveness of this clustering method through comparisons with user contribution history and crawl graphs.
HonoursReport.pdf is simply the report and honours-final.zip contains all files that were used/created as part of the project.