Carleton University - School of Computer Science Honours Project
Winter 2020
Wikifier
Noah Beeney
SCS Honours Project Image
ABSTRACT
The goal of this project is to detect important words in a given body of text and then insert HTML hyperlinks into the text for important words which link to their Wikipedia page. Important words are detected by measuring their relatedness to the overall theme of the article. This is achieved using the notion of Wikipedia paths. The length of a Wikipedia path from one article to another can be defined as the number of hyperlink clicks it takes to navigate from the page of the source article to the page of the target article. A website called Six Degrees of Wikipedia is used to quickly find the shortest path length and the number of shortest paths from one Wikipedia article to another. These metrics are used to predict whether a particular word is deemed important in the given body of text. An HTML file is generated that contains the original body of text with clickable hyperlinks inserted for important words. This can be useful for bloggers and other online writers, however the underlying concept of measuring the relatedness of words using Wikipedia has many other practical and theoretical applications.