Carleton University - School of Computer Science Honours Project
Winter 2019
Creating a Benchmark Through Parsing and Validating Question-Answer Databases
Gabriel Sinhorin
SCS Honours Project Image
ABSTRACT
The Resource Description Framework, or RDF, is a widely used method of representing data, which represents data using a subject, and object and a predicate connecting the two. This information can span a wide variety of domains, and information can be extracted from RDF using the SPARQL query language. However, it can be difficult for users unfamiliar with SPARQL, or without a base knowledge of the data sets to make use of the stored data. To remedy this, the idea of a Question Answering System is proposed. A Question Answering System, or QA System for short, takes natural language questions, converts them into queries, and uses those queries to get accurate answers from a database. While there is an abundance of QA Systems available, they suffer from a lack of accuracy, due to the data sets used in the creation of these systems being ad-hoc and lacking any explanation to why they were chosen. Because of this, a universal benchmark is proposed. Creating this benchmark is the goal of this project, where data sets from numerous sources are used. These data sets are scanned and updated to remove redundancy and their answers are verified to be correct and up to date using their included SPARQL Queries. The final data sets were compiled into a JSON file. Along with this, a shallow analysis of the included SPARQL queries was done.