Carleton University - School of Computer Science Honours Project
Fall 2020
Developing a Data Crawler to Collect Data for Predicting Developer Expertise
Eliza Moore
SCS Honours Project Image
ABSTRACT
The goal of this project was to contribute to another project (being led by a Master’s student) that is on predicting the expertise topics of software developers based on their activity and contributions to GitHub and Stack Overflow. These predicted topics will then be compared to self-declared topics of expertise that, in this case, will come from the LinkedIn profiles of the software developers used as project subjects. The project will contribute by developing a data crawler to collect the self-declared data by extracting expertise topics from developer LinkedIn profiles and to provide the resulting data. To achieve this, the project was split into two sections: collecting LinkedIn profile URLs from a provided list of users and collecting the expertise data from each user’s profile. The first section was done manually by using Stack Overflow user accounts, included in the provided data for this project, to gather enough information and find the user’s matching GitHub and LinkedIn accounts. The work product from part one was then used as input for part two: the data crawler. Using Selenium, ChromeDriver and Java, a program was created to navigate to, and collect the needed information from, each profile provided in the input. The final output of the data crawler program was a csv file including all relevant original data and newly gathered data that is to be used in the parent project.