Carleton University - School of Computer Science Honours Project
Winter 2020
Profiling Stack Overflow users based on badge collection
Alexei Tipenko
SCS Honours Project Image
ABSTRACT
Stack Overflow is undoubtedly the most popular Q/A platform for software developers. Because of its status within the tech community, the platform contains a lot of interesting and relevant information that can be used within the field of mining software repositories. A lot of pertinent analysis has been done with the help of some Stack Overflow public datasets, yielding research topics relating to expert recommendation, sentiment analysis, reputation building, and user post analysis. But there is one promising area of research that can be further expanded on – Badges. Badges represent a variety of engagement activities on the platform such as creating posts, answering questions, making comments, flagging, voting, and editing. The broad scope of this data can highlight important measures of user participation that are more varied than a reputation score alone. By looking at important statistics around badges collected, a number of clustering methods can be used to group users by their badge achievements. Association rules can also be used to highlight the activities of highly active users. This thesis explores user badge activity on Stack Overflow through exploratory data analysis, clustering methods and association rules to get a better idea of who these users are, how they tend to behave on the platform and what improvements can be made to encourage more user activity.