Crowdsourced bug triagingBug triaging and assignment is a time-consuming task in big projects. Most research in this area examines the developers' prior development and bug-fixing activities in order to recognize their areas of expertise and assign to them relevant bug fixes. We propose a novel method that exploits a new source of evidence for the developers' expertise, namely their contributions to Q&A platforms such as Stack Overflow. We evaluated this method in the context of the 20 largest GitHub projects, considering 7144 bug reports. Our results demonstrate that our method exhibits superior accuracy to other state-of-the-art methods, and that future bug-assignment algorithms should consider exploring other sources of expertise, beyond the project's version-control system and bug tracker. University of Alberta | Publication | 2015-01-01 | Ali Sajedi, A Hindle, Eleni Stroulia |
Crowdsourced bug triaging: Leveraging Q&A platforms for bug assignmentBug triaging, i.e., assigning a bug report to the “best” person to address it, involves identifying a list of developers that are qualified to understand and address the bug report, and then ranking them according to their expertise. Most research in this area examines the description of the bug report and the developers’ prior development and bug-fixing activities. In this paper, we propose a novel method that exploits a new source of evidence for the developers’ expertise, namely their contributions in Stack Overflow, the popular software Question and Answer (Q&A) platform. The key intuition of our method is that the questions a developer asks and answers in Stack Overflow, or more generally in software Q&A platforms, can potentially be an excellent indicator of his/her expertise. Motivated by this idea, our method uses the bug-report description as a guide for selecting relevant Stack Overflow contributions on the basis of which to identify developers with the necessary expertise to close the bug under examination. We evaluated this method in the context of the 20 largest GitHub projects, considering 7144 bug reports. Our results demonstrate that our method exhibits superior accuracy to other state-of-the-art methods. University of Alberta | Publication | 2016-01-01 | Ali Sajedi, A Hindle, Eleni Stroulia |
GitHub's Big Data Adaptor: An Eclipse PluginThe data of GitHub, the most popular code-sharing platform, fits the characteristics of "big data" (Volume, Variety and Velocity). To facilitate studies on this huge GitHub data volume, the GHTorrent web-site publishes a MYSQL dump of (some) GitHub data quarterly. Unfortunately, developers using these published data dumps face challenges with respect to the time required to parse and ingest the data, the space required to store it, and the latency of their queries. To help address these challenges, we developed a data adaptor as an Eclipse plugin, which efficiently handles this dump. The plugin offers an interactive interface through which users can explore and select any field in any table. After extracting the data selected by the user, the parser exports it in easy-to-use spreadsheets. We hope that using this plugin will facilitate further studies on the GitHub data as a whole. University of Alberta | Publication | 2015-01-01 | Ali Sajedi, Vraj Shah, Eleni Stroulia |
Involvement, Contribution and Influence in GitHub and Stack OverflowSoftware developers are increasingly adopting social-media platforms to contribute to software development, learn and develop a reputation for themselves. GitHub supports version-controlled code sharing and social-networking functionalities and Stack Overflow is a social forum for question answering on programming topics. Motivated by the features' overlap of the two networks, we set out to mine and analyze and correlate the members' core contributions, editorial activities and influence in the two networks. We aim to better understand the similarities and differences of the members' contributions in the two platforms and their evolution over time. In this context, while studying the activities of different user groups, we conducted a three-step investigation of GitHub activity, Stack Overflow activity and inter-network activity over a five-year period. We report our findings on interesting membership and activity patterns within each platform and some relations between the two. University of Alberta | Publication | 2014-01-01 | Ali Sajedi, Afsaneh Esteki, Ameneh Gholipour, Abram Hindle, Eleni Stroulia |
Measuring User Influence in GitHub: The Million Follower FallacyInfluence in social networks has been extensively studied for collaborative-filtering recommendations and marketing purposes. We are interested in the notion of influence in Software Social Networks (SSNs); more specifically, we want to answer the following questions: 1) What does "influence" mean in SSNs? Given the variety of types of interactions supported in these networks and the abundance of centrality-type metrics, what is the nature of the influence captured by these matrics? 2) Are there silos of influence in these platforms or does influence span across thematic boundaries? To investigate these two questions, we first conducted an in-depth comparison of three influence metrics, number of followers, number of forked projects, and number of project watchers in GitHub¹ (the largest code-sharing and version-control system). Next, we examined how the influence of the most influential software-engineering people in GitHub is spread over different programming languages. Our results indicate (a) that the three influence metrics capture two major characteristics: popularity and content value (code reusability) and (b) that the influence of influentials is spread over more than one programming language, but there is no specific trend toward any two programming languages. University of Alberta | Publication | 2016-01-01 | Ali Sajedi, Eleni Stroulia |
Realistic Bug TriagingThe task of assigning a bug report to the developer "best" able to address it involves identifying a list of developers qualified to understand and address the bug report and ranking them according to their expertise. Most research in this area addresses this task by matching the description of the bug report and the developers' prior development and bug-fixing activities.
This thesis puts forward a more realistic formulation of the bug-assignment task. First, we develop a novel model of the developers' expertise, taking into account relevant evidence from their code-development contributions, as well as their contributions to relevant Question-and-Answer (Q&A) platforms. Second, we adopt an economics perspective to the task, and we propose to generalize the task from "assigning one bug to the best developer" to "cost-effectively assigning multiple pending bugs to a set of qualified and available developers." In this paper, we report on our early results on the value of broadening the notion of developer's expertise to take into account evidence from Q&A platforms. University of Alberta | Publication | 2016-01-01 | Ali Sajedi |