Tianpei Xia

I'm currently working as a software engineer at NewsBreak, a leading intelligent app focus on local news. Before that, I was a NSF funded Ph.D. candidate in Department of Computer Science at North Carolina State University under the supervision of Dr. Tim Menzies. I joined the Real-world Artificial Intelligence for Software Engineering (RAISE) lab in 2017.

Before coming to NC State, I earned my M.S. degree of Computer Science from The University of Texas at Dallas in 2015, where I also finished courseworks of Electrical and Computer Engineering (mainly focusing on VLSI Circuit Design and Computer Architecture). I did my undergrad at Nanjing University of Posts and Telecommunications, China, where I got my B.S. degree of Microelectronics in 2013.

Email /  Resume  /  LinkedIn


My research interests include using data mining and artificial intelligence methods to solve real-world problems in software engineering field. Such as exploring new techniques in search-based optimization to improve the performance of current SE predicting tasks (like effort estimation, text mining, software health, etc.). I believe these software engineering tasks do not always need to be hard (but it may not be easy to find the easy ways to do it) and enjoy finding the path to make them better and better.

NSF Funded: Search-based Software Engineering Research
  • Evolutionary Algorithms for Hyperparameter Optimization: Developed a hyperparameter optimization framework called OIL (Optimized Inductive Learning), which applied evolutionary algorithms (e.g. Differential Evolution and NSGA-II) to supercharge software analytic tasks. OIL was tested on a wide range of optimizers with 945 projects data. Experimental results show that OIL improved the performance of effort estimation in terms of accuracy (won 16 out of 18 cases) and efficiency (reduced runtime from days to hours), respectively.
  • Sequential Model Optimization for Software Effort Estimation: Applied and developed a sequential model based method (a.k.a active learning method) named ``FLASH'' for the first time in software effort estimation domain. With the constraints of specific computation costs, FLASH can efficiently find good configurations of machine learning methods (e.g. CART) for effort estimations. Overall it can improve the performance of software effort estimation tasks by 11% on average in terms of accuracy.
  • Effort Estimation for Contemporary Projects on GitHub: Proposed and developed a predicting method for project health prediction. A group of health indicators is defined based on project developing process and industrial domain knowledge. In the study, 78,455 months of data from 1,628 GitHub projects are collected, according to the preliminary results, the process action on project level can be predicted to a high level of accuracy (%10 error rate) with hyperparameter tuning on predicting methods.
  • System migration for educational computer programming application
  • Educational Application Platform Migration: Helped to migrate an educational purpose programming game, BOTS, from its original developing platform ``Unity 4'' to ``Unity 5'' by using JavaScript. After migration, more potential features are enabled for the game's future extension and development. BOTS is a serious puzzle game designed to teach programming fundamentals for novice computer users.
  • Satellite images change detection by using Gaussian Mixture Model
  • Image Change Detection: Applied and developed a Gaussian mixture model to identify landscape changes using high resolution satellite images. This grid-based method has competitive performance in Bi-temporal change detection. Given two very high resolution satellite images from the same landscape area, it can achieve similar performance as humans in terms of landscape change detection.
  • Publications
    Sequential Model Optimization for Software Effort Estimation
    Tianpei Xia, Rui Shu, Xipeng Shen, Tim Menzies
    Transactions on Software Engineering (TSE), 2020

    Many methods have been proposed to estimate how much effort is required to build and maintain software. Much of that research tries to recommend a single method – an approach that makes the dubious assumption that one method can handle the diversity of software project data. To address this drawback, we apply a configuration technique called “ROME” (Rapid Optimizing Methods for Estimation), which uses sequential model-based optimization (SMO) to find what configuration settings of effort estimation techniques work best for a particular data set.

    Predicting Health Indicators for Open Source Projects (using Hyperparameter Optimization)
    Tianpei Xia, Wei Fu, Rui Shu, Rishabh Agrawal, Tim Menzies
    Empirical Software Engineering (EMSE), 2022

    Software developed on public platforms are a source of data that can be used to make predictions about those projects. While the activity of a single developer may be random and hard to predict, when large groups of developers work together on software projects, the resulting behavior can be predicted with good accuracy. In this study, we use 78,455 months of data from 1,628 GitHub projects to make various predictions about the current status of those projects. The predicting error rate can be greatly reduced using DECART hyperparameter optimization.

    How to Better Distinguish Security Bug Reports (using Dual Hyperparameter Optimization)
    Rui Shu, Tianpei Xia, Jianfeng Chen, Laurie Williams, Tim Menzies
    Empirical Software Engineering (EMSE), 2021

    In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce targets require extensive optimization effort. Our approach can quickly optimize models that achieve better recalls than the prior state-of-the-art. These increases in recall are associated with moderate increases in false positive rates.

    Omni: Automated Ensemble with Unexpected Models against Adversarial Evasion Attack
    Rui Shu, Tianpei Xia, Laurie Williams, Tim Menzies
    Empirical Software Engineering (EMSE), 2022

    Machine learning-based security detection models have become prevalent in modern malware and intrusion detection systems. However, such models are susceptible to adversarial evasion attacks. In this type of attack, inputs are specially crafted by intelligent malicious adversaries, with the aim of being misclassified by existing state-ofthe-art models. Our methods can help security practitioners and researchers build a more robust model against adversarial evasion attack through the use of ensemble learningwith hyperparameter optimization.

    Professional Services
    Reviewer: Transactions on Software Engineering

    Reviewer: Empirical Software Engineering

    Reviewer: Information and Software Technology

    TA Experiences
    CSC226 Discrete Mathematics for Computer Scientists

    CSC236 Concepts and Facilities of Operating Systems for Computer Scientists

    CSC495 Special Topics in Computer Science: Programming Languages & Modeling

    CSC540 Database Management Concepts and Systems