Effort Prediction using Machine Learning

Waquar Shamsi
4 min readAug 6, 2021

--

What is Effort Prediction

Effort prediction is a process of forecasting how much effort is required to develop or maintain a software application. This effort is traditionally measured in the hours worked by a person, or the cost needed to be paid.

Contents:

  • What is Effort Prediction
  • Why is Effort Prediction Necessary
  • Non ML Approaches
  • Various Machine Learning based Solutions
  • Advantages of Effort Prediction
  • Sample Standard Datasets
  • How to learn more
  • Acknowledgement
  • References

Why is Effort Prediction Necessary

Effort prediction is crucial in the software development process as it aids the team to ensure a product is developed and delivered on time. It helps in managing resources and fore-look the issues. The issues are more pronounced when the effort prediction is used during the early phases of the software development lifecycle.

Non ML Approaches

  • Lines of Code (LOC)
    It is a metric used to measure the computer program size by counting the number of lines in the text of the program’s source code.
  • Function Points (FP)
    It is another metric used to find the complexity and size of a program. It is calculated by counting the number and types of functions used in the program, called parameters, each parameter individually assessed for complexity. FP can be used to estimate the time and effort required for a project. It is programming language independent unlike Lines of Code(LoC).
  • COCOMO
    COCOMO is one of the most used software estimation models. COCOMO predicts the efforts and schedule of a software product based on the size of the software. Another approach used is an expert estimation. An expert estimates how much effort a project requires as they have deep knowledge about the problem. They use their intuition and previous experience to estimate the project.

Various Machine Learning based Solutions

  • Ensemble Based Approach
  • Better Learning Techniques
  • Data Augmentation

Challenges in Effort Prediction

  • Some techniques, such as LoC(Lines of Code) is programming language-dependent.
  • Lack of historical project data to learn from.
  • The model learned from one project may not work for another project (Cross Project Effort Prediction).
  • No single best model for prediction, performance varies based on the dataset.

Better Learning Techniques

Software Engineering data consists of a large amount of variability, leading to poor fits of Machine Learning models.
To overcome this poor fit, models can be trained on a subset of the dataset, called the local approach. It significantly increases the predictive power of statistical models.
However, while local models can distinguish the significant variables for each local region of the data, the recommendations between different regions can be conflicting. Hence, a global model that takes into account local characteristics of the data, gives the best of both worlds.

Data Preparation and Model Training:
Do a correlation analysis on the features of the dataset to identify highly correlated attributes, like LoC(line of code) etc.
Variance Inflation Factors(VIF) Analysis: measure VIF for all features, and remove features with high VIF values.
To compare the global model with Local Modelling, split the dataset for training and testing for the global model.
For local modelling, after splitting into testing and training sets, apply model-based clustering technique on the training set to generate the clusters.
For each cluster, train a separate model.
hen a test instance comes, identify its cluster and use the respective local model to generate the prediction.

src: Think Locally, Act Globally: Improving Defect and Effort Prediction Models, MSR’12

For a global model that takes into account local characteristics of the data, a state of the art model called Multivariate Adaptive Regression Splines (MARS) is to be used.

Advantages of Effort Prediction

Various studies have proven effort prediction resulting in overall better output. One noticeable scenario is the projects with cost as high priority and incomplete requirements specifications are prone to adjust the work to fit the estimate when it was estimated too optimistically, while too optimistic estimates can lead to effort overruns for projects with quality as high priority and well-specified requirements.

Sample Standard Datasets

  • Nasa93
  • Cocomo81
  • Miyazaki94
  • ISBSG

How to Learn More

Learning is a never-ending process, to gain further knowledge on the application of Machine Learning for Bug Reports Triaging, the following approaches can be taken:

  • Visit Top Software Engineering Conferences such as ICSE, ASE, FSE, TOSEM, TSE, MSR and so on.
  • Read some good survey paper or literature review
  • Check out the reference sections of the research papers.

Acknowledgement

I would like to thank my professor Dr Pankaj Jalote.

References

[1].A Novel Automated Approach for Software Effort Estimation based on Data Augmentation, FSE’18

[2].Think Locally, Act Globally: Improving Defect and Effort Prediction Models, MSR’12

[3].One the Value of Ensemble Effort Estimation, TSE’12

[4]. Class Point Approach for Software Effort Estimation Using Various Support Vector Regression Kernel Methods, ISEC’14

--

--