Executive Summary : | Software is a ubiquitous aspect of modern life, with India's software market estimated at $235 billion. With digitization in the public sector and Digital India initiatives, thousands of software systems are offered by Central and State Governments. However, maintaining software is becoming increasingly complex and time-consuming, especially for legacy systems with insufficient or incomplete documentation. This challenge consumes approximately 80% of development costs in many projects. The proposed research aims to address this issue by focusing on code and non-code artifacts, such as pull requests and issues, and leveraging advances in Artificial Intelligence (AI) and Natural Language Processing (NLP) to find novel approaches and tools to semi-automatically generate software documentation for software maintenance in legacy software systems. This research could significantly reduce the effort involved in documenting and comprehending software systems, both in the public and private sectors. A preliminary study was conducted on 1.38 million software artifacts from 950 public software repositories on GitHub using mixed-methods methods, including surveys, interviews, card sorting, data analysis, NLP, and machine learning. A taxonomy of software documentation was proposed, which the researchers plan to extend and leverage to achieve their research project goals. The research aims to advance scientific knowledge in software engineering and reduce the effort involved in documenting and comprehending software systems in both public and private sectors. |