Parallelization of Genetic Algorithms for Software Architecture Recovery

Software Architecture Recovery (SAR) techniques analyze dependencies between software modules and automatically cluster them to achieve high modularity. Many of these approaches employ genetic algorithms (GAs) for clustering software modules. A major drawback of these algorithms is their lack of scalability. In this paper, we address this drawback by introducing generic software components that can encapsulate subroutines (operators) of a GA to execute them in parallel. We use these components to implement a novel hybrid GA for SAR that exploits parallelism to find better solutions faster. We compare the effectiveness of parallel algorithms with respect to the sequential counterparts that are previously proposed for SAR. We observe that parallelization enables a greater number of iterations to be performed in the search for high-quality solutions. The increased efficiency achieved through parallel processing allows for faster convergence towards optimal solutions by harnessing the power of multiple processing units in a coordinated manner. The amount of improvement in modularity increases up to 125% particularly in the context of large-scale systems. Our algorithm can scale to recover the architecture of a large system, Chromium, which has more than 18,500 modules and 750,000 dependencies among these modules.