Ant Colony Optimization with Policy Gradients and Replay
| dc.contributor.author | Jardee, William | |
| dc.contributor.author | Sheppard, John W. | |
| dc.date.accessioned | 2025-12-01T19:18:50Z | |
| dc.date.issued | 2025-07 | |
| dc.description.abstract | Ant Colony Optimization (ACO) has served as a widely-utilized metaheuristic algorithm for decades for solving combinatorial optimization problems. Since its initial construction, ACO has seen a wide variety of modifications and connections to Reinforcement Learning (RL). Substantial parallels can be seen as early as 1995 with Ant-Q's relationship with Q-learning, through 2022 with ADACO's connection with Policy Gradient. In this work, we describe ACO, more specifically the Stochastic Gradient Descent ACO algorithm (ACOSGD), explicitly as an off-policy Policy Gradient (PG) method. We also incorporate experience replay into several ACO algorithm variants, including AS, MaxMin-ACO, ACOSGD, ADACO, and our two policy gradient-based versions: PGACO and PPOACO, drawing the connection to elitist ACO strategies. We show that our implementation of PG in ACO with experience replay and a baselined reward update strategy applied to eight TSP problems of varying sizes performs competitively with both fundamental ACO and SGD-based ACO versions. We also show that the replay buffer seems to unilaterally improve the performance of ACO algorithms through an ablation study. | |
| dc.identifier.citation | Jardee, W., & Sheppard, J. (2025, July). Ant Colony Optimization with Policy Gradients and Replay. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 240-248). | |
| dc.identifier.doi | 10.1145/3712256.3726452 | |
| dc.identifier.uri | https://scholarworks.montana.edu/handle/1/19560 | |
| dc.language.iso | en_US | |
| dc.publisher | ACM | |
| dc.rights | cc-by | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0 | |
| dc.subject | Ant Colony Optimization | |
| dc.subject | Ant Algorithms | |
| dc.subject | Metaheuristics | |
| dc.subject | Reinforcement Learning | |
| dc.subject | Replay Buffer | |
| dc.subject | Policy Gradient | |
| dc.title | Ant Colony Optimization with Policy Gradients and Replay | |
| dc.type | Article | |
| mus.citation.extentfirstpage | 1 | |
| mus.citation.extentlastpage | 9 | |
| mus.citation.journaltitle | Proceedings of the Genetic and Evolutionary Computation Conference | |
| mus.relation.college | College of Engineering | |
| mus.relation.department | Computer Science | |
| mus.relation.university | Montana State University - Bozeman |