Ant Colony Optimization with Policy Gradients and Replay
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ACM
Abstract
Ant Colony Optimization (ACO) has served as a widely-utilized metaheuristic algorithm for decades for solving combinatorial optimization problems. Since its initial construction, ACO has seen a wide variety of modifications and connections to Reinforcement Learning (RL). Substantial parallels can be seen as early as 1995 with Ant-Q's relationship with Q-learning, through 2022 with ADACO's connection with Policy Gradient. In this work, we describe ACO, more specifically the Stochastic Gradient Descent ACO algorithm (ACOSGD), explicitly as an off-policy Policy Gradient (PG) method. We also incorporate experience replay into several ACO algorithm variants, including AS, MaxMin-ACO, ACOSGD, ADACO, and our two policy gradient-based versions: PGACO and PPOACO, drawing the connection to elitist ACO strategies. We show that our implementation of PG in ACO with experience replay and a baselined reward update strategy applied to eight TSP problems of varying sizes performs competitively with both fundamental ACO and SGD-based ACO versions. We also show that the replay buffer seems to unilaterally improve the performance of ACO algorithms through an ablation study.
Description
Citation
Jardee, W., & Sheppard, J. (2025, July). Ant Colony Optimization with Policy Gradients and Replay. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 240-248).
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as cc-by

