Ant Colony Optimization with Policy Gradients and Replay

Jardee, William; Sheppard, John W.

doi:10.1145/3712256.3726452

Ant Colony Optimization with Policy Gradients and Replay

dc.contributor.author	Jardee, William
dc.contributor.author	Sheppard, John W.
dc.date.accessioned	2025-12-01T19:18:50Z
dc.date.issued	2025-07
dc.description.abstract	Ant Colony Optimization (ACO) has served as a widely-utilized metaheuristic algorithm for decades for solving combinatorial optimization problems. Since its initial construction, ACO has seen a wide variety of modifications and connections to Reinforcement Learning (RL). Substantial parallels can be seen as early as 1995 with Ant-Q's relationship with Q-learning, through 2022 with ADACO's connection with Policy Gradient. In this work, we describe ACO, more specifically the Stochastic Gradient Descent ACO algorithm (ACOSGD), explicitly as an off-policy Policy Gradient (PG) method. We also incorporate experience replay into several ACO algorithm variants, including AS, MaxMin-ACO, ACOSGD, ADACO, and our two policy gradient-based versions: PGACO and PPOACO, drawing the connection to elitist ACO strategies. We show that our implementation of PG in ACO with experience replay and a baselined reward update strategy applied to eight TSP problems of varying sizes performs competitively with both fundamental ACO and SGD-based ACO versions. We also show that the replay buffer seems to unilaterally improve the performance of ACO algorithms through an ablation study.
dc.identifier.citation	Jardee, W., & Sheppard, J. (2025, July). Ant Colony Optimization with Policy Gradients and Replay. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 240-248).
dc.identifier.doi	10.1145/3712256.3726452
dc.identifier.uri	https://scholarworks.montana.edu/handle/1/19560
dc.language.iso	en_US
dc.publisher	ACM
dc.rights	cc-by
dc.rights.uri	https://creativecommons.org/licenses/by/4.0
dc.subject	Ant Colony Optimization
dc.subject	Ant Algorithms
dc.subject	Metaheuristics
dc.subject	Reinforcement Learning
dc.subject	Replay Buffer
dc.subject	Policy Gradient
dc.title	Ant Colony Optimization with Policy Gradients and Replay
dc.type	Article
mus.citation.extentfirstpage	1
mus.citation.extentlastpage	9
mus.citation.journaltitle	Proceedings of the Genetic and Evolutionary Computation Conference
mus.relation.college	College of Engineering
mus.relation.department	Computer Science
mus.relation.university	Montana State University - Bozeman

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ant-colony-opt-2025.pdf
Size:: 925.46 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 825 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Work - Computer Science