Reinforcement Learning for Combinatorial Optimization Problems: Application to Scheduling Problems

Description

The objective of this research is to leverage machine learning capabilities to address challenges in Combinatorial Optimization Problems (COPs). COPs have been the center of interest for researchers and practitioners because of their wide applicability and significant impact across research

The objective of this research is to leverage machine learning capabilities to address challenges in Combinatorial Optimization Problems (COPs). COPs have been the center of interest for researchers and practitioners because of their wide applicability and significant impact across research domains and industries. Traditional approaches such as mathematical optimization and meta-heuristics often struggle with scalability issues when addressing COPs.
Furthermore, flexibility can be an issue, as small changes in the problem setting may necessitate a complete redevelopment of the solution approach. Here, attention-based machine learning methods are designed to address COPs, such as the representative Job Shop Scheduling Problem. Contributions are methods which integrate a policy gradient reinforcement learning approach with a modified transformer architecture. Importantly, the proposed model is trained fast with small problem instances and shown to provide high quality solutions to scaled problem instances. Empirical evidence demonstrates that the proposed approach surpasses the results of recent learning-based approaches and outperforms commonly implemented heuristic rules. Additional research contributions extend the model to leverage the pre-training during the inference state to further improve performance. The final research contribution is the modified machine learning architecture with an attention mechanism, which can be applied to different combinatorial optimization (COP) domains, such as the assignment problem, with minimal refinement. Furthermore, the designed framework is applied to an assignment problem. A more realistic model for a weapon target assignment problem (WTAP) is designed and experiments show that the model can be trained efficiently on small-scale problem instances while also scaling to solve larger problems. The pre-trained model outperforms a heuristic approach. Further enhancements designed for inference using beam search and rollout resulted in substantial improvements in solution quality.

Downloads

Public access restricted until 2026-05-01.

Details

Contributors
Date Created
2025
Embargo Release Date
Language
  • en
Note
  • Partial requirement for: Ph.D., Arizona State University, 2025
  • Field of study: Industrial Engineering
Additional Information
English
Extent
  • 122 pages
Open Access
Peer-reviewed