Learn to Tour: Operator Design For Feasible Solution Mapping

Published in , 2023

Recommended citation:

This paper revisits a class of traveling salesman problems (TSP), namely, the pickup-and-delivery TSP (PDTSP), which finds the shortest tour along a sequence of one-to-one pickup-and-delivery nodes. In PDTSP, precedence constraints need to be satisfied that each pickup node must be visited before its corresponding delivery node. Classic operations research (OR) algorithms for PDTSP are difficult to scale to large-sized problems. Recently, reinforcement learning (RL) has been applied to TSPs. The basic idea is to explore and evaluate visiting sequences in a solution space. However, this approach is time-consuming for PDTSP, as it has to evaluate many infeasible solutions of which precedence constraints are violated. To restrict solution search within a feasible space, we design learning operators that always map one feasible solution to another, without wasting time exploring the infeasible solution space. Such operators are evaluated and selected as policies to solve PDTSPs in an RL framework. We implement our method on different problem sizes in comparison to baselines, including classic OR algorithms and existing learning methods. Results show that our approach can find tours shorter than baselines. Codes and data are available at: L2T

Download paper here