The invention provides a 
mobile robot path planning 
algorithm based on single-chain sequential 
backtracking Q-learning. According to the 
mobile robot path planning 
algorithm based on the single-chain sequential 
backtracking Q-learning, a two-dimensional environment is expressed by using a grid method, each environment area block corresponds to a discrete location, the state of a 
mobile robot at some moment is expressed by an environment location where the 
robot is located, the search of each step of the mobile 
robot is based on a Q-learning iterative formula of a non-deterministic Markov 
decision process, progressively sequential 
backtracking is carried out from the Q value of the 
tail end of a 
single chain, namely the current state, to the Q value of the head end of the 
single chain until a target state is reached, the mobile 
robot cyclically and repeatedly finds out paths to the target state from an original state, the search of each step is carried out according to the steps, and Q values of states are continuously iterated and optimized until the Q values are converged. The mobile 
robot path planning 
algorithm based on the single-chain sequential backtracking Q-learning has the advantages that the number of steps required for optimal path searching is far less than that of a classic Q-learning algorithm and a Q(
lambda) algorithm, the learning time is shorter, and the learning efficiency is higher; and particularly for large environments, the mobile 
robot path planning algorithm based on the single-chain sequential backtracking Q-learning has more obvious advantages.