Approximate dynamic programming-based approaches for input–output data-driven control of nonlinear processes.

*(English)*Zbl 1092.93011Summary: We propose two approximate dynamic programming (ADP)-based strategies for control of nonlinear processes using input-output data. In the first strategy, which we term ‘\(J\)-learning,’ one builds an empirical nonlinear model using closed-loop test data and performs dynamic programming with it to derive an improved control policy. In the second strategy, called ‘\(Q\)-learning,’ one tries to learn an improved control policy in a model-less manner. Compared to the conventional model predictive control approach, the new approach offers some practical advantages in using nonlinear empirical models for process control. Besides the potential reduction in the on-line computational burden, it offers a convenient way to control the degree of model extrapolation in the calculation of optimal control moves. One major difficulty associated with using an empirical model within the multi-step predictive control setting is that the model can be excessively extrapolated into regions of the state space where identification data were scarce or nonexistent, leading to performances far worse than predicted by the model. Within the proposed ADP-based strategies, this problem is handled by imposing a penalty term designed on the basis of local data distribution. A CSTR example is provided to illustrate the proposed approaches.

##### MSC:

93B30 | System identification |

90C39 | Dynamic programming |

49L20 | Dynamic programming in optimal control and differential games |

93C55 | Discrete-time control/observation systems |

##### Keywords:

Nonlinear model identification; Nonlinear model predictive control; Approximate dynamic programming; NARX model; Reinforcement learning; \(Q\)-learning
PDF
BibTeX
XML
Cite

\textit{J. M. Lee} and \textit{J. H. Lee}, Automatica 41, No. 7, 1281--1288 (2005; Zbl 1092.93011)

Full Text:
DOI

**OpenURL**

##### References:

[1] | Bellman, R.E., Dynamic programming, (1957), Princeton University Press Princeton, New Jersey |

[2] | Bertsekas, D.P.; Tsitsiklis, J.N., Neuro-dynamic programming, (1996), Athena Scientific Belmont, MA · Zbl 0924.68163 |

[3] | Demuth, H.; Beale, M., Neural network toolbox User’s guide (MATLAB), (2002), The MathWorks Inc Natick, MA |

[4] | Hernández, E.; Arkun, Y., Control of nonlinear systems using polynomial ARMA models, Aiche journal, 39, 3, 446-460, (1993) |

[5] | Kaisare, N. S., Lee, J. M., & Lee, J. H. (2002). Comparison of policy iteration, value iteration and temporal difference learning. In AIChE Annual Meeting, Indianapolis, IN. |

[6] | Lee, J. M. (2004). A study on architecture, algorithms, and applications of approximate dynamic programming-based approach to optimal control. Ph.D. thesis, Georgia Institute of Technology. |

[7] | Lee, J.M.; Lee, J.H., Simulation-based learning of cost-to-go for control of nonlinear processes, Korean journal of chemical engineering, 21, 2, 338-344, (2004) |

[8] | Lee, J.H.; Ricker, N.L., Extended Kalman filter based nonlinear model predictive control, Industrial and engineering chemistry research, 33, 1530-1541, (1994) |

[9] | Leonard, J.A.; Kramer, M.A.; Ungar, L.H., A neural network architecture that computes its own reliability, Computers and chemical engineering, 16, 819-835, (1992) |

[10] | Morari, M.; Lee, J.H., Model predictive control: past, present and future, Computers and chemical engineering, 23, 667-682, (1999) |

[11] | Parzen, E., On estimation of a probability density function and mode, Annals of mathematical statistics, 33, 1065-1076, (1962) · Zbl 0116.11302 |

[12] | Sjöberg, J.; Zhang, Q.; Ljung, L.; Benveniste, A.; Delyon, B.; Glorennec, P.-Y.; Hjalmarsson, H.; Juditsky, A., Nonlinear black-box modeling in system identification: A unified overview, Automatica, 31, 12, 1691-1724, (1995) · Zbl 0846.93018 |

[13] | Smart, W. D., & Kaelbling, L. P. (2000). Practical reinforcement learning in continuous spaces. In Proceedings of the 17th international conference on machine learning (pp. 903-910). San Francisco, CA: Morgan Kaufmann. |

[14] | Su, H.T.; McAvoy, T.J., Integration of multilayer perceptron networks and linear dynamic models: A Hammerstein modelling approach, Industrial and engineering chemistry research, 32, 1927-1936, (1993) |

[15] | Sutton, R.S.; Barto, A.G., Reinforcement learning: an introduction, (1998), MIT Press Cambridge, MA |

[16] | Takeda, M.; Nakamura, T.; Imai, M.; Ogasawara, T.; Asada, M., Enhanced continuous valued \(Q\)-learning for real autonomous robots, Advanced robotics, 14, 5, 439-442, (2000) |

[17] | Tsai, P.-F.; Chu, J.-Z.; Jang, S.-S.; Shieh, S.-S., Developing a robust model predictive control architecture through regional knowledge analysis of artificial neural networks, Journal of process control, 13, 423-435, (2002) |

[18] | Watkins, C.J.C.H.; Dayan, P., \(Q\)-learning, Machine learning, 8, 279-292, (1992) · Zbl 0773.68062 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.