×

Personalized optimization with user’s feedback. (English) Zbl 07430178

Summary: This paper develops an online algorithm to solve a time-varying optimization problem with an objective that comprises a known time-varying cost and an unknown function. This problem structure arises in a number of engineering systems and cyber-physical systems where the known function captures time-varying engineering costs, and the unknown function models user’s satisfaction; in this context, the objective is to strike a balance between given performance metrics and user’s satisfaction. Key challenges related to the problem at hand are related to (1) the time variability of the problem, and (2) the fact that learning of the user’s utility function is performed concurrently with the execution of the online algorithm. This paper leverages Gaussian processes (GP) to learn the unknown cost function from noisy functional evaluation and build pertinent upper confidence bounds. Using the GP formalism, the paper then advocates time-varying optimization tools to design an online algorithm that exhibits tracking of the oracle-based optimal trajectory within an error ball, while learning the user’s satisfaction function with no-regret. The algorithmic steps are inexact, to account for possible limited computational budgets or real-time implementation considerations. Numerical examples are illustrated based on a problem related to vehicle control.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
60G15 Gaussian processes
68W27 Online algorithms; streaming algorithms
90C30 Nonlinear programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the international conference on machine learning. Banff, Alberta, Canada.
[2] Agarwal, A., Dekel, O., & Xiao, L. (2010). Optimal algorithms for online convex optimization with multi-point bandit feedback. In Proc. annual conf. on learning theory. Haifa, Israel.
[3] Akbari, M.; Gharesifard, B.; Linder, T., Individual regret bounds for the distributed online alternating direction method of multipliers, IEEE Transactions on Automatic Control, 64, 4, 1746-1752 (2019) · Zbl 1482.90144
[4] Azaïs, J.-M.; Wschebor, M., Level sets and extrema of random processes and fields (2009), John Wiley & Sons. Inc.: John Wiley & Sons. Inc. Hoboken, New Jersey · Zbl 1168.60002
[5] Bae, S., Han, S. M., & Moura, S. (2018). System analysis and optimization of human-actuated dynamical systems. In Proceedings of the American control conference. Milwaukee, WI, USA. (pp. 4539-4545).
[6] Berkenkamp, F., Moriconi, R., Schoellig, A. P., & Krause, A. (2016). Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes. In Proceedings of the 55th conference on decision and control (pp. 4661-4666).
[7] Bernstein, A.; Dall’Anese, E.; Simonetto, A., Online primal-dual methods with measurement feedback for time-varying convex optimization, IEEE Transactions on Signal Processing, 67, 8, 1978-1991 (2019) · Zbl 1458.90508
[8] Besbes, O.; Gur, Y.; Zeevi, A., Non-stationary stochastic optimization (2013), http://arxiv.org/abs/1307.5449
[9] Blum, A.; Jackson, J.; Sandholm, T.; Zinkevich, M., Preference elicitation and query learning, Journal of Machine Learning Research, 5, 649-667 (2004) · Zbl 1222.68092
[10] Bogunovic, I., Scarlett, J., & Cevher, . (2016). Time-varying Gaussian process bandit optimization. In Proceedings of the 19th international conference on artificial intelligence and statistics, PMLR, (Vol. 51) (pp. 314-323).
[11] Bourgin, D. D., Peterson, J. C., Reichman, D., Griffiths, T. L., & Russell, S. J. (2019). Cognitive model priors for predicting human decisions. In Proceedings of the 36th international conference on machine learning. Long Beach, California.
[12] Breazeal, C., & Thomaz, A. L. (2008). Learning from human teachers with socially guided exploration. In Proceedings of the international conference on robotics and automation. Pasadena, CA, USA.
[13] Bubeck, S.; Cesa-Bianchi, N., Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and Trends in Machine Learning, 5, 1, 1-122 (2012) · Zbl 1281.91051
[14] Cao, X.; Liu, K. J.R., Online convex optimization with time-varying constraints and bandit feedback, IEEE Transactions on Automatic Control, 1 (2018)
[15] Chatupromwong, P., & Yokoyama, A. (2012). Optimization of charging sequence of plug-in electric vehicles in smart grid considering user’s satisfaction. In Proceedings of the IEEE International conference on power system technology (pp. 1-6).
[16] Chen, T.; Giannakis, G. B., Bandit convex optimization for scalable and dynamic IoT management, IEEE Internet of Things Journal (2018)
[17] Chu, W., & Ghahramani, Z. (2005). Preference learning with Gaussian processes. In Proceedings of the 22nd international conference on machine learning. Bonn, Germany. (pp. 137-144).
[18] Dall’Anese, E.; Simonetto, A., Optimal power flow pursuit, IEEE Transactions on Smart Grid, 9, 2, 942-952 (2018)
[19] Dall’Anese, E.; Simonetto, A.; Becker, S.; Madden, L., Optimization and learning with information streams: Time-varying algorithms and applications, Signal Processing Magazine, 37, 3, 71-83 (2020)
[20] Deisenroth, M. P., Efficient reinforcement learning using Gaussian processes (2010), Faculty of Informatics, Karlsruhe Institute of Technology, (Ph.D. thesis)
[21] Deisenroth, M.; Fox, D.; Rasmussen, C. E., Gaussian processes for data-efficient learning in robotics and control, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 2, 408-423 (2015)
[22] Dixit, R.; Bedi, A. S.; Tripathi, R.; Rajawat, K., Online learning with inexact proximal online gradient descent algorithms, IEEE Transactions on Signal Processing, 67, 5, 1338-1352 (2019) · Zbl 1414.90282
[23] Duchi, J.; Jordan, M.; Wainwright, M. J.; Wibisono, A., Optimal rates for zero-order convex optimization: The power of two function evaluations, IEEE Transactions on Information Theory, 61, 5, 2788-2806 (2015) · Zbl 1359.90155
[24] El Chamie, M.; Janak, D.; Acikmese, B., Markov decision processes with sequential sensor measurements, Automatica, 103, 450-460 (2019) · Zbl 1421.90159
[25] Epperlein, J. P.; Zhuk, S.; Shorten, R., Recovering Markov models from closed-loop data, Automatica, 103, 116-125 (2019) · Zbl 1415.93243
[26] Fazlyab, M.; Paternain, S.; Preciado, V.; Ribeiro, A., Prediction-correction interior-point method for time-varying convex optimization, IEEE Transactions on Automatic Control, 63, 7 (2018) · Zbl 1423.90274
[27] Flaxman, A., Kalai, A. T., & McMahan, H. (2005). Online convex optimization in the bandit setting: Gradient descent without gradient. In Proceedings of the ACM-SIAM symposium on discrete algorithms. Vancouver, Canada. (pp. 385-394). · Zbl 1297.90117
[28] Ghavamzadeh, M.; Mannor, S.; Pineau, J.; Tamar, A., Bayesian reinforcement learning: A survey, Foundations and Trends(R) in Machine Learning, 8, 5-6, 359-492 (2015) · Zbl 1382.68190
[29] Ghosal, S.; Roy, A., Posterior constistency of Gaussian pprocess prior for nonparametric binary regression, The Annals of Statistics, 34, 5, 2413-2429 (2006) · Zbl 1106.62039
[30] Greenberg, I., The log normal distribution of headways, Australian Road Research, 2, 7 (1966)
[31] Hauswirth, A., Zanardi, A., Bolognani, S., Dörfler, F., & Hug, G. (2017). Online optimization in closed loop on the power flow manifold. In Proceedings of the IEEE powertech conference. Manchester, UK.
[32] Hosseini, S.; Chapman, A.; Mesbahi, M., Online distributed convex optimization on dynamic networks, IEEE Transactions on Automatic Control, 61, 11, 3545-3550 (2016) · Zbl 1359.90094
[33] Houlsby, N.; Hernández-Lobato, J.; Huszár, F.; Ghahramani, Z., Collaborative Gaussian processes for preference learning, Advances in Neural Information Processing Systems, 3, 2096-2104 (2012)
[34] Hours, J.-H.; Jones, C. N., A parametric non-convex decomposition algorithm for real-time and distributed NMPC, IEEE Transactions on Automatic Control, 61, 2, 287-302 (2016) · Zbl 1359.90134
[35] Huber, J.; Wittink, D. R.; Fiedler, J. A.; Miller, R., The effectiveness of alternative preference elicitation procedures in predicting choice, Journal of Marketing Research, 30, 2, 105-114 (1993)
[36] Human Factors behind Autonomous Vehicles - Expert Article, J. (2018), https://www.robsonforensic.com/articles/autonomous-vehicle-human-factors-expert/, 25 April 2018
[37] Jadbabaie, A., Rakhlin, A., Shahrampour, S., & Sridharan, K. Online optimization: Competing with dynamic comparators. In Proceedings of the eighteenth international conference on artificial intelligence and statistics, PMLR. 38, (pp. 398-406).
[38] Jain, A., Nghiem, T. X., Morari, M., & Mangharam, R. (2018). Learning and control using Gaussian processes: Towards bridging machine learning and controls for physical systems. In Proceedings of the 9th ACM/IEEE international conference on cyber-physical systems. Porto, Portugal. (pp. 140-149).
[39] Kahneman, D.; Tversky, A., Prospect theory: An analysis of decision under risk, Econometrica, 47, 2, 263-291 (1979) · Zbl 0411.90012
[40] Karimi, H., Nutini, J., & Schmidt, M. (2016). Linear convergence of gradient and proximal-gradient methods under the Polyak-Lojasiewicz condition. In Proceedings of the european conference of machine learning and knowledge discovery in databases. Riva del Garda, Italy. (pp. 795-811).
[41] Koppel, A.; Paternain, S.; Richard, C.; Ribeiro, A., Decentralized online learning with kernels, IEEE Transactions on Signal Processing, 66, 12, 3240-3255 (2018) · Zbl 1414.90245
[42] Kübler, T. C.; Kasneci, E.; Rosenstiel, W.; Schiefer, U.; Nagel, K.; Papageorgiou, E., Stress-indicators and exploratory gaze for the analysis of hazard perception in patients with visual field loss, Transportation Research Part F: Traffic Psychology and Behaviour, 24, 231-243 (2014)
[43] Lepri, B.; Staiano, J.; Sangokoya, D.; Letouzé, E.; Oliver, N., The tyranny of data? The bright and dark sides of data-driven decision-making for social good, (Transparent Data Mining for Big and Small Data, (Vol. 32) (2017), Springer), 3-24
[44] Levine, S.; Popovic, Z.; Koltun, V., Nonlinear inverse reinforcement learning with Gaussian processes, (Advances in neural information processing systems, (Vol.24) (2011)), 19-27
[45] Linehan, C.; Murphy, G.; Hicks, K.; Gerling, K.; Morrissey, K., Handing over the keys: A qualitative study of the experience of automation in driving, International Journal of Human-Computer Interaction, 35, 18, 1681-1692 (2019)
[46] Liu, S.; Chen, P.-Y.; Kailkhura, B.; Zhang, G.; Hero, A.; Varshney, P. K., A primer on zeroth-order optimization in signal processing and machine learning (2020), ArXiv:2006.06224
[47] Liu, M.; Chowdhary, G.; da Silva, B. C.; Liu, S.; How, J. P., Gaussian processes for learning and control: A tutorial with examples, IEEE Control Systems Magazine, 38, 5, 53-86 (2018) · Zbl 1477.93176
[48] Luo, X., Zhang, Y., & Zavlanos, M. M. (2020). Socially-aware robot planning via bandit human feedback. In 2020 ACM/IEEE 11th international conference on cyber-physical systems (pp. 216-225).
[49] Ma, W.; Gupta, V.; Topcu, U., Distributed charging control of electric vehicles using online learning, IEEE Transactions on Automatic Control, 62, 10, 5289-5295 (2017) · Zbl 1390.93590
[50] McFadden, D.; Train, K., Mixed MNL models for discrete response, Journal of Applied Econometrics, 15, 447-470 (2000)
[51] Monteil, J.; Bouroche, M.; Leith, D. J., \( \mathcal{L}_2\) And \(\mathcal{L}_\infty\) stability analysis of heterogeneous traffic with application to parameter optimization for the control of automated vehicles, IEEE Transactions on Control Systems Technology, 1-16 (2018)
[52] Monteil, J.; Russo, G.; Shorten, R., On \(\mathcal{L}_\infty\) string stability of nonlinear bidirectional asymmetric heterogeneous platoon systems, Automatica, 105, 198-205 (2019) · Zbl 1429.93249
[53] Nedić, A.; Olshevsky, A.; Uribe, C. A., Fast convergence rates for distributed non-Bayesian learning, IEEE Transactions on Automatic Control, 62, 11, 5538-5553 (2017) · Zbl 1458.62116
[54] Nghiem, X. T., & Jones, C. N. (2017). Data-driven demand response modeling and control of buildings with Gaussian Processes. In Proceeding of the American control conference. Seattle, WA, USA.
[55] Oldewurtel, F.; Parisio, A.; Jones, C. N.; Gyalistras, D.; Gwerder, M.; Stauch, P. K., Use of model predictive control and weather forecasts for energy efficient building climate control, Energy and Buildings, 45, 15-27 (2012)
[56] Paternain, S., Morari, M., & Ribeiro, A. (2018). A prediction-correction method for model predictive control. In Proceedings of the American control conference. Milwaukee, WI, USA.
[57] Pentland, A.; Liu, A., Modeling and prediction of human behavior, Neural Computation, 11, 1, 229-242 (1999)
[58] Pinsler, R., Akrour, R., Osa, T., Peters, J., & Neumann, G. (2018). Sample and feedback efficient hierarchical reinforcement learning from human preferences. In 2018 IEEE international conference on robotics and automation (pp. 596-601).
[59] Quercia, D., Schifanella, R., & Aiello, L. M. (2014). The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. In Proceedings of conference on hypertext and social media. Santiago, Chile. (pp. 116-125).
[60] Rasmussen, C. E.; Williams, C. K.I., Gaussian processes for machine learning (2006), The MIT Press: The MIT Press Cambridge, MA, US · Zbl 1177.68165
[61] Roulet, V.; d’Aspremont, A., Sharpness, restart and acceleration, (Advances in Neural Information Processing Systems, (Vol. 30) (2017)), 1119-1129
[62] Seeger, M. W.; Kakade, S. M.; Foster, D. P., Information consistency of nonparametric Gaussian process methods, IEEE Transactions on Information Theory, 54, 5, 2376-2382 (2008) · Zbl 1328.94027
[63] Shahrampour, S.; Jadbabaie, A., Distributed online optimization in dynamic environments using mirror descent, IEEE Transactions on Automatic Control, 63, 3, 714-725 (2018) · Zbl 1390.90125
[64] Shalev-Shwartz, S., Online learning and online convex optimization, Foundations and Trends® in Machine Learning, 4, 2, 107-194 (2012) · Zbl 1253.68190
[65] Simonetto, A.; Dall’Anese, E., Prediction-correction algorithms for time-varying constrained optimization, IEEE Transactions on Signal Processing, 65, 20, 5481-5494 (2017) · Zbl 1414.90376
[66] Simonetto, A.; Dall’Anese, E.; Paternain, S.; Leus, G.; Giannakis, G. B., Time-varying convex optimization: Time-structured algorithms and applications, Proceedings of the IEEE, 108, 11, 2032-2048 (2020)
[67] Slivkins, A., & Upfal, E. (2008). Adapting to a changing environment: the Brownian restless bandits. In Proceedings of the conference on learning theory. Helsinki, Finland. (pp. 343-354).
[68] Solak, E.; Murray-Smith, R.; Leithead, W. E.; Leith, D. J.; Rasmussen, C. E., Derivative observations in Gaussian process models of dynamic systems, (Advances in Neural Information Processing Systems, (Vol. 15) (2003), MIT Press), 1057-1064
[69] Spaulding, W.; Deogun, J., A pathway to personalization of integrated treatment: Informatics and decision science in psychiatric rehabilitation, Schizophrenia Bulletin, 37, 2, 129-137 (2011)
[70] Srinivas, N.; Krause, A.; Kakade, S. M.; Seeger, M. W., Information-theoretic regret bounds for Gaussian process optimization in the bandit setting, IEEE Transactions on Information Theory, 58, 5, 3250-3265 (2012) · Zbl 1365.94131
[71] Stüdli, S.; Seron, M. M.; Middleton, R. H., Vehicular platoons in cyclic interconnections, Automatica, 94, 283-293 (2018) · Zbl 1401.93145
[72] Themelis, A.; Patrinos, P., Douglas-rachford splitting and ADMM for nonconvex optimization: tight convergence results (2018) · Zbl 1434.90158
[73] van der Vaart, A. W.; van Zanten, J. H., Rates of contraction of posterior distributions based on Gaussian process priors, The Annals of Statistics, 36, 3, 1435-1463 (2008) · Zbl 1141.60018
[74] Wang, Y.; Yin, W.; Zeng, J., Global convergence of ADMM in nonconvex nonsmooth optimization, Journal of Scientific Computing (2018)
[75] Weernink, M.; Janus, S.; van Til, J.; Raisch, D.; van Manen, J.; IJzerman, M., A systematic review to identify the use of preference elicitation methods in health care decision making, Pharmaceutical Medicine, 28, 4, 175-185 (2014)
[76] Yang, Y.; Bhattacharya, A.; Pati, D., Frequentist coverage and sup-norm convergence rate in Gaussian process regression (2017), ArXiv:1708.04753
[77] Zhou, X.; Dall’Anese, E.; Chen, L.; Simonetto, A., An incentive-based online optimization framework for distribution grids, IEEE Transactions on Automatic Control, 63, 7 (2018) · Zbl 1423.90038
[78] Zhu, H.; Williams, C. K.I.; Rohwer, R.; Morciniec, M., (Bishop, C., Gaussian regression and optimal finite dimensional linear models; in neural networks and machine learning. Gaussian regression and optimal finite dimensional linear models; in neural networks and machine learning, NATO ASI series, vol. 168 (1998), Springer)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.