Villarreal, Ruben; Vlassis, Nikolaos N.; Phan, Nhon N.; Catanach, Tommie A.; Jones, Reese E.; Trask, Nathaniel A.; Kramer, Sharlotte L. B.; Sun, WaiChing Design of experiments for the calibration of history-dependent models via deep reinforcement learning and an enhanced Kalman filter. (English) Zbl 07691253 Comput. Mech. 72, No. 1, 95-124 (2023). MSC: 74-XX PDF BibTeX XML Cite \textit{R. Villarreal} et al., Comput. Mech. 72, No. 1, 95--124 (2023; Zbl 07691253) Full Text: DOI arXiv OpenURL
Cho, Hunyong; Holloway, Shannon T.; Couper, David J.; Kosorok, Michael R. Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring. (English) Zbl 07689563 Biometrika 110, No. 2, 395-410 (2023). MSC: 62-XX PDF BibTeX XML Cite \textit{H. Cho} et al., Biometrika 110, No. 2, 395--410 (2023; Zbl 07689563) Full Text: DOI arXiv OpenURL
Zhu, Yuhua; Ying, Lexing Variational actor-critic algorithms. (English) Zbl 07683230 ESAIM, Control Optim. Calc. Var. 29, Paper No. 20, 26 p. (2023). MSC: 90C40 93E20 PDF BibTeX XML Cite \textit{Y. Zhu} and \textit{L. Ying}, ESAIM, Control Optim. Calc. Var. 29, Paper No. 20, 26 p. (2023; Zbl 07683230) Full Text: DOI arXiv OpenURL
Crimaldi, Irene; Louis, Pierre-Yves; Minelli, Ida G. Interacting nonlinear reinforced stochastic processes: synchronization or non-synchronization. (English) Zbl 07682781 Adv. Appl. Probab. 55, No. 1, 275-320 (2023). MSC: 60K35 62L20 91D30 60F15 62P20 62P25 PDF BibTeX XML Cite \textit{I. Crimaldi} et al., Adv. Appl. Probab. 55, No. 1, 275--320 (2023; Zbl 07682781) Full Text: DOI arXiv OpenURL
Dellnitz, Michael; Hüllermeier, Eyke; Lücke, Marvin; Ober-Blöbaum, Sina; Offen, Christian; Peitz, Sebastian; Pfannschmidt, Karlson Efficient time-stepping for numerical integration using reinforcement learning. (English) Zbl 07682246 SIAM J. Sci. Comput. 45, No. 2, A579-A595 (2023). MSC: 34A12 34A38 65D32 65L05 65L06 68T05 PDF BibTeX XML Cite \textit{M. Dellnitz} et al., SIAM J. Sci. Comput. 45, No. 2, A579--A595 (2023; Zbl 07682246) Full Text: DOI arXiv OpenURL
Olshevsky, Alex; Gharesifard, Bahman A small gain analysis of single timescale actor critic. (English) Zbl 07681925 SIAM J. Control Optim. 61, No. 2, 980-1007 (2023). MSC: 68Txx 90C26 PDF BibTeX XML Cite \textit{A. Olshevsky} and \textit{B. Gharesifard}, SIAM J. Control Optim. 61, No. 2, 980--1007 (2023; Zbl 07681925) Full Text: DOI arXiv OpenURL
Guo, Xin; Hu, Anran; Zhang, Yufei Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls. (English) Zbl 07681917 SIAM J. Control Optim. 61, No. 2, 755-787 (2023). MSC: 93D15 68T05 PDF BibTeX XML Cite \textit{X. Guo} et al., SIAM J. Control Optim. 61, No. 2, 755--787 (2023; Zbl 07681917) Full Text: DOI arXiv OpenURL
Bhola, Sahil; Pawar, Suraj; Balaprakash, Prasanna; Maulik, Romit Multi-fidelity reinforcement learning framework for shape optimization. (English) Zbl 07679166 J. Comput. Phys. 482, Article ID 112018, 11 p. (2023). MSC: 68Txx 76Dxx 76-XX PDF BibTeX XML Cite \textit{S. Bhola} et al., J. Comput. Phys. 482, Article ID 112018, 11 p. (2023; Zbl 07679166) Full Text: DOI arXiv OpenURL
Darendeliler, Alp; Claeys, Dieter; Aghezzaf, El-Houssaine Integrated condition-based maintenance and multi-item lot-sizing with stochastic demand. (English) Zbl 07677917 J. Ind. Manag. Optim. 19, No. 9, 6908-6947 (2023). MSC: 58F15 58F17 53C35 PDF BibTeX XML Cite \textit{A. Darendeliler} et al., J. Ind. Manag. Optim. 19, No. 9, 6908--6947 (2023; Zbl 07677917) Full Text: DOI OpenURL
Alyazidi, Nezar M.; Hassanine, Abdalrahman M.; Mahmoud, Magdi S. An online adaptive policy iteration-based reinforcement learning for a class of a nonlinear 3D overhead crane. (English) Zbl 07677299 Appl. Math. Comput. 447, Article ID 127810, 19 p. (2023). MSC: 93Cxx 93Bxx 70Qxx PDF BibTeX XML Cite \textit{N. M. Alyazidi} et al., Appl. Math. Comput. 447, Article ID 127810, 19 p. (2023; Zbl 07677299) Full Text: DOI OpenURL
Ueda, Masahiko Memory-two strategies forming symmetric mutual reinforcement learning equilibrium in repeated prisoners’ dilemma game. (English) Zbl 07677169 Appl. Math. Comput. 444, Article ID 127819, 15 p. (2023). MSC: 91Axx 68Txx 92Dxx PDF BibTeX XML Cite \textit{M. Ueda}, Appl. Math. Comput. 444, Article ID 127819, 15 p. (2023; Zbl 07677169) Full Text: DOI arXiv OpenURL
Li, Dongdong; Dong, Jiuxiang Performance-constrained fault-tolerant DSC based on reinforcement learning for nonlinear systems with uncertain parameters. (English) Zbl 07677125 Appl. Math. Comput. 443, Article ID 127759, 21 p. (2023). MSC: 93Cxx 93Bxx 93-XX PDF BibTeX XML Cite \textit{D. Li} and \textit{J. Dong}, Appl. Math. Comput. 443, Article ID 127759, 21 p. (2023; Zbl 07677125) Full Text: DOI OpenURL
Sun, Zhongshi; Jia, Guangyan Reinforcement learning for exploratory linear-quadratic two-person zero-sum stochastic differential games. (English) Zbl 07677071 Appl. Math. Comput. 442, Article ID 127763, 16 p. (2023). MSC: 91Axx 49Nxx 93Exx PDF BibTeX XML Cite \textit{Z. Sun} and \textit{G. Jia}, Appl. Math. Comput. 442, Article ID 127763, 16 p. (2023; Zbl 07677071) Full Text: DOI OpenURL
Wang, Xiao; Zhang, Guowei; Li, Yongqiang; Qu, Na A heuristically accelerated reinforcement learning method for maintenance policy of an assembly line. (English) Zbl 07668841 J. Ind. Manag. Optim. 19, No. 4, 2381-2395 (2023). MSC: 58F15 58F17 53C35 PDF BibTeX XML Cite \textit{X. Wang} et al., J. Ind. Manag. Optim. 19, No. 4, 2381--2395 (2023; Zbl 07668841) Full Text: DOI OpenURL
Monter, Samuel; Heuthe, Veit-Lorenz; Panizon, Emanuele; Bechinger, Clemens Dynamics and risk sharing in groups of selfish individuals. (English) Zbl 1508.92337 J. Theor. Biol. 562, Article ID 111433, 8 p. (2023). MSC: 92D50 PDF BibTeX XML Cite \textit{S. Monter} et al., J. Theor. Biol. 562, Article ID 111433, 8 p. (2023; Zbl 1508.92337) Full Text: DOI OpenURL
Kirk, Robert; Zhang, Amy; Grefenstette, Edward; Rocktäschel, Tim A survey of zero-shot generalisation in deep reinforcement learning. (English) Zbl 1506.68106 J. Artif. Intell. Res. (JAIR) 76, 201-264 (2023). MSC: 68T07 PDF BibTeX XML Cite \textit{R. Kirk} et al., J. Artif. Intell. Res. (JAIR) 76, 201--264 (2023; Zbl 1506.68106) Full Text: DOI arXiv OpenURL
Neumann, Niels M. P.; de Heer, Paolo B. U. L.; Phillipson, Frank Quantum reinforcement learning. Comparing quantum annealing and gate-based quantum computing with classical deep reinforcement learning. (English) Zbl 07658652 Quantum Inf. Process. 22, No. 2, Paper No. 125, 18 p. (2023). MSC: 81P68 PDF BibTeX XML Cite \textit{N. M. P. Neumann} et al., Quantum Inf. Process. 22, No. 2, Paper No. 125, 18 p. (2023; Zbl 07658652) Full Text: DOI OpenURL
Huang, Zhen; Tu, Yidong; Fang, Haiyang; Wang, Hai; Zhang, Liang; Shi, Kaibo; He, Shuping Off-policy reinforcement learning for tracking control of discrete-time Markov jump linear systems with completely unknown dynamics. (English) Zbl 1507.93140 J. Franklin Inst. 360, No. 3, 2361-2378 (2023). MSC: 93C55 93E03 93C05 49N90 PDF BibTeX XML Cite \textit{Z. Huang} et al., J. Franklin Inst. 360, No. 3, 2361--2378 (2023; Zbl 1507.93140) Full Text: DOI OpenURL
Stuckey, K.; Newton, P. K. COVID-19 vaccine incentive scheduling using an optimally controlled reinforcement learning model. (English) Zbl 1507.92061 Physica D 445, Article ID 133613, 11 p. (2023). MSC: 92C60 91A22 68T05 PDF BibTeX XML Cite \textit{K. Stuckey} and \textit{P. K. Newton}, Physica D 445, Article ID 133613, 11 p. (2023; Zbl 1507.92061) Full Text: DOI OpenURL
Zhang, Ke; Lin, Xi; Li, Meng Graph attention reinforcement learning with flexible matching policies for multi-depot vehicle routing problems. (English) Zbl 07652854 Physica A 611, Article ID 128451, 16 p. (2023). MSC: 68T07 90-XX PDF BibTeX XML Cite \textit{K. Zhang} et al., Physica A 611, Article ID 128451, 16 p. (2023; Zbl 07652854) Full Text: DOI OpenURL
Nasir, Yusuf; Durlofsky, Louis J. Deep reinforcement learning for optimal well control in subsurface systems with uncertain geology. (English) Zbl 07652827 J. Comput. Phys. 477, Article ID 111945, 25 p. (2023). MSC: 86Axx 90Cxx 68Txx PDF BibTeX XML Cite \textit{Y. Nasir} and \textit{L. J. Durlofsky}, J. Comput. Phys. 477, Article ID 111945, 25 p. (2023; Zbl 07652827) Full Text: DOI arXiv OpenURL
Elhaki, Omid; Shojaei, Khoshnam A novel adaptive fuzzy reinforcement learning controller for a platoon of off-axle hitching tractor-trailers with a prescribed performance and path curvature compensation. (English) Zbl 1507.93135 Eur. J. Control 69, Article ID 100735, 16 p. (2023). MSC: 93C42 93C40 93D05 93C85 PDF BibTeX XML Cite \textit{O. Elhaki} and \textit{K. Shojaei}, Eur. J. Control 69, Article ID 100735, 16 p. (2023; Zbl 1507.93135) Full Text: DOI OpenURL
Kokolakis, Nick-Marios T.; Vamvoudakis, Kyriakos G. Bounded rational Dubins vehicle coordination for target tracking using reinforcement learning. (English) Zbl 1507.93161 Automatica 149, Article ID 110732, 14 p. (2023). MSC: 93C85 93C30 91A80 PDF BibTeX XML Cite \textit{N.-M. T. Kokolakis} and \textit{K. G. Vamvoudakis}, Automatica 149, Article ID 110732, 14 p. (2023; Zbl 1507.93161) Full Text: DOI OpenURL
Jiang, Yi; Gao, Weinan; Wu, Jin; Chai, Tianyou; Lewis, Frank L. Reinforcement learning and cooperative \(H_\infty\) output regulation of linear continuous-time multi-agent systems. (English) Zbl 1507.93063 Automatica 148, Article ID 110768, 11 p. (2023). MSC: 93B36 93A16 93C05 93B52 PDF BibTeX XML Cite \textit{Y. Jiang} et al., Automatica 148, Article ID 110768, 11 p. (2023; Zbl 1507.93063) Full Text: DOI OpenURL
Zhao, Jianguo; Yang, Chunyu; Gao, Weinan; Modares, Hamidreza; Chen, Xinkai; Dai, Wei Linear quadratic tracking control of unknown systems: a two-phase reinforcement learning method. (English) Zbl 1507.93158 Automatica 148, Article ID 110761, 10 p. (2023). MSC: 93C70 93C05 49N10 PDF BibTeX XML Cite \textit{J. Zhao} et al., Automatica 148, Article ID 110761, 10 p. (2023; Zbl 1507.93158) Full Text: DOI OpenURL
Moerland, Thomas M.; Broekens, Joost; Plaat, Aske; Jonker, Catholijn M. Model-based reinforcement learning: a survey. (English) Zbl 07644944 Found. Trends Mach. Learn. 16, No. 1, 1-118 (2023). MSC: 68T05 68-02 PDF BibTeX XML Cite \textit{T. M. Moerland} et al., Found. Trends Mach. Learn. 16, No. 1, 1--118 (2023; Zbl 07644944) Full Text: DOI arXiv OpenURL
Cohen, Max H.; Serlin, Zachary; Leahy, Kevin; Belta, Calin Temporal logic guided safe model-based reinforcement learning: a hybrid systems approach. (English) Zbl 1505.93116 Nonlinear Anal., Hybrid Syst. 47, Article ID 101295, 23 p. (2023). MSC: 93C30 93C40 03B44 93C10 PDF BibTeX XML Cite \textit{M. H. Cohen} et al., Nonlinear Anal., Hybrid Syst. 47, Article ID 101295, 23 p. (2023; Zbl 1505.93116) Full Text: DOI OpenURL
Chandak, Siddharth; Borkar, Vivek S.; Dolhare, Harsh A concentration bound for \(\operatorname{LSPE}( \lambda )\). (English) Zbl 1505.93252 Syst. Control Lett. 171, Article ID 105418, 9 p. (2023). MSC: 93E03 PDF BibTeX XML Cite \textit{S. Chandak} et al., Syst. Control Lett. 171, Article ID 105418, 9 p. (2023; Zbl 1505.93252) Full Text: DOI arXiv OpenURL
Hsu, Kai-Chieh; Ren, Allen Z.; Nguyen, Duy P.; Majumdar, Anirudha; Fisac, Jaime F. Sim-to-lab-to-real: safe reinforcement learning with shielding and generalization guarantees. (English) Zbl 07638289 Artif. Intell. 314, Article ID 103811, 24 p. (2023). MSC: 68Txx PDF BibTeX XML Cite \textit{K.-C. Hsu} et al., Artif. Intell. 314, Article ID 103811, 24 p. (2023; Zbl 07638289) Full Text: DOI arXiv OpenURL
Kovařík, Vojtěch; Seitz, Dominik; Lisý, Viliam; Rudolf, Jan; Sun, Shuo; Ha, Karel Value functions for depth-limited solving in zero-sum imperfect-information games. (English) Zbl 07638283 Artif. Intell. 314, Article ID 103805, 51 p. (2023). MSC: 68Txx PDF BibTeX XML Cite \textit{V. Kovařík} et al., Artif. Intell. 314, Article ID 103805, 51 p. (2023; Zbl 07638283) Full Text: DOI arXiv OpenURL
Hildebrandt, Florentin D.; Thomas, Barrett W.; Ulmer, Marlin W. Opportunities for reinforcement learning in stochastic dynamic vehicle routing. (English) Zbl 07634103 Comput. Oper. Res. 150, Article ID 106071, 14 p. (2023). MSC: 90Bxx PDF BibTeX XML Cite \textit{F. D. Hildebrandt} et al., Comput. Oper. Res. 150, Article ID 106071, 14 p. (2023; Zbl 07634103) Full Text: DOI OpenURL
da Costa, Paulo; Verleijsdonk, Peter; Voorberg, Simon; Akcay, Alp; Kapodistria, Stella; van Jaarsveld, Willem; Zhang, Yingqian Policies for the dynamic traveling maintainer problem with alerts. (English) Zbl 07632159 Eur. J. Oper. Res. 305, No. 3, 1141-1152 (2023). MSC: 90Bxx PDF BibTeX XML Cite \textit{P. da Costa} et al., Eur. J. Oper. Res. 305, No. 3, 1141--1152 (2023; Zbl 07632159) Full Text: DOI arXiv OpenURL
Lai, Jing; Xiong, Junlin; Shu, Zhan Model-free optimal control of discrete-time systems with additive and multiplicative noises. (English) Zbl 1505.93287 Automatica 147, Article ID 110685, 9 p. (2023). MSC: 93E20 93C55 49N10 PDF BibTeX XML Cite \textit{J. Lai} et al., Automatica 147, Article ID 110685, 9 p. (2023; Zbl 1505.93287) Full Text: DOI arXiv OpenURL
Cohen, Max H.; Belta, Calin Safe exploration in model-based reinforcement learning using control barrier functions. (English) Zbl 1505.93123 Automatica 147, Article ID 110684, 9 p. (2023). MSC: 93C40 68T05 93C10 49L20 PDF BibTeX XML Cite \textit{M. H. Cohen} and \textit{C. Belta}, Automatica 147, Article ID 110684, 9 p. (2023; Zbl 1505.93123) Full Text: DOI arXiv OpenURL
Merrell, David; Chandereng, Thevaa; Park, Yeonhee A Markov decision process for response-adaptive randomization in clinical trials. (English) Zbl 07626670 Comput. Stat. Data Anal. 178, Article ID 107599, 11 p. (2023). MSC: 62-XX PDF BibTeX XML Cite \textit{D. Merrell} et al., Comput. Stat. Data Anal. 178, Article ID 107599, 11 p. (2023; Zbl 07626670) Full Text: DOI arXiv OpenURL
He, Jing; Liu, Xinglu; Duan, Qiyao; Chan, Wai Kin (Victor); Qi, Mingyao Reinforcement learning for multi-item retrieval in the puzzle-based storage system. (English) Zbl 07619287 Eur. J. Oper. Res. 305, No. 2, 820-837 (2023). MSC: 90Bxx PDF BibTeX XML Cite \textit{J. He} et al., Eur. J. Oper. Res. 305, No. 2, 820--837 (2023; Zbl 07619287) Full Text: DOI arXiv OpenURL
Karimi-Mamaghan, Maryam; Mohammadi, Mehrdad; Pasdeloup, Bastien; Meyer, Patrick Learning to select operators in meta-heuristics: an integration of Q-learning into the iterated greedy algorithm for the permutation flowshop scheduling problem. (English) Zbl 07594703 Eur. J. Oper. Res. 304, No. 3, 1296-1330 (2023). MSC: 90Bxx PDF BibTeX XML Cite \textit{M. Karimi-Mamaghan} et al., Eur. J. Oper. Res. 304, No. 3, 1296--1330 (2023; Zbl 07594703) Full Text: DOI OpenURL
Thul, Lawrence; Powell, Warren Stochastic optimization for vaccine and testing kit allocation for the COVID-19 pandemic. (English) Zbl 07583175 Eur. J. Oper. Res. 304, No. 1, 325-338 (2023). MSC: 90Bxx PDF BibTeX XML Cite \textit{L. Thul} and \textit{W. Powell}, Eur. J. Oper. Res. 304, No. 1, 325--338 (2023; Zbl 07583175) Full Text: DOI arXiv OpenURL
Antonov, N.; Ishmukhametov, Sh. An intelligent choice of witnesses in the Miller-Rabin primality test. Reinforcement learning approach. (English) Zbl 07676580 Lobachevskii J. Math. 43, No. 12, 3420-3429 (2022). MSC: 11Yxx 11Axx 68Qxx PDF BibTeX XML Cite \textit{N. Antonov} and \textit{Sh. Ishmukhametov}, Lobachevskii J. Math. 43, No. 12, 3420--3429 (2022; Zbl 07676580) Full Text: DOI OpenURL
Minashina, I. K.; Gorbachev, R. A.; Zakharova, E. M. Scheduling in multiagent systems using reinforcement learning. (English. Russian original) Zbl 07667554 Dokl. Math. 106, Suppl. 1, S70-S78 (2022); translation from Dokl. Ross. Akad. Nauk, Mat. Inform. Protsessy Upr. 508, 79-87 (2022). MSC: 90B35 PDF BibTeX XML Cite \textit{I. K. Minashina} et al., Dokl. Math. 106, S70--S78 (2022; Zbl 07667554); translation from Dokl. Ross. Akad. Nauk, Mat. Inform. Protsessy Upr. 508, 79--87 (2022) Full Text: DOI OpenURL
Liu, Zhongshan; Yu, Bin; Zhang, Li; Wang, Wensi A hybrid control strategy for a dynamic scheduling problem in transit networks. (English) Zbl 07659884 Int. J. Appl. Math. Comput. Sci. 32, No. 4, 553-567 (2022). MSC: 90B35 90B06 90B25 PDF BibTeX XML Cite \textit{Z. Liu} et al., Int. J. Appl. Math. Comput. Sci. 32, No. 4, 553--567 (2022; Zbl 07659884) Full Text: DOI OpenURL
Wang, Yang; Chen, Zhibin Dynamic graph conv-LSTM model with dynamic positional encoding for the large-scale traveling salesman problem. (English) Zbl 07659460 Math. Biosci. Eng. 19, No. 10, 9730-9748 (2022). MSC: 90C27 90C59 PDF BibTeX XML Cite \textit{Y. Wang} and \textit{Z. Chen}, Math. Biosci. Eng. 19, No. 10, 9730--9748 (2022; Zbl 07659460) Full Text: DOI OpenURL
Denizdurduran, Berat; Markram, Henry; Gewaltig, Marc-Oliver Optimum trajectory learning in musculoskeletal systems with model predictive control and deep reinforcement learning. (English) Zbl 1508.92022 Biol. Cybern. 116, No. 5-6, 711-726 (2022). MSC: 92C10 93B45 68T07 PDF BibTeX XML Cite \textit{B. Denizdurduran} et al., Biol. Cybern. 116, No. 5--6, 711--726 (2022; Zbl 1508.92022) Full Text: DOI OpenURL
Yao, Shixuan; Liu, Xiaochen; Zhang, Yinghui; Cui, Ze An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. (English) Zbl 07659305 Math. Biosci. Eng. 19, No. 9, 9258-9290 (2022). MSC: 49J20 93C10 35F21 PDF BibTeX XML Cite \textit{S. Yao} et al., Math. Biosci. Eng. 19, No. 9, 9258--9290 (2022; Zbl 07659305) Full Text: DOI OpenURL
Casgrain, Philippe; Ning, Brian; Jaimungal, Sebastian Deep Q-learning for Nash equilibria: Nash-DQN. (English) Zbl 1508.91523 Appl. Math. Finance 29, No. 1, 62-78 (2022). MSC: 91G15 91A15 68T07 PDF BibTeX XML Cite \textit{P. Casgrain} et al., Appl. Math. Finance 29, No. 1, 62--78 (2022; Zbl 1508.91523) Full Text: DOI arXiv OpenURL
Chandak, Siddharth; Borkar, Vivek S.; Dodhia, Parth Concentration of contractive stochastic approximation and reinforcement learning. (English) Zbl 07644932 Stoch. Syst. 12, No. 4, 411-430 (2022). MSC: 62L20 68T05 PDF BibTeX XML Cite \textit{S. Chandak} et al., Stoch. Syst. 12, No. 4, 411--430 (2022; Zbl 07644932) Full Text: DOI arXiv OpenURL
Jarne Ornia, Daniel; Mazo, Manuel jun. Robust event-driven interactions in cooperative multi-agent learning. (English) Zbl 07643440 Bogomolov, Sergiy (ed.) et al., Formal modeling and analysis of timed systems. 20th international conference, FORMATS 2022, Warsaw, Poland, September 13–15, 2022. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 13465, 281-297 (2022). MSC: 68Qxx PDF BibTeX XML Cite \textit{D. Jarne Ornia} and \textit{M. Mazo jun.}, Lect. Notes Comput. Sci. 13465, 281--297 (2022; Zbl 07643440) Full Text: DOI arXiv OpenURL
Yang, Wenhao; Zhang, Liangyu; Zhang, Zhihua Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics. (English) Zbl 07641124 Ann. Stat. 50, No. 6, 3223-3248 (2022). MSC: 62C05 62F12 68Q32 PDF BibTeX XML Cite \textit{W. Yang} et al., Ann. Stat. 50, No. 6, 3223--3248 (2022; Zbl 07641124) Full Text: DOI arXiv Link OpenURL
Qu, Guannan; Wierman, Adam; Li, Na Scalable reinforcement learning for multiagent networked systems. (English) Zbl 07640312 Oper. Res. 70, No. 6, 3601-3628 (2022). MSC: 90B10 PDF BibTeX XML Cite \textit{G. Qu} et al., Oper. Res. 70, No. 6, 3601--3628 (2022; Zbl 07640312) Full Text: DOI arXiv OpenURL
Shah, Devavrat; Xie, Qiaomin; Xu, Zhi Nonasymptotic analysis of Monte Carlo tree search. (English) Zbl 07640292 Oper. Res. 70, No. 6, 3234-3260 (2022). MSC: 90C40 PDF BibTeX XML Cite \textit{D. Shah} et al., Oper. Res. 70, No. 6, 3234--3260 (2022; Zbl 07640292) Full Text: DOI arXiv OpenURL
Chang, Hsiang-Chuan; Okubo, Tomohiro; Kobayashi, Akihiro; Morimoto, Akinori Artificial intelligence (AI) applications using big data and survey data for exploring the existence of the potential users of public transportation system. (English) Zbl 1505.90034 Int. J. Inf. Manage. Sci. 33, No. 4, 271-290 (2022). MSC: 90B20 68T05 PDF BibTeX XML Cite \textit{H.-C. Chang} et al., Int. J. Inf. Manage. Sci. 33, No. 4, 271--290 (2022; Zbl 1505.90034) Full Text: DOI OpenURL
Khetarpal, Khimya; Riemer, Matthew; Rish, Irina; Precup, Doina Towards continual reinforcement learning: a review and perspectives. (English) Zbl 07639824 J. Artif. Intell. Res. (JAIR) 75, 1401-1476 (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{K. Khetarpal} et al., J. Artif. Intell. Res. (JAIR) 75, 1401--1476 (2022; Zbl 07639824) Full Text: DOI arXiv OpenURL
Mazoure, Bogdan; Doan, Thang; Li, Tianyu; Makarenkov, Vladimir; Pineau, Joelle; Precup, Doina; Rabusseau, Guillaume Low-rank representation of reinforcement learning policies. (English) Zbl 1502.68262 J. Artif. Intell. Res. (JAIR) 75, 597-636 (2022). MSC: 68T05 90C40 PDF BibTeX XML Cite \textit{B. Mazoure} et al., J. Artif. Intell. Res. (JAIR) 75, 597--636 (2022; Zbl 1502.68262) Full Text: DOI OpenURL
Ma, Xiaoteng; Ma, Shuai; Xia, Li; Zhao, Qianchuan Mean-semivariance policy optimization via risk-averse reinforcement learning. (English) Zbl 1502.68261 J. Artif. Intell. Res. (JAIR) 75, 569-595 (2022). MSC: 68T05 68T20 90C40 PDF BibTeX XML Cite \textit{X. Ma} et al., J. Artif. Intell. Res. (JAIR) 75, 569--595 (2022; Zbl 1502.68261) Full Text: DOI arXiv OpenURL
Guo, Xin; Xu, Renyuan; Zariphopoulou, Thaleia Entropy regularization for mean field games with learning. (English) Zbl 1505.91061 Math. Oper. Res. 47, No. 4, 3239-3260 (2022). MSC: 91A16 35Q89 94A17 68T05 PDF BibTeX XML Cite \textit{X. Guo} et al., Math. Oper. Res. 47, No. 4, 3239--3260 (2022; Zbl 1505.91061) Full Text: DOI arXiv OpenURL
Aletti, Giacomo; Crimaldi, Irene The rescaled Pólya urn: local reinforcement and chi-squared goodness-of-fit test. (English) Zbl 1503.60026 Adv. Appl. Probab. 54, No. 3, 849-879 (2022). MSC: 60F05 62F03 60J05 62F05 PDF BibTeX XML Cite \textit{G. Aletti} and \textit{I. Crimaldi}, Adv. Appl. Probab. 54, No. 3, 849--879 (2022; Zbl 1503.60026) Full Text: DOI arXiv OpenURL
Bayer, Christian; Belomestny, Denis; Hager, Paul; Pigato, Paolo; Schoenmakers, John; Spokoiny, Vladimir Reinforced optimal control. (English) Zbl 1503.93050 Commun. Math. Sci. 20, No. 7, 1951-1978 (2022). MSC: 93E20 93E24 49L20 90C40 PDF BibTeX XML Cite \textit{C. Bayer} et al., Commun. Math. Sci. 20, No. 7, 1951--1978 (2022; Zbl 1503.93050) Full Text: DOI arXiv OpenURL
He, Xue-Zhong; Lin, Shen Reinforcement learning equilibrium in limit order markets. (English) Zbl 07631989 J. Econ. Dyn. Control 144, Article ID 104497, 20 p. (2022). MSC: 91-XX PDF BibTeX XML Cite \textit{X.-Z. He} and \textit{S. Lin}, J. Econ. Dyn. Control 144, Article ID 104497, 20 p. (2022; Zbl 07631989) Full Text: DOI OpenURL
Huang, Xiaowei; Peng, Bei; Zhao, Xingyu Dependable learning-enabled multiagent systems. (English) Zbl 07631903 AI Commun. 35, No. 4, 407-420 (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{X. Huang} et al., AI Commun. 35, No. 4, 407--420 (2022; Zbl 07631903) Full Text: DOI OpenURL
Ahmed, Ibrahim H.; Brewitt, Cillian; Carlucho, Ignacio; Christianos, Filippos; Dunion, Mhairi; Fosong, Elliot; Garcin, Samuel; Guo, Shangmin; Gyevnar, Balint; McInroe, Trevor; Papoudakis, Georgios; Rahman, Arrasy; Schäfer, Lukas; Tamborski, Massimiliano; Vecchio, Giuseppe; Wang, Cheng; Albrecht, Stefano V. Deep reinforcement learning for multi-agent interaction. (English) Zbl 07631899 AI Commun. 35, No. 4, 357-368 (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{I. H. Ahmed} et al., AI Commun. 35, No. 4, 357--368 (2022; Zbl 07631899) Full Text: DOI arXiv OpenURL
Black, Elizabeth; Brandão, Martim; Cocarascu, Oana; de Keijzer, Bart; Du, Yali; Long, Derek; Luck, Michael; McBurney, Peter; Meroño-Peñuela, Albert; Miles, Simon; Modgil, Sanjay; Moreau, Luc; Polukarov, Maria; Rodrigues, Odinaldo; Ventre, Carmine Reasoning and interaction for social artificial intelligence. (English) Zbl 07631896 AI Commun. 35, No. 4, 309-325 (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{E. Black} et al., AI Commun. 35, No. 4, 309--325 (2022; Zbl 07631896) Full Text: DOI OpenURL
Gemp, Ian; Anthony, Thomas; Bachrach, Yoram; Bhoopchand, Avishkar; Bullard, Kalesha; Connor, Jerome; Dasagi, Vibhavari; de Vylder, Bart; Duéñez-Guzmán, Edgar A.; Elie, Romuald; Everett, Richard; Hennes, Daniel; Hughes, Edward; Khan, Mina; Lanctot, Marc; Larson, Kate; Lever, Guy; Liu, Siqi; Marris, Luke; McKee, Kevin R.; Muller, Paul; Pérolat, Julien; Strub, Florian; Tacchetti, Andrea; Tarassov, Eugene; Wang, Zhe; Tuyls, Karl Developing, evaluating and scaling learning agents in multi-agent environments. (English) Zbl 07631893 AI Commun. 35, No. 4, 271-284 (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{I. Gemp} et al., AI Commun. 35, No. 4, 271--284 (2022; Zbl 07631893) Full Text: DOI arXiv OpenURL
Farzanegan, Behzad; Suratgar, Amir Abolfazl; Menhaj, Mohammad Bagher; Zamani, Mohsen Distributed optimal control for continuous-time nonaffine nonlinear interconnected systems. (English) Zbl 1505.49023 Int. J. Control 95, No. 12, 3462-3476 (2022). MSC: 49L20 93C10 PDF BibTeX XML Cite \textit{B. Farzanegan} et al., Int. J. Control 95, No. 12, 3462--3476 (2022; Zbl 1505.49023) Full Text: DOI OpenURL
Padalkar, Abhishek; Nieuwenhuisen, Matthias; Schulz, Dirk; Stulp, Freek Closing the gap: combining task specification and reinforcement learning for compliant vegetable cutting. (English) Zbl 1504.93256 Gusikhin, Oleg (ed.) et al., Informatics in control, automation and robotics. 17th international conference, ICINCO 2020, Lieusaint, Paris, France, July 7–9, 2020. Revised selected papers. Cham: Springer. Lect. Notes Electr. Eng. 793, 187-206 (2022). MSC: 93C85 68T05 PDF BibTeX XML Cite \textit{A. Padalkar} et al., Lect. Notes Electr. Eng. 793, 187--206 (2022; Zbl 1504.93256) Full Text: DOI OpenURL
Wang, Haowen; Song, Zhaoyang; Wang, Yinuo; Tian, Yanbing; Ma, Hongyang Target-generating quantum error correction coding scheme based on generative confrontation network. (English) Zbl 1508.81631 Quantum Inf. Process. 21, No. 8, Paper No. 280, 17 p. (2022). MSC: 81P73 81P68 PDF BibTeX XML Cite \textit{H. Wang} et al., Quantum Inf. Process. 21, No. 8, Paper No. 280, 17 p. (2022; Zbl 1508.81631) Full Text: DOI OpenURL
Joshi, Anant A.; Taghvaei, Amirhossein; Mehta, Prashant G.; Meyn, Sean P. Controlled interacting particle algorithms for simulation-based reinforcement learning. (English) Zbl 1505.49027 Syst. Control Lett. 170, Article ID 105392, 15 p. (2022). MSC: 49N10 49N15 PDF BibTeX XML Cite \textit{A. A. Joshi} et al., Syst. Control Lett. 170, Article ID 105392, 15 p. (2022; Zbl 1505.49027) Full Text: DOI arXiv OpenURL
Cappart, Quentin; Bergman, David; Rousseau, Louis-Martin; Prémont-Schwarz, Isabeau; Parjadis, Augustin Improving variable orderings of approximate decision diagrams using reinforcement learning. (English) Zbl 1502.90088 INFORMS J. Comput. 34, No. 5, 2552-2570 (2022). MSC: 90B50 68T05 PDF BibTeX XML Cite \textit{Q. Cappart} et al., INFORMS J. Comput. 34, No. 5, 2552--2570 (2022; Zbl 1502.90088) Full Text: DOI OpenURL
Nguyen, Luong-Ha; Goulet, James-A. Analytically tractable hidden-states inference in Bayesian neural networks. (English) Zbl 07625203 J. Mach. Learn. Res. 23, Paper No. 50, 33 p. (2022). MSC: 68T05 PDF BibTeX XML Cite \textit{L.-H. Nguyen} and \textit{J.-A. Goulet}, J. Mach. Learn. Res. 23, Paper No. 50, 33 p. (2022; Zbl 07625203) Full Text: arXiv Link OpenURL
Wei, Kaixuan; Aviles-Rivero, Angelica; Liang, Jingwei; Fu, Ying; Huang, Hua; Schönlieb, Carola-Bibiane TFPnP: tuning-free plug-and-play proximal algorithms with applications to inverse imaging problems. (English) Zbl 07625169 J. Mach. Learn. Res. 23, Paper No. 16, 48 p. (2022). MSC: 68T05 PDF BibTeX XML Cite \textit{K. Wei} et al., J. Mach. Learn. Res. 23, Paper No. 16, 48 p. (2022; Zbl 07625169) Full Text: arXiv Link OpenURL
Subramanian, Jayakumar; Sinha, Amit; Seraj, Raihan; Mahajan, Aditya Approximate information state for approximate planning and reinforcement learning in partially observed systems. (English) Zbl 07625165 J. Mach. Learn. Res. 23, Paper No. 12, 83 p. (2022). MSC: 68T05 PDF BibTeX XML Cite \textit{J. Subramanian} et al., J. Mach. Learn. Res. 23, Paper No. 12, 83 p. (2022; Zbl 07625165) Full Text: arXiv Link OpenURL
Bertsimas, Dimitris; Paskov, Alex World-class interpretable poker. (English) Zbl 07624265 Mach. Learn. 111, No. 8, 3063-3083 (2022). MSC: 68T05 PDF BibTeX XML Cite \textit{D. Bertsimas} and \textit{A. Paskov}, Mach. Learn. 111, No. 8, 3063--3083 (2022; Zbl 07624265) Full Text: DOI OpenURL
Han, Bingyan Cooperation between independent market makers. (English) Zbl 1505.91368 Quant. Finance 22, No. 11, 2005-2019 (2022). MSC: 91G15 91A80 68T07 PDF BibTeX XML Cite \textit{B. Han}, Quant. Finance 22, No. 11, 2005--2019 (2022; Zbl 1505.91368) Full Text: DOI arXiv OpenURL
Tang, Wenpin; Zhang, Yuming Paul; Zhou, Xun Yu Exploratory HJB equations and their convergence. (English) Zbl 1501.35132 SIAM J. Control Optim. 60, No. 6, 3191-3216 (2022). MSC: 35F21 60J60 93E15 93E20 PDF BibTeX XML Cite \textit{W. Tang} et al., SIAM J. Control Optim. 60, No. 6, 3191--3216 (2022; Zbl 1501.35132) Full Text: DOI arXiv OpenURL
Chen, Zaiwei; Zhang, Sheng; Doan, Thinh T.; Clarke, John-Paul; Maguluri, Siva Theja Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning. (English) Zbl 1504.93364 Automatica 146, Article ID 110623, 14 p. (2022). MSC: 93E03 93C10 68T05 PDF BibTeX XML Cite \textit{Z. Chen} et al., Automatica 146, Article ID 110623, 14 p. (2022; Zbl 1504.93364) Full Text: DOI arXiv OpenURL
Gros, Sebastien; Zanon, Mario Learning for MPC with stability & safety guarantees. (English) Zbl 1504.93085 Automatica 146, Article ID 110598, 13 p. (2022). MSC: 93B45 93D05 PDF BibTeX XML Cite \textit{S. Gros} and \textit{M. Zanon}, Automatica 146, Article ID 110598, 13 p. (2022; Zbl 1504.93085) Full Text: DOI arXiv OpenURL
Chen, Ci; Xie, Lihua; Xie, Kan; Lewis, Frank L.; Xie, Shengli Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning. (English) Zbl 1504.93197 Automatica 146, Article ID 110581, 14 p. (2022). MSC: 93C40 93B52 49J15 PDF BibTeX XML Cite \textit{C. Chen} et al., Automatica 146, Article ID 110581, 14 p. (2022; Zbl 1504.93197) Full Text: DOI OpenURL
Hottung, André; Tierney, Kevin Neural large neighborhood search for routing problems. (English) Zbl 07613164 Artif. Intell. 313, Article ID 103786, 17 p. (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{A. Hottung} and \textit{K. Tierney}, Artif. Intell. 313, Article ID 103786, 17 p. (2022; Zbl 07613164) Full Text: DOI arXiv OpenURL
Roveda, Loris; Testa, Andrea; Shahid, Asad Ali; Braghin, Francesco; Piga, Dario Q-learning-based model predictive variable impedance control for physical human-robot collaboration. (English) Zbl 07613153 Artif. Intell. 312, Article ID 103771, 32 p. (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{L. Roveda} et al., Artif. Intell. 312, Article ID 103771, 32 p. (2022; Zbl 07613153) Full Text: DOI OpenURL
Liu, Rex G.; Frank, Michael J. Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning. (English) Zbl 07613152 Artif. Intell. 312, Article ID 103770, 27 p. (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{R. G. Liu} and \textit{M. J. Frank}, Artif. Intell. 312, Article ID 103770, 27 p. (2022; Zbl 07613152) Full Text: DOI OpenURL
Chen, Xu; Wang, Jun Inhomogeneous deep Q-network for time sensitive applications. (English) Zbl 07613149 Artif. Intell. 312, Article ID 103757, 22 p. (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{X. Chen} and \textit{J. Wang}, Artif. Intell. 312, Article ID 103757, 22 p. (2022; Zbl 07613149) Full Text: DOI OpenURL
Nair, Biji; Bhanu, S. Mary Saira A reinforcement learning algorithm for rescheduling preempted tasks in fog nodes. (English) Zbl 1501.90031 J. Sched. 25, No. 5, 547-565 (2022). MSC: 90B35 68M20 PDF BibTeX XML Cite \textit{B. Nair} and \textit{S. M. S. Bhanu}, J. Sched. 25, No. 5, 547--565 (2022; Zbl 1501.90031) Full Text: DOI OpenURL
Kious, Daniel; Mailler, Cécile; Schapira, Bruno Finding geodesics on graphs using reinforcement learning. (English) Zbl 1501.05029 Ann. Appl. Probab. 32, No. 5, 3889-3929 (2022). MSC: 05C81 05C10 05C12 05C35 60K35 62L20 PDF BibTeX XML Cite \textit{D. Kious} et al., Ann. Appl. Probab. 32, No. 5, 3889--3929 (2022; Zbl 1501.05029) Full Text: DOI arXiv OpenURL
Chen, Siqi; Yang, Yang; Su, Ran Deep reinforcement learning with emergent communication for coalitional negotiation games. (English) Zbl 1501.91012 Math. Biosci. Eng. 19, No. 5, 4592-4609 (2022). MSC: 91A12 68T07 91A90 PDF BibTeX XML Cite \textit{S. Chen} et al., Math. Biosci. Eng. 19, No. 5, 4592--4609 (2022; Zbl 1501.91012) Full Text: DOI OpenURL
Ischuk, Igor N.; Telnykh, Bogdan K.; Tyapkin, Valeriy N.; Kremez, Nikolay S. Intellectual ways of solving the problem of constructing the optimal route of an unmanned aerial vehicle in the conditions of counteraction. (English) Zbl 07604842 J. Sib. Fed. Univ., Math. Phys. 15, No. 4, 431-443 (2022). MSC: 68-XX 68Txx 62-XX PDF BibTeX XML Cite \textit{I. N. Ischuk} et al., J. Sib. Fed. Univ., Math. Phys. 15, No. 4, 431--443 (2022; Zbl 07604842) Full Text: DOI MNR OpenURL
Liu, Ruo-Ze; Pang, Zhen-Jia; Meng, Zhou-Yu; Wang, Wenhai; Yu, Yang; Lu, Tong On efficient reinforcement learning for full-length game of StarCraft II. (English) Zbl 07603112 J. Artif. Intell. Res. (JAIR) 75, 213-260 (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{R.-Z. Liu} et al., J. Artif. Intell. Res. (JAIR) 75, 213--260 (2022; Zbl 07603112) Full Text: DOI arXiv OpenURL
Suriyanarayana, Varun; Tavaslıoğlu, Onur; Patel, Ankit B.; Schaefer, Andrew J. Reinforcement learning of simplex pivot rules: a proof of concept. (English) Zbl 1503.90069 Optim. Lett. 16, No. 8, 2513-2525 (2022). MSC: 90C05 90C49 PDF BibTeX XML Cite \textit{V. Suriyanarayana} et al., Optim. Lett. 16, No. 8, 2513--2525 (2022; Zbl 1503.90069) Full Text: DOI OpenURL
Giuseppi, Alessandro; Pietrabissa, Antonio Bellman’s principle of optimality and deep reinforcement learning for time-varying tasks. (English) Zbl 1500.93144 Int. J. Control 95, No. 9, 2448-2459 (2022). MSC: 93E20 90C40 PDF BibTeX XML Cite \textit{A. Giuseppi} and \textit{A. Pietrabissa}, Int. J. Control 95, No. 9, 2448--2459 (2022; Zbl 1500.93144) Full Text: DOI OpenURL
Jin, Dan; Chen, Bo; Yu, Li; Liu, Shichao Adaptive output regulation for cyber-physical systems under time-delay attacks. (English) Zbl 1500.93054 Control Theory Technol. 20, No. 1, 20-31 (2022). MSC: 93C40 93B70 93C83 93C43 PDF BibTeX XML Cite \textit{D. Jin} et al., Control Theory Technol. 20, No. 1, 20--31 (2022; Zbl 1500.93054) Full Text: DOI OpenURL
Gao, Weinan; Jiang, Zhong-Ping Learning-based adaptive optimal output regulation of linear and nonlinear systems: an overview. (English) Zbl 1500.93053 Control Theory Technol. 20, No. 1, 1-19 (2022). MSC: 93C40 93C05 93C10 93A16 90C39 93-02 PDF BibTeX XML Cite \textit{W. Gao} and \textit{Z.-P. Jiang}, Control Theory Technol. 20, No. 1, 1--19 (2022; Zbl 1500.93053) Full Text: DOI OpenURL
Bisi, Lorenzo; Santambrogio, Davide; Sandrelli, Federico; Tirinzoni, Andrea; Ziebart, Brian D.; Restelli, Marcello Risk-averse policy optimization via risk-neutral policy optimization. (English) Zbl 07596159 Artif. Intell. 311, Article ID 103765, 16 p. (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{L. Bisi} et al., Artif. Intell. 311, Article ID 103765, 16 p. (2022; Zbl 07596159) Full Text: DOI OpenURL
Wang, Yuheng; Chapman, Margaret P. Risk-averse autonomous systems: a brief history and recent developments from the perspective of optimal control. (English) Zbl 07596154 Artif. Intell. 311, Article ID 103743, 25 p. (2022). MSC: 68Txx PDF BibTeX XML Cite \textit{Y. Wang} and \textit{M. P. Chapman}, Artif. Intell. 311, Article ID 103743, 25 p. (2022; Zbl 07596154) Full Text: DOI arXiv OpenURL
Lian, Bosen; Xue, Wenqian; Lewis, Frank L.; Chai, Tianyou Inverse reinforcement learning for multi-player noncooperative apprentice games. (English) Zbl 1498.91014 Automatica 145, Article ID 110524, 9 p. (2022). MSC: 91A10 91A06 49N45 PDF BibTeX XML Cite \textit{B. Lian} et al., Automatica 145, Article ID 110524, 9 p. (2022; Zbl 1498.91014) Full Text: DOI OpenURL
Ahmadi, Ehsan; Mosadegh, Hadi; Maihami, Reza; Ghalehkhondabi, Iman; Sun, Minghe; Süer, Gürsel A. Intelligent inventory management approaches for perishable pharmaceutical products in a healthcare supply chain. (English) Zbl 07593207 Comput. Oper. Res. 147, Article ID 105968, 20 p. (2022). MSC: 90Bxx PDF BibTeX XML Cite \textit{E. Ahmadi} et al., Comput. Oper. Res. 147, Article ID 105968, 20 p. (2022; Zbl 07593207) Full Text: DOI OpenURL
Le, Richard; Ku, Hyejin Reducing systemic risk in a multi-layer network using reinforcement learning. (English) Zbl 07592478 Physica A 605, Article ID 128029, 21 p. (2022). MSC: 82-XX PDF BibTeX XML Cite \textit{R. Le} and \textit{H. Ku}, Physica A 605, Article ID 128029, 21 p. (2022; Zbl 07592478) Full Text: DOI OpenURL
Clempner, Julio B. A Lyapunov approach for stable reinforcement learning. (English) Zbl 07592290 Comput. Appl. Math. 41, No. 6, Paper No. 279, 15 p. (2022). MSC: 90C40 46N10 60J20 65C40 68T07 PDF BibTeX XML Cite \textit{J. B. Clempner}, Comput. Appl. Math. 41, No. 6, Paper No. 279, 15 p. (2022; Zbl 07592290) Full Text: DOI OpenURL
Cen, Shicong; Cheng, Chen; Chen, Yuxin; Wei, Yuting; Chi, Yuejie Fast global convergence of natural policy gradient methods with entropy regularization. (English) Zbl 1500.90086 Oper. Res. 70, No. 4, 2563-2578 (2022). MSC: 90C52 90C40 PDF BibTeX XML Cite \textit{S. Cen} et al., Oper. Res. 70, No. 4, 2563--2578 (2022; Zbl 1500.90086) Full Text: DOI arXiv OpenURL
Gao, Jinmin; Le, Meilong; Fang, Yuan Dynamic air ticket pricing using reinforcement learning method. (English) Zbl 1497.90068 RAIRO, Oper. Res. 56, No. 4, 2475-2493 (2022). MSC: 90B22 90C40 91B24 PDF BibTeX XML Cite \textit{J. Gao} et al., RAIRO, Oper. Res. 56, No. 4, 2475--2493 (2022; Zbl 1497.90068) Full Text: DOI OpenURL
Panov, A. I. Simultaneous learning and planning in a hierarchical control system for a cognitive agent. (English. Russian original) Zbl 1496.93084 Autom. Remote Control 83, No. 6, 869-883 (2022); translation from Avtom. Telemekh. 2022, No. 6, 53-71 (2022). MSC: 93C85 PDF BibTeX XML Cite \textit{A. I. Panov}, Autom. Remote Control 83, No. 6, 869--883 (2022; Zbl 1496.93084); translation from Avtom. Telemekh. 2022, No. 6, 53--71 (2022) Full Text: DOI OpenURL
L. A., Prashanth; Fu, Michael C. Risk-sensitive reinforcement learning via policy gradient search. (English) Zbl 07582341 Found. Trends Mach. Learn. 15, No. 5, 537-693 (2022). MSC: 68T05 68-02 PDF BibTeX XML Cite \textit{P. L. A.} and \textit{M. C. Fu}, Found. Trends Mach. Learn. 15, No. 5, 537--693 (2022; Zbl 07582341) Full Text: DOI arXiv OpenURL
Silvestri, Mattia; De Filippo, Allegra; Ruggeri, Federico; Lombardi, Michele Hybrid offline/online optimization for energy management via reinforcement learning. (English) Zbl 1504.90188 Schaus, Pierre (ed.), Integration of constraint programming, artificial intelligence, and operations research. 19th international conference, CPAIOR 2022, Los Angeles, CA, USA, June 20–23, 2022. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 13292, 358-373 (2022). MSC: 90C40 68T07 PDF BibTeX XML Cite \textit{M. Silvestri} et al., Lect. Notes Comput. Sci. 13292, 358--373 (2022; Zbl 1504.90188) Full Text: DOI OpenURL