Academic literature on the topic 'Actor-critic algorithm'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Actor-critic algorithm.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Actor-critic algorithm"

1

Wang, Jing, and Ioannis Ch Paschalidis. "An Actor-Critic Algorithm With Second-Order Actor and Critic." IEEE Transactions on Automatic Control 62, no. 6 (2017): 2689–703. http://dx.doi.org/10.1109/tac.2016.2616384.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zheng, Liyuan, Tanner Fiez, Zane Alumbaugh, Benjamin Chasnov, and Lillian J. Ratliff. "Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (2022): 9217–24. http://dx.doi.org/10.1609/aaai.v36i8.20908.

Full text
Abstract:
The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation. We adopt this viewpoint and model the actor and critic interaction as a two-player general-sum game with a leader-follower structure known as a Stackelberg game. Given this abstraction, we propose a meta-framework for Stackelberg actor-critic algorithms where the leader player follows the total derivative of its objective instead of the usual individual gradient. From a theoretical standpoint, we develop a policy gradient t
APA, Harvard, Vancouver, ISO, and other styles
3

Iwaki, Ryo, and Minoru Asada. "Implicit incremental natural actor critic algorithm." Neural Networks 109 (January 2019): 103–12. http://dx.doi.org/10.1016/j.neunet.2018.10.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kim, Gi-Soo, Jane P. Kim, and Hyun-Joon Yang. "Robust Tests in Online Decision-Making." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (2022): 10016–24. http://dx.doi.org/10.1609/aaai.v36i9.21240.

Full text
Abstract:
Bandit algorithms are widely used in sequential decision problems to maximize the cumulative reward. One potential application is mobile health, where the goal is to promote the user's health through personalized interventions based on user specific information acquired through wearable devices. Important considerations include the type of, and frequency with which data is collected (e.g. GPS, or continuous monitoring), as such factors can severely impact app performance and users’ adherence. In order to balance the need to collect data that is useful with the constraint of impacting app perfo
APA, Harvard, Vancouver, ISO, and other styles
5

Sergey, Denisov, and Jee-Hyong Lee. "Actor-Critic Algorithm with Transition Cost Estimation." International Journal of Fuzzy Logic and Intelligent Systems 16, no. 4 (2016): 270–75. http://dx.doi.org/10.5391/ijfis.2016.16.4.270.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ahmed, Ayman Elshabrawy M. "Controller parameter tuning using actor-critic algorithm." IOP Conference Series: Materials Science and Engineering 610 (October 11, 2019): 012054. http://dx.doi.org/10.1088/1757-899x/610/1/012054.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ding, Siyuan, Shengxiang Li, Guangyi Liu, et al. "Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion." Journal of Sensors 2021 (December 8, 2021): 1–14. http://dx.doi.org/10.1155/2021/8739206.

Full text
Abstract:
The exponential explosion of joint actions and massive data collection are two main challenges in multiagent reinforcement learning algorithms with centralized training. To overcome these problems, in this paper, we propose a model-free and fully decentralized actor-critic multiagent reinforcement learning algorithm based on message diffusion. To this end, the agents are assumed to be placed in a time-varying communication network. Each agent makes limited observations regarding the global state and joint actions; therefore, it needs to obtain and share information with others over the network
APA, Harvard, Vancouver, ISO, and other styles
8

Hafez, Muhammad Burhan, Cornelius Weber, Matthias Kerzel, and Stefan Wermter. "Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning." Paladyn, Journal of Behavioral Robotics 10, no. 1 (2019): 14–29. http://dx.doi.org/10.1515/pjbr-2019-0005.

Full text
Abstract:
Abstract In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Haifei, Jian Xu, Jian Zhang, and Quan Liu. "Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms." Computational Intelligence and Neuroscience 2022 (November 18, 2022): 1–10. http://dx.doi.org/10.1155/2022/1117781.

Full text
Abstract:
The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DDPG algorithm (DN-DDPG). First, on the basis of the original actor-critic network architecture of the algorithm, a critic network is added to assist the training, and the smallest Q value of the two critic networks is taken as the estimated value of the action in each update. Reduce the probability o
APA, Harvard, Vancouver, ISO, and other styles
10

Jain, Arushi, Gandharv Patil, Ayush Jain, Khimya Khetarpal, and Doina Precup. "Variance Penalized On-Policy and Off-Policy Actor-Critic." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (2021): 7899–907. http://dx.doi.org/10.1609/aaai.v35i9.16964.

Full text
Abstract:
Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this paper, we propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return. Previous work uses the second moment of return to estimate the variance indirectly. Instead, we use a much simpler recently proposed direct variance estimator which updates the estimates incrementally using tem
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Actor-critic algorithm"

1

Konda, Vijaymohan (Vijaymohan Gao) 1973. "Actor-critic algorithms." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/8120.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.<br>Includes bibliographical references (leaves 143-147).<br>Many complex decision making problems like scheduling in manufacturing systems, portfolio management in finance, admission control in communication networks etc., with clear and precise objectives, can be formulated as stochastic dynamic programming problems in which the objective of decision making is to maximize a single "overall" reward. In these formulations, finding an optimal decision policy involves computing a ce
APA, Harvard, Vancouver, ISO, and other styles
2

Saxena, Naman. "Average Reward Actor-Critic with Deterministic Policy Search." Thesis, 2023. https://etd.iisc.ac.in/handle/2005/6175.

Full text
Abstract:
The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first sho
APA, Harvard, Vancouver, ISO, and other styles
3

Diddigi, Raghuram Bharadwaj. "Reinforcement Learning Algorithms for Off-Policy, Multi-Agent Learning and Applications to Smart Grids." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/5673.

Full text
Abstract:
Reinforcement Learning (RL) algorithms are a popular class of algorithms for training an agent to learn desired behavior through interaction with an environment whose dynamics is unknown to the agent. RL algorithms combined with neural network architectures have enjoyed much success in various disciplines like games, medicine, energy management, economics and supply chain management. In our thesis, we study interesting extensions of standard single-agent RL settings, like off-policy and multi-agent settings. We discuss the motivations and importance of these settings and propose convergen
APA, Harvard, Vancouver, ISO, and other styles
4

Lakshmanan, K. "Online Learning and Simulation Based Algorithms for Stochastic Optimization." Thesis, 2012. http://etd.iisc.ac.in/handle/2005/3245.

Full text
Abstract:
In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we hav
APA, Harvard, Vancouver, ISO, and other styles
5

Lakshmanan, K. "Online Learning and Simulation Based Algorithms for Stochastic Optimization." Thesis, 2012. http://hdl.handle.net/2005/3245.

Full text
Abstract:
In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we hav
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Actor-critic algorithm"

1

Kim, Chayoung, Jung-min Park, and Hye-young Kim. "An Actor-Critic Algorithm for SVM Hyperparameters." In Information Science and Applications 2018. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-1056-0_64.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zha, ZhongYi, XueSong Tang, and Bo Wang. "An Advanced Actor-Critic Algorithm for Training Video Game AI." In Neural Computing for Advanced Applications. Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-7670-6_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Melo, Francisco S., and Manuel Lopes. "Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs." In Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-87481-2_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Sun, Qifeng, Hui Ren, Youxiang Duan, and Yanan Yan. "The Adaptive PID Controlling Algorithm Using Asynchronous Advantage Actor-Critic Learning Method." In Simulation Tools and Techniques. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32216-8_48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Guiliang, Xu Li, Miningming Sun, and Ping Li. "An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction." In Proceedings of the 2020 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2020. http://dx.doi.org/10.1137/1.9781611976236.25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cheng, Yuhu, Huanting Feng, and Xuesong Wang. "Actor-Critic Algorithm Based on Incremental Least-Squares Temporal Difference with Eligibility Trace." In Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-25944-9_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jiang, Haobo, Jianjun Qian, Jin Xie, and Jian Yang. "Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm." In Pattern Recognition and Computer Vision. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03398-9_48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chuyen, T. D., Dao Huy Du, N. D. Dien, R. V. Hoa, and N. V. Toan. "Building Intelligent Navigation System for Mobile Robots Based on the Actor – Critic Algorithm." In Advances in Engineering Research and Application. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-92574-1_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Huaqing, Hongbin Ma, and Ying Jin. "An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control." In Intelligent Robotics and Applications. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-13841-6_41.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Park, Jooyoung, Jongho Kim, and Daesung Kang. "An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm." In Computational Intelligence and Security. Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11596448_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Actor-critic algorithm"

1

Wang, Jing, and Ioannis Ch Paschalidis. "A Hessian actor-critic algorithm." In 2014 IEEE 53rd Annual Conference on Decision and Control (CDC). IEEE, 2014. http://dx.doi.org/10.1109/cdc.2014.7039533.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Yaputra, Jordi, and Suyanto Suyanto. "The Effect of Discounting Actor-loss in Actor-Critic Algorithm." In 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). IEEE, 2021. http://dx.doi.org/10.1109/isriti54043.2021.9702883.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Aleixo, Everton, Juan Colonna, and Raimundo Barreto. "SVC-A2C - Actor Critic Algorithm to Improve Smart Vacuum Cleaner." In IX Simpósio Brasileiro de Engenharia de Sistemas Computacionais. Sociedade Brasileira de Computação - SBC, 2019. http://dx.doi.org/10.5753/sbesc_estendido.2019.8637.

Full text
Abstract:
This work present a new approach to develop a vacuum cleaner. This use actor-critic algorithm. We execute tests with three other algoritms to compare. Even that, we develop a new simulator based on Gym to execute the tests.
APA, Harvard, Vancouver, ISO, and other styles
4

Prabuchandran K.J., Shalabh Bhatnagar, and Vivek S. Borkar. "An actor critic algorithm based on Grassmanian search." In 2014 IEEE 53rd Annual Conference on Decision and Control (CDC). IEEE, 2014. http://dx.doi.org/10.1109/cdc.2014.7039948.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yang, Zhuoran, Kaiqing Zhang, Mingyi Hong, and Tamer Basar. "A Finite Sample Analysis of the Actor-Critic Algorithm." In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018. http://dx.doi.org/10.1109/cdc.2018.8619440.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Vrushabh, D., Shalini K, and K. Sonam. "Actor-Critic Algorithm for Optimal Synchronization of Kuramoto Oscillator." In 2020 7th International Conference on Control, Decision and Information Technologies (CoDIT). IEEE, 2020. http://dx.doi.org/10.1109/codit49905.2020.9263785.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Paschalidis, Ioannis Ch, and Yingwei Lin. "Mobile agent coordination via a distributed actor-critic algorithm." In Automation (MED 2011). IEEE, 2011. http://dx.doi.org/10.1109/med.2011.5983038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Diddigi, Raghuram Bharadwaj, Prateek Jain, Prabuchandran K. J, and Shalabh Bhatnagar. "Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm." In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. http://dx.doi.org/10.1109/ijcnn55064.2022.9892303.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Bo, Yue Zhang, Shupo Fu, and Xuan Liu. "Reduce UAV Coverage Energy Consumption through Actor-Critic Algorithm." In 2019 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN). IEEE, 2019. http://dx.doi.org/10.1109/msn48538.2019.00069.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zhong, Shan, Quan Liu, Shengrong Gong, Qiming Fu, and Jin Xu. "Efficient actor-critic algorithm with dual piecewise model learning." In 2017 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2017. http://dx.doi.org/10.1109/ssci.2017.8280911.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!