Добірка наукової літератури з теми "Soft Actor-Critic"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Soft Actor-Critic".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Soft Actor-Critic":

1

Hyeon, Soo-Jong, Tae-Young Kang, and Chang-Kyung Ryoo. "A Path Planning for Unmanned Aerial Vehicles Using SAC (Soft Actor Critic) Algorithm." Journal of Institute of Control, Robotics and Systems 28, no. 2 (February 28, 2022): 138–45. http://dx.doi.org/10.5302/j.icros.2022.21.0220.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Ding, Feng, Guanfeng Ma, Zhikui Chen, Jing Gao, and Peng Li. "Averaged Soft Actor-Critic for Deep Reinforcement Learning." Complexity 2021 (April 1, 2021): 1–16. http://dx.doi.org/10.1155/2021/6658724.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.
3

Qin, Chenjie, Lijun Zhang, Dawei Yin, Dezhong Peng, and Yongzhong Zhuang. "Some effective tricks are used to improve Soft Actor Critic." Journal of Physics: Conference Series 2010, no. 1 (September 1, 2021): 012061. http://dx.doi.org/10.1088/1742-6596/2010/1/012061.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Yang, Qisong, Thiago D. Simão, Simon H. Tindemans, and Matthijs T. J. Spaan. "WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10639–46. http://dx.doi.org/10.1609/aaai.v35i12.17272.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.
5

Wong, Ching-Chang, Shao-Yu Chien, Hsuan-Ming Feng, and Hisasuki Aoyama. "Motion Planning for Dual-Arm Robot Based on Soft Actor-Critic." IEEE Access 9 (2021): 26871–85. http://dx.doi.org/10.1109/access.2021.3056903.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Wu, Xiongwei, Xiuhua Li, Jun Li, P. C. Ching, Victor C. M. Leung, and H. Vincent Poor. "Caching Transient Content for IoT Sensing: Multi-Agent Soft Actor-Critic." IEEE Transactions on Communications 69, no. 9 (September 2021): 5886–901. http://dx.doi.org/10.1109/tcomm.2021.3086535.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Ali, Hamid, Hammad Majeed, Imran Usman, and Khaled A. Almejalli. "Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network." Wireless Communications and Mobile Computing 2021 (June 10, 2021): 1–13. http://dx.doi.org/10.1155/2021/9920591.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In reinforcement learning (RL), an agent learns an environment through hit and trail. This behavior allows the agent to learn in complex and difficult environments. In RL, the agent normally learns the given environment by exploring or exploiting. Most of the algorithms suffer from under exploration in the latter stage of the episodes. Recently, an off-policy algorithm called soft actor critic (SAC) is proposed that overcomes this problem by maximizing entropy as it learns the environment. In it, the agent tries to maximize entropy along with the expected discounted rewards. In SAC, the agent tries to be as random as possible while moving towards the maximum reward. This randomness allows the agent to explore the environment and stops it from getting stuck into local optima. We believe that maximizing the entropy causes the overestimation of entropy term which results in slow policy learning. This is because of the drastic change in action distribution whenever agent revisits the similar states. To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained. Both the policies try to maximize entropy by choosing actions against the minimum entropy to reduce the overestimation. The use of two policies result in better and faster convergence. We demonstrate our approach on different well known continuous control simulated environments. Results show that our proposed technique achieves better results against state of the art SAC algorithm and learns better policies.
8

Sola, Yoann, Gilles Le Chenadec, and Benoit Clement. "Simultaneous Control and Guidance of an AUV Based on Soft Actor–Critic." Sensors 22, no. 16 (August 14, 2022): 6072. http://dx.doi.org/10.3390/s22166072.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The marine environment is a hostile setting for robotics. It is strongly unstructured, uncertain, and includes many external disturbances that cannot be easily predicted or modeled. In this work, we attempt to control an autonomous underwater vehicle (AUV) to perform a waypoint tracking task, using a machine learning-based controller. There has been great progress in machine learning (in many different domains) in recent years; in the subfield of deep reinforcement learning, several algorithms suitable for the continuous control of dynamical systems have been designed. We implemented the soft actor–critic (SAC) algorithm, an entropy-regularized deep reinforcement learning algorithm that allows fulfilling a learning task and encourages the exploration of the environment simultaneously. We compared a SAC-based controller with a proportional integral derivative (PID) controller on a waypoint tracking task using specific performance metrics. All tests were simulated via the UUV simulator. We applied these two controllers to the RexROV 2, a six degrees of freedom cube-shaped remotely operated underwater Vehicle (ROV) converted in an AUV. We propose several interesting contributions as a result of these tests, such as making the SAC control and guiding the AUV simultaneously, outperforming the PID controller in terms of energy saving, and reducing the amount of information needed by the SAC algorithm inputs. Moreover, our implementation of this controller allows facilitating the transfer towards real-world robots. The code corresponding to this work is available on GitHub.
9

Yu, Xin, Yushan Sun, Xiangbin Wang, and Guocheng Zhang. "End-to-End AUV Motion Planning Method Based on Soft Actor-Critic." Sensors 21, no. 17 (September 1, 2021): 5893. http://dx.doi.org/10.3390/s21175893.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor–critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.
10

Al Younes, Younes Al, and Martin Barczyk. "Adaptive Nonlinear Model Predictive Horizon Using Deep Reinforcement Learning for Optimal Trajectory Planning." Drones 6, no. 11 (October 27, 2022): 323. http://dx.doi.org/10.3390/drones6110323.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This paper presents an adaptive trajectory planning approach for nonlinear dynamical systems based on deep reinforcement learning (DRL). This methodology is applied to the authors’ recently published optimization-based trajectory planning approach named nonlinear model predictive horizon (NMPH). The resulting design, which we call `adaptive NMPH’, generates optimal trajectories for an autonomous vehicle based on the system’s states and its environment. This is done by tuning the NMPH’s parameters online using two different actor-critic DRL-based algorithms, deep deterministic policy gradient (DDPG) and soft actor-critic (SAC). Both adaptive NMPH variants are trained and evaluated on an aerial drone inside a high-fidelity simulation environment. The results demonstrate the learning curves, sample complexity, and stability of the DRL-based adaptation scheme and show the superior performance of adaptive NMPH relative to our earlier designs.

Дисертації з теми "Soft Actor-Critic":

1

Sola, Yoann. "Contributions to the development of deep reinforcement learning-based controllers for AUV." Thesis, Brest, École nationale supérieure de techniques avancées Bretagne, 2021. http://www.theses.fr/2021ENTA0015.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
L’environnement marin est un cadre très hostile pour la robotique. Il est fortement non-structuré, très incertain et inclut beaucoup de perturbations externes qui ne peuvent pas être facilement prédites ou modélisées. Dans ce travail, nous allons essayer de contrôler un véhicule sous-marin autonome (AUV) afin d’effectuer une tâche de suivi de points de cheminement, en utilisant un contrôleur basé sur de l’apprentissage automatique. L’apprentissage automatique a permis de faire des progrès impressionnants dans de nombreux domaines différents ces dernières années, et le sous-domaine de l’apprentissage profond par renforcement a réussi à concevoir plusieurs algorithmes très adaptés au contrôle continu de systèmes dynamiques. Nous avons choisi d’implémenter l’algorithme du Soft Actor-Critic (SAC), un algorithme d’apprentissage profond par renforcement régularisé en entropie permettant de simultanément remplir une tâche d’apprentissage et d’encourager l’exploration de l’environnement. Nous avons comparé un contrôleur basé sur le SAC avec un contrôleur Proportionnel-Intégral-Dérivé (PID) sur une tâche de suivi de points de cheminement et en utilisant des métriques de performance spécifiques. Tous ces tests ont été effectués en simulation grâce à l’utilisation de l’UUV Simulator. Nous avons décidé d’appliquer ces deux contrôleurs au RexROV 2, un véhicule sous-marin téléguidé (ROV) de forme cubique et à six degrés de liberté converti en AUV. Grâce à ces tests, nous avons réussi à proposer plusieurs contributions intéressantes telles que permettre au SAC d’accomplir un contrôle de l’AUV de bout en bout, surpasser le contrôleur PID en terme d’économie d’énergie, et réduire la quantité d’informations dont l’algorithme du SAC a besoin. De plus nous proposons une méthodologie pour l’entraînement d’algorithmes d’apprentissage profond par renforcement sur des tâches de contrôle, ainsi qu’une discussion sur l’absence d’algorithmes de guidage pour notre contrôleur d’AUV de bout en bout
The marine environment is a very hostile setting for robotics. It is strongly unstructured, very uncertain and includes a lot of external disturbances which cannot be easily predicted or modelled. In this work, we will try to control an autonomous underwater vehicle (AUV) in order to perform a waypoint tracking task, using a machine learning-based controller. Machine learning allowed to make impressive progress in a lot of different domain in the recent years, and the subfield of deep reinforcement learning managed to design several algorithms very suitable for the continuous control of dynamical systems. We chose to implement the Soft Actor-Critic (SAC) algorithm, an entropy-regularized deep reinforcement learning algorithm allowing to fulfill a learning task and to encourage the exploration of the environment simultaneously. We compared a SAC-based controller with a Proportional-Integral-Derivative (PID) controller on a waypoint tracking task and using specific performance metrics. All the tests were performed in simulation thanks to the use of the UUV Simulator. We decided to apply these two controllers to the RexROV 2, a six degrees of freedom cube-shaped remotely operated underwater vehicle (ROV) converted in an AUV. Thanks to these tests, we managed to propose several interesting contributions such as making the SAC achieve an end-to-end control of the AUV, outperforming the PID controller in terms of energy saving, and reducing the amount of information needed by the SAC algorithm. Moreover we propose a methodology for the training of deep reinforcement learning algorithms on control tasks, as well as a discussion about the absence of guidance algorithms for our end-to-end AUV controller

Частини книг з теми "Soft Actor-Critic":

1

Chen, Tao, Xingxing Ma, Shixun You, and Xiaoli Zhang. "Soft Actor-Critic-Based Continuous Control Optimization for Moving Target Tracking." In Lecture Notes in Computer Science, 630–41. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-34110-7_53.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Huang, Shiyu, Bin Wang, Hang Su, Dong Li, Jianye Hao, Jun Zhu, and Ting Chen. "Off-Policy Training for Truncated TD($$\lambda $$) Boosted Soft Actor-Critic." In PRICAI 2021: Trends in Artificial Intelligence, 46–59. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-89370-5_4.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Liu, Pengbo, Shuxin Ge, Xiaobo Zhou, Chaokun Zhang, and Keqiu Li. "Soft Actor-Critic-Based DAG Tasks Offloading in Multi-access Edge Computing with Inter-user Cooperation." In Algorithms and Architectures for Parallel Processing, 313–27. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-95391-1_20.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Lin, Fangze, Wei Ning, and Zhengrong Zou. "Fed-MT-ISAC: Federated Multi-task Inverse Soft Actor-Critic for Human-Like NPCs in the Metaverse Games." In Intelligent Computing Methodologies, 492–503. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-13832-4_41.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Soft Actor-Critic":

1

Pu, Yuan, Shaochen Wang, Xin Yao, and Bin Li. "Latent Context Based Soft Actor-Critic." In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9207008.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Fan, Ting-Han, and Yubo Wang. "Soft Actor-Critic With Integer Actions." In 2022 American Control Conference (ACC). IEEE, 2022. http://dx.doi.org/10.23919/acc53348.2022.9867395.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Nishio, Daichi, Toi Tsuneda, Daiki Kuyoshi, and Satoshi Yamane. "Discriminator Soft Actor Critic without Extrinsic Rewards." In 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE). IEEE, 2020. http://dx.doi.org/10.1109/gcce50665.2020.9292009.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Tan, Mingxi, Andong Tian, and Ludovic Denoyer. "Regularized Soft Actor-Critic for Behavior Transfer Learning." In 2022 IEEE Conference on Games (CoG). IEEE, 2022. http://dx.doi.org/10.1109/cog51982.2022.9893655.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Savari, Maryam, and Yoonsuck Choe. "Online Virtual Training in Soft Actor-Critic for Autonomous Driving." In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021. http://dx.doi.org/10.1109/ijcnn52387.2021.9533791.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Rao, Ning, Hua Xu, Balin Song, and Yunhao Shi. "Soft Actor-Critic Deep Reinforcement Learning Based Interference Resource Allocation." In ICCAI '21: 2021 7th International Conference on Computing and Artificial Intelligence. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3467707.3467766.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Choi, Jinyoung, Christopher Dance, Jung-Eun Kim, Seulbin Hwang, and Kyung-Sik Park. "Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation." In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021. http://dx.doi.org/10.1109/icra48506.2021.9560962.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Nematollahi, Iman, Erick Rosete-Beas, Adrian Rpfer, Tim Welschehold, Abhinav Valada, and Wolfram Burgard. "Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models." In 2022 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2022. http://dx.doi.org/10.1109/icra46639.2022.9811770.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Li, Dingcheng, Xu Li, Jun Wang, and Ping Li. "Video Recommendation with Multi-gate Mixture of Experts Soft Actor Critic." In SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3397271.3401238.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Ren, Yangang, Jingliang Duan, Shengbo Eben Li, Yang Guan, and Qi Sun. "Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic." In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020. http://dx.doi.org/10.1109/itsc45102.2020.9294300.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

До бібліографії