Bibliographies thématiques / Soft Actor-Critic

Littérature scientifique sur le sujet « Soft Actor-Critic »

Auteur : Grafiati

Publié le 11 novembre 2022

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les listes thématiques d’articles de revues, de livres, de thèses, de rapports de conférences et d’autres sources académiques sur le sujet « Soft Actor-Critic ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Articles de revues sur le sujet "Soft Actor-Critic":

Hyeon, Soo-Jong, Tae-Young Kang et Chang-Kyung Ryoo. « A Path Planning for Unmanned Aerial Vehicles Using SAC (Soft Actor Critic) Algorithm ». Journal of Institute of Control, Robotics and Systems 28, n^o 2 (28 février 2022) : 138–45. http://dx.doi.org/10.5302/j.icros.2022.21.0220.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Ding, Feng, Guanfeng Ma, Zhikui Chen, Jing Gao et Peng Li. « Averaged Soft Actor-Critic for Deep Reinforcement Learning ». Complexity 2021 (1 avril 2021) : 1–16. http://dx.doi.org/10.1155/2021/6658724.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Résumé :

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.

Qin, Chenjie, Lijun Zhang, Dawei Yin, Dezhong Peng et Yongzhong Zhuang. « Some effective tricks are used to improve Soft Actor Critic ». Journal of Physics : Conference Series 2010, n^o 1 (1 septembre 2021) : 012061. http://dx.doi.org/10.1088/1742-6596/2010/1/012061.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Yang, Qisong, Thiago D. Simão, Simon H. Tindemans et Matthijs T. J. Spaan. « WCSAC : Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 12 (18 mai 2021) : 10639–46. http://dx.doi.org/10.1609/aaai.v35i12.17272.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Résumé :

Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.

Wong, Ching-Chang, Shao-Yu Chien, Hsuan-Ming Feng et Hisasuki Aoyama. « Motion Planning for Dual-Arm Robot Based on Soft Actor-Critic ». IEEE Access 9 (2021) : 26871–85. http://dx.doi.org/10.1109/access.2021.3056903.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Wu, Xiongwei, Xiuhua Li, Jun Li, P. C. Ching, Victor C. M. Leung et H. Vincent Poor. « Caching Transient Content for IoT Sensing : Multi-Agent Soft Actor-Critic ». IEEE Transactions on Communications 69, n^o 9 (septembre 2021) : 5886–901. http://dx.doi.org/10.1109/tcomm.2021.3086535.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Ali, Hamid, Hammad Majeed, Imran Usman et Khaled A. Almejalli. « Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network ». Wireless Communications and Mobile Computing 2021 (10 juin 2021) : 1–13. http://dx.doi.org/10.1155/2021/9920591.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Résumé :

In reinforcement learning (RL), an agent learns an environment through hit and trail. This behavior allows the agent to learn in complex and difficult environments. In RL, the agent normally learns the given environment by exploring or exploiting. Most of the algorithms suffer from under exploration in the latter stage of the episodes. Recently, an off-policy algorithm called soft actor critic (SAC) is proposed that overcomes this problem by maximizing entropy as it learns the environment. In it, the agent tries to maximize entropy along with the expected discounted rewards. In SAC, the agent tries to be as random as possible while moving towards the maximum reward. This randomness allows the agent to explore the environment and stops it from getting stuck into local optima. We believe that maximizing the entropy causes the overestimation of entropy term which results in slow policy learning. This is because of the drastic change in action distribution whenever agent revisits the similar states. To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained. Both the policies try to maximize entropy by choosing actions against the minimum entropy to reduce the overestimation. The use of two policies result in better and faster convergence. We demonstrate our approach on different well known continuous control simulated environments. Results show that our proposed technique achieves better results against state of the art SAC algorithm and learns better policies.

Sola, Yoann, Gilles Le Chenadec et Benoit Clement. « Simultaneous Control and Guidance of an AUV Based on Soft Actor–Critic ». Sensors 22, n^o 16 (14 août 2022) : 6072. http://dx.doi.org/10.3390/s22166072.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Résumé :

The marine environment is a hostile setting for robotics. It is strongly unstructured, uncertain, and includes many external disturbances that cannot be easily predicted or modeled. In this work, we attempt to control an autonomous underwater vehicle (AUV) to perform a waypoint tracking task, using a machine learning-based controller. There has been great progress in machine learning (in many different domains) in recent years; in the subfield of deep reinforcement learning, several algorithms suitable for the continuous control of dynamical systems have been designed. We implemented the soft actor–critic (SAC) algorithm, an entropy-regularized deep reinforcement learning algorithm that allows fulfilling a learning task and encourages the exploration of the environment simultaneously. We compared a SAC-based controller with a proportional integral derivative (PID) controller on a waypoint tracking task using specific performance metrics. All tests were simulated via the UUV simulator. We applied these two controllers to the RexROV 2, a six degrees of freedom cube-shaped remotely operated underwater Vehicle (ROV) converted in an AUV. We propose several interesting contributions as a result of these tests, such as making the SAC control and guiding the AUV simultaneously, outperforming the PID controller in terms of energy saving, and reducing the amount of information needed by the SAC algorithm inputs. Moreover, our implementation of this controller allows facilitating the transfer towards real-world robots. The code corresponding to this work is available on GitHub.

Yu, Xin, Yushan Sun, Xiangbin Wang et Guocheng Zhang. « End-to-End AUV Motion Planning Method Based on Soft Actor-Critic ». Sensors 21, n^o 17 (1 septembre 2021) : 5893. http://dx.doi.org/10.3390/s21175893.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Résumé :

This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor–critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.

Al Younes, Younes Al, et Martin Barczyk. « Adaptive Nonlinear Model Predictive Horizon Using Deep Reinforcement Learning for Optimal Trajectory Planning ». Drones 6, n^o 11 (27 octobre 2022) : 323. http://dx.doi.org/10.3390/drones6110323.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Résumé :

This paper presents an adaptive trajectory planning approach for nonlinear dynamical systems based on deep reinforcement learning (DRL). This methodology is applied to the authors’ recently published optimization-based trajectory planning approach named nonlinear model predictive horizon (NMPH). The resulting design, which we call `adaptive NMPH’, generates optimal trajectories for an autonomous vehicle based on the system’s states and its environment. This is done by tuning the NMPH’s parameters online using two different actor-critic DRL-based algorithms, deep deterministic policy gradient (DDPG) and soft actor-critic (SAC). Both adaptive NMPH variants are trained and evaluated on an aerial drone inside a high-fidelity simulation environment. The results demonstrate the learning curves, sample complexity, and stability of the DRL-based adaptation scheme and show the superior performance of adaptive NMPH relative to our earlier designs.

Plus de sources

Thèses sur le sujet "Soft Actor-Critic":

Sola, Yoann. « Contributions to the development of deep reinforcement learning-based controllers for AUV ». Thesis, Brest, École nationale supérieure de techniques avancées Bretagne, 2021. http://www.theses.fr/2021ENTA0015.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Résumé :

L’environnement marin est un cadre très hostile pour la robotique. Il est fortement non-structuré, très incertain et inclut beaucoup de perturbations externes qui ne peuvent pas être facilement prédites ou modélisées. Dans ce travail, nous allons essayer de contrôler un véhicule sous-marin autonome (AUV) afin d’effectuer une tâche de suivi de points de cheminement, en utilisant un contrôleur basé sur de l’apprentissage automatique. L’apprentissage automatique a permis de faire des progrès impressionnants dans de nombreux domaines différents ces dernières années, et le sous-domaine de l’apprentissage profond par renforcement a réussi à concevoir plusieurs algorithmes très adaptés au contrôle continu de systèmes dynamiques. Nous avons choisi d’implémenter l’algorithme du Soft Actor-Critic (SAC), un algorithme d’apprentissage profond par renforcement régularisé en entropie permettant de simultanément remplir une tâche d’apprentissage et d’encourager l’exploration de l’environnement. Nous avons comparé un contrôleur basé sur le SAC avec un contrôleur Proportionnel-Intégral-Dérivé (PID) sur une tâche de suivi de points de cheminement et en utilisant des métriques de performance spécifiques. Tous ces tests ont été effectués en simulation grâce à l’utilisation de l’UUV Simulator. Nous avons décidé d’appliquer ces deux contrôleurs au RexROV 2, un véhicule sous-marin téléguidé (ROV) de forme cubique et à six degrés de liberté converti en AUV. Grâce à ces tests, nous avons réussi à proposer plusieurs contributions intéressantes telles que permettre au SAC d’accomplir un contrôle de l’AUV de bout en bout, surpasser le contrôleur PID en terme d’économie d’énergie, et réduire la quantité d’informations dont l’algorithme du SAC a besoin. De plus nous proposons une méthodologie pour l’entraînement d’algorithmes d’apprentissage profond par renforcement sur des tâches de contrôle, ainsi qu’une discussion sur l’absence d’algorithmes de guidage pour notre contrôleur d’AUV de bout en bout
The marine environment is a very hostile setting for robotics. It is strongly unstructured, very uncertain and includes a lot of external disturbances which cannot be easily predicted or modelled. In this work, we will try to control an autonomous underwater vehicle (AUV) in order to perform a waypoint tracking task, using a machine learning-based controller. Machine learning allowed to make impressive progress in a lot of different domain in the recent years, and the subfield of deep reinforcement learning managed to design several algorithms very suitable for the continuous control of dynamical systems. We chose to implement the Soft Actor-Critic (SAC) algorithm, an entropy-regularized deep reinforcement learning algorithm allowing to fulfill a learning task and to encourage the exploration of the environment simultaneously. We compared a SAC-based controller with a Proportional-Integral-Derivative (PID) controller on a waypoint tracking task and using specific performance metrics. All the tests were performed in simulation thanks to the use of the UUV Simulator. We decided to apply these two controllers to the RexROV 2, a six degrees of freedom cube-shaped remotely operated underwater vehicle (ROV) converted in an AUV. Thanks to these tests, we managed to propose several interesting contributions such as making the SAC achieve an end-to-end control of the AUV, outperforming the PID controller in terms of energy saving, and reducing the amount of information needed by the SAC algorithm. Moreover we propose a methodology for the training of deep reinforcement learning algorithms on control tasks, as well as a discussion about the absence of guidance algorithms for our end-to-end AUV controller

Chapitres de livres sur le sujet "Soft Actor-Critic":

Chen, Tao, Xingxing Ma, Shixun You et Xiaoli Zhang. « Soft Actor-Critic-Based Continuous Control Optimization for Moving Target Tracking ». Dans Lecture Notes in Computer Science, 630–41. Cham : Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-34110-7_53.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Huang, Shiyu, Bin Wang, Hang Su, Dong Li, Jianye Hao, Jun Zhu et Ting Chen. « Off-Policy Training for Truncated TD($$\lambda $$) Boosted Soft Actor-Critic ». Dans PRICAI 2021 : Trends in Artificial Intelligence, 46–59. Cham : Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-89370-5_4.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Liu, Pengbo, Shuxin Ge, Xiaobo Zhou, Chaokun Zhang et Keqiu Li. « Soft Actor-Critic-Based DAG Tasks Offloading in Multi-access Edge Computing with Inter-user Cooperation ». Dans Algorithms and Architectures for Parallel Processing, 313–27. Cham : Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-95391-1_20.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Lin, Fangze, Wei Ning et Zhengrong Zou. « Fed-MT-ISAC : Federated Multi-task Inverse Soft Actor-Critic for Human-Like NPCs in the Metaverse Games ». Dans Intelligent Computing Methodologies, 492–503. Cham : Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-13832-4_41.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Actes de conférences sur le sujet "Soft Actor-Critic":

Pu, Yuan, Shaochen Wang, Xin Yao et Bin Li. « Latent Context Based Soft Actor-Critic ». Dans 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9207008.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Fan, Ting-Han, et Yubo Wang. « Soft Actor-Critic With Integer Actions ». Dans 2022 American Control Conference (ACC). IEEE, 2022. http://dx.doi.org/10.23919/acc53348.2022.9867395.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Nishio, Daichi, Toi Tsuneda, Daiki Kuyoshi et Satoshi Yamane. « Discriminator Soft Actor Critic without Extrinsic Rewards ». Dans 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE). IEEE, 2020. http://dx.doi.org/10.1109/gcce50665.2020.9292009.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Tan, Mingxi, Andong Tian et Ludovic Denoyer. « Regularized Soft Actor-Critic for Behavior Transfer Learning ». Dans 2022 IEEE Conference on Games (CoG). IEEE, 2022. http://dx.doi.org/10.1109/cog51982.2022.9893655.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Savari, Maryam, et Yoonsuck Choe. « Online Virtual Training in Soft Actor-Critic for Autonomous Driving ». Dans 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021. http://dx.doi.org/10.1109/ijcnn52387.2021.9533791.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Rao, Ning, Hua Xu, Balin Song et Yunhao Shi. « Soft Actor-Critic Deep Reinforcement Learning Based Interference Resource Allocation ». Dans ICCAI '21 : 2021 7th International Conference on Computing and Artificial Intelligence. New York, NY, USA : ACM, 2021. http://dx.doi.org/10.1145/3467707.3467766.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Choi, Jinyoung, Christopher Dance, Jung-Eun Kim, Seulbin Hwang et Kyung-Sik Park. « Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation ». Dans 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021. http://dx.doi.org/10.1109/icra48506.2021.9560962.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Nematollahi, Iman, Erick Rosete-Beas, Adrian Rpfer, Tim Welschehold, Abhinav Valada et Wolfram Burgard. « Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models ». Dans 2022 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2022. http://dx.doi.org/10.1109/icra46639.2022.9811770.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Li, Dingcheng, Xu Li, Jun Wang et Ping Li. « Video Recommendation with Multi-gate Mixture of Experts Soft Actor Critic ». Dans SIGIR '20 : The 43rd International ACM SIGIR conference on research and development in Information Retrieval. New York, NY, USA : ACM, 2020. http://dx.doi.org/10.1145/3397271.3401238.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Ren, Yangang, Jingliang Duan, Shengbo Eben Li, Yang Guan et Qi Sun. « Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic ». Dans 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020. http://dx.doi.org/10.1109/itsc45102.2020.9294300.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.