Literatura académica sobre el tema "Soft Actor-Critic"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Soft Actor-Critic".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Soft Actor-Critic":

1

Hyeon, Soo-Jong, Tae-Young Kang y Chang-Kyung Ryoo. "A Path Planning for Unmanned Aerial Vehicles Using SAC (Soft Actor Critic) Algorithm". Journal of Institute of Control, Robotics and Systems 28, n.º 2 (28 de febrero de 2022): 138–45. http://dx.doi.org/10.5302/j.icros.2022.21.0220.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Ding, Feng, Guanfeng Ma, Zhikui Chen, Jing Gao y Peng Li. "Averaged Soft Actor-Critic for Deep Reinforcement Learning". Complexity 2021 (1 de abril de 2021): 1–16. http://dx.doi.org/10.1155/2021/6658724.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Resumen
With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.
3

Qin, Chenjie, Lijun Zhang, Dawei Yin, Dezhong Peng y Yongzhong Zhuang. "Some effective tricks are used to improve Soft Actor Critic". Journal of Physics: Conference Series 2010, n.º 1 (1 de septiembre de 2021): 012061. http://dx.doi.org/10.1088/1742-6596/2010/1/012061.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Yang, Qisong, Thiago D. Simão, Simon H. Tindemans y Matthijs T. J. Spaan. "WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 12 (18 de mayo de 2021): 10639–46. http://dx.doi.org/10.1609/aaai.v35i12.17272.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Resumen
Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.
5

Wong, Ching-Chang, Shao-Yu Chien, Hsuan-Ming Feng y Hisasuki Aoyama. "Motion Planning for Dual-Arm Robot Based on Soft Actor-Critic". IEEE Access 9 (2021): 26871–85. http://dx.doi.org/10.1109/access.2021.3056903.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Wu, Xiongwei, Xiuhua Li, Jun Li, P. C. Ching, Victor C. M. Leung y H. Vincent Poor. "Caching Transient Content for IoT Sensing: Multi-Agent Soft Actor-Critic". IEEE Transactions on Communications 69, n.º 9 (septiembre de 2021): 5886–901. http://dx.doi.org/10.1109/tcomm.2021.3086535.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Ali, Hamid, Hammad Majeed, Imran Usman y Khaled A. Almejalli. "Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network". Wireless Communications and Mobile Computing 2021 (10 de junio de 2021): 1–13. http://dx.doi.org/10.1155/2021/9920591.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Resumen
In reinforcement learning (RL), an agent learns an environment through hit and trail. This behavior allows the agent to learn in complex and difficult environments. In RL, the agent normally learns the given environment by exploring or exploiting. Most of the algorithms suffer from under exploration in the latter stage of the episodes. Recently, an off-policy algorithm called soft actor critic (SAC) is proposed that overcomes this problem by maximizing entropy as it learns the environment. In it, the agent tries to maximize entropy along with the expected discounted rewards. In SAC, the agent tries to be as random as possible while moving towards the maximum reward. This randomness allows the agent to explore the environment and stops it from getting stuck into local optima. We believe that maximizing the entropy causes the overestimation of entropy term which results in slow policy learning. This is because of the drastic change in action distribution whenever agent revisits the similar states. To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained. Both the policies try to maximize entropy by choosing actions against the minimum entropy to reduce the overestimation. The use of two policies result in better and faster convergence. We demonstrate our approach on different well known continuous control simulated environments. Results show that our proposed technique achieves better results against state of the art SAC algorithm and learns better policies.
8

Sola, Yoann, Gilles Le Chenadec y Benoit Clement. "Simultaneous Control and Guidance of an AUV Based on Soft Actor–Critic". Sensors 22, n.º 16 (14 de agosto de 2022): 6072. http://dx.doi.org/10.3390/s22166072.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Resumen
The marine environment is a hostile setting for robotics. It is strongly unstructured, uncertain, and includes many external disturbances that cannot be easily predicted or modeled. In this work, we attempt to control an autonomous underwater vehicle (AUV) to perform a waypoint tracking task, using a machine learning-based controller. There has been great progress in machine learning (in many different domains) in recent years; in the subfield of deep reinforcement learning, several algorithms suitable for the continuous control of dynamical systems have been designed. We implemented the soft actor–critic (SAC) algorithm, an entropy-regularized deep reinforcement learning algorithm that allows fulfilling a learning task and encourages the exploration of the environment simultaneously. We compared a SAC-based controller with a proportional integral derivative (PID) controller on a waypoint tracking task using specific performance metrics. All tests were simulated via the UUV simulator. We applied these two controllers to the RexROV 2, a six degrees of freedom cube-shaped remotely operated underwater Vehicle (ROV) converted in an AUV. We propose several interesting contributions as a result of these tests, such as making the SAC control and guiding the AUV simultaneously, outperforming the PID controller in terms of energy saving, and reducing the amount of information needed by the SAC algorithm inputs. Moreover, our implementation of this controller allows facilitating the transfer towards real-world robots. The code corresponding to this work is available on GitHub.
9

Yu, Xin, Yushan Sun, Xiangbin Wang y Guocheng Zhang. "End-to-End AUV Motion Planning Method Based on Soft Actor-Critic". Sensors 21, n.º 17 (1 de septiembre de 2021): 5893. http://dx.doi.org/10.3390/s21175893.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Resumen
This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor–critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.
10

Al Younes, Younes Al y Martin Barczyk. "Adaptive Nonlinear Model Predictive Horizon Using Deep Reinforcement Learning for Optimal Trajectory Planning". Drones 6, n.º 11 (27 de octubre de 2022): 323. http://dx.doi.org/10.3390/drones6110323.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Resumen
This paper presents an adaptive trajectory planning approach for nonlinear dynamical systems based on deep reinforcement learning (DRL). This methodology is applied to the authors’ recently published optimization-based trajectory planning approach named nonlinear model predictive horizon (NMPH). The resulting design, which we call `adaptive NMPH’, generates optimal trajectories for an autonomous vehicle based on the system’s states and its environment. This is done by tuning the NMPH’s parameters online using two different actor-critic DRL-based algorithms, deep deterministic policy gradient (DDPG) and soft actor-critic (SAC). Both adaptive NMPH variants are trained and evaluated on an aerial drone inside a high-fidelity simulation environment. The results demonstrate the learning curves, sample complexity, and stability of the DRL-based adaptation scheme and show the superior performance of adaptive NMPH relative to our earlier designs.

Tesis sobre el tema "Soft Actor-Critic":

1

Sola, Yoann. "Contributions to the development of deep reinforcement learning-based controllers for AUV". Thesis, Brest, École nationale supérieure de techniques avancées Bretagne, 2021. http://www.theses.fr/2021ENTA0015.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Resumen
L’environnement marin est un cadre très hostile pour la robotique. Il est fortement non-structuré, très incertain et inclut beaucoup de perturbations externes qui ne peuvent pas être facilement prédites ou modélisées. Dans ce travail, nous allons essayer de contrôler un véhicule sous-marin autonome (AUV) afin d’effectuer une tâche de suivi de points de cheminement, en utilisant un contrôleur basé sur de l’apprentissage automatique. L’apprentissage automatique a permis de faire des progrès impressionnants dans de nombreux domaines différents ces dernières années, et le sous-domaine de l’apprentissage profond par renforcement a réussi à concevoir plusieurs algorithmes très adaptés au contrôle continu de systèmes dynamiques. Nous avons choisi d’implémenter l’algorithme du Soft Actor-Critic (SAC), un algorithme d’apprentissage profond par renforcement régularisé en entropie permettant de simultanément remplir une tâche d’apprentissage et d’encourager l’exploration de l’environnement. Nous avons comparé un contrôleur basé sur le SAC avec un contrôleur Proportionnel-Intégral-Dérivé (PID) sur une tâche de suivi de points de cheminement et en utilisant des métriques de performance spécifiques. Tous ces tests ont été effectués en simulation grâce à l’utilisation de l’UUV Simulator. Nous avons décidé d’appliquer ces deux contrôleurs au RexROV 2, un véhicule sous-marin téléguidé (ROV) de forme cubique et à six degrés de liberté converti en AUV. Grâce à ces tests, nous avons réussi à proposer plusieurs contributions intéressantes telles que permettre au SAC d’accomplir un contrôle de l’AUV de bout en bout, surpasser le contrôleur PID en terme d’économie d’énergie, et réduire la quantité d’informations dont l’algorithme du SAC a besoin. De plus nous proposons une méthodologie pour l’entraînement d’algorithmes d’apprentissage profond par renforcement sur des tâches de contrôle, ainsi qu’une discussion sur l’absence d’algorithmes de guidage pour notre contrôleur d’AUV de bout en bout
The marine environment is a very hostile setting for robotics. It is strongly unstructured, very uncertain and includes a lot of external disturbances which cannot be easily predicted or modelled. In this work, we will try to control an autonomous underwater vehicle (AUV) in order to perform a waypoint tracking task, using a machine learning-based controller. Machine learning allowed to make impressive progress in a lot of different domain in the recent years, and the subfield of deep reinforcement learning managed to design several algorithms very suitable for the continuous control of dynamical systems. We chose to implement the Soft Actor-Critic (SAC) algorithm, an entropy-regularized deep reinforcement learning algorithm allowing to fulfill a learning task and to encourage the exploration of the environment simultaneously. We compared a SAC-based controller with a Proportional-Integral-Derivative (PID) controller on a waypoint tracking task and using specific performance metrics. All the tests were performed in simulation thanks to the use of the UUV Simulator. We decided to apply these two controllers to the RexROV 2, a six degrees of freedom cube-shaped remotely operated underwater vehicle (ROV) converted in an AUV. Thanks to these tests, we managed to propose several interesting contributions such as making the SAC achieve an end-to-end control of the AUV, outperforming the PID controller in terms of energy saving, and reducing the amount of information needed by the SAC algorithm. Moreover we propose a methodology for the training of deep reinforcement learning algorithms on control tasks, as well as a discussion about the absence of guidance algorithms for our end-to-end AUV controller

Capítulos de libros sobre el tema "Soft Actor-Critic":

1

Chen, Tao, Xingxing Ma, Shixun You y Xiaoli Zhang. "Soft Actor-Critic-Based Continuous Control Optimization for Moving Target Tracking". En Lecture Notes in Computer Science, 630–41. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-34110-7_53.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Huang, Shiyu, Bin Wang, Hang Su, Dong Li, Jianye Hao, Jun Zhu y Ting Chen. "Off-Policy Training for Truncated TD($$\lambda $$) Boosted Soft Actor-Critic". En PRICAI 2021: Trends in Artificial Intelligence, 46–59. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-89370-5_4.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Liu, Pengbo, Shuxin Ge, Xiaobo Zhou, Chaokun Zhang y Keqiu Li. "Soft Actor-Critic-Based DAG Tasks Offloading in Multi-access Edge Computing with Inter-user Cooperation". En Algorithms and Architectures for Parallel Processing, 313–27. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-95391-1_20.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Lin, Fangze, Wei Ning y Zhengrong Zou. "Fed-MT-ISAC: Federated Multi-task Inverse Soft Actor-Critic for Human-Like NPCs in the Metaverse Games". En Intelligent Computing Methodologies, 492–503. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-13832-4_41.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Soft Actor-Critic":

1

Pu, Yuan, Shaochen Wang, Xin Yao y Bin Li. "Latent Context Based Soft Actor-Critic". En 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9207008.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Fan, Ting-Han y Yubo Wang. "Soft Actor-Critic With Integer Actions". En 2022 American Control Conference (ACC). IEEE, 2022. http://dx.doi.org/10.23919/acc53348.2022.9867395.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Nishio, Daichi, Toi Tsuneda, Daiki Kuyoshi y Satoshi Yamane. "Discriminator Soft Actor Critic without Extrinsic Rewards". En 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE). IEEE, 2020. http://dx.doi.org/10.1109/gcce50665.2020.9292009.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Tan, Mingxi, Andong Tian y Ludovic Denoyer. "Regularized Soft Actor-Critic for Behavior Transfer Learning". En 2022 IEEE Conference on Games (CoG). IEEE, 2022. http://dx.doi.org/10.1109/cog51982.2022.9893655.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Savari, Maryam y Yoonsuck Choe. "Online Virtual Training in Soft Actor-Critic for Autonomous Driving". En 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021. http://dx.doi.org/10.1109/ijcnn52387.2021.9533791.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Rao, Ning, Hua Xu, Balin Song y Yunhao Shi. "Soft Actor-Critic Deep Reinforcement Learning Based Interference Resource Allocation". En ICCAI '21: 2021 7th International Conference on Computing and Artificial Intelligence. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3467707.3467766.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Choi, Jinyoung, Christopher Dance, Jung-Eun Kim, Seulbin Hwang y Kyung-Sik Park. "Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation". En 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021. http://dx.doi.org/10.1109/icra48506.2021.9560962.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Nematollahi, Iman, Erick Rosete-Beas, Adrian Rpfer, Tim Welschehold, Abhinav Valada y Wolfram Burgard. "Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models". En 2022 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2022. http://dx.doi.org/10.1109/icra46639.2022.9811770.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Li, Dingcheng, Xu Li, Jun Wang y Ping Li. "Video Recommendation with Multi-gate Mixture of Experts Soft Actor Critic". En SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3397271.3401238.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Ren, Yangang, Jingliang Duan, Shengbo Eben Li, Yang Guan y Qi Sun. "Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic". En 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020. http://dx.doi.org/10.1109/itsc45102.2020.9294300.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Pasar a la bibliografía