To see the other types of publications on this topic, follow the link: Q-learning.

Journal articles on the topic 'Q-learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Q-learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Watkins, Christopher J. C. H., and Peter Dayan. "Q-learning." Machine Learning 8, no. 3-4 (1992): 279–92. http://dx.doi.org/10.1007/bf00992698.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Clausen, C., and H. Wechsler. "Quad-Q-learning." IEEE Transactions on Neural Networks 11, no. 2 (2000): 279–94. http://dx.doi.org/10.1109/72.839000.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

ten Hagen, Stephan, and Ben Kr�se. "Neural Q-learning." Neural Computing & Applications 12, no. 2 (2003): 81–88. http://dx.doi.org/10.1007/s00521-003-0369-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Yin-Hao, Tzuu-Hseng S. Li, and Chih-Jui Lin. "Backward Q-learning: The combination of Sarsa algorithm and Q-learning." Engineering Applications of Artificial Intelligence 26, no. 9 (2013): 2184–93. http://dx.doi.org/10.1016/j.engappai.2013.06.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Evseenko, Alla, and Dmitrii Romannikov. "Application of Deep Q-learning and double Deep Q-learning algorithms to the task of control an inverted pendulum." Transaction of Scientific Papers of the Novosibirsk State Technical University, no. 1-2 (August 26, 2020): 7–25. http://dx.doi.org/10.17212/2307-6879-2020-1-2-7-25.

Full text
Abstract:
Today, such a branch of science as «artificial intelligence» is booming in the world. Systems built on the basis of artificial intelligence methods have the ability to perform functions that are traditionally considered the prerogative of man. Artificial intelligence has a wide range of research areas. One such area is machine learning. This article discusses the algorithms of one of the approaches of machine learning – reinforcement learning (RL), according to which a lot of research and development has been carried out over the past seven years. Development and research on this approach is m
APA, Harvard, Vancouver, ISO, and other styles
6

Abedalguni, Bilal. "Bat Q-learning Algorithm." Jordanian Journal of Computers and Information Technology 3, no. 1 (2017): 51. http://dx.doi.org/10.5455/jjcit.71-1480540385.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhu, Rong, and Mattia Rigotti. "Self-correcting Q-learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (2021): 11185–92. http://dx.doi.org/10.1609/aaai.v35i12.17334.

Full text
Abstract:
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method
APA, Harvard, Vancouver, ISO, and other styles
8

Borkar, Vivek S., and Siddharth Chandak. "Prospect-theoretic Q-learning." Systems & Control Letters 156 (October 2021): 105009. http://dx.doi.org/10.1016/j.sysconle.2021.105009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ganger, Michael, and Wei Hu. "Quantum Multiple Q-Learning." International Journal of Intelligence Science 09, no. 01 (2019): 1–22. http://dx.doi.org/10.4236/ijis.2019.91001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

John, Indu, Chandramouli Kamanchi, and Shalabh Bhatnagar. "Generalized Speedy Q-Learning." IEEE Control Systems Letters 4, no. 3 (2020): 524–29. http://dx.doi.org/10.1109/lcsys.2020.2970555.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

HORIUCHI, Tadashi, Akinori FUJINO, Osamu KATAI, and Tetsuo SAWARAGI. "Q-PSP Learning: An Exploitation-Oriented Q-Learning Algorithm and Its Applications." Transactions of the Society of Instrument and Control Engineers 35, no. 5 (1999): 645–53. http://dx.doi.org/10.9746/sicetr1965.35.645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Ghazanfari, Behzad, and Nasser Mozayani. "Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks." Journal of Intelligent & Fuzzy Systems 26, no. 6 (2014): 2771–83. http://dx.doi.org/10.3233/ifs-130945.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Kim, Min-Soeng, Sun-Gi Hong, and Ju-Jang Lee. "Self-Learning Fuzzy Logic Controller using Q-Learning." Journal of Advanced Computational Intelligence and Intelligent Informatics 4, no. 5 (2000): 349–54. http://dx.doi.org/10.20965/jaciii.2000.p0349.

Full text
Abstract:
Fuzzy logic controllers consist of if-then fuzzy rules generally adopted from a priori expert knowledge. However, it is not always easy or cheap to obtain expert knowledge. Q-learning can be used to acquire knowledge from experiences even without the model of the environment. The conventional Q-learning algorithm cannot deal with continuous states and continuous actions. However, the fuzzy logic controller can inherently receive continuous input values and generate continuous output values. Thus, in this paper, the Q-learning algorithm is incorporated into the fuzzy logic controller to compens
APA, Harvard, Vancouver, ISO, and other styles
14

Yang, Min-Gyu, Kuk-Hyun Ahn, and Jae-Bok Song. "Tidy-up Task Planner based on Q-learning." Journal of Korea Robotics Society 16, no. 1 (2021): 56–63. http://dx.doi.org/10.7746/jkros.2021.16.1.056.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Moodie, Erica E. M., Nema Dean, and Yue Ru Sun. "Q-Learning: Flexible Learning About Useful Utilities." Statistics in Biosciences 6, no. 2 (2013): 223–43. http://dx.doi.org/10.1007/s12561-013-9103-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Hatcho, Yasuyo, Kiyohiko Hattori, and Keiki Takadama. "Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents." Journal of Advanced Computational Intelligence and Intelligent Informatics 13, no. 6 (2009): 667–74. http://dx.doi.org/10.20965/jaciii.2009.p0667.

Full text
Abstract:
This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction gam
APA, Harvard, Vancouver, ISO, and other styles
17

Clifton, Jesse, and Eric Laber. "Q-Learning: Theory and Applications." Annual Review of Statistics and Its Application 7, no. 1 (2020): 279–301. http://dx.doi.org/10.1146/annurev-statistics-031219-041220.

Full text
Abstract:
Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in an infinite-horizon decision problem, now refers to a general class of reinforcement learning methods widely used in statistics and artificial intelligence. In the context of personalized medicine, finite-horizon Q-learning is the workhorse for estimating optimal treatment strategies, known as treatment regimes. Infinite-horizon Q-learning is also increasingly relevant in the growing field of mobile health. In computer science, Q-learning methods have achieved remarkable performance in domains such a
APA, Harvard, Vancouver, ISO, and other styles
18

He, Ningxia. "Image Sampling Using Q-Learning." International Journal of Computer Science and Engineering 8, no. 1 (2021): 5–12. http://dx.doi.org/10.14445/23488387/ijcse-v8i1p102.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Ganapathi Subramanian, Sriram, Matthew E. Taylor, Kate Larson, and Mark Crowley. "Multi-Agent Advisor Q-Learning." Journal of Artificial Intelligence Research 74 (May 5, 2022): 1–74. http://dx.doi.org/10.1613/jair.1.13445.

Full text
Abstract:
In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled fram
APA, Harvard, Vancouver, ISO, and other styles
20

Hu, Yuepeng, Lehan Yang, and Yizhu Lou. "Path Planning with Q-Learning." Journal of Physics: Conference Series 1948, no. 1 (2021): 012038. http://dx.doi.org/10.1088/1742-6596/1948/1/012038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Sarigül, Mehmet, and Mutlu Avci. "Q LEARNING REGRESSION NEURAL NETWORK." Neural Network World 28, no. 5 (2018): 415–31. http://dx.doi.org/10.14311/nnw.2018.28.023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Kamanchi, Chandramouli, Raghuram Bharadwaj Diddigi, and Shalabh Bhatnagar. "Successive Over-Relaxation ${Q}$ -Learning." IEEE Control Systems Letters 4, no. 1 (2020): 55–60. http://dx.doi.org/10.1109/lcsys.2019.2921158.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Patnaik, Srikanta, and N. P. Mahalik. "Multiagent coordination utilising Q-learning." International Journal of Automation and Control 1, no. 4 (2007): 377. http://dx.doi.org/10.1504/ijaac.2007.015863.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Lecué, Guillaume, and Philippe Rigollet. "Optimal learning with Q-aggregation." Annals of Statistics 42, no. 1 (2014): 211–24. http://dx.doi.org/10.1214/13-aos1190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Ahmadabadi, M. N., and M. Asadpour. "Expertness based cooperative Q-learning." IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 32, no. 1 (2002): 66–76. http://dx.doi.org/10.1109/3477.979961.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Linn, Kristin A., Eric B. Laber, and Leonard A. Stefanski. "Interactive Q-Learning for Quantiles." Journal of the American Statistical Association 112, no. 518 (2017): 638–49. http://dx.doi.org/10.1080/01621459.2016.1155993.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Goldberg, Yair, and Michael R. Kosorok. "Q-learning with censored data." Annals of Statistics 40, no. 1 (2012): 529–60. http://dx.doi.org/10.1214/12-aos968.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Peng, Jing, and Ronald J. Williams. "Incremental multi-step Q-learning." Machine Learning 22, no. 1-3 (1996): 283–90. http://dx.doi.org/10.1007/bf00114731.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

El Wafi, Mouna, My Abdelkader Youssefi, Rachid Dakir, and Mohamed Bakir. "Intelligent Robot in Unknown Environments: Walk Path Using Q-Learning and Deep Q-Learning." Automation 6, no. 1 (2025): 12. https://doi.org/10.3390/automation6010012.

Full text
Abstract:
Autonomous navigation is essential for mobile robots to efficiently operate in complex environments. This study investigates Q-learning and Deep Q-learning to improve navigation performance. The research examines their effectiveness in complex maze configurations, focusing on how the epsilon-greedy strategy influences the agent’s ability to reach its goal in minimal time using Q-learning. A distinctive aspect of this work is the adaptive tuning of hyperparameters, where alpha and gamma values are dynamically adjusted throughout training. This eliminates the need for manually fixed parameters a
APA, Harvard, Vancouver, ISO, and other styles
30

Mustafa, Hasan Kathim, Azma Zakaria Nurul, Abidin Z.Zainal, Kamil Maseer Ziadoon, and Hasan Alzamili Ali. "Online Sequential Extreme Learning Machine (OSELM) based Q-learning(OSELM-QL)." Seybold Report V16, no. 11 (2021): 1–14. https://doi.org/10.5281/zenodo.6553518.

Full text
Abstract:
ABSTRACT The usage of reinforcement learning (RL) for many type of applications is increasing. The quick development of machine learning models in the recent years has motivated researchers to integrate Q-learning with deep learning which has opened the door for many vision based applications of RL. However, using RL with shallow types of neural network has not been tackled adequately in the literature despite its need for real time types of applications such as control systems or time constraint decision based system. In this article, we propose a novel online sequential extreme learning mach
APA, Harvard, Vancouver, ISO, and other styles
31

HOSOYA, Yu, and Motohide UMANO. "Improvement of Updating Method of Q Values in Fuzzy Q-Learning." Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 27, no. 6 (2015): 942–48. http://dx.doi.org/10.3156/jsoft.27.942.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Duryea, Ethan, Michael Ganger, and Wei Hu. "Exploring Deep Reinforcement Learning with Multi Q-Learning." Intelligent Control and Automation 07, no. 04 (2016): 129–44. http://dx.doi.org/10.4236/ica.2016.74012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Hwang, Kao-Shing, Wei-Cheng Jiang, and Yu-Jen Chen. "ADAPTIVE MODEL LEARNING BASED ON DYNA-Q LEARNING." Cybernetics and Systems 44, no. 8 (2013): 641–62. http://dx.doi.org/10.1080/01969722.2013.803387.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

BOCHOK, Viacheslav, and Nataliia FEDOROVA. "CENTRALIZED LEARNING FOR THE DEEP Q-LEARNING MODELS." Information Technology and Society, no. 2 (13) (2024): 6–11. http://dx.doi.org/10.32689/maup.it.2024.2.1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

da Costa, Luis Antonio L. F., Rafael Kunst, and Edison Pignaton de Freitas. "Q-FANET: Improved Q-learning based routing protocol for FANETs." Computer Networks 198 (October 2021): 108379. http://dx.doi.org/10.1016/j.comnet.2021.108379.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Meng, Xiao-Li. "Discussion: The Q-q Dynamic for Deeper Learning and Research." International Statistical Review 84, no. 2 (2015): 181–89. http://dx.doi.org/10.1111/insr.12151.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Guo, Yanqin. "Enhancing Flappy Bird Performance With Q-Learning and DQN Strategies." Highlights in Science, Engineering and Technology 85 (March 13, 2024): 396–402. http://dx.doi.org/10.54097/qrded191.

Full text
Abstract:
Flappy Bird, a classic single-player game, boasts a deceptively simple premise yet proves to be a formidable challenge in achieving high scores. Various algorithms have been employed to improve its performance, yet a comprehensive assessment of Q-Learning and Deep Q-Network (DQN) in the context of this game remains elusive. This study undertakes the task of training Flappy Bird using both Q-Learning and DQN methodologies, showcasing the potency of reinforcement learning within the realm of gaming. Through meticulous comparisons and analyses, the paper uncovers the inherent strengths and weakne
APA, Harvard, Vancouver, ISO, and other styles
38

D'Orazio, Tiziana, and Grazia Cicirelli. "Q-Learning: computation of optimal Q-values for evaluating the learning level in robotic tasks." Journal of Experimental & Theoretical Artificial Intelligence 13, no. 3 (2001): 241–70. http://dx.doi.org/10.1080/09528130110063100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Chen, Bo-Wei, Shih-Hung Yang, Yu-Chun Lo, et al. "Enhancement of Hippocampal Spatial Decoding Using a Dynamic Q-Learning Method With a Relative Reward Using Theta Phase Precession." International Journal of Neural Systems 30, no. 09 (2020): 2050048. http://dx.doi.org/10.1142/s0129065720500483.

Full text
Abstract:
Hippocampal place cells and interneurons in mammals have stable place fields and theta phase precession profiles that encode spatial environmental information. Hippocampal CA1 neurons can represent the animal’s location and prospective information about the goal location. Reinforcement learning (RL) algorithms such as Q-learning have been used to build the navigation models. However, the traditional Q-learning ([Formula: see text]Q-learning) limits the reward function once the animals arrive at the goal location, leading to unsatisfactory location accuracy and convergence rates. Therefore, we
APA, Harvard, Vancouver, ISO, and other styles
40

Liu, Peiyi. "Q-Learning: Applications and Convergence Rate Optimization." Highlights in Science, Engineering and Technology 63 (August 8, 2023): 210–15. http://dx.doi.org/10.54097/hset.v63i.10878.

Full text
Abstract:
As an important algorithm of artificial intelligence technology, Q-learning algorithm plays a significant part in a number of fields, such as driverless technology, industrial automation, health care, intelligent search, game, etc. As a classical learning algorithm in reinforcement learning, with the help of an experienced action sequence in a Markov environment, an agent can learn to select the best course of action using the model-free learning technique known as Q-learning. This paper mainly discusses the addition of received signal strength (RSS) to the Q-learning algorithm to navigate unm
APA, Harvard, Vancouver, ISO, and other styles
41

Raza, Ali, Asfand Ali, Alaptageen Qayyum, Ghulam Shabir, Zahid Hussain, and Ghulam Murtaza. "HYPERPARAMETER IMPACT ON LEARNING EFFICIENCY IN Q-LEARNING AND DQN USING OPENAI GYMNASIUM ENVIRONMENTS." International Journal of Advanced Research 13, no. 05 (2025): 1164–76. https://doi.org/10.21474/ijar01/21007.

Full text
Abstract:
This study compares the Q-learning and DQN methodologies within the CartPole-v1 environment. All methods were executed, trained, and evaluated according to test outcomes and training improvements. The research paper includes statistical summaries, and visualizations of performance and learning. This paper examines the impact of hyperparameters on the learning efficiency of Q-Learning and Deep Q-Network (DQN) in the CartPole-v1 environment of OpenAI Gymnasium. The results indicate that DQN substantially outperforms Q-Learning. DQN achieved a peak training reward of 500, whereas Q-Learning maint
APA, Harvard, Vancouver, ISO, and other styles
42

古, 彭. "Improvement and Implementation of Q-Learning Algorithm." Computer Science and Application 11, no. 07 (2021): 1994–2007. http://dx.doi.org/10.12677/csa.2021.117204.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Xu, Shenghua, Yang Gu, Xiaoyan Li, et al. "Indoor Emergency Path Planning Based on the Q-Learning Optimization Algorithm." ISPRS International Journal of Geo-Information 11, no. 1 (2022): 66. http://dx.doi.org/10.3390/ijgi11010066.

Full text
Abstract:
The internal structure of buildings is becoming increasingly complex. Providing a scientific and reasonable evacuation route for trapped persons in a complex indoor environment is important for reducing casualties and property losses. In emergency and disaster relief environments, indoor path planning has great uncertainty and higher safety requirements. Q-learning is a value-based reinforcement learning algorithm that can complete path planning tasks through autonomous learning without establishing mathematical models and environmental maps. Therefore, we propose an indoor emergency path plan
APA, Harvard, Vancouver, ISO, and other styles
44

Zhang, Chunyuan, Qi Song, and Zeng Meng. "Minibatch Recursive Least Squares Q-Learning." Computational Intelligence and Neuroscience 2021 (October 8, 2021): 1–9. http://dx.doi.org/10.1155/2021/5370281.

Full text
Abstract:
The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximat
APA, Harvard, Vancouver, ISO, and other styles
45

Takashi, Sato and Fumiko Shirasaki NIT Okinawa College Japan. "A Comparative Study on the Performances of Q-Learning and Neural Q-Learning Agents toward Analysis of Emergence of Communication." Journal of Information and Communication Engineering(JICE) Volume 3, Issue 5 (2020): 128–35. https://doi.org/10.5281/zenodo.4309746.

Full text
Abstract:
Abstract: In this paper, we suppose the gesture theory that is one theory on the origin of language, which tries to establish that speech originated from gestures. Based on the theory, we assume that “actions” having some purposes can be used as “symbols” in the communication through a learning process. The purpose of this study is to clarify what abilities of agents and what conditions are necessary to acquire usages of the actions as the symbols. To investigate them, we adopt a collision avoidance game and compare the performances of Q-learning agents with that of Neu
APA, Harvard, Vancouver, ISO, and other styles
46

Wei-Kai Sun, Wei-Kai Sun, Xiao-Mei Wang Wei-Kai Sun, Bin Wang Xiao-Mei Wang, Jia-Sen Zhang Bin Wang, and Hai-Yang Du Jia-Sen Zhang. "MR-SFAMA-Q: A MAC Protocol based on Q-Learning for Underwater Acoustic Sensor Networks." 電腦學刊 35, no. 1 (2024): 051–63. http://dx.doi.org/10.53106/199115992024023501004.

Full text
Abstract:
<p>In recent years, with the rapid development of science and technology, many new technologies have made people’s exploration of the ocean deeper and deeper, and due to the requirements of national defense and marine development, the underwater acoustic sensor network (UASN) has been paid more and more attention. Nevertheless, the underwater acoustic channel has the properties of considerable propagation delay, limited bandwidth, and unstable network topology. In order to improve the performance of the medium access control (MAC) protocol in UASN, we propose a new MAC protocol
APA, Harvard, Vancouver, ISO, and other styles
47

Md Nurul Raihen and Jason Tran. "Optimizing reinforcement learning in complex environments using neural networks." International Journal of Science and Research Archive 12, no. 2 (2024): 2047–62. http://dx.doi.org/10.30574/ijsra.2024.12.2.1471.

Full text
Abstract:
This paper presents the distinct mechanisms and applications of traditional Q-learning (QL) and Deep Q-learning (DQL) within the realm of reinforcement learning (RL). Traditional Q-learning (QL) utilizes the Bellman equation to update Q-values stored in a Q-table, making it suitable for simple environments. However, its scalability is limited due to the exponential growth of state-action pairs in complex environments. Deep Q-learning (DQL) addresses this limitation by using neural networks to approximate Q-values, thus eliminating the need for a Q-table, and enabling efficient handling of comp
APA, Harvard, Vancouver, ISO, and other styles
48

Hao, Qixuan. "The Achievement of Dynamic Obstacle Avoidance Based on Improved Q-Learning Algorithm." Highlights in Science, Engineering and Technology 63 (August 8, 2023): 252–58. http://dx.doi.org/10.54097/hset.v63i.10883.

Full text
Abstract:
Dynamic obstacle avoidance is a classic problem in robot control, which involves the ability of a robot to avoid obstacles in the environment and reach its destination. Among various path planning algorithms, the dynamic obstacle avoidance issue may be resolved using the reinforcement learning algorithm Q-learning. This article provides a comprehensive review of the recent research progress and achievements in the field of dynamic obstacle avoidance, through the analysis and improvement of the Q-learning algorithm. The article begins by introducing the background and research status of dynamic
APA, Harvard, Vancouver, ISO, and other styles
49

Shin, YongWoo. "Q-learning to improve learning speed using Minimax algorithm." Journal of Korea Game Society 18, no. 4 (2018): 99–106. http://dx.doi.org/10.7583/jkgs.2018.18.4.99.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Xu, Haoran, Xianyuan Zhan, and Xiangyu Zhu. "Constraints Penalized Q-learning for Safe Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (2022): 8753–60. http://dx.doi.org/10.1609/aaai.v36i8.20855.

Full text
Abstract:
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We s
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!