Zeitschriftenartikel: „Q-learning“

1

Watkins, Christopher J. C. H., und Peter Dayan. „Q-learning“. Machine Learning 8, Nr. 3-4 (Mai 1992): 279–92. http://dx.doi.org/10.1007/bf00992698.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Clausen, C., und H. Wechsler. „Quad-Q-learning“. IEEE Transactions on Neural Networks 11, Nr. 2 (März 2000): 279–94. http://dx.doi.org/10.1109/72.839000.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

ten Hagen, Stephan, und Ben Kr�se. „Neural Q-learning“. Neural Computing & Applications 12, Nr. 2 (01.11.2003): 81–88. http://dx.doi.org/10.1007/s00521-003-0369-9.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Wang, Yin-Hao, Tzuu-Hseng S. Li und Chih-Jui Lin. „Backward Q-learning: The combination of Sarsa algorithm and Q-learning“. Engineering Applications of Artificial Intelligence 26, Nr. 9 (Oktober 2013): 2184–93. http://dx.doi.org/10.1016/j.engappai.2013.06.016.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Evseenko, Alla, und Dmitrii Romannikov. „Application of Deep Q-learning and double Deep Q-learning algorithms to the task of control an inverted pendulum“. Transaction of Scientific Papers of the Novosibirsk State Technical University, Nr. 1-2 (26.08.2020): 7–25. http://dx.doi.org/10.17212/2307-6879-2020-1-2-7-25.

Der volle Inhalt der Quelle

Annotation:

Today, such a branch of science as «artificial intelligence» is booming in the world. Systems built on the basis of artificial intelligence methods have the ability to perform functions that are traditionally considered the prerogative of man. Artificial intelligence has a wide range of research areas. One such area is machine learning. This article discusses the algorithms of one of the approaches of machine learning – reinforcement learning (RL), according to which a lot of research and development has been carried out over the past seven years. Development and research on this approach is mainly carried out to solve problems in Atari 2600 games or in other similar ones. In this article, reinforcement training will be applied to one of the dynamic objects – an inverted pendulum. As a model of this object, we consider a model of an inverted pendulum on a cart taken from the Gym library, which contains many models that are used to test and analyze reinforcement learning algorithms. The article describes the implementation and study of two algorithms from this approach, Deep Q-learning and Double Deep Q-learning. As a result, training, testing and training time graphs for each algorithm are presented, on the basis of which it is concluded that it is desirable to use the Double Deep Q-learning algorithm, because the training time is approximately 2 minutes and provides the best control for the model of an inverted pendulum on a cart.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Abedalguni, Bilal. „Bat Q-learning Algorithm“. Jordanian Journal of Computers and Information Technology 3, Nr. 1 (2017): 51. http://dx.doi.org/10.5455/jjcit.71-1480540385.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Zhu, Rong, und Mattia Rigotti. „Self-correcting Q-learning“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 12 (18.05.2021): 11185–92. http://dx.doi.org/10.1609/aaai.v35i12.17334.

Der volle Inhalt der Quelle

Annotation:

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Borkar, Vivek S., und Siddharth Chandak. „Prospect-theoretic Q-learning“. Systems & Control Letters 156 (Oktober 2021): 105009. http://dx.doi.org/10.1016/j.sysconle.2021.105009.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Ganger, Michael, und Wei Hu. „Quantum Multiple Q-Learning“. International Journal of Intelligence Science 09, Nr. 01 (2019): 1–22. http://dx.doi.org/10.4236/ijis.2019.91001.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

John, Indu, Chandramouli Kamanchi und Shalabh Bhatnagar. „Generalized Speedy Q-Learning“. IEEE Control Systems Letters 4, Nr. 3 (Juli 2020): 524–29. http://dx.doi.org/10.1109/lcsys.2020.2970555.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

HORIUCHI, Tadashi, Akinori FUJINO, Osamu KATAI und Tetsuo SAWARAGI. „Q-PSP Learning: An Exploitation-Oriented Q-Learning Algorithm and Its Applications“. Transactions of the Society of Instrument and Control Engineers 35, Nr. 5 (1999): 645–53. http://dx.doi.org/10.9746/sicetr1965.35.645.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Ghazanfari, Behzad, und Nasser Mozayani. „Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks“. Journal of Intelligent & Fuzzy Systems 26, Nr. 6 (2014): 2771–83. http://dx.doi.org/10.3233/ifs-130945.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Yang, Min-Gyu, Kuk-Hyun Ahn und Jae-Bok Song. „Tidy-up Task Planner based on Q-learning“. Journal of Korea Robotics Society 16, Nr. 1 (01.02.2021): 56–63. http://dx.doi.org/10.7746/jkros.2021.16.1.056.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Kim, Min-Soeng, Sun-Gi Hong und Ju-Jang Lee. „Self-Learning Fuzzy Logic Controller using Q-Learning“. Journal of Advanced Computational Intelligence and Intelligent Informatics 4, Nr. 5 (20.09.2000): 349–54. http://dx.doi.org/10.20965/jaciii.2000.p0349.

Der volle Inhalt der Quelle

Annotation:

Fuzzy logic controllers consist of if-then fuzzy rules generally adopted from a priori expert knowledge. However, it is not always easy or cheap to obtain expert knowledge. Q-learning can be used to acquire knowledge from experiences even without the model of the environment. The conventional Q-learning algorithm cannot deal with continuous states and continuous actions. However, the fuzzy logic controller can inherently receive continuous input values and generate continuous output values. Thus, in this paper, the Q-learning algorithm is incorporated into the fuzzy logic controller to compensate for each method’s disadvantages. Modified fuzzy rules are proposed in order to incorporate the Q-learning algorithm into the fuzzy logic controller. This combination results in the fuzzy logic controller that can learn through experience. Since Q-values in Q-learning are functional values of the state and the action, we cannot directly apply the conventional Q-learning algorithm to the proposed fuzzy logic controller. Interpolation is used in each modified fuzzy rule so that the Q-value is updatable.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Moodie, Erica E. M., Nema Dean und Yue Ru Sun. „Q-Learning: Flexible Learning About Useful Utilities“. Statistics in Biosciences 6, Nr. 2 (12.09.2013): 223–43. http://dx.doi.org/10.1007/s12561-013-9103-z.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Hatcho, Yasuyo, Kiyohiko Hattori und Keiki Takadama. „Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents“. Journal of Advanced Computational Intelligence and Intelligent Informatics 13, Nr. 6 (20.11.2009): 667–74. http://dx.doi.org/10.20965/jaciii.2009.p0667.

Der volle Inhalt der Quelle

Annotation:

This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Clifton, Jesse, und Eric Laber. „Q-Learning: Theory and Applications“. Annual Review of Statistics and Its Application 7, Nr. 1 (09.03.2020): 279–301. http://dx.doi.org/10.1146/annurev-statistics-031219-041220.

Der volle Inhalt der Quelle

Annotation:

Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in an infinite-horizon decision problem, now refers to a general class of reinforcement learning methods widely used in statistics and artificial intelligence. In the context of personalized medicine, finite-horizon Q-learning is the workhorse for estimating optimal treatment strategies, known as treatment regimes. Infinite-horizon Q-learning is also increasingly relevant in the growing field of mobile health. In computer science, Q-learning methods have achieved remarkable performance in domains such as game-playing and robotics. In this article, we ( a) review the history of Q-learning in computer science and statistics, ( b) formalize finite-horizon Q-learning within the potential outcomes framework and discuss the inferential difficulties for which it is infamous, and ( c) review variants of infinite-horizon Q-learning and the exploration-exploitation problem, which arises in decision problems with a long time horizon. We close by discussing issues arising with the use of Q-learning in practice, including arguments for combining Q-learning with direct-search methods; sample size considerations for sequential, multiple assignment randomized trials; and possibilities for combining Q-learning with model-based methods.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

He, Ningxia. „Image Sampling Using Q-Learning“. International Journal of Computer Science and Engineering 8, Nr. 1 (25.01.2021): 5–12. http://dx.doi.org/10.14445/23488387/ijcse-v8i1p102.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Ganapathi Subramanian, Sriram, Matthew E. Taylor, Kate Larson und Mark Crowley. „Multi-Agent Advisor Q-Learning“. Journal of Artificial Intelligence Research 74 (05.05.2022): 1–74. http://dx.doi.org/10.1613/jair.1.13445.

Der volle Inhalt der Quelle

Annotation:

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Hu, Yuepeng, Lehan Yang und Yizhu Lou. „Path Planning with Q-Learning“. Journal of Physics: Conference Series 1948, Nr. 1 (01.06.2021): 012038. http://dx.doi.org/10.1088/1742-6596/1948/1/012038.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Sarigül, Mehmet, und Mutlu Avci. „Q LEARNING REGRESSION NEURAL NETWORK“. Neural Network World 28, Nr. 5 (2018): 415–31. http://dx.doi.org/10.14311/nnw.2018.28.023.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Kamanchi, Chandramouli, Raghuram Bharadwaj Diddigi und Shalabh Bhatnagar. „Successive Over-Relaxation ${Q}$ -Learning“. IEEE Control Systems Letters 4, Nr. 1 (Januar 2020): 55–60. http://dx.doi.org/10.1109/lcsys.2019.2921158.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Patnaik, Srikanta, und N. P. Mahalik. „Multiagent coordination utilising Q-learning“. International Journal of Automation and Control 1, Nr. 4 (2007): 377. http://dx.doi.org/10.1504/ijaac.2007.015863.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Lecué, Guillaume, und Philippe Rigollet. „Optimal learning with Q-aggregation“. Annals of Statistics 42, Nr. 1 (Februar 2014): 211–24. http://dx.doi.org/10.1214/13-aos1190.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Ahmadabadi, M. N., und M. Asadpour. „Expertness based cooperative Q-learning“. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 32, Nr. 1 (2002): 66–76. http://dx.doi.org/10.1109/3477.979961.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

Linn, Kristin A., Eric B. Laber und Leonard A. Stefanski. „Interactive Q-Learning for Quantiles“. Journal of the American Statistical Association 112, Nr. 518 (31.03.2017): 638–49. http://dx.doi.org/10.1080/01621459.2016.1155993.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

Goldberg, Yair, und Michael R. Kosorok. „Q-learning with censored data“. Annals of Statistics 40, Nr. 1 (Februar 2012): 529–60. http://dx.doi.org/10.1214/12-aos968.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

28

Peng, Jing, und Ronald J. Williams. „Incremental multi-step Q-learning“. Machine Learning 22, Nr. 1-3 (1996): 283–90. http://dx.doi.org/10.1007/bf00114731.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

29

HOSOYA, Yu, und Motohide UMANO. „Improvement of Updating Method of Q Values in Fuzzy Q-Learning“. Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 27, Nr. 6 (2015): 942–48. http://dx.doi.org/10.3156/jsoft.27.942.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

30

Duryea, Ethan, Michael Ganger und Wei Hu. „Exploring Deep Reinforcement Learning with Multi Q-Learning“. Intelligent Control and Automation 07, Nr. 04 (2016): 129–44. http://dx.doi.org/10.4236/ica.2016.74012.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

31

Hwang, Kao-Shing, Wei-Cheng Jiang und Yu-Jen Chen. „ADAPTIVE MODEL LEARNING BASED ON DYNA-Q LEARNING“. Cybernetics and Systems 44, Nr. 8 (17.11.2013): 641–62. http://dx.doi.org/10.1080/01969722.2013.803387.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

32

da Costa, Luis Antonio L. F., Rafael Kunst und Edison Pignaton de Freitas. „Q-FANET: Improved Q-learning based routing protocol for FANETs“. Computer Networks 198 (Oktober 2021): 108379. http://dx.doi.org/10.1016/j.comnet.2021.108379.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

33

Meng, Xiao-Li. „Discussion: The Q-q Dynamic for Deeper Learning and Research“. International Statistical Review 84, Nr. 2 (16.12.2015): 181–89. http://dx.doi.org/10.1111/insr.12151.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

34

Guo, Yanqin. „Enhancing Flappy Bird Performance With Q-Learning and DQN Strategies“. Highlights in Science, Engineering and Technology 85 (13.03.2024): 396–402. http://dx.doi.org/10.54097/qrded191.

Der volle Inhalt der Quelle

Annotation:

Flappy Bird, a classic single-player game, boasts a deceptively simple premise yet proves to be a formidable challenge in achieving high scores. Various algorithms have been employed to improve its performance, yet a comprehensive assessment of Q-Learning and Deep Q-Network (DQN) in the context of this game remains elusive. This study undertakes the task of training Flappy Bird using both Q-Learning and DQN methodologies, showcasing the potency of reinforcement learning within the realm of gaming. Through meticulous comparisons and analyses, the paper uncovers the inherent strengths and weaknesses embedded within these algorithms. This exploration not only fosters a nuanced grasp of Q-Learning and DQN but does so by leveraging a simplistic gaming environment as the proving ground. Strikingly, the experimental results unveil an initial disadvantage for DQN during training, followed by a rapid surge in performance surpassing Q-Learning in mid-training. Conversely, Q-Learning demonstrates an aptitude for swiftly reaching its performance zenith. Both algorithms tout distinct merits: Q-Learning's adeptness in simpler tasks and DQN's reliability in tackling complex states. In conclusion, this study not only discerns algorithmic prowess but lays a foundational framework for broader application across diverse gaming scenarios. By delving into the nuances of Q-Learning and DQN, the paper establishes a clearer path for harnessing the advantages in shaping the future landscape of game optimization.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

35

D'Orazio, Tiziana, und Grazia Cicirelli. „Q-Learning: computation of optimal Q-values for evaluating the learning level in robotic tasks“. Journal of Experimental & Theoretical Artificial Intelligence 13, Nr. 3 (Juli 2001): 241–70. http://dx.doi.org/10.1080/09528130110063100.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

36

古, 彭. „Improvement and Implementation of Q-Learning Algorithm“. Computer Science and Application 11, Nr. 07 (2021): 1994–2007. http://dx.doi.org/10.12677/csa.2021.117204.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

37

Wei-Kai Sun, Wei-Kai Sun, Xiao-Mei Wang Wei-Kai Sun, Bin Wang Xiao-Mei Wang, Jia-Sen Zhang Bin Wang und Hai-Yang Du Jia-Sen Zhang. „MR-SFAMA-Q: A MAC Protocol based on Q-Learning for Underwater Acoustic Sensor Networks“. 電腦學刊 35, Nr. 1 (Februar 2024): 051–63. http://dx.doi.org/10.53106/199115992024023501004.

Der volle Inhalt der Quelle

Annotation:

<p>In recent years, with the rapid development of science and technology, many new technologies have made people’s exploration of the ocean deeper and deeper, and due to the requirements of national defense and marine development, the underwater acoustic sensor network (UASN) has been paid more and more attention. Nevertheless, the underwater acoustic channel has the properties of considerable propagation delay, limited bandwidth, and unstable network topology. In order to improve the performance of the medium access control (MAC) protocol in UASN, we propose a new MAC protocol based on the Slotted-FAMA of Multiple Reception (MR-SFAMA) protocol. The protocol uses the Q-Learning algorithm to optimize the multi-receiver handshake mechanism. The current state is judged according to the received node request, and the Q-table is established. Through the multi-round interaction between the node and the environment, the Q-table is continuously updated to obtain the optimal strategy and determine the optimal data transmission scheduling scheme. The reward function is set according to the total back-off time and frame error rate, which can reduce the packet loss rate during network data transmission while reducing the delay. In addition, the matching asynchronous operation and uniform random back-off algorithm are used to solve the problem of long channel idle time and low channel utilization. This new protocol can be well applied to unstable network topology. The simulation results show that the protocol performs better than Slotted-FAMA and MR-SFAMA regarding delay and normalized throughput.</p> <p> </p>

APA, Harvard, Vancouver, ISO und andere Zitierweisen

38

Liu, Peiyi. „Q-Learning: Applications and Convergence Rate Optimization“. Highlights in Science, Engineering and Technology 63 (08.08.2023): 210–15. http://dx.doi.org/10.54097/hset.v63i.10878.

Der volle Inhalt der Quelle

Annotation:

As an important algorithm of artificial intelligence technology, Q-learning algorithm plays a significant part in a number of fields, such as driverless technology, industrial automation, health care, intelligent search, game, etc. As a classical learning algorithm in reinforcement learning, with the help of an experienced action sequence in a Markov environment, an agent can learn to select the best course of action using the model-free learning technique known as Q-learning. This paper mainly discusses the addition of received signal strength (RSS) to the Q-learning algorithm to navigate unmanned aerial vehicle (UAV), summarizes the main content and results of the neural Q-learning algorithm helping UAV avoid obstacles, adaptive and random exploration (ARE) method is proposed to address the issue in UAV planning a route tasks, summarizes the content and results of route designing of moving robot using obstacle characteristics as Q-learning states and actions, the Q-learning algorithm employs a novel exploration technique that combines ε-rapacious exploring with Boltzmann theory. to help mobile robot to plan its path, and analyzes the convergence speed of the algorithm for planning a route of stage Q-learning and the path planning algorithm of traditional Q-learning. When there are many states and actions, the operation efficiency of the Q-learning algorithm will be greatly reduced, so it is necessary to study in depth how to reduce the operation time of the algorithm for Q-learning and increase the convergence velocity of algorithm for Q-learning.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

39

Chen, Bo-Wei, Shih-Hung Yang, Yu-Chun Lo, Ching-Fu Wang, Han-Lin Wang, Chen-Yang Hsu, Yun-Ting Kuo et al. „Enhancement of Hippocampal Spatial Decoding Using a Dynamic Q-Learning Method With a Relative Reward Using Theta Phase Precession“. International Journal of Neural Systems 30, Nr. 09 (12.08.2020): 2050048. http://dx.doi.org/10.1142/s0129065720500483.

Der volle Inhalt der Quelle

Annotation:

Hippocampal place cells and interneurons in mammals have stable place fields and theta phase precession profiles that encode spatial environmental information. Hippocampal CA1 neurons can represent the animal’s location and prospective information about the goal location. Reinforcement learning (RL) algorithms such as Q-learning have been used to build the navigation models. However, the traditional Q-learning ([Formula: see text]Q-learning) limits the reward function once the animals arrive at the goal location, leading to unsatisfactory location accuracy and convergence rates. Therefore, we proposed a revised version of the Q-learning algorithm, dynamical Q-learning ([Formula: see text]Q-learning), which assigns the reward function adaptively to improve the decoding performance. Firing rate was the input of the neural network of [Formula: see text]Q-learning and was used to predict the movement direction. On the other hand, phase precession was the input of the reward function to update the weights of [Formula: see text]Q-learning. Trajectory predictions using [Formula: see text]Q- and [Formula: see text]Q-learning were compared by the root mean squared error (RMSE) between the actual and predicted rat trajectories. Using [Formula: see text]Q-learning, significantly higher prediction accuracy and faster convergence rate were obtained compared with [Formula: see text]Q-learning in all cell types. Moreover, combining place cells and interneurons with theta phase precession improved the convergence rate and prediction accuracy. The proposed [Formula: see text]Q-learning algorithm is a quick and more accurate method to perform trajectory reconstruction and prediction.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

40

Zhang, Chunyuan, Qi Song und Zeng Meng. „Minibatch Recursive Least Squares Q-Learning“. Computational Intelligence and Neuroscience 2021 (08.10.2021): 1–9. http://dx.doi.org/10.1155/2021/5370281.

Der volle Inhalt der Quelle

Annotation:

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

41

Shin, YongWoo. „Q-learning to improve learning speed using Minimax algorithm“. Journal of Korea Game Society 18, Nr. 4 (31.08.2018): 99–106. http://dx.doi.org/10.7583/jkgs.2018.18.4.99.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

42

Xu, Haoran, Xianyuan Zhan und Xiangyu Zhu. „Constraints Penalized Q-learning for Safe Offline Reinforcement Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 8 (28.06.2022): 8753–60. http://dx.doi.org/10.1609/aaai.v36i8.20855.

Der volle Inhalt der Quelle

Annotation:

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

43

Charypar, David, und Kai Nagel. „Q-Learning for Flexible Learning of Daily Activity Plans“. Transportation Research Record: Journal of the Transportation Research Board 1935, Nr. 1 (Januar 2005): 163–69. http://dx.doi.org/10.1177/0361198105193500119.

Der volle Inhalt der Quelle

Annotation:

Q-learning is a method from artificial intelligence to solve the reinforcement learning problem (RLP), defined as follows. An agent is faced with a set of states, S. For each state s there is a set of actions, A( s), that the agent can take and that takes the agent (deterministically or stochastically) to another state. For each state the agent receives a (possibly stochastic) reward. The task is to select actions such that the reward is maximized. Activity generation is for demand generation in the context of transportation simulation. For each member of a synthetic population, a daily activity plan stating a sequence of activities (e.g., home-work-shop-home), including locations and times, needs to be found. Activities at different locations generate demand for transportation. Activity generation can be modeled as an RLP with the states given by the triple (type of activity, starting time of activity, time already spent at activity). The possible actions are either to stay at a given activity or to move to another activity. Rewards are given as “utility per time slice,” which corresponds to a coarse version of marginal utility. Q-learning has the property that, by repeating similar experiences over and over again, the agent looks forward in time; that is, the agent can also go on paths through state space in which high rewards are given only at the end. This paper presents computational results with such an algorithm for daily activity planning.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

44

Tan, Chunxi, Ruijian Han, Rougang Ye und Kani Chen. „Adaptive Learning Recommendation Strategy Based on Deep Q-learning“. Applied Psychological Measurement 44, Nr. 4 (25.07.2019): 251–66. http://dx.doi.org/10.1177/0146621619858674.

Der volle Inhalt der Quelle

Annotation:

Personalized recommendation system has been widely adopted in E-learning field that is adaptive to each learner’s own learning pace. With full utilization of learning behavior data, psychometric assessment models keep track of the learner’s proficiency on knowledge points, and then, the well-designed recommendation strategy selects a sequence of actions to meet the objective of maximizing learner’s learning efficiency. This article proposes a novel adaptive recommendation strategy under the framework of reinforcement learning. The proposed strategy is realized by the deep Q-learning algorithms, which are the techniques that contributed to the success of AlphaGo Zero to achieve the super-human level in playing the game of go. The proposed algorithm incorporates an early stopping to account for the possibility that learners may choose to stop learning. It can properly deal with missing data and can handle more individual-specific features for better recommendations. The recommendation strategy guides individual learners with efficient learning paths that vary from person to person. The authors showcase concrete examples with numeric analysis of substantive learning scenarios to further demonstrate the power of the proposed method.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

45

Gokul, Vignesh, Parinitha Kannan, Sharath Kumar und Shomona Gracia. „Deep Q-Learning for Home Automation“. International Journal of Computer Applications 152, Nr. 6 (17.10.2016): 1–5. http://dx.doi.org/10.5120/ijca2016911873.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

46

NOTSU, Akira, und Katsuhiro HONDA. „Discounted UCB1-tuned for Q-Learning“. Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 26, Nr. 6 (2014): 913–23. http://dx.doi.org/10.3156/jsoft.26.913.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

47

Hu, Wei, und James Hu. „Q Learning with Quantum Neural Networks“. Natural Science 11, Nr. 01 (2019): 31–39. http://dx.doi.org/10.4236/ns.2019.111005.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

48

Zheng, Zhang, Ji-Hoon Seung, Tae-Yeong Kim und Kil-To Chong. „Traffic Control using Q-Learning Algorithm“. Journal of the Korea Academia-Industrial cooperation Society 12, Nr. 11 (30.11.2011): 5135–42. http://dx.doi.org/10.5762/kais.2011.12.11.5135.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

49

Liu, Jingchen, Gongjun Xu und Zhiliang Ying. „Theory of self-learning $Q$-matrix“. Bernoulli 19, Nr. 5A (November 2013): 1790–817. http://dx.doi.org/10.3150/12-bej430.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

50

Ma, Yu chien (Calvin), Zoe Wang und Alexander Fleiss. „Deep Q-Learning for Trading Cryptocurrency“. Journal of Financial Data Science 3, Nr. 3 (08.06.2021): 121–27. http://dx.doi.org/10.3905/jfds.2021.1.064.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Zeitschriftenartikel zum Thema „Q-learning“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an